Linux menu

Tuesday, September 23, 2014

Awk and Gawk In Linux

What is awk?


Awk is a pattern scanning processing language that was designed to process text files such as system dumps log files. Awk allows the use of regular expressions and pattern matching making it a very powerful language. The name Awk comes from its original authors: Alfred V. Aho, Brian W. Kernighan and Peter J. Weinberger. Gawk is the (Gnu Awk) implementation of Awk.

Like many programming languages, Awk can handle variables, loops, conditional processing and arithmetic. Below are some simple examples of using awk to display various fields of information. Below is intended as a quick introduction to awk only.

Basic Examples of Awk


Contents of test file text.txt


one      two       three    four      five          six
John     246810    team01   UK        Birmingham    FTE
Paul     135790    team02   UK        Glasgow       FTE
Marcus   049583    team03   DE        Bremen        PTE
Foxy     903485    team01   UK        Aston         PTE

Print all Lines and fields in a file


Awk reads in each line of your file or input and separates each line into fields. By default, white space (spaces and tabs) are used to separate the fields. Each of this fields is then stored within variables. To display the entire line, we use the variable $0, for field one we would use $1, field two would be $2...

john@sles01:~/testing> awk '{ print $0 }' test.txt
one      two       three    four      five          six
John     246810    team01   UK        Birmingham    FTE
Paul     135790    team02   UK        Glasgow       FTE
Marcus   049583    team03   DE        Bremen        PTE
Foxy     903485    team01   UK        Aston         PTE

Print field one



john@sles01:~/testing> awk '{ print $1 }' test.txt
one
John
Paul
Marcus
Foxy

Print field two



john@sles01:~/testing> awk '{ print $2 }' test.txt
two
246810
135790
049583
903485

Print fields one and two



john@sles01:~/testing> awk '{ print $1,$3 }' test.txt
one three
John team01
Paul team02
Marcus team03
Foxy team01

Print only fields containing a certain string


The following example prints only lines that contain the string "team01":


john@sles01:~/testing> awk '/team01/ { print $0}' test.txt
John     246810    team01   UK        Birmingham    FTE
Foxy     903485    team01   UK        Aston         PTE

The following example prints fields one, two and three only if they contain the string "team01":


john@sles01:~/testing> awk '/team01/ { print $1,"-",$2,"-",$3}' test.txt
John - 246810 - team01
Foxy - 903485 - team01

Again the following examples print only lines containing the specified string:


john@sles01:~/testing> awk '/FTE$/ { print $0 }' test.txt
John     246810    team01   UK        Birmingham    FTE
Paul     135790    team02   UK        Glasgow       FTE

john@sles01:~/testing> awk '/PTE$/ { print $0 }' test.txt
Marcus   049583    team03   DE        Bremen        PTE
Foxy     903485    team01   UK        Aston         PTE

Field Separator


By default awk splits its input line fields by white space. To modify this separator field you can use the "-F" flag to specify a different separator. A simple way to demonstrate this would be to process the "/etc/passwd" file as this is separated by ":" colons.

A simple example of this would be to issue the command: awk < /etc/passwd -F: '{ print $1,"-",$6,"-",$7 }'

We can now see that the name field, home directory area and shell information is displayed:


john - /home/john - /bin/bash
johnny - /home/johnny - /bin/bash
oracle - /home/oracle - /bin/bash
oralint - /home/oralint - /bin/bash
lol - /home/lol - /bin/bash
test - /home/test - /bin/bash
testuser - /home/testuser - /bin/bash

Basic Arithmetic with awk


The examples below show very basic addition, subtraction, multiplication and division calculations:


john@sles01:~/testing> echo 10 2 | awk '{ print $1 + $2 }'
12

john@sles01:~/testing> echo 10 2 | awk '{ print $1 - $2 }'
8

john@sles01:~/testing> echo 10 2 | awk '{ print $1 * $2 }'
20

john@sles01:~/testing> echo 10 2 | awk '{ print $1 / $2 }'
5

Basic Loop Example


There are many loop types that can be used by awk. Some of the commonly used types are "while", "do while" and "for". A simple example of a while loop can be found below:


john@sles01:~/testing> awk 'BEGIN{
x=1;
while(1)
{
print "Count = ",x;
if ( x==10 )
break;
x++;
}}'

Count =  1
Count =  2
Count =  3
Count =  4
Count =  5
Count =  6
Count =  7
Count =  8
Count =  9
Count =  10

Awk Scripting


Every awk program has three parts: a BEGIN block, which is executed once before any input is read; a main loop, which is executed for every line of input; and an END block, which is executed after all of the input is read.

A simple example of an awk script:



#!/usr/bin/awk -f
#
# Test awk script
#

BEGIN {
       print "--- I am a test awk file ---"
       count=0
      }

{
      if ($3 =="team01") {
         print "team01 members found: "$1,"-",$3
         count=count+1
      }
}

END {
     print "------------------------------"
     printf("\tTotal Number of Records Processed:\t%d\n", NR)
     printf("\tNumber of team01 members found :\t%d\n", count)
}

john@sles01:~/testing> ls -l awk01
-rwxr-xr-x 1 john users 416 May 28 15:02 awk01

john@sles01:~/testing> ./awk01 test.txt
--- I am a test awk file ---
team01 members found: John - team01
team01 members found: Foxy - team01
------------------------------
        Total Number of Records Processed:      5
        Number of team01 members found :        2

In the above example script our BEGIN block sets our count value to 0 "count=0", next the middle section of the script checks the contents of the third field $3. If this field contains our search criteria of "team01", then we increment the count value by one. Once each line of the input file has been scanned, the END block then prints a simple summary of the results. The variable count contains the number of matches and the awk variable NR contains the number of records processed.

No comments: