A very quick introduction to AWK


Introduction

Of course a quick introduction shouldn't have an introduction....

What it can do

What it can do is:

  • Anything grep does
  • Anything wc does
  • Manipulate text tables of most types
  • Work in a pipe
  • Probably a lot sed can, but just learn to use sed anyway.

Syntax

Code is set up in a if-match-then code blocks. Every code block has a pattern in front of it. The most common are: BEGIN, END and /regular expression/. None of these blocks are mandatory, all are shown below.

BEGIN{
  #code here, this is a comment
}
/[314]*/{
}
END{
}

BEGIN is executed at the program beginning, END at the end. The rest of the matches are done to records.

Records are collected from the input by splitting the input into chunks using the record seperator RS. When a record matches, it is split into fields using the field seperator FS. So the input data is scanned as "fieldFSfieldFSfieldRS".

FS and RS are both variables and can be set using the "=" operator. They default to FS=" " and RS="\n" Both have their counterparts for output: OFS and ORS

Once the record has been split up (using FS), it's available like Bash variables: $0 is the whole record, $1 the first field, $2 the second, etc ...

Normal variables have no type or distinct starting characters. End statements with a semicolon (";") and/or newline.

Most common functions

FunctionDescription
print RecordToPrint
next Jump to next record instead of trying for other matches.
system(commandToExecute) Execute commandToExecute.
rand() Generate random number, zero or more, lower then 1.
gsub(regularExpression, replacement, haystack) Substitude
length(value) Returns the length of value
tolower(value) Returns the uppercase version of value
toupper(value) Returns the lowercase version of value

Execution

Use one of the following (or look at the manual once):

awk -f program-file
awk -f program-file -- file names to process
awk -- "program-text" file names to process

Examples

This is a very simple user list to very simple html conversion in AWK: execute using awk -f thefile.awk -- /etc/passwd > temp.html

BEGIN{
  FS = ":"
  ORS= "<br />\n"
  print "<html><body>"
}
//{
  print "User <b>" $1 "</b>"
  print "lives at " $6
  user += 1
}
END{
  print "";
  print "A total of " user " users on the system."
  print "</body></html>"
}

Line for line, the following is stated above:

  1. In the beginning...
  2.   Split fields using the colon character (":")
  3.   When outputting records, the output record seperator should be "<br>\n"
  4.   Now print the start of the html page to the stdout
  5. Close the block
  6. For every record (all records will match an empty expression)
  7.   Print the first field of the list (the first text before FS)
  8.   Print lives at followed by the 6th field of the line.
  9.   Increment the user variable with one. It defaults to zero/empty.
  10. Close the block
  11. At the end of all input (at the END)
  12.   Print an empty record (which puts "" + ORS on the output)
  13.   Print "A total of" then the OFS then the user cound then OFS again and then " users on the system" ORS
  14.   Print the closing statements for the html page ORS
  15. Close the block

Here is a quick line to get all /var/log/message type loglines form a diskimage and show how many where not matched:
(kept the regular expression large to keep it readable)

strings diskimage.img |awk -- "/^[A-Z][a-z][a-z] [0-9 ][0-9 ] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]/{print; next}//{unmatched++}END{print unmatched \" lines unmatched.\"}"

For more examples, read the manual (man awk).

See also

This is such a quick and very dirty introduction to AWK that if you really want to use it allot, you should read more on it.
Here are some of the online resources:

Last update: July 02 2008
[Contact] [XHTML]
Copyright © Infosnel.nl