AWK

Getting Started

Introduction to AWK and basic syntax fundamentals

What is AWK

Understanding AWK and its use cases for text processing

AWK overview and capabilities

AWK is a powerful tool for text processing, data extraction, and pattern matching across files and streams.

Code

# AWK is a text processing language designed for pattern scanning
# and manipulation of data. Key features:
# - Process text files line by line
# - Extract and manipulate columns
# - Perform calculations on data
# - Filter records based on patterns
# - Generate reports and formatted output

# AWK stands for: Aho, Weinberger, Kernighan (creators)
# Variants: awk (original), gawk (GNU AWK), mawk (minimal AWK)

Execution

which awk && awk --version

Output

/usr/bin/awk
GNU Awk 5.1.0, API: 3.1

AWK reads input line by line automatically
Supports regular expressions for pattern matching
Built-in variables track lines, fields, and more
Can be used interactively or in shell scripts

AWK vs other text processing tools

AWK excels at processing structured data and performing transformations that combine pattern matching with data manipulation.

Code

# Compare AWK with other text processing tools:
# grep: Search file patterns (matches lines)
# sed: Stream editor (find and replace, transformations)
# awk: Full programming language (matching + calculations)

# AWK is best when you need:
# - Extract or reorganize columns
# - Perform calculations on text data
# - Process structured text (CSV, logs, reports)
# - Conditional logic and complex transformations

Execution

echo -e "name,age\nJohn,30\nJane,25" | awk -F',' '{print $1 " is " $2 " years old"}'

Output

name is age years old
John is 30 years old
Jane is 25 years old

AWK is a complete Turing-complete programming language
More powerful than grep or sed for complex text processing
Easier syntax than writing shell scripts for text processing

Basic Syntax and Structure

Understanding AWK program structure and execution flow

AWK program structure

Shows the three parts of an AWK program - BEGIN block (runs once before input), main block (runs for each line), and END block (runs once after all input).

Code

# Basic AWK syntax structure:
# awk 'pattern { action }' input-file

# Program structure with BEGIN, pattern/action, and END:
awk '
  BEGIN {
    # Initialization, runs before processing input
    print "Starting processing..."
  }
  pattern {
    # Main processing, runs for each line
    # pattern can be regex, expression, or range
  }
  END {
    # Finalization, runs after all input processed
    print "Processing complete"
  }
' input-file

Execution

echo -e "apple\nbanana\ncherry" | awk 'BEGIN { print "Fruits:" } { print "- " $0 } END { print "Done" }'

Output

Fruits:
- apple
- banana
- cherry
Done

BEGIN and END blocks are optional
Multiple patterns can match the same line
Pattern can be omitted (all lines match)
Action can be omitted (default is print)

Inline AWK programs and scripts

Different ways to invoke AWK - inline, from file, with arguments, and from pipes.

Code

# Method 1: Inline program with single quotes
awk '{ print NR, $0 }' file.txt

# Method 2: Program from file
awk -f script.awk file.txt

# Method 3: Multiple input files
awk '{ print FILENAME, NR, $0 }' file1.txt file2.txt

# Method 4: Program with variables
awk -v var=value '{ print var, $0 }' file.txt

# Method 5: Pipe input directly
echo "data" | awk '{ print toupper($0) }'

Execution

echo -e "line1\nline2" | awk '{ print NR, $0 }'

Output

1 line1
2 line2

Single quotes protect AWK syntax from shell interpretation
-f flag reads program from file
-v flag passes variables to AWK
AWK can read from multiple files sequentially

Installation and Setup

Installing AWK and verifying functionality

Install AWK on Linux systems

Installation of GNU AWK (gawk) on various Linux distributions and macOS.

Code

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y gawk

# CentOS/RHEL
sudo yum install -y gawk

# macOS
brew install gawk

# Verify installation
awk --version

Execution

gawk --version | head -1

Output

GNU Awk 5.1.0, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)

Most systems have awk or gawk installed by default
gawk (GNU AWK) is most feature-complete implementation
mawk is faster but supports fewer features
Compatibility: gawk includes all mawk features

Creating AWK script file

Creating and using AWK script files for more complex programs.

Code

# Create a simple AWK script file
cat > process.awk << 'EOF'
BEGIN {
  print "Processing file..."
}
{
  total += $1
  count++
}
END {
  print "Total: " total
  print "Average: " (count > 0 ? total / count : 0)
}
EOF

# Run the script
awk -f process.awk numbers.txt

# Make script executable (requires shebang)
echo '#!/usr/bin/awk -f' | cat - process.awk > /tmp/process
chmod +x /tmp/process
./process numbers.txt

Execution

cat process.awk

Output

BEGIN {
  print "Processing file..."
}
{
  total += $1
  count++
}
END {
  print "Total: " total
  print "Average: " (count > 0 ? total / count : 0)
}

Use -f flag to load program from file
Script files can be made executable with proper shebang
Shebang line: #!/usr/bin/awk -f
Scripts are useful for complex logic and reusability

Field Processing

Working with fields, separators, and field manipulation

Field Variables and NF

Understanding field access and manipulation with field variables

Accessing fields in AWK

Field variables allow accessing individual fields by number, with $NF accessing the last field dynamically.

Code

# Field variables reference:
# $0 = entire line
# $1 = first field
# $2 = second field
# ... $NF = last field
# NF = total number of fields in the line

# Examples:
echo "John Doe 30 Engineer" | awk '{ print $1 }'       # John
echo "John Doe 30 Engineer" | awk '{ print $NF }'      # Engineer
echo "John Doe 30 Engineer" | awk '{ print $(NF-1) }'  # 30
echo "John Doe 30 Engineer" | awk '{ print NF }'       # 4

Execution

echo "apple banana cherry" | awk '{ print "Last field:", $NF; print "Field count:", NF }'

Output

Last field: cherry
Field count: 3

Fields are separated by whitespace by default (FS)
$0 is the entire line
Use $(NF-1) for second-to-last field, $(NF-2) for third-to-last, etc.
NF changes if you modify fields

Modifying fields

Modifying, adding, and removing fields by direct assignment or changing NF value.

Code

# Change field values
echo "apple 5 basket" | awk '{ $2 = 10; print }'     # apple 10 basket

# Add fields
echo "apple basket" | awk '{ $(NF+1) = "10"; print }'  # apple basket 10

# Reconstruct line with modified fields
echo "a b c" | awk '{ $2 = "B"; print $0 }'          # a B c

# Remove fields (truncate)
echo "a b c d e" | awk '{ NF = 3; print }'           # a b c

Execution

echo "test 100 active" | awk '{ $2 = 200; NF = 2; print }'

Output

test 200

Modifying a field updates $0 automatically
Increasing NF adds empty fields
Decreasing NF removes fields from the end
Output uses OFS to join modified fields

Field Separators

Setting and using different field separator patterns

Setting field separators

Custom field separators allow parsing different delimited formats like CSV, TSV, and colon-separated data.

Code

# FS (Field Separator) - what separates fields in input
# OFS (Output Field Separator) - what separates fields in output

# Default FS is whitespace. Set custom FS with -F flag:
echo "apple,banana,cherry" | awk -F',' '{ print $2 }'      # banana

# Or use BEGIN to set FS
echo "apple:banana:cherry" | awk 'BEGIN { FS=":" } { print $2 }'  # banana

# Set both FS and OFS
echo "apple-banana-cherry" | awk -F'-' -v OFS=',' '{ print $1, $2, $3 }'
# apple,banana,cherry

# Use regex as FS (multiple delimiters)
echo "apple,banana;cherry" | awk -F'[,;]' '{ print $2 }'   # banana

Execution

echo "name:age:city" | awk -F':' '{ print "Name=" $1 ", Age=" $2 }'

Output

Name=name, Age=age

Default FS matches one or more whitespace characters
FS can be a single character or regex pattern
-F flag overrides default FS
BEGIN block can set FS for variable input

Processing CSV and complex formats

Different separators for input parsing and output formatting with FS and OFS.

Code

# Process CSV with quoted fields
echo 'name,age,city
"Smith, John",30,"New York"
"Doe, Jane",25,"Los Angeles"' | awk -F',' '
NR > 1 {
  gsub(/"/, "", $1)  # Remove quotes
  gsub(/ +$/, "", $1)  # Remove trailing spaces
  print $1 " is " $2 " years old"
}'

# Tab-separated values
echo -e "col1\tcol2\tcol3" | awk -F'\t' '{ print $2 }'

# Multi-character delimiter
echo "apple::banana::cherry" | awk -F'::' '{ print $1 }'

Execution

echo "a:b:c" | awk -F':' -v OFS='-' '{ print $1, $2, $3 }'

Output

a-b-c

-v OFS flags sets output field separator
FS regex can handle multiple delimiter types
gsub useful for cleaning CSV quote characters
Use -F'\t' for tab-delimited files

Field Manipulation and Reconstruction

Transform and reorganize field data

Rearranging and extracting fields

Extracting and rearranging fields to create custom output formats.

Code

# Extract and rearrange columns
echo "John Doe john@example.com 30" | awk '{ print $3, $2, $1 }'
# john@example.com Doe John

# Extract specific fields
echo "apple 100 USD red" | awk '{ print "Item: " $1 ", Price: " $2 " " $4 }'
# Item: apple, Price: 100 red

# Concatenate fields
echo "John Doe" | awk '{ full_name = $1 " " $2; print "Hello, " full_name }'
# Hello, John Doe

# Extract substring from field
echo "example@domain.com" | awk -F'@' '{ print "User: " $1 }'
# User: example

Execution

echo "2025-02-28" | awk -F'-' '{ print "Year: " $1 ", Month: " $2 ", Day: " $3 }'

Output

Year: 2025, Month: 02, Day: 28

Fields can be rearranged in any order
Easy to combine fields into new strings
Using FS and field numbers simplifies parsing

Computing values from fields

Performing calculations and transformations on field values.

Code

# Sum and average fields
echo -e "100\n200\n300" | awk '{ sum += $1 } END { print "Total: " sum ", Avg: " sum/NR }'
# Total: 600, Avg: 200

# Product of fields
echo "5 4 3" | awk '{ print "Product: " $1 * $2 * $3 }'
# Product: 60

# Calculate percentage
echo "75 100" | awk '{ print ($1 / $2 * 100) "%" }'
# 75%

# String operations on fields
echo "javascript PYTHON bash" | awk '{ print toupper($1), tolower($2), tolower($3) }'
# JAVASCRIPT python bash

Execution

echo "10 20" | awk '{ print "Sum: " $1 + $2 ", Diff: " $1 - $2 ", Prod: " $1 * $2 }'

Output

Sum: 30, Diff: -10, Prod: 200

AWK automatically converts fields to numbers when needed
Can perform arithmetic directly on fields
String functions can transform field values

Patterns and Matching

Pattern matching and regular expressions in AWK

Pattern Types and Matching

Different pattern types for matching records

AWK pattern types

Different pattern types allow flexible filtering and matching of records.

Code

# Pattern types in AWK:

# 1. Regex patterns
awk '/pattern/ { action }' file         # Match lines with pattern
awk '!/pattern/ { action }' file        # Match lines without pattern

# 2. Expression patterns (boolean)
awk '$1 > 100 { action }' file          # Field comparison
awk 'NR >= 2 && NR <= 5' file           # Line number range
awk '$1 == "active" { action }' file    # Exact match

# 3. BEGIN/END patterns
awk 'BEGIN { action }'                  # Before processing input
awk 'END { action }'                    # After processing input

# 4. Range patterns
awk '/start/,/end/ { action }' file     # From start to end pattern

# 5. No pattern
awk '{ action }' file                   # All lines match

Execution

echo -e "apple 50\nbanana 120\ncherry 75" | awk '$2 > 100 { print }'

Output

banana 120

Patterns determine which lines trigger the action block
Multiple patterns can target different lines
Patterns are optional (all lines match by default)
Expression patterns use standard operators

Regex patterns and matching

Regex patterns filter lines based on pattern matching and operator ~.

Code

# Regex pattern matching
awk '/linux/' file                      # Contains "linux"
awk '/^linux/' file                     # Starts with "linux"
awk '/linux$/' file                     # Ends with "linux"
awk '/[0-9]+/' file                     # Contains digits
awk '!/error/' file                     # Does not contain "error"

# Regex with field matching (~)
awk '$1 ~ /john/' file                  # First field matches regex
awk '$2 !~ /test/' file                 # Second field doesn't match

# Case-insensitive matching
awk 'tolower($0) ~ /error/' file        # Case-insensitive search

Execution

echo -e "error: issue\nwarning: notice\nerror: problem" | awk '/error/'

Output

error: issue
error: problem

~ operator checks if field matches regex
!~ operator checks if field doesn't match
Use tolower() for case-insensitive matching
^ and $ anchor patterns to start and end

BEGIN and END Blocks

Initialization and finalization with BEGIN and END

BEGIN and END blocks

BEGIN runs before processing input and END runs after, useful for setup, cleanup, and reporting.

Code

# BEGIN block - runs once before processing input
awk 'BEGIN {
  print "Processing started..."
  count = 0
}
{
  count++
}
END {
  print "Total lines: " count
}' file

# Example: Sum numbers and show statistics
awk 'BEGIN { sum = 0; count = 0 }
NR > 1 { sum += $1; count++ }
END {
  avg = (count > 0) ? sum / count : 0
  print "Sum: " sum "\nAverage: " avg
}' data.txt

Execution

echo -e "10\n20\n30" | awk 'BEGIN { print "Numbers:" } { print "- " $0 } END { print "Done" }'

Output

Numbers:
- 10
- 20
- 30
Done

BEGIN block runs even if no input is provided
END block runs even if no lines match patterns
Variables initialized in BEGIN persist through all input

Header and summary generation

Using BEGIN and END to create formatted reports with headers, footers, and summary statistics.

Code

# Generate report with header and footer
awk 'BEGIN {
  printf "%-15s %-10s %-10s\n", "Name", "Age", "City"
  printf "%-15s %-10s %-10s\n", "----", "---", "----"
}
{
  printf "%-15s %-10s %-10s\n", $1, $2, $3
}
END {
  print ""
  print "Total records: " NR
}' people.txt

# Calculate running statistics
awk 'BEGIN {
  header = "Line\tValue\tRunSum"
  print header
}
{
  sum += $1
  printf "%d\t%d\t%d\n", NR, $1, sum
}
END {
  print "Final sum: " sum
}' numbers.txt

Execution

echo -e "5\n10\n15" | awk 'BEGIN { print "Input values:" } { sum += $1; print NR, $1 } END { print "Total: " sum }'

Output

Input values:
1 5
2 10
3 15
Total: 30

printf in BEGIN/END creates formatted output
Variables maintain state across BEGIN, main, and END
Good pattern for generating summaries and reports

Range and Compound Patterns

Using range patterns and combining conditions

Range patterns

Range patterns select a block of lines from first matching pattern to second matching pattern.

Code

# Range patterns match from first pattern to second pattern
# Syntax: /start/,/end/ { action }

# Print lines between BEGIN and END markers
awk '/BEGIN/,/END/' file

# Process sections of a file
awk '/^Chapter 1/,/^Chapter 2/' book.txt

# Extract code blocks between markers
awk '/```/,/```/' doc.md | head -20

# Line number ranges
awk 'NR==10, NR==20 { print NR, $0 }' file

Execution

echo -e "start\ndata1\ndata2\nend\nignore" | awk '/start/,/end/ { print }'

Output

start
data1
data2
end

Range state toggles when start pattern matches
Range stays true until end pattern matches
Both start and end lines are included

Compound conditions

Compound conditions combine multiple patterns with logical operators for complex filtering.

Code

# Logical AND (&&) - both conditions must be true
awk '$1 == "error" && $2 > 100 { print }' logs.txt

# Logical OR (||) - either condition can be true
awk '$1 == "error" || $1 == "warning" { print }' logs.txt

# Negation (!) - condition must be false
awk '!($1 == "info") { print }' logs.txt

# Complex expressions
awk '($1 ~ /^ERR/ || $1 ~ /^FAIL/) && NR > 100 { print }' logs.txt

# Multiple conditions on fields
awk '$2 >= 500 && $2 <= 1000 && $3 == "active"' data.txt

Execution

echo -e "error 150\nerror 50\nwarning 200" | awk '$1 == "error" && $2 > 100 { print }'

Output

error 150

&& has higher precedence than ||
Use parentheses for clarity in complex expressions
Patterns are evaluated left to right

Variables and Operators

Built-in variables, operators, and expressions

Built-in Variables

Understanding AWK's built-in variables

Built-in variables reference

Built-in variables track line numbers, field counts, separators, and filenames.

Code

# NR - Total record number (line number across all files)
awk '{ print NR, $0 }' file

# FNR - File record number (line number within current file)
awk '{ print FNR, FILENAME, $0 }' file1 file2

# NF - Number of fields in current record
awk '{ print "Fields: " NF, "Last field: " $NF }' file

# FS - Field separator (input)
awk 'BEGIN { FS=":" } { print $1 }'

# OFS - Output field separator
awk -v OFS="-" '{ print $1, $2, $3 }' file

# RS - Record separator (usually newline)
awk 'BEGIN { RS=";" } { print NR, $0 }'

# ORS - Output record separator
awk 'BEGIN { ORS=";" } { print $0 }'

# FILENAME - Current filename being processed
awk '{ print FILENAME ":" NR ":" $0 }' file1 file2

# ARGC/ARGV - Command line argument count and values
awk 'BEGIN { print "Args: " ARGC; for (i=0; i<ARGC; i++) print ARGV[i] }'

Execution

echo -e "a b c\nd e f g" | awk '{ print "NR=" NR " NF=" NF " Last=" $NF }'

Output

NR=1 NF=3 Last=c
NR=2 NF=4 Last=g

NR increments with each line across all files
FNR resets for each new file
NF changes if you modify fields
FILENAME shows which file is being processed

Additional built-in variables

Additional built-in variables for environment access, arguments, and match results.

Code

# ENVIRON - Environment variables
awk 'BEGIN { print "User: " ENVIRON["USER"] }'
awk 'BEGIN { print "Home: " ENVIRON["HOME"] }'

# ARGC/ARGV - Arguments
awk 'BEGIN { print ARGC; for(i in ARGV) print i, ARGV[i] }' file1 file2

# SUBSEP - Subscript separator for arrays
awk 'BEGIN { a[1,2] = "val"; SUBSEP = "-"; for (i in a) print i }'

# RSTART/RLENGTH - From match() function
awk 'BEGIN {
  str = "hello world"
  match(str, /world/)
  print "Start: " RSTART " Length: " RLENGTH
}'

Execution

awk 'BEGIN { print "Arguments: " ARGC }' file1 file2

Output

Arguments: 3

ENVIRON allows accessing environment variables
ARGC is argument count (includes program name)
ARGV[0] is awk command, ARGV[1] is first file, etc.

Operators and Expressions

Arithmetic, comparison, logical, and string operators

Operators reference

Complete set of AWK operators for arithmetic, comparison, logical, and string operations.

Code

# ARITHMETIC OPERATORS
# + (addition)      x + y
# - (subtraction)   x - y
# * (multiplication) x * y
# / (division)      x / y
# % (modulo)        x % y
# ^ (exponentiation) x ^ y

# COMPARISON OPERATORS
# <   less than
# <=  less than or equal
# >   greater than
# >=  greater than or equal
# ==  equal
# !=  not equal

# LOGICAL OPERATORS
# &&  logical AND
# ||  logical OR
# !   logical NOT

# STRING OPERATORS
# (space) concatenation: "hello" " " "world"
# ~   regex match: $0 ~ /pattern/
# !~  regex not match: $0 !~ /pattern/

# ASSIGNMENT OPERATORS
# =   assignment
# +=  add and assign
# -=  subtract and assign
# *=  multiply and assign
# /=  divide and assign
# %=  modulo and assign
# ^=  exponentiate and assign
# ++  increment
# --  decrement

Execution

echo "10 3" | awk '{ print $1 + $2, $1 - $2, $1 * $2, $1 / $2, $1 % $2, $1 ^ $2 }'

Output

13 7 30 3.33333 1 1000

String concatenation is implicit (space between values)
Comparison returns 1 (true) or 0 (false)
Regex operators ~ and !~ test field patterns

Operator examples

Practical examples of operators including compound assignment, concatenation, and ternary operator.

Code

# Arithmetic with increment/decrement
awk 'BEGIN { x = 5; print x++, ++x, x--, --x }'         # 5 7 7 5

# Compound assignment
awk 'BEGIN { x = 10; x += 5; print x }'                # 15

# String concatenation
awk '{ print $1 " is " $2 " years old" }'              # Concatenate strings

# Ternary operator
awk '{ print ($1 > 50) ? "large" : "small" }'         # Conditional value

# Precedence example
awk 'BEGIN { print 2 + 3 * 4 }'                        # 14 (not 20)
awk 'BEGIN { print (2 + 3) * 4 }'                      # 20

Execution

awk 'BEGIN { x = 5; y = 3; print (x > y) ? "x is greater" : "y is greater" }'

Output

x is greater

Ternary operator syntax: condition ? true_value : false_value
Post-increment (x++) returns old value
Pre-increment (++x) returns new value

User-Defined Variables

Creating and managing custom variables

Variable declaration and scope

Variables in AWK are created automatically and are global by default.

Code

# Variables don't need declaration - created on use
awk 'BEGIN { x = 5; y = 10; print x + y }'            # 15

# Uninitialized variables are 0 or empty string
awk 'BEGIN { print "x=" x, "y=" y }'                 # x=0 y=

# Variables are global by default
awk 'BEGIN { x = 5 } { print x, $0 } END { print x }'

# Function parameters and local variables
awk 'function add(a, b,   local1, local2) {
  local1 = a
  local2 = b
  return local1 + local2
}
BEGIN { print add(3, 4) }'  # 7

# Command-line variable assignment
awk -v var=value 'BEGIN { print var }'
awk -v name="John" '{ print name, $0 }'

Execution

awk 'BEGIN { count = 0 } { count++ } END { print "Total lines: " count }' /etc/hostname

Output

Total lines: 1

Uninitialized variables are 0 (numeric) or "" (string)
Variables created on first use
All variables are global except function parameters
-v flag passes variables to AWK program

Local and global variable conventions

Global variables persist across functions, while function parameters and extra parameters are local.

Code

# Global variables (accessible everywhere)
awk 'BEGIN { global_var = 10 }
function test() { return global_var * 2 }
BEGIN { print test() }'  # 20

# Function parameters are local
awk 'function f(x) {
  y = 100          # y is global (created inside function)
  return x + y
}
BEGIN { x = 1; y = 2; print f(5); print x, y }'  # 105, 1 100

# Local variables (extra parameters)
awk 'function f(x, y,    local1, local2) {
  local1 = 10
  local2 = 20
  return x + y + local1 + local2
}
BEGIN { print f(1, 2) }'  # 33

# Array variables
awk 'BEGIN {
  arr[1] = "one"
  arr[2] = "two"
  for (i in arr) print i, arr[i]
}'

Execution

awk 'BEGIN { total = 0; for (i = 1; i <= 3; i++) total += i; print "Sum 1-3: " total }'

Output

Sum 1-3: 6

Extra function parameters after named parameters act as local variables
Global variables accessible throughout program
Local variables should be separated by extra whitespace in function signature

Control Flow

Conditional statements, loops, and flow control

If-Else Statements

Conditional execution with if, else, and else if

If-else statements

If-else statements allow conditional execution based on boolean expressions.

Code

# Basic if statement
awk '{
  if ($1 > 50)
    print "Large: " $1
}' file

# if-else statement
awk '{
  if ($1 > 50)
    print "Large"
  else
    print "Small"
}' file

# Multiple conditions with else-if
awk '{
  if ($1 > 100)
    print "Very large"
  else if ($1 > 50)
    print "Large"
  else if ($1 > 10)
    print "Medium"
  else
    print "Small"
}' file

# Nested conditions
awk '{
  if ($1 > 50) {
    if ($2 == "active")
      print "Active and large"
    else
      print "Inactive and large"
  }
}' file

Execution

echo -e "apple 75\nbanana 30" | awk '{ if ($2 > 50) print $1 " is popular"; else print $1 " is unpopular" }'

Output

apple is popular
banana is unpopular

Condition must evaluate to true (non-zero) or false (zero)
Braces required for multiple statements in block
String comparisons use ==, !=, ~, !~

Ternary operator

Ternary operator provides compact conditional value selection.

Code

# Ternary operator: condition ? true_value : false_value
awk '{ print ($1 > 50) ? "large" : "small" }' file

# Nested ternary
awk '{
  status = ($1 > 100) ? "very large" :
           ($1 > 50)  ? "large" :
           "small"
  print $1 " is " status
}' file

# Ternary in variable assignment
awk '{
  message = (NR == 1) ? "First line" : "Not first"
  print message
}' file

# Ternary with string values
awk '{
  result = ($1 ~ /^[0-9]+$/) ? "number" : "text"
  print result
}' file

Execution

echo -e "5\n150\n75" | awk '{ print $1 " is " (($1 > 100) ? "large" : "small") }'

Output

5 is small
150 is large
75 is small

Ternary operator: condition ? value_if_true : value_if_false
Useful for simple conditional assignments
Can be nested for multiple conditions

Loops (For, While, Do-While)

Loop structures for iterating over data

For loops

For loops iterate with counter (C-style) or iterate array keys (for-in).

Code

# C-style for loop
awk 'BEGIN {
  for (i = 1; i <= 5; i++)
    print i
}'

# For loop with array
awk 'BEGIN {
  arr[1] = "apple"
  arr[2] = "banana"
  arr[3] = "cherry"
  for (i = 1; i <= 3; i++)
    print arr[i]
}'

# For-in loop (iterate array keys)
awk 'BEGIN {
  data["name"] = "John"
  data["age"] = "30"
  data["city"] = "NYC"
  for (key in data)
    print key ": " data[key]
}'

# For loop with field iteration
awk '{
  for (i = 1; i <= NF; i++)
    print "Field " i ": " $i
}' file

Execution

awk 'BEGIN { for (i=1; i<=3; i++) for (j=1; j<=3; j++) print i, j }'

Output

For-in loop iterates over array keys in arbitrary order
C-style for has init, condition, and update
Empty for (;;) creates infinite loop

While and do-while loops

While loops execute while condition is true; do-while always executes at least once.

Code

# While loop
awk 'BEGIN {
  i = 1
  while (i <= 5) {
    print i
    i++
  }
}'

# While loop reading from file
awk '{
  i = 1
  while (i <= NF) {
    print "Field " i ": " $i
    i++
  }
}' file

# Do-while loop (always runs at least once)
awk 'BEGIN {
  i = 1
  do {
    print i
    i++
  } while (i <= 3)
}'

# While loop with break
awk 'BEGIN {
  i = 1
  while (i <= 10) {
    if (i == 5) break
    print i
    i++
  }
}'

Execution

awk 'BEGIN { i=1; while (i<=3) { print "i=" i; i++ } }'

Output

i=1
i=2
i=3

While checks condition before executing
Do-while checks condition after executing
Both support break and continue statements

Flow Control (Break, Continue, Next)

Control loop and program flow with break, continue, next, and exit

Break and continue

Break exits loops immediately, continue skips to next iteration.

Code

# Break - exit loop immediately
awk 'BEGIN {
  for (i = 1; i <= 10; i++) {
    if (i == 5) break
    print i
  }
}'  # Output: 1 2 3 4

# Continue - skip to next iteration
awk 'BEGIN {
  for (i = 1; i <= 5; i++) {
    if (i == 3) continue
    print i
  }
}'  # Output: 1 2 4 5

# Break from nested loop
awk 'BEGIN {
  for (i = 1; i <= 3; i++) {
    for (j = 1; j <= 3; j++) {
      if (j == 2) break
      print i, j
    }
  }
}'

Execution

awk 'BEGIN { for(i=1; i<=5; i++) { if(i==3) continue; print i } }'

Output

Break only exits innermost loop
Continue skips rest of loop body and goes to next iteration

Next, nextfile, and exit

Next skips to next line, nextfile skips remaining lines in file, exit terminates program.

Code

# next - skip to next input line
awk '
{
  if ($1 == "skip") next
  print "Processing: " $0
}' file

# nextfile - skip to next file
awk '
{
  if ($1 == "EOF") nextfile
  print $0
}' file1 file2

# exit - terminate program
awk '
{
  if (NR > 10) exit
  print $0
}' largefile

# exit with status code
awk 'BEGIN { exit 0 }'    # Success
awk 'BEGIN { exit 1 }'    # Error

# exit runs END block
awk 'BEGIN { exit } END { print "Cleanup" }'

Execution

echo -e "a\nb\nc\nd" | awk 'NR==2 { next } { print }'

Output

a
c
d

Next applies to main pattern-action block
Nextfile skips to next file in ARGV
Exit runs END blocks before terminating

Built-in Functions

String, mathematical, and array functions

String Functions

Functions for string manipulation and processing

String length and substring

Length, substr, and index extract string information and substrings.

Code

# length(str) - string length
awk 'BEGIN { print length("hello") }'              # 5
awk '{ print NF, length($1) }' file               # Fields and field length

# substr(str, start, len) - extract substring
awk 'BEGIN { print substr("hello world", 7) }'    # world
awk 'BEGIN { print substr("hello world", 1, 5) }'  # hello
awk '{ print substr($0, 1, 10) }' file            # First 10 chars of line

# index(str, substr) - find position of substring
awk 'BEGIN { print index("hello world", "world") }'  # 7
awk '{ print index($0, "error") }'                  # Find "error" position

# Practical example: extract domain from email
awk -F'@' '{ domain = substr($2, 1, length($2)-4); print domain }' emails.txt

Execution

awk 'BEGIN { s="hello"; print "len=" length(s), "sub=" substr(s, 2, 3), "idx=" index(s, "ll") }'

Output

len=5 sub=ell idx=3

length() returns character count
substr starts at 1 (not 0)
index() returns 1-based position or 0 if not found

String replacement and formatting

Sub and gsub replace patterns; match tests for matches; toupper/tolower change case.

Code

# sub(regex, repl, target) - replace first match
awk 'BEGIN { s="hello hello"; sub(/hello/, "hi", s); print s }'  # hi hello
awk '{ sub(/old/, "new"); print }' file                          # Replace first occurrence

# gsub(regex, repl, target) - replace all matches
awk 'BEGIN { s="hello hello"; gsub(/hello/, "hi", s); print s }'  # hi hi
awk '{ gsub(/ +/, " "); print }' file                            # Normalize spaces

# match(str, regex) - test if matches and set RSTART, RLENGTH
awk 'BEGIN {
  if (match("hello world", /wor/))
    print "Found at " RSTART " length " RLENGTH
}'

# sprintf(format, ...) - formatted string
awk 'BEGIN { printf "%s=%d\n", "count", 42 }'
awk 'BEGIN { s = sprintf("%.2f", 3.14159); print s }'  # 3.14

# tolower() and toupper()
awk 'BEGIN { print tolower("HELLO") }'  # hello
awk 'BEGIN { print toupper("hello") }'  # HELLO

Execution

awk 'BEGIN { s="hello world"; gsub(/l/, "L", s); print s }'

Output

heLLo worLd

sub replaces first occurrence, gsub replaces all
If target omitted, uses $0
match sets RSTART and RLENGTH variables
sprintf formats strings without printing

Mathematical Functions

Arithmetic and trigonometric functions

Basic math functions

Math functions perform calculations on numbers.

Code

# int(x) - integer part
awk 'BEGIN { print int(3.7) }'                 # 3

# sqrt(x) - square root
awk 'BEGIN { print sqrt(16) }'                 # 4

# sin(x), cos(x), atan2(y,x) - trigonometric
awk 'BEGIN { print sin(0), cos(0) }'          # 0 1
awk 'BEGIN {
  pi = atan2(0, -1)
  print "Pi = " pi
  print "Sin(pi/2) = " sin(pi/2)
}'

# exp(x) - e^x
awk 'BEGIN { print exp(1) }'                  # 2.71828...

# log(x) - natural logarithm
awk 'BEGIN { print log(2.71828) }'            # 1

# Practical: Calculate percentage
awk 'BEGIN { print (75/100) * 100 "%"}'        # 75%

Execution

awk 'BEGIN { print "sqrt(25)=" sqrt(25), "int(3.99)=" int(3.99), "exp(1)=" exp(1) }'

Output

sqrt(25)=5 int(3.99)=3 exp(1)=2.71828

int() truncates toward zero
Trigonometric functions use radians
exp(1) is approximately e

Random numbers

Random functions generate random numbers for simulations and sampling.

Code

# rand() - random number 0 to 1
awk 'BEGIN {
  for (i = 1; i <= 3; i++)
    print rand()
}'

# Random integer 1-10
awk 'BEGIN {
  for (i = 1; i <= 5; i++)
    print int(rand() * 10) + 1
}'

# srand(seed) - seed random generator
awk 'BEGIN {
  srand(123)
  print rand()
}'

# srand() with no seed uses current time
awk 'BEGIN {
  srand()
  print rand()
}'

Execution

awk 'BEGIN { srand(42); for(i=1; i<=3; i++) print int(rand()*10) }'

Output

4
8
2

rand() returns float between 0 and 1
srand() seeds with current time if no seed given
Seeding enables reproducible random sequences

Arrays and Array Functions

Creating and manipulating arrays

Array basics

Arrays store multiple values with numeric or string indices.

Code

# Create and access array elements
awk 'BEGIN {
  arr[1] = "one"
  arr[2] = "two"
  arr[3] = "three"
  for (i = 1; i <= 3; i++)
    print arr[i]
}'

# Associative array (string keys)
awk 'BEGIN {
  person["name"] = "John"
  person["age"] = "30"
  person["city"] = "NYC"
  for (key in person)
    print key ": " person[key]
}'

# Array from file
awk '{ arr[NR] = $0 } END { for (i in arr) print i ": " arr[i] }' file

# Check if key exists
awk 'BEGIN {
  arr["key"] = "value"
  if ("key" in arr) print "Found"
  if ("missing" in arr) print "Not there"
}'

Execution

awk 'BEGIN { a[1]=10; a[2]=20; a[3]=30; for(i in a) sum+=a[i]; print "Sum=" sum }'

Output

Sum=60

Array indices can be numbers or strings
For-in loop iterates in arbitrary order
in operator tests key existence

Array operations

Delete removes elements, split creates arrays from strings, multi-dimensional arrays possible with composite keys.

Code

# Delete array element
awk 'BEGIN {
  a[1] = "one"; a[2] = "two"
  delete a[1]
  for (i in a) print i, a[i]
}'  # Output: 2 two

# Delete entire array
awk 'BEGIN {
  a[1] = 1; a[2] = 2
  delete a
  for (i in a) print i
}'  # No output

# Multi-dimensional arrays
awk 'BEGIN {
  a[1,1] = "top-left"
  a[1,2] = "top-right"
  a[2,1] = "bot-left"
  for (key in a) print key ": " a[key]
}'

# split() function creates array
awk 'BEGIN {
  count = split("a:b:c:d", arr, ":")
  for (i = 1; i <= count; i++)
    print arr[i]
}'

Execution

awk 'BEGIN { n=split("one,two,three", a, ","); for(i=1;i<=n;i++) print a[i] }'

Output

one
two
three

delete removes single element or entire array
split returns count of elements created
Multi-dimensional: a[i,j,k] key is i\034j\034k (SUBSEP)

Advanced Features

User-defined functions, file handling, and advanced techniques

User-Defined Functions

Creating and using custom functions

Defining and calling functions

Functions encapsulate logic for reuse and modularity.

Code

# Basic function definition and example
awk 'function add(a, b) {
  return a + b
}
BEGIN {
  print "5 + 3 = " add(5, 3)
}'

# Function with multiple statements
awk 'function greet(name) {
  greeting = "Hello, " name "!"
  return greeting
}
BEGIN { print greet("Alice") }'

# Function without return statement
awk 'function print_info(x) {
  print "Value: " x
}
BEGIN { print_info(42) }'

# Function with local variables (extra parameters)
awk 'function calculate(x, y,   local_result) {
  local_result = x * y + 10
  return local_result
}
BEGIN { print calculate(3, 4) }'

Execution

awk 'function double(x) { return x * 2 } BEGIN { print double(21) }'

Output

Functions must be defined before use
Parameters are passed by value
Extra parameters act as local variables
Return statement optional (returns 0/"" if omitted)

Advanced function patterns

Functions support recursion, array parameters, and complex logic patterns.

Code

# Recursive function (factorial)
awk 'function fact(n) {
  if (n <= 1) return 1
  return n * fact(n - 1)
}
BEGIN { print "5! = " fact(5) }'

# Function modifying array parameter (arrays passed by reference)
awk 'function fill_array(arr, n,   i) {
  for (i = 1; i <= n; i++)
    arr[i] = i * i
}
BEGIN {
  fill_array(data, 5)
  for (i in data) print i, data[i]
}'

# Helper functions for validation
awk 'function is_number(s) { return s ~ /^[0-9]+$/ }
function is_empty(s) { return length(s) == 0 }
BEGIN {
  print "5 is_number: " is_number("5")
  print empty is_empty: " is_empty("")
}'

Execution

awk 'function max(a, b) { return (a > b) ? a : b } BEGIN { print max(15, 23) }'

Output

Recursive functions must have base case
Arrays passed by reference (modifications persist)
Scalars passed by value (modifications local)

Advanced Input and Output

Multiple file handling, redirection, and pipes

Processing multiple files

Process multiple files with automatic file tracking.

Code

# FILENAME variable shows current file
awk '{ print FILENAME ":" NR ":" $0 }' file1 file2

# FNR is line number within each file
awk 'FNR == 1 { print "Starting " FILENAME }
{ print FILENAME ":" FNR ":" $0 }' file1 file2

# ARGIND (gawk) - current argument index
awk '{ print ARGIND, FILENAME, NR, FNR }' file1 file2

# Process specific files only
awk 'FILENAME == "file1" { print }' file1 file2

# Skip files based on pattern
awk 'FNR == 1 && FILENAME ~ /skip/ { nextfile }
{ print }' file*

Execution

echo "a" > /tmp/f1 && echo "b" > /tmp/f2 && awk '{ print FILENAME, FNR, $0 }' /tmp/f1 /tmp/f2

Output

/tmp/f1 1 a
/tmp/f2 1 b

NR tracks total lines across all files
FNR resets for each file
FILENAME shows current filename

Output redirection and pipes

Redirect output to files or pipes; getline reads additional lines.

Code

# Redirect output to file
awk '{ print > "output.txt" }' input.txt        # Overwrite
awk '{ print >> "output.txt" }' input.txt       # Append

# Redirect specific pattern to file
awk '/error/ { print > "errors.log" }
!/error/ { print > "clean.log" }' logfile

# Pipe output to command
awk '{ print | "sort" }' unsorted.txt
awk '{ print | "mail -s report user@host" }' data

# Close file or pipe
awk '{
  print > "output.txt"
  close("output.txt")  # Flush and close file
}' input

# getline - read next line
awk '{
  print "Current: " $0
  getline next_line
  print "Next: " next_line
}' file

Execution

echo -e "hello\nworld" | awk '{ print | "sort -r" }'

Output

world
hello

> creates new file, >> appends to file
| pipes output to command
close() flushes file and allows reopening
getline reads next input line into a variable

Practical Examples

Real-world use cases and data processing scenarios

Common Text Processing Tasks

Practical examples of typical AWK operations

Extracting and reformatting data

Common data extraction and transformation operations.

Code

# Extract specific columns from CSV
awk -F',' '{ print $1, $3 }' data.csv

# Convert CSV to TSV (tab-separated)
awk -F',' -v OFS='\t' '{ print $1, $2, $3 }' data.csv

# Extract IP address from log files
awk '{ print $1 }' /var/log/apache2/access.log

# Count occurrences of each word
awk '{ for(i=1; i<=NF; i++) count[$i]++ }
END { for (w in count) print w, count[w] }' file.txt

# Filter and print lines with specific pattern
awk '/^ERROR|^WARN/ { print }' system.log

# Sum column of numbers
awk '{ sum += $1 } END { print "Total: " sum }' numbers.txt

Execution

echo -e "a:1\nb:2\nc:3" | awk -F':' '{ sum+=$2 } END { print "Sum=" sum }'

Output

Sum=6

awk excels at column extraction and reformatting
Easy to change field separators for format conversion

Log processing and analysis

Processing logs to extract metrics and statistics.

Code

# Count requests by hour from Apache log
awk '{
  time = $4
  sub(/\[/, "", time)
  gsub(/:/, " ", time)
  hour = substr(time, 13, 2)
  count[hour]++
}
END {
  for (h in count) print h ":00 - " count[h] " requests"
}' /var/log/apache2/access.log

# Extract and count HTTP status codes
awk '{ print $(NF-1) }' access.log | sort | uniq -c

# Find top users by request count
awk '{ users[$1]++ }
END {
  for (u in users) print u, users[u]
}' access.log | sort -k2 -rn | head -10

# Calculate average response time
awk '{
  time = $(NF)
  sum += time
  count++
}
END {
  print "Average: " sum / count " ms"
}' response-times.log

Execution

echo -e "error\nerror\nwarn\nerror" | awk '{ count[$1]++ } END { for (t in count) print t, count[t] }'

Output

error 3
warn 1

AWK ideal for parsing structured log formats
Pattern matching finds relevant log lines
Aggregation generates summary statistics

Advanced Data Processing

Complex data transformations and analysis

Data validation and cleaning

Data cleaning operations remove duplicates, whitespace, and invalid records.

Code

# Remove duplicate lines (preserve order)
awk '!seen[$0]++' file

# Remove leading/trailing whitespace
awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }' file

# Validate email format
awk '/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/ { print }' emails.txt

# Check if all fields are numeric
awk '{ for(i=1; i<=NF; i++) if ($i !~ /^[0-9.]+$/) next; print }' data.txt

# Remove lines with specific patterns
awk '!/^#/ && !/^$/ && !/^;/ { print }' config.txt

# Normalize whitespace (multiple spaces to single)
awk '{ gsub(/ +/, " "); $1=$1; print }' file

Execution

echo -e "a\nb\na\nc\nb" | awk '!seen[$0]++'

Output

a
b
c

!seen[$0]++ is elegant duplicate removal pattern
gsub with start/end anchors trims whitespace
Email validation uses complex regex pattern

Statistical calculations and reports

Statistical operations including mean, min/max, and formatted report generation.

Code

# Calculate mean, median, and mode
awk '{
  sum += $1; count++
  arr[count] = $1
}
END {
  printf "Mean: %.2f\n", sum/count
}' numbers.txt

# Find min and max values
awk 'NR==1 { min=$1; max=$1; next }
{ if ($1 < min) min=$1; if ($1 > max) max=$1 }
END { print "Min: " min ", Max: " max }' numbers.txt

# Generate formatted report
awk 'BEGIN {
  printf "%-20s %-10s %-10s\n", "Name", "Age", "Score"
  printf "%-20s %-10s %-10s\n", "----", "---", "-----"
}
{
  total += $3; count++
  printf "%-20s %-10d %-10d\n", $1, $2, $3
}
END {
  printf "%-20s %-10s %-10.1f\n", "Average", "", total/count
}' data.txt

# Percentile calculation
awk '{
  arr[NR] = $1
  sum += $1
}
END {
  mean = sum / NR
  for (i=1; i<=NR; i++)
    dev += (arr[i] - mean)^2
  stddev = sqrt(dev / NR)
  print "Mean: " mean ", StdDev: " stddev
}' data.txt

Execution

echo -e "5\n10\n15\n20\n25" | awk '{ sum+=$1; arr[NR]=$1 } END { print "Sum=" sum " Avg=" sum/NR " Count=" NR }'

Output

Sum=75 Avg=15 Count=5

Array needed for min/max or median calculations
printf enables precise formatting for reports
Mathematical functions enable statistical analysis

AWK

Getting Started

What is AWK

Accessibility

Best Practices

Common Errors

Keywords

AWK overview and capabilities

AWK vs other text processing tools

Basic Syntax and Structure

Accessibility

Best Practices

Common Errors

Keywords

AWK program structure

Inline AWK programs and scripts

Installation and Setup

Accessibility

Best Practices

Common Errors

Keywords

Install AWK on Linux systems

Creating AWK script file

Field Processing

Field Variables and NF

Accessibility

Best Practices

Common Errors

Keywords

Accessing fields in AWK

Modifying fields

Field Separators

Accessibility

Best Practices

Common Errors

Keywords

Setting field separators

Processing CSV and complex formats

Field Manipulation and Reconstruction

Accessibility

Best Practices

Common Errors

Keywords

Rearranging and extracting fields

Computing values from fields

Patterns and Matching

Pattern Types and Matching

Accessibility

Best Practices

Common Errors

Keywords

AWK pattern types

Regex patterns and matching

BEGIN and END Blocks

Accessibility

Best Practices

Common Errors

Keywords

BEGIN and END blocks

Header and summary generation

Range and Compound Patterns

Accessibility

Best Practices

Common Errors

Keywords

Range patterns

Compound conditions

Variables and Operators

Built-in Variables

Accessibility

Best Practices

Common Errors

Keywords

Built-in variables reference

Additional built-in variables

Operators and Expressions

Accessibility

Best Practices

Common Errors

Keywords