Cheatsheets

AWK

AWK

AWK is a powerful text processing language that allows pattern scanning and data extraction. Essential for processing text files, extracting columns, and performing calculations on text data.

8 Categories 22 Sections 44 Examples
AWK Text Processing Pattern Matching Data Extraction Linux Shell Scripting

Getting Started

Introduction to AWK and basic syntax fundamentals

What is AWK

Understanding AWK and its use cases for text processing

AWK overview and capabilities

AWK is a powerful tool for text processing, data extraction, and pattern matching across files and streams.

Code
Terminal window
# AWK is a text processing language designed for pattern scanning
# and manipulation of data. Key features:
# - Process text files line by line
# - Extract and manipulate columns
# - Perform calculations on data
# - Filter records based on patterns
# - Generate reports and formatted output
# AWK stands for: Aho, Weinberger, Kernighan (creators)
# Variants: awk (original), gawk (GNU AWK), mawk (minimal AWK)
Execution
Terminal window
which awk && awk --version
Output
Terminal window
/usr/bin/awk
GNU Awk 5.1.0, API: 3.1
  • AWK reads input line by line automatically
  • Supports regular expressions for pattern matching
  • Built-in variables track lines, fields, and more
  • Can be used interactively or in shell scripts

AWK vs other text processing tools

AWK excels at processing structured data and performing transformations that combine pattern matching with data manipulation.

Code
Terminal window
# Compare AWK with other text processing tools:
# grep: Search file patterns (matches lines)
# sed: Stream editor (find and replace, transformations)
# awk: Full programming language (matching + calculations)
# AWK is best when you need:
# - Extract or reorganize columns
# - Perform calculations on text data
# - Process structured text (CSV, logs, reports)
# - Conditional logic and complex transformations
Execution
Terminal window
echo -e "name,age\nJohn,30\nJane,25" | awk -F',' '{print $1 " is " $2 " years old"}'
Output
Terminal window
name is age years old
John is 30 years old
Jane is 25 years old
  • AWK is a complete Turing-complete programming language
  • More powerful than grep or sed for complex text processing
  • Easier syntax than writing shell scripts for text processing

Basic Syntax and Structure

Understanding AWK program structure and execution flow

AWK program structure

Shows the three parts of an AWK program - BEGIN block (runs once before input), main block (runs for each line), and END block (runs once after all input).

Code
Terminal window
# Basic AWK syntax structure:
# awk 'pattern { action }' input-file
# Program structure with BEGIN, pattern/action, and END:
awk '
BEGIN {
# Initialization, runs before processing input
print "Starting processing..."
}
pattern {
# Main processing, runs for each line
# pattern can be regex, expression, or range
}
END {
# Finalization, runs after all input processed
print "Processing complete"
}
' input-file
Execution
Terminal window
echo -e "apple\nbanana\ncherry" | awk 'BEGIN { print "Fruits:" } { print "- " $0 } END { print "Done" }'
Output
Terminal window
Fruits:
- apple
- banana
- cherry
Done
  • BEGIN and END blocks are optional
  • Multiple patterns can match the same line
  • Pattern can be omitted (all lines match)
  • Action can be omitted (default is print)

Inline AWK programs and scripts

Different ways to invoke AWK - inline, from file, with arguments, and from pipes.

Code
Terminal window
# Method 1: Inline program with single quotes
awk '{ print NR, $0 }' file.txt
# Method 2: Program from file
awk -f script.awk file.txt
# Method 3: Multiple input files
awk '{ print FILENAME, NR, $0 }' file1.txt file2.txt
# Method 4: Program with variables
awk -v var=value '{ print var, $0 }' file.txt
# Method 5: Pipe input directly
echo "data" | awk '{ print toupper($0) }'
Execution
Terminal window
echo -e "line1\nline2" | awk '{ print NR, $0 }'
Output
Terminal window
1 line1
2 line2
  • Single quotes protect AWK syntax from shell interpretation
  • -f flag reads program from file
  • -v flag passes variables to AWK
  • AWK can read from multiple files sequentially

Installation and Setup

Installing AWK and verifying functionality

Install AWK on Linux systems

Installation of GNU AWK (gawk) on various Linux distributions and macOS.

Code
Terminal window
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y gawk
# CentOS/RHEL
sudo yum install -y gawk
# macOS
brew install gawk
# Verify installation
awk --version
Execution
Terminal window
gawk --version | head -1
Output
Terminal window
GNU Awk 5.1.0, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
  • Most systems have awk or gawk installed by default
  • gawk (GNU AWK) is most feature-complete implementation
  • mawk is faster but supports fewer features
  • Compatibility: gawk includes all mawk features

Creating AWK script file

Creating and using AWK script files for more complex programs.

Code
Terminal window
# Create a simple AWK script file
cat > process.awk << 'EOF'
BEGIN {
print "Processing file..."
}
{
total += $1
count++
}
END {
print "Total: " total
print "Average: " (count > 0 ? total / count : 0)
}
EOF
# Run the script
awk -f process.awk numbers.txt
# Make script executable (requires shebang)
echo '#!/usr/bin/awk -f' | cat - process.awk > /tmp/process
chmod +x /tmp/process
./process numbers.txt
Execution
Terminal window
cat process.awk
Output
Terminal window
BEGIN {
print "Processing file..."
}
{
total += $1
count++
}
END {
print "Total: " total
print "Average: " (count > 0 ? total / count : 0)
}
  • Use -f flag to load program from file
  • Script files can be made executable with proper shebang
  • Shebang line: #!/usr/bin/awk -f
  • Scripts are useful for complex logic and reusability

Field Processing

Working with fields, separators, and field manipulation

Field Variables and NF

Understanding field access and manipulation with field variables

Accessing fields in AWK

Field variables allow accessing individual fields by number, with $NF accessing the last field dynamically.

Code
Terminal window
# Field variables reference:
# $0 = entire line
# $1 = first field
# $2 = second field
# ... $NF = last field
# NF = total number of fields in the line
# Examples:
echo "John Doe 30 Engineer" | awk '{ print $1 }' # John
echo "John Doe 30 Engineer" | awk '{ print $NF }' # Engineer
echo "John Doe 30 Engineer" | awk '{ print $(NF-1) }' # 30
echo "John Doe 30 Engineer" | awk '{ print NF }' # 4
Execution
Terminal window
echo "apple banana cherry" | awk '{ print "Last field:", $NF; print "Field count:", NF }'
Output
Terminal window
Last field: cherry
Field count: 3
  • Fields are separated by whitespace by default (FS)
  • $0 is the entire line
  • Use $(NF-1) for second-to-last field, $(NF-2) for third-to-last, etc.
  • NF changes if you modify fields

Modifying fields

Modifying, adding, and removing fields by direct assignment or changing NF value.

Code
Terminal window
# Change field values
echo "apple 5 basket" | awk '{ $2 = 10; print }' # apple 10 basket
# Add fields
echo "apple basket" | awk '{ $(NF+1) = "10"; print }' # apple basket 10
# Reconstruct line with modified fields
echo "a b c" | awk '{ $2 = "B"; print $0 }' # a B c
# Remove fields (truncate)
echo "a b c d e" | awk '{ NF = 3; print }' # a b c
Execution
Terminal window
echo "test 100 active" | awk '{ $2 = 200; NF = 2; print }'
Output
Terminal window
test 200
  • Modifying a field updates $0 automatically
  • Increasing NF adds empty fields
  • Decreasing NF removes fields from the end
  • Output uses OFS to join modified fields

Field Separators

Setting and using different field separator patterns

Setting field separators

Custom field separators allow parsing different delimited formats like CSV, TSV, and colon-separated data.

Code
Terminal window
# FS (Field Separator) - what separates fields in input
# OFS (Output Field Separator) - what separates fields in output
# Default FS is whitespace. Set custom FS with -F flag:
echo "apple,banana,cherry" | awk -F',' '{ print $2 }' # banana
# Or use BEGIN to set FS
echo "apple:banana:cherry" | awk 'BEGIN { FS=":" } { print $2 }' # banana
# Set both FS and OFS
echo "apple-banana-cherry" | awk -F'-' -v OFS=',' '{ print $1, $2, $3 }'
# apple,banana,cherry
# Use regex as FS (multiple delimiters)
echo "apple,banana;cherry" | awk -F'[,;]' '{ print $2 }' # banana
Execution
Terminal window
echo "name:age:city" | awk -F':' '{ print "Name=" $1 ", Age=" $2 }'
Output
Terminal window
Name=name, Age=age
  • Default FS matches one or more whitespace characters
  • FS can be a single character or regex pattern
  • -F flag overrides default FS
  • BEGIN block can set FS for variable input

Processing CSV and complex formats

Different separators for input parsing and output formatting with FS and OFS.

Code
Terminal window
# Process CSV with quoted fields
echo 'name,age,city
"Smith, John",30,"New York"
"Doe, Jane",25,"Los Angeles"' | awk -F',' '
NR > 1 {
gsub(/"/, "", $1) # Remove quotes
gsub(/ +$/, "", $1) # Remove trailing spaces
print $1 " is " $2 " years old"
}'
# Tab-separated values
echo -e "col1\tcol2\tcol3" | awk -F'\t' '{ print $2 }'
# Multi-character delimiter
echo "apple::banana::cherry" | awk -F'::' '{ print $1 }'
Execution
Terminal window
echo "a:b:c" | awk -F':' -v OFS='-' '{ print $1, $2, $3 }'
Output
Terminal window
a-b-c
  • -v OFS flags sets output field separator
  • FS regex can handle multiple delimiter types
  • gsub useful for cleaning CSV quote characters
  • Use -F'\t' for tab-delimited files

Field Manipulation and Reconstruction

Transform and reorganize field data

Rearranging and extracting fields

Extracting and rearranging fields to create custom output formats.

Code
Terminal window
# Extract and rearrange columns
echo "John Doe john@example.com 30" | awk '{ print $3, $2, $1 }'
# john@example.com Doe John
# Extract specific fields
echo "apple 100 USD red" | awk '{ print "Item: " $1 ", Price: " $2 " " $4 }'
# Item: apple, Price: 100 red
# Concatenate fields
echo "John Doe" | awk '{ full_name = $1 " " $2; print "Hello, " full_name }'
# Hello, John Doe
# Extract substring from field
echo "example@domain.com" | awk -F'@' '{ print "User: " $1 }'
# User: example
Execution
Terminal window
echo "2025-02-28" | awk -F'-' '{ print "Year: " $1 ", Month: " $2 ", Day: " $3 }'
Output
Terminal window
Year: 2025, Month: 02, Day: 28
  • Fields can be rearranged in any order
  • Easy to combine fields into new strings
  • Using FS and field numbers simplifies parsing

Computing values from fields

Performing calculations and transformations on field values.

Code
Terminal window
# Sum and average fields
echo -e "100\n200\n300" | awk '{ sum += $1 } END { print "Total: " sum ", Avg: " sum/NR }'
# Total: 600, Avg: 200
# Product of fields
echo "5 4 3" | awk '{ print "Product: " $1 * $2 * $3 }'
# Product: 60
# Calculate percentage
echo "75 100" | awk '{ print ($1 / $2 * 100) "%" }'
# 75%
# String operations on fields
echo "javascript PYTHON bash" | awk '{ print toupper($1), tolower($2), tolower($3) }'
# JAVASCRIPT python bash
Execution
Terminal window
echo "10 20" | awk '{ print "Sum: " $1 + $2 ", Diff: " $1 - $2 ", Prod: " $1 * $2 }'
Output
Terminal window
Sum: 30, Diff: -10, Prod: 200
  • AWK automatically converts fields to numbers when needed
  • Can perform arithmetic directly on fields
  • String functions can transform field values

Patterns and Matching

Pattern matching and regular expressions in AWK

Pattern Types and Matching

Different pattern types for matching records

AWK pattern types

Different pattern types allow flexible filtering and matching of records.

Code
Terminal window
# Pattern types in AWK:
# 1. Regex patterns
awk '/pattern/ { action }' file # Match lines with pattern
awk '!/pattern/ { action }' file # Match lines without pattern
# 2. Expression patterns (boolean)
awk '$1 > 100 { action }' file # Field comparison
awk 'NR >= 2 && NR <= 5' file # Line number range
awk '$1 == "active" { action }' file # Exact match
# 3. BEGIN/END patterns
awk 'BEGIN { action }' # Before processing input
awk 'END { action }' # After processing input
# 4. Range patterns
awk '/start/,/end/ { action }' file # From start to end pattern
# 5. No pattern
awk '{ action }' file # All lines match
Execution
Terminal window
echo -e "apple 50\nbanana 120\ncherry 75" | awk '$2 > 100 { print }'
Output
Terminal window
banana 120
  • Patterns determine which lines trigger the action block
  • Multiple patterns can target different lines
  • Patterns are optional (all lines match by default)
  • Expression patterns use standard operators

Regex patterns and matching

Regex patterns filter lines based on pattern matching and operator ~.

Code
Terminal window
# Regex pattern matching
awk '/linux/' file # Contains "linux"
awk '/^linux/' file # Starts with "linux"
awk '/linux$/' file # Ends with "linux"
awk '/[0-9]+/' file # Contains digits
awk '!/error/' file # Does not contain "error"
# Regex with field matching (~)
awk '$1 ~ /john/' file # First field matches regex
awk '$2 !~ /test/' file # Second field doesn't match
# Case-insensitive matching
awk 'tolower($0) ~ /error/' file # Case-insensitive search
Execution
Terminal window
echo -e "error: issue\nwarning: notice\nerror: problem" | awk '/error/'
Output
Terminal window
error: issue
error: problem
  • ~ operator checks if field matches regex
  • !~ operator checks if field doesn't match
  • Use tolower() for case-insensitive matching
  • ^ and $ anchor patterns to start and end

BEGIN and END Blocks

Initialization and finalization with BEGIN and END

BEGIN and END blocks

BEGIN runs before processing input and END runs after, useful for setup, cleanup, and reporting.

Code
Terminal window
# BEGIN block - runs once before processing input
awk 'BEGIN {
print "Processing started..."
count = 0
}
{
count++
}
END {
print "Total lines: " count
}' file
# Example: Sum numbers and show statistics
awk 'BEGIN { sum = 0; count = 0 }
NR > 1 { sum += $1; count++ }
END {
avg = (count > 0) ? sum / count : 0
print "Sum: " sum "\nAverage: " avg
}' data.txt
Execution
Terminal window
echo -e "10\n20\n30" | awk 'BEGIN { print "Numbers:" } { print "- " $0 } END { print "Done" }'
Output
Terminal window
Numbers:
- 10
- 20
- 30
Done
  • BEGIN block runs even if no input is provided
  • END block runs even if no lines match patterns
  • Variables initialized in BEGIN persist through all input

Header and summary generation

Using BEGIN and END to create formatted reports with headers, footers, and summary statistics.

Code
Terminal window
# Generate report with header and footer
awk 'BEGIN {
printf "%-15s %-10s %-10s\n", "Name", "Age", "City"
printf "%-15s %-10s %-10s\n", "----", "---", "----"
}
{
printf "%-15s %-10s %-10s\n", $1, $2, $3
}
END {
print ""
print "Total records: " NR
}' people.txt
# Calculate running statistics
awk 'BEGIN {
header = "Line\tValue\tRunSum"
print header
}
{
sum += $1
printf "%d\t%d\t%d\n", NR, $1, sum
}
END {
print "Final sum: " sum
}' numbers.txt
Execution
Terminal window
echo -e "5\n10\n15" | awk 'BEGIN { print "Input values:" } { sum += $1; print NR, $1 } END { print "Total: " sum }'
Output
Terminal window
Input values:
1 5
2 10
3 15
Total: 30
  • printf in BEGIN/END creates formatted output
  • Variables maintain state across BEGIN, main, and END
  • Good pattern for generating summaries and reports

Range and Compound Patterns

Using range patterns and combining conditions

Range patterns

Range patterns select a block of lines from first matching pattern to second matching pattern.

Code
Terminal window
# Range patterns match from first pattern to second pattern
# Syntax: /start/,/end/ { action }
# Print lines between BEGIN and END markers
awk '/BEGIN/,/END/' file
# Process sections of a file
awk '/^Chapter 1/,/^Chapter 2/' book.txt
# Extract code blocks between markers
awk '/```/,/```/' doc.md | head -20
# Line number ranges
awk 'NR==10, NR==20 { print NR, $0 }' file
Execution
Terminal window
echo -e "start\ndata1\ndata2\nend\nignore" | awk '/start/,/end/ { print }'
Output
Terminal window
start
data1
data2
end
  • Range state toggles when start pattern matches
  • Range stays true until end pattern matches
  • Both start and end lines are included

Compound conditions

Compound conditions combine multiple patterns with logical operators for complex filtering.

Code
Terminal window
# Logical AND (&&) - both conditions must be true
awk '$1 == "error" && $2 > 100 { print }' logs.txt
# Logical OR (||) - either condition can be true
awk '$1 == "error" || $1 == "warning" { print }' logs.txt
# Negation (!) - condition must be false
awk '!($1 == "info") { print }' logs.txt
# Complex expressions
awk '($1 ~ /^ERR/ || $1 ~ /^FAIL/) && NR > 100 { print }' logs.txt
# Multiple conditions on fields
awk '$2 >= 500 && $2 <= 1000 && $3 == "active"' data.txt
Execution
Terminal window
echo -e "error 150\nerror 50\nwarning 200" | awk '$1 == "error" && $2 > 100 { print }'
Output
Terminal window
error 150
  • && has higher precedence than ||
  • Use parentheses for clarity in complex expressions
  • Patterns are evaluated left to right

Variables and Operators

Built-in variables, operators, and expressions

Built-in Variables

Understanding AWK's built-in variables

Built-in variables reference

Built-in variables track line numbers, field counts, separators, and filenames.

Code
Terminal window
# NR - Total record number (line number across all files)
awk '{ print NR, $0 }' file
# FNR - File record number (line number within current file)
awk '{ print FNR, FILENAME, $0 }' file1 file2
# NF - Number of fields in current record
awk '{ print "Fields: " NF, "Last field: " $NF }' file
# FS - Field separator (input)
awk 'BEGIN { FS=":" } { print $1 }'
# OFS - Output field separator
awk -v OFS="-" '{ print $1, $2, $3 }' file
# RS - Record separator (usually newline)
awk 'BEGIN { RS=";" } { print NR, $0 }'
# ORS - Output record separator
awk 'BEGIN { ORS=";" } { print $0 }'
# FILENAME - Current filename being processed
awk '{ print FILENAME ":" NR ":" $0 }' file1 file2
# ARGC/ARGV - Command line argument count and values
awk 'BEGIN { print "Args: " ARGC; for (i=0; i<ARGC; i++) print ARGV[i] }'
Execution
Terminal window
echo -e "a b c\nd e f g" | awk '{ print "NR=" NR " NF=" NF " Last=" $NF }'
Output
Terminal window
NR=1 NF=3 Last=c
NR=2 NF=4 Last=g
  • NR increments with each line across all files
  • FNR resets for each new file
  • NF changes if you modify fields
  • FILENAME shows which file is being processed

Additional built-in variables

Additional built-in variables for environment access, arguments, and match results.

Code
Terminal window
# ENVIRON - Environment variables
awk 'BEGIN { print "User: " ENVIRON["USER"] }'
awk 'BEGIN { print "Home: " ENVIRON["HOME"] }'
# ARGC/ARGV - Arguments
awk 'BEGIN { print ARGC; for(i in ARGV) print i, ARGV[i] }' file1 file2
# SUBSEP - Subscript separator for arrays
awk 'BEGIN { a[1,2] = "val"; SUBSEP = "-"; for (i in a) print i }'
# RSTART/RLENGTH - From match() function
awk 'BEGIN {
str = "hello world"
match(str, /world/)
print "Start: " RSTART " Length: " RLENGTH
}'
Execution
Terminal window
awk 'BEGIN { print "Arguments: " ARGC }' file1 file2
Output
Terminal window
Arguments: 3
  • ENVIRON allows accessing environment variables
  • ARGC is argument count (includes program name)
  • ARGV[0] is awk command, ARGV[1] is first file, etc.

Operators and Expressions

Arithmetic, comparison, logical, and string operators

Operators reference

Complete set of AWK operators for arithmetic, comparison, logical, and string operations.

Code
Terminal window
# ARITHMETIC OPERATORS
# + (addition) x + y
# - (subtraction) x - y
# * (multiplication) x * y
# / (division) x / y
# % (modulo) x % y
# ^ (exponentiation) x ^ y
# COMPARISON OPERATORS
# < less than
# <= less than or equal
# > greater than
# >= greater than or equal
# == equal
# != not equal
# LOGICAL OPERATORS
# && logical AND
# || logical OR
# ! logical NOT
# STRING OPERATORS
# (space) concatenation: "hello" " " "world"
# ~ regex match: $0 ~ /pattern/
# !~ regex not match: $0 !~ /pattern/
# ASSIGNMENT OPERATORS
# = assignment
# += add and assign
# -= subtract and assign
# *= multiply and assign
# /= divide and assign
# %= modulo and assign
# ^= exponentiate and assign
# ++ increment
# -- decrement
Execution
Terminal window
echo "10 3" | awk '{ print $1 + $2, $1 - $2, $1 * $2, $1 / $2, $1 % $2, $1 ^ $2 }'
Output
Terminal window
13 7 30 3.33333 1 1000
  • String concatenation is implicit (space between values)
  • Comparison returns 1 (true) or 0 (false)
  • Regex operators ~ and !~ test field patterns

Operator examples

Practical examples of operators including compound assignment, concatenation, and ternary operator.

Code
Terminal window
# Arithmetic with increment/decrement
awk 'BEGIN { x = 5; print x++, ++x, x--, --x }' # 5 7 7 5
# Compound assignment
awk 'BEGIN { x = 10; x += 5; print x }' # 15
# String concatenation
awk '{ print $1 " is " $2 " years old" }' # Concatenate strings
# Ternary operator
awk '{ print ($1 > 50) ? "large" : "small" }' # Conditional value
# Precedence example
awk 'BEGIN { print 2 + 3 * 4 }' # 14 (not 20)
awk 'BEGIN { print (2 + 3) * 4 }' # 20
Execution
Terminal window
awk 'BEGIN { x = 5; y = 3; print (x > y) ? "x is greater" : "y is greater" }'
Output
Terminal window
x is greater
  • Ternary operator syntax: condition ? true_value : false_value
  • Post-increment (x++) returns old value
  • Pre-increment (++x) returns new value

User-Defined Variables

Creating and managing custom variables

Variable declaration and scope

Variables in AWK are created automatically and are global by default.

Code
Terminal window
# Variables don't need declaration - created on use
awk 'BEGIN { x = 5; y = 10; print x + y }' # 15
# Uninitialized variables are 0 or empty string
awk 'BEGIN { print "x=" x, "y=" y }' # x=0 y=
# Variables are global by default
awk 'BEGIN { x = 5 } { print x, $0 } END { print x }'
# Function parameters and local variables
awk 'function add(a, b, local1, local2) {
local1 = a
local2 = b
return local1 + local2
}
BEGIN { print add(3, 4) }' # 7
# Command-line variable assignment
awk -v var=value 'BEGIN { print var }'
awk -v name="John" '{ print name, $0 }'
Execution
Terminal window
awk 'BEGIN { count = 0 } { count++ } END { print "Total lines: " count }' /etc/hostname
Output
Terminal window
Total lines: 1
  • Uninitialized variables are 0 (numeric) or "" (string)
  • Variables created on first use
  • All variables are global except function parameters
  • -v flag passes variables to AWK program

Local and global variable conventions

Global variables persist across functions, while function parameters and extra parameters are local.

Code
Terminal window
# Global variables (accessible everywhere)
awk 'BEGIN { global_var = 10 }
function test() { return global_var * 2 }
BEGIN { print test() }' # 20
# Function parameters are local
awk 'function f(x) {
y = 100 # y is global (created inside function)
return x + y
}
BEGIN { x = 1; y = 2; print f(5); print x, y }' # 105, 1 100
# Local variables (extra parameters)
awk 'function f(x, y, local1, local2) {
local1 = 10
local2 = 20
return x + y + local1 + local2
}
BEGIN { print f(1, 2) }' # 33
# Array variables
awk 'BEGIN {
arr[1] = "one"
arr[2] = "two"
for (i in arr) print i, arr[i]
}'
Execution
Terminal window
awk 'BEGIN { total = 0; for (i = 1; i <= 3; i++) total += i; print "Sum 1-3: " total }'
Output
Terminal window
Sum 1-3: 6
  • Extra function parameters after named parameters act as local variables
  • Global variables accessible throughout program
  • Local variables should be separated by extra whitespace in function signature

Control Flow

Conditional statements, loops, and flow control

If-Else Statements

Conditional execution with if, else, and else if

If-else statements

If-else statements allow conditional execution based on boolean expressions.

Code
Terminal window
# Basic if statement
awk '{
if ($1 > 50)
print "Large: " $1
}' file
# if-else statement
awk '{
if ($1 > 50)
print "Large"
else
print "Small"
}' file
# Multiple conditions with else-if
awk '{
if ($1 > 100)
print "Very large"
else if ($1 > 50)
print "Large"
else if ($1 > 10)
print "Medium"
else
print "Small"
}' file
# Nested conditions
awk '{
if ($1 > 50) {
if ($2 == "active")
print "Active and large"
else
print "Inactive and large"
}
}' file
Execution
Terminal window
echo -e "apple 75\nbanana 30" | awk '{ if ($2 > 50) print $1 " is popular"; else print $1 " is unpopular" }'
Output
Terminal window
apple is popular
banana is unpopular
  • Condition must evaluate to true (non-zero) or false (zero)
  • Braces required for multiple statements in block
  • String comparisons use ==, !=, ~, !~

Ternary operator

Ternary operator provides compact conditional value selection.

Code
Terminal window
# Ternary operator: condition ? true_value : false_value
awk '{ print ($1 > 50) ? "large" : "small" }' file
# Nested ternary
awk '{
status = ($1 > 100) ? "very large" :
($1 > 50) ? "large" :
"small"
print $1 " is " status
}' file
# Ternary in variable assignment
awk '{
message = (NR == 1) ? "First line" : "Not first"
print message
}' file
# Ternary with string values
awk '{
result = ($1 ~ /^[0-9]+$/) ? "number" : "text"
print result
}' file
Execution
Terminal window
echo -e "5\n150\n75" | awk '{ print $1 " is " (($1 > 100) ? "large" : "small") }'
Output
Terminal window
5 is small
150 is large
75 is small
  • Ternary operator: condition ? value_if_true : value_if_false
  • Useful for simple conditional assignments
  • Can be nested for multiple conditions

Loops (For, While, Do-While)

Loop structures for iterating over data

For loops

For loops iterate with counter (C-style) or iterate array keys (for-in).

Code
Terminal window
# C-style for loop
awk 'BEGIN {
for (i = 1; i <= 5; i++)
print i
}'
# For loop with array
awk 'BEGIN {
arr[1] = "apple"
arr[2] = "banana"
arr[3] = "cherry"
for (i = 1; i <= 3; i++)
print arr[i]
}'
# For-in loop (iterate array keys)
awk 'BEGIN {
data["name"] = "John"
data["age"] = "30"
data["city"] = "NYC"
for (key in data)
print key ": " data[key]
}'
# For loop with field iteration
awk '{
for (i = 1; i <= NF; i++)
print "Field " i ": " $i
}' file
Execution
Terminal window
awk 'BEGIN { for (i=1; i<=3; i++) for (j=1; j<=3; j++) print i, j }'
Output
Terminal window
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
  • For-in loop iterates over array keys in arbitrary order
  • C-style for has init, condition, and update
  • Empty for (;;) creates infinite loop

While and do-while loops

While loops execute while condition is true; do-while always executes at least once.

Code
Terminal window
# While loop
awk 'BEGIN {
i = 1
while (i <= 5) {
print i
i++
}
}'
# While loop reading from file
awk '{
i = 1
while (i <= NF) {
print "Field " i ": " $i
i++
}
}' file
# Do-while loop (always runs at least once)
awk 'BEGIN {
i = 1
do {
print i
i++
} while (i <= 3)
}'
# While loop with break
awk 'BEGIN {
i = 1
while (i <= 10) {
if (i == 5) break
print i
i++
}
}'
Execution
Terminal window
awk 'BEGIN { i=1; while (i<=3) { print "i=" i; i++ } }'
Output
Terminal window
i=1
i=2
i=3
  • While checks condition before executing
  • Do-while checks condition after executing
  • Both support break and continue statements

Flow Control (Break, Continue, Next)

Control loop and program flow with break, continue, next, and exit

Break and continue

Break exits loops immediately, continue skips to next iteration.

Code
Terminal window
# Break - exit loop immediately
awk 'BEGIN {
for (i = 1; i <= 10; i++) {
if (i == 5) break
print i
}
}' # Output: 1 2 3 4
# Continue - skip to next iteration
awk 'BEGIN {
for (i = 1; i <= 5; i++) {
if (i == 3) continue
print i
}
}' # Output: 1 2 4 5
# Break from nested loop
awk 'BEGIN {
for (i = 1; i <= 3; i++) {
for (j = 1; j <= 3; j++) {
if (j == 2) break
print i, j
}
}
}'
Execution
Terminal window
awk 'BEGIN { for(i=1; i<=5; i++) { if(i==3) continue; print i } }'
Output
Terminal window
1
2
4
5
  • Break only exits innermost loop
  • Continue skips rest of loop body and goes to next iteration

Next, nextfile, and exit

Next skips to next line, nextfile skips remaining lines in file, exit terminates program.

Code
Terminal window
# next - skip to next input line
awk '
{
if ($1 == "skip") next
print "Processing: " $0
}' file
# nextfile - skip to next file
awk '
{
if ($1 == "EOF") nextfile
print $0
}' file1 file2
# exit - terminate program
awk '
{
if (NR > 10) exit
print $0
}' largefile
# exit with status code
awk 'BEGIN { exit 0 }' # Success
awk 'BEGIN { exit 1 }' # Error
# exit runs END block
awk 'BEGIN { exit } END { print "Cleanup" }'
Execution
Terminal window
echo -e "a\nb\nc\nd" | awk 'NR==2 { next } { print }'
Output
Terminal window
a
c
d
  • Next applies to main pattern-action block
  • Nextfile skips to next file in ARGV
  • Exit runs END blocks before terminating

Built-in Functions

String, mathematical, and array functions

String Functions

Functions for string manipulation and processing

String length and substring

Length, substr, and index extract string information and substrings.

Code
Terminal window
# length(str) - string length
awk 'BEGIN { print length("hello") }' # 5
awk '{ print NF, length($1) }' file # Fields and field length
# substr(str, start, len) - extract substring
awk 'BEGIN { print substr("hello world", 7) }' # world
awk 'BEGIN { print substr("hello world", 1, 5) }' # hello
awk '{ print substr($0, 1, 10) }' file # First 10 chars of line
# index(str, substr) - find position of substring
awk 'BEGIN { print index("hello world", "world") }' # 7
awk '{ print index($0, "error") }' # Find "error" position
# Practical example: extract domain from email
awk -F'@' '{ domain = substr($2, 1, length($2)-4); print domain }' emails.txt
Execution
Terminal window
awk 'BEGIN { s="hello"; print "len=" length(s), "sub=" substr(s, 2, 3), "idx=" index(s, "ll") }'
Output
Terminal window
len=5 sub=ell idx=3
  • length() returns character count
  • substr starts at 1 (not 0)
  • index() returns 1-based position or 0 if not found

String replacement and formatting

Sub and gsub replace patterns; match tests for matches; toupper/tolower change case.

Code
Terminal window
# sub(regex, repl, target) - replace first match
awk 'BEGIN { s="hello hello"; sub(/hello/, "hi", s); print s }' # hi hello
awk '{ sub(/old/, "new"); print }' file # Replace first occurrence
# gsub(regex, repl, target) - replace all matches
awk 'BEGIN { s="hello hello"; gsub(/hello/, "hi", s); print s }' # hi hi
awk '{ gsub(/ +/, " "); print }' file # Normalize spaces
# match(str, regex) - test if matches and set RSTART, RLENGTH
awk 'BEGIN {
if (match("hello world", /wor/))
print "Found at " RSTART " length " RLENGTH
}'
# sprintf(format, ...) - formatted string
awk 'BEGIN { printf "%s=%d\n", "count", 42 }'
awk 'BEGIN { s = sprintf("%.2f", 3.14159); print s }' # 3.14
# tolower() and toupper()
awk 'BEGIN { print tolower("HELLO") }' # hello
awk 'BEGIN { print toupper("hello") }' # HELLO
Execution
Terminal window
awk 'BEGIN { s="hello world"; gsub(/l/, "L", s); print s }'
Output
Terminal window
heLLo worLd
  • sub replaces first occurrence, gsub replaces all
  • If target omitted, uses $0
  • match sets RSTART and RLENGTH variables
  • sprintf formats strings without printing

Mathematical Functions

Arithmetic and trigonometric functions

Basic math functions

Math functions perform calculations on numbers.

Code
Terminal window
# int(x) - integer part
awk 'BEGIN { print int(3.7) }' # 3
# sqrt(x) - square root
awk 'BEGIN { print sqrt(16) }' # 4
# sin(x), cos(x), atan2(y,x) - trigonometric
awk 'BEGIN { print sin(0), cos(0) }' # 0 1
awk 'BEGIN {
pi = atan2(0, -1)
print "Pi = " pi
print "Sin(pi/2) = " sin(pi/2)
}'
# exp(x) - e^x
awk 'BEGIN { print exp(1) }' # 2.71828...
# log(x) - natural logarithm
awk 'BEGIN { print log(2.71828) }' # 1
# Practical: Calculate percentage
awk 'BEGIN { print (75/100) * 100 "%"}' # 75%
Execution
Terminal window
awk 'BEGIN { print "sqrt(25)=" sqrt(25), "int(3.99)=" int(3.99), "exp(1)=" exp(1) }'
Output
Terminal window
sqrt(25)=5 int(3.99)=3 exp(1)=2.71828
  • int() truncates toward zero
  • Trigonometric functions use radians
  • exp(1) is approximately e

Random numbers

Random functions generate random numbers for simulations and sampling.

Code
Terminal window
# rand() - random number 0 to 1
awk 'BEGIN {
for (i = 1; i <= 3; i++)
print rand()
}'
# Random integer 1-10
awk 'BEGIN {
for (i = 1; i <= 5; i++)
print int(rand() * 10) + 1
}'
# srand(seed) - seed random generator
awk 'BEGIN {
srand(123)
print rand()
}'
# srand() with no seed uses current time
awk 'BEGIN {
srand()
print rand()
}'
Execution
Terminal window
awk 'BEGIN { srand(42); for(i=1; i<=3; i++) print int(rand()*10) }'
Output
Terminal window
4
8
2
  • rand() returns float between 0 and 1
  • srand() seeds with current time if no seed given
  • Seeding enables reproducible random sequences

Arrays and Array Functions

Creating and manipulating arrays

Array basics

Arrays store multiple values with numeric or string indices.

Code
Terminal window
# Create and access array elements
awk 'BEGIN {
arr[1] = "one"
arr[2] = "two"
arr[3] = "three"
for (i = 1; i <= 3; i++)
print arr[i]
}'
# Associative array (string keys)
awk 'BEGIN {
person["name"] = "John"
person["age"] = "30"
person["city"] = "NYC"
for (key in person)
print key ": " person[key]
}'
# Array from file
awk '{ arr[NR] = $0 } END { for (i in arr) print i ": " arr[i] }' file
# Check if key exists
awk 'BEGIN {
arr["key"] = "value"
if ("key" in arr) print "Found"
if ("missing" in arr) print "Not there"
}'
Execution
Terminal window
awk 'BEGIN { a[1]=10; a[2]=20; a[3]=30; for(i in a) sum+=a[i]; print "Sum=" sum }'
Output
Terminal window
Sum=60
  • Array indices can be numbers or strings
  • For-in loop iterates in arbitrary order
  • in operator tests key existence

Array operations

Delete removes elements, split creates arrays from strings, multi-dimensional arrays possible with composite keys.

Code
Terminal window
# Delete array element
awk 'BEGIN {
a[1] = "one"; a[2] = "two"
delete a[1]
for (i in a) print i, a[i]
}' # Output: 2 two
# Delete entire array
awk 'BEGIN {
a[1] = 1; a[2] = 2
delete a
for (i in a) print i
}' # No output
# Multi-dimensional arrays
awk 'BEGIN {
a[1,1] = "top-left"
a[1,2] = "top-right"
a[2,1] = "bot-left"
for (key in a) print key ": " a[key]
}'
# split() function creates array
awk 'BEGIN {
count = split("a:b:c:d", arr, ":")
for (i = 1; i <= count; i++)
print arr[i]
}'
Execution
Terminal window
awk 'BEGIN { n=split("one,two,three", a, ","); for(i=1;i<=n;i++) print a[i] }'
Output
Terminal window
one
two
three
  • delete removes single element or entire array
  • split returns count of elements created
  • Multi-dimensional: a[i,j,k] key is i\034j\034k (SUBSEP)

Advanced Features

User-defined functions, file handling, and advanced techniques

User-Defined Functions

Creating and using custom functions

Defining and calling functions

Functions encapsulate logic for reuse and modularity.

Code
Terminal window
# Basic function definition and example
awk 'function add(a, b) {
return a + b
}
BEGIN {
print "5 + 3 = " add(5, 3)
}'
# Function with multiple statements
awk 'function greet(name) {
greeting = "Hello, " name "!"
return greeting
}
BEGIN { print greet("Alice") }'
# Function without return statement
awk 'function print_info(x) {
print "Value: " x
}
BEGIN { print_info(42) }'
# Function with local variables (extra parameters)
awk 'function calculate(x, y, local_result) {
local_result = x * y + 10
return local_result
}
BEGIN { print calculate(3, 4) }'
Execution
Terminal window
awk 'function double(x) { return x * 2 } BEGIN { print double(21) }'
Output
Terminal window
42
  • Functions must be defined before use
  • Parameters are passed by value
  • Extra parameters act as local variables
  • Return statement optional (returns 0/"" if omitted)

Advanced function patterns

Functions support recursion, array parameters, and complex logic patterns.

Code
Terminal window
# Recursive function (factorial)
awk 'function fact(n) {
if (n <= 1) return 1
return n * fact(n - 1)
}
BEGIN { print "5! = " fact(5) }'
# Function modifying array parameter (arrays passed by reference)
awk 'function fill_array(arr, n, i) {
for (i = 1; i <= n; i++)
arr[i] = i * i
}
BEGIN {
fill_array(data, 5)
for (i in data) print i, data[i]
}'
# Helper functions for validation
awk 'function is_number(s) { return s ~ /^[0-9]+$/ }
function is_empty(s) { return length(s) == 0 }
BEGIN {
print "5 is_number: " is_number("5")
print empty is_empty: " is_empty("")
}'
Execution
Terminal window
awk 'function max(a, b) { return (a > b) ? a : b } BEGIN { print max(15, 23) }'
Output
Terminal window
23
  • Recursive functions must have base case
  • Arrays passed by reference (modifications persist)
  • Scalars passed by value (modifications local)

Advanced Input and Output

Multiple file handling, redirection, and pipes

Processing multiple files

Process multiple files with automatic file tracking.

Code
Terminal window
# FILENAME variable shows current file
awk '{ print FILENAME ":" NR ":" $0 }' file1 file2
# FNR is line number within each file
awk 'FNR == 1 { print "Starting " FILENAME }
{ print FILENAME ":" FNR ":" $0 }' file1 file2
# ARGIND (gawk) - current argument index
awk '{ print ARGIND, FILENAME, NR, FNR }' file1 file2
# Process specific files only
awk 'FILENAME == "file1" { print }' file1 file2
# Skip files based on pattern
awk 'FNR == 1 && FILENAME ~ /skip/ { nextfile }
{ print }' file*
Execution
Terminal window
echo "a" > /tmp/f1 && echo "b" > /tmp/f2 && awk '{ print FILENAME, FNR, $0 }' /tmp/f1 /tmp/f2
Output
Terminal window
/tmp/f1 1 a
/tmp/f2 1 b
  • NR tracks total lines across all files
  • FNR resets for each file
  • FILENAME shows current filename

Output redirection and pipes

Redirect output to files or pipes; getline reads additional lines.

Code
Terminal window
# Redirect output to file
awk '{ print > "output.txt" }' input.txt # Overwrite
awk '{ print >> "output.txt" }' input.txt # Append
# Redirect specific pattern to file
awk '/error/ { print > "errors.log" }
!/error/ { print > "clean.log" }' logfile
# Pipe output to command
awk '{ print | "sort" }' unsorted.txt
awk '{ print | "mail -s report user@host" }' data
# Close file or pipe
awk '{
print > "output.txt"
close("output.txt") # Flush and close file
}' input
# getline - read next line
awk '{
print "Current: " $0
getline next_line
print "Next: " next_line
}' file
Execution
Terminal window
echo -e "hello\nworld" | awk '{ print | "sort -r" }'
Output
Terminal window
world
hello
  • > creates new file, >> appends to file
  • | pipes output to command
  • close() flushes file and allows reopening
  • getline reads next input line into a variable

Practical Examples

Real-world use cases and data processing scenarios

Common Text Processing Tasks

Practical examples of typical AWK operations

Extracting and reformatting data

Common data extraction and transformation operations.

Code
Terminal window
# Extract specific columns from CSV
awk -F',' '{ print $1, $3 }' data.csv
# Convert CSV to TSV (tab-separated)
awk -F',' -v OFS='\t' '{ print $1, $2, $3 }' data.csv
# Extract IP address from log files
awk '{ print $1 }' /var/log/apache2/access.log
# Count occurrences of each word
awk '{ for(i=1; i<=NF; i++) count[$i]++ }
END { for (w in count) print w, count[w] }' file.txt
# Filter and print lines with specific pattern
awk '/^ERROR|^WARN/ { print }' system.log
# Sum column of numbers
awk '{ sum += $1 } END { print "Total: " sum }' numbers.txt
Execution
Terminal window
echo -e "a:1\nb:2\nc:3" | awk -F':' '{ sum+=$2 } END { print "Sum=" sum }'
Output
Terminal window
Sum=6
  • awk excels at column extraction and reformatting
  • Easy to change field separators for format conversion

Log processing and analysis

Processing logs to extract metrics and statistics.

Code
Terminal window
# Count requests by hour from Apache log
awk '{
time = $4
sub(/\[/, "", time)
gsub(/:/, " ", time)
hour = substr(time, 13, 2)
count[hour]++
}
END {
for (h in count) print h ":00 - " count[h] " requests"
}' /var/log/apache2/access.log
# Extract and count HTTP status codes
awk '{ print $(NF-1) }' access.log | sort | uniq -c
# Find top users by request count
awk '{ users[$1]++ }
END {
for (u in users) print u, users[u]
}' access.log | sort -k2 -rn | head -10
# Calculate average response time
awk '{
time = $(NF)
sum += time
count++
}
END {
print "Average: " sum / count " ms"
}' response-times.log
Execution
Terminal window
echo -e "error\nerror\nwarn\nerror" | awk '{ count[$1]++ } END { for (t in count) print t, count[t] }'
Output
Terminal window
error 3
warn 1
  • AWK ideal for parsing structured log formats
  • Pattern matching finds relevant log lines
  • Aggregation generates summary statistics

Advanced Data Processing

Complex data transformations and analysis

Data validation and cleaning

Data cleaning operations remove duplicates, whitespace, and invalid records.

Code
Terminal window
# Remove duplicate lines (preserve order)
awk '!seen[$0]++' file
# Remove leading/trailing whitespace
awk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }' file
# Validate email format
awk '/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/ { print }' emails.txt
# Check if all fields are numeric
awk '{ for(i=1; i<=NF; i++) if ($i !~ /^[0-9.]+$/) next; print }' data.txt
# Remove lines with specific patterns
awk '!/^#/ && !/^$/ && !/^;/ { print }' config.txt
# Normalize whitespace (multiple spaces to single)
awk '{ gsub(/ +/, " "); $1=$1; print }' file
Execution
Terminal window
echo -e "a\nb\na\nc\nb" | awk '!seen[$0]++'
Output
Terminal window
a
b
c
  • !seen[$0]++ is elegant duplicate removal pattern
  • gsub with start/end anchors trims whitespace
  • Email validation uses complex regex pattern

Statistical calculations and reports

Statistical operations including mean, min/max, and formatted report generation.

Code
Terminal window
# Calculate mean, median, and mode
awk '{
sum += $1; count++
arr[count] = $1
}
END {
printf "Mean: %.2f\n", sum/count
}' numbers.txt
# Find min and max values
awk 'NR==1 { min=$1; max=$1; next }
{ if ($1 < min) min=$1; if ($1 > max) max=$1 }
END { print "Min: " min ", Max: " max }' numbers.txt
# Generate formatted report
awk 'BEGIN {
printf "%-20s %-10s %-10s\n", "Name", "Age", "Score"
printf "%-20s %-10s %-10s\n", "----", "---", "-----"
}
{
total += $3; count++
printf "%-20s %-10d %-10d\n", $1, $2, $3
}
END {
printf "%-20s %-10s %-10.1f\n", "Average", "", total/count
}' data.txt
# Percentile calculation
awk '{
arr[NR] = $1
sum += $1
}
END {
mean = sum / NR
for (i=1; i<=NR; i++)
dev += (arr[i] - mean)^2
stddev = sqrt(dev / NR)
print "Mean: " mean ", StdDev: " stddev
}' data.txt
Execution
Terminal window
echo -e "5\n10\n15\n20\n25" | awk '{ sum+=$1; arr[NR]=$1 } END { print "Sum=" sum " Avg=" sum/NR " Count=" NR }'
Output
Terminal window
Sum=75 Avg=15 Count=5
  • Array needed for min/max or median calculations
  • printf enables precise formatting for reports
  • Mathematical functions enable statistical analysis