AWK
AWK is a powerful text processing language that allows pattern scanning and data extraction. Essential for processing text files, extracting columns, and performing calculations on text data.
No commands found
Try adjusting your search term
Getting Started
Introduction to AWK and basic syntax fundamentals
What is AWK
Understanding AWK and its use cases for text processing
AWK overview and capabilities
AWK is a powerful tool for text processing, data extraction, and pattern matching across files and streams.
# AWK is a text processing language designed for pattern scanning# and manipulation of data. Key features:# - Process text files line by line# - Extract and manipulate columns# - Perform calculations on data# - Filter records based on patterns# - Generate reports and formatted output
# AWK stands for: Aho, Weinberger, Kernighan (creators)# Variants: awk (original), gawk (GNU AWK), mawk (minimal AWK)which awk && awk --version/usr/bin/awkGNU Awk 5.1.0, API: 3.1- AWK reads input line by line automatically
- Supports regular expressions for pattern matching
- Built-in variables track lines, fields, and more
- Can be used interactively or in shell scripts
AWK vs other text processing tools
AWK excels at processing structured data and performing transformations that combine pattern matching with data manipulation.
# Compare AWK with other text processing tools:# grep: Search file patterns (matches lines)# sed: Stream editor (find and replace, transformations)# awk: Full programming language (matching + calculations)
# AWK is best when you need:# - Extract or reorganize columns# - Perform calculations on text data# - Process structured text (CSV, logs, reports)# - Conditional logic and complex transformationsecho -e "name,age\nJohn,30\nJane,25" | awk -F',' '{print $1 " is " $2 " years old"}'name is age years oldJohn is 30 years oldJane is 25 years old- AWK is a complete Turing-complete programming language
- More powerful than grep or sed for complex text processing
- Easier syntax than writing shell scripts for text processing
Basic Syntax and Structure
Understanding AWK program structure and execution flow
AWK program structure
Shows the three parts of an AWK program - BEGIN block (runs once before input), main block (runs for each line), and END block (runs once after all input).
# Basic AWK syntax structure:# awk 'pattern { action }' input-file
# Program structure with BEGIN, pattern/action, and END:awk ' BEGIN { # Initialization, runs before processing input print "Starting processing..." } pattern { # Main processing, runs for each line # pattern can be regex, expression, or range } END { # Finalization, runs after all input processed print "Processing complete" }' input-fileecho -e "apple\nbanana\ncherry" | awk 'BEGIN { print "Fruits:" } { print "- " $0 } END { print "Done" }'Fruits:- apple- banana- cherryDone- BEGIN and END blocks are optional
- Multiple patterns can match the same line
- Pattern can be omitted (all lines match)
- Action can be omitted (default is print)
Inline AWK programs and scripts
Different ways to invoke AWK - inline, from file, with arguments, and from pipes.
# Method 1: Inline program with single quotesawk '{ print NR, $0 }' file.txt
# Method 2: Program from fileawk -f script.awk file.txt
# Method 3: Multiple input filesawk '{ print FILENAME, NR, $0 }' file1.txt file2.txt
# Method 4: Program with variablesawk -v var=value '{ print var, $0 }' file.txt
# Method 5: Pipe input directlyecho "data" | awk '{ print toupper($0) }'echo -e "line1\nline2" | awk '{ print NR, $0 }'1 line12 line2- Single quotes protect AWK syntax from shell interpretation
- -f flag reads program from file
- -v flag passes variables to AWK
- AWK can read from multiple files sequentially
Installation and Setup
Installing AWK and verifying functionality
Install AWK on Linux systems
Installation of GNU AWK (gawk) on various Linux distributions and macOS.
# Ubuntu/Debiansudo apt-get updatesudo apt-get install -y gawk
# CentOS/RHELsudo yum install -y gawk
# macOSbrew install gawk
# Verify installationawk --versiongawk --version | head -1GNU Awk 5.1.0, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)- Most systems have awk or gawk installed by default
- gawk (GNU AWK) is most feature-complete implementation
- mawk is faster but supports fewer features
- Compatibility: gawk includes all mawk features
Creating AWK script file
Creating and using AWK script files for more complex programs.
# Create a simple AWK script filecat > process.awk << 'EOF'BEGIN { print "Processing file..."}{ total += $1 count++}END { print "Total: " total print "Average: " (count > 0 ? total / count : 0)}EOF
# Run the scriptawk -f process.awk numbers.txt
# Make script executable (requires shebang)echo '#!/usr/bin/awk -f' | cat - process.awk > /tmp/processchmod +x /tmp/process./process numbers.txtcat process.awkBEGIN { print "Processing file..."}{ total += $1 count++}END { print "Total: " total print "Average: " (count > 0 ? total / count : 0)}- Use -f flag to load program from file
- Script files can be made executable with proper shebang
- Shebang line: #!/usr/bin/awk -f
- Scripts are useful for complex logic and reusability
Field Processing
Working with fields, separators, and field manipulation
Field Variables and NF
Understanding field access and manipulation with field variables
Accessing fields in AWK
Field variables allow accessing individual fields by number, with $NF accessing the last field dynamically.
# Field variables reference:# $0 = entire line# $1 = first field# $2 = second field# ... $NF = last field# NF = total number of fields in the line
# Examples:echo "John Doe 30 Engineer" | awk '{ print $1 }' # Johnecho "John Doe 30 Engineer" | awk '{ print $NF }' # Engineerecho "John Doe 30 Engineer" | awk '{ print $(NF-1) }' # 30echo "John Doe 30 Engineer" | awk '{ print NF }' # 4echo "apple banana cherry" | awk '{ print "Last field:", $NF; print "Field count:", NF }'Last field: cherryField count: 3- Fields are separated by whitespace by default (FS)
- $0 is the entire line
- Use $(NF-1) for second-to-last field, $(NF-2) for third-to-last, etc.
- NF changes if you modify fields
Modifying fields
Modifying, adding, and removing fields by direct assignment or changing NF value.
# Change field valuesecho "apple 5 basket" | awk '{ $2 = 10; print }' # apple 10 basket
# Add fieldsecho "apple basket" | awk '{ $(NF+1) = "10"; print }' # apple basket 10
# Reconstruct line with modified fieldsecho "a b c" | awk '{ $2 = "B"; print $0 }' # a B c
# Remove fields (truncate)echo "a b c d e" | awk '{ NF = 3; print }' # a b cecho "test 100 active" | awk '{ $2 = 200; NF = 2; print }'test 200- Modifying a field updates $0 automatically
- Increasing NF adds empty fields
- Decreasing NF removes fields from the end
- Output uses OFS to join modified fields
Field Separators
Setting and using different field separator patterns
Setting field separators
Custom field separators allow parsing different delimited formats like CSV, TSV, and colon-separated data.
# FS (Field Separator) - what separates fields in input# OFS (Output Field Separator) - what separates fields in output
# Default FS is whitespace. Set custom FS with -F flag:echo "apple,banana,cherry" | awk -F',' '{ print $2 }' # banana
# Or use BEGIN to set FSecho "apple:banana:cherry" | awk 'BEGIN { FS=":" } { print $2 }' # banana
# Set both FS and OFSecho "apple-banana-cherry" | awk -F'-' -v OFS=',' '{ print $1, $2, $3 }'# apple,banana,cherry
# Use regex as FS (multiple delimiters)echo "apple,banana;cherry" | awk -F'[,;]' '{ print $2 }' # bananaecho "name:age:city" | awk -F':' '{ print "Name=" $1 ", Age=" $2 }'Name=name, Age=age- Default FS matches one or more whitespace characters
- FS can be a single character or regex pattern
- -F flag overrides default FS
- BEGIN block can set FS for variable input
Processing CSV and complex formats
Different separators for input parsing and output formatting with FS and OFS.
# Process CSV with quoted fieldsecho 'name,age,city"Smith, John",30,"New York""Doe, Jane",25,"Los Angeles"' | awk -F',' 'NR > 1 { gsub(/"/, "", $1) # Remove quotes gsub(/ +$/, "", $1) # Remove trailing spaces print $1 " is " $2 " years old"}'
# Tab-separated valuesecho -e "col1\tcol2\tcol3" | awk -F'\t' '{ print $2 }'
# Multi-character delimiterecho "apple::banana::cherry" | awk -F'::' '{ print $1 }'echo "a:b:c" | awk -F':' -v OFS='-' '{ print $1, $2, $3 }'a-b-c- -v OFS flags sets output field separator
- FS regex can handle multiple delimiter types
- gsub useful for cleaning CSV quote characters
- Use -F'\t' for tab-delimited files
Field Manipulation and Reconstruction
Transform and reorganize field data
Rearranging and extracting fields
Extracting and rearranging fields to create custom output formats.
# Extract and rearrange columnsecho "John Doe john@example.com 30" | awk '{ print $3, $2, $1 }'# john@example.com Doe John
# Extract specific fieldsecho "apple 100 USD red" | awk '{ print "Item: " $1 ", Price: " $2 " " $4 }'# Item: apple, Price: 100 red
# Concatenate fieldsecho "John Doe" | awk '{ full_name = $1 " " $2; print "Hello, " full_name }'# Hello, John Doe
# Extract substring from fieldecho "example@domain.com" | awk -F'@' '{ print "User: " $1 }'# User: exampleecho "2025-02-28" | awk -F'-' '{ print "Year: " $1 ", Month: " $2 ", Day: " $3 }'Year: 2025, Month: 02, Day: 28- Fields can be rearranged in any order
- Easy to combine fields into new strings
- Using FS and field numbers simplifies parsing
Computing values from fields
Performing calculations and transformations on field values.
# Sum and average fieldsecho -e "100\n200\n300" | awk '{ sum += $1 } END { print "Total: " sum ", Avg: " sum/NR }'# Total: 600, Avg: 200
# Product of fieldsecho "5 4 3" | awk '{ print "Product: " $1 * $2 * $3 }'# Product: 60
# Calculate percentageecho "75 100" | awk '{ print ($1 / $2 * 100) "%" }'# 75%
# String operations on fieldsecho "javascript PYTHON bash" | awk '{ print toupper($1), tolower($2), tolower($3) }'# JAVASCRIPT python bashecho "10 20" | awk '{ print "Sum: " $1 + $2 ", Diff: " $1 - $2 ", Prod: " $1 * $2 }'Sum: 30, Diff: -10, Prod: 200- AWK automatically converts fields to numbers when needed
- Can perform arithmetic directly on fields
- String functions can transform field values
Patterns and Matching
Pattern matching and regular expressions in AWK
Pattern Types and Matching
Different pattern types for matching records
AWK pattern types
Different pattern types allow flexible filtering and matching of records.
# Pattern types in AWK:
# 1. Regex patternsawk '/pattern/ { action }' file # Match lines with patternawk '!/pattern/ { action }' file # Match lines without pattern
# 2. Expression patterns (boolean)awk '$1 > 100 { action }' file # Field comparisonawk 'NR >= 2 && NR <= 5' file # Line number rangeawk '$1 == "active" { action }' file # Exact match
# 3. BEGIN/END patternsawk 'BEGIN { action }' # Before processing inputawk 'END { action }' # After processing input
# 4. Range patternsawk '/start/,/end/ { action }' file # From start to end pattern
# 5. No patternawk '{ action }' file # All lines matchecho -e "apple 50\nbanana 120\ncherry 75" | awk '$2 > 100 { print }'banana 120- Patterns determine which lines trigger the action block
- Multiple patterns can target different lines
- Patterns are optional (all lines match by default)
- Expression patterns use standard operators
Regex patterns and matching
Regex patterns filter lines based on pattern matching and operator ~.
# Regex pattern matchingawk '/linux/' file # Contains "linux"awk '/^linux/' file # Starts with "linux"awk '/linux$/' file # Ends with "linux"awk '/[0-9]+/' file # Contains digitsawk '!/error/' file # Does not contain "error"
# Regex with field matching (~)awk '$1 ~ /john/' file # First field matches regexawk '$2 !~ /test/' file # Second field doesn't match
# Case-insensitive matchingawk 'tolower($0) ~ /error/' file # Case-insensitive searchecho -e "error: issue\nwarning: notice\nerror: problem" | awk '/error/'error: issueerror: problem- ~ operator checks if field matches regex
- !~ operator checks if field doesn't match
- Use tolower() for case-insensitive matching
- ^ and $ anchor patterns to start and end
BEGIN and END Blocks
Initialization and finalization with BEGIN and END
BEGIN and END blocks
BEGIN runs before processing input and END runs after, useful for setup, cleanup, and reporting.
# BEGIN block - runs once before processing inputawk 'BEGIN { print "Processing started..." count = 0}{ count++}END { print "Total lines: " count}' file
# Example: Sum numbers and show statisticsawk 'BEGIN { sum = 0; count = 0 }NR > 1 { sum += $1; count++ }END { avg = (count > 0) ? sum / count : 0 print "Sum: " sum "\nAverage: " avg}' data.txtecho -e "10\n20\n30" | awk 'BEGIN { print "Numbers:" } { print "- " $0 } END { print "Done" }'Numbers:- 10- 20- 30Done- BEGIN block runs even if no input is provided
- END block runs even if no lines match patterns
- Variables initialized in BEGIN persist through all input
Header and summary generation
Using BEGIN and END to create formatted reports with headers, footers, and summary statistics.
# Generate report with header and footerawk 'BEGIN { printf "%-15s %-10s %-10s\n", "Name", "Age", "City" printf "%-15s %-10s %-10s\n", "----", "---", "----"}{ printf "%-15s %-10s %-10s\n", $1, $2, $3}END { print "" print "Total records: " NR}' people.txt
# Calculate running statisticsawk 'BEGIN { header = "Line\tValue\tRunSum" print header}{ sum += $1 printf "%d\t%d\t%d\n", NR, $1, sum}END { print "Final sum: " sum}' numbers.txtecho -e "5\n10\n15" | awk 'BEGIN { print "Input values:" } { sum += $1; print NR, $1 } END { print "Total: " sum }'Input values:1 52 103 15Total: 30- printf in BEGIN/END creates formatted output
- Variables maintain state across BEGIN, main, and END
- Good pattern for generating summaries and reports
Range and Compound Patterns
Using range patterns and combining conditions
Range patterns
Range patterns select a block of lines from first matching pattern to second matching pattern.
# Range patterns match from first pattern to second pattern# Syntax: /start/,/end/ { action }
# Print lines between BEGIN and END markersawk '/BEGIN/,/END/' file
# Process sections of a fileawk '/^Chapter 1/,/^Chapter 2/' book.txt
# Extract code blocks between markersawk '/```/,/```/' doc.md | head -20
# Line number rangesawk 'NR==10, NR==20 { print NR, $0 }' fileecho -e "start\ndata1\ndata2\nend\nignore" | awk '/start/,/end/ { print }'startdata1data2end- Range state toggles when start pattern matches
- Range stays true until end pattern matches
- Both start and end lines are included
Compound conditions
Compound conditions combine multiple patterns with logical operators for complex filtering.
# Logical AND (&&) - both conditions must be trueawk '$1 == "error" && $2 > 100 { print }' logs.txt
# Logical OR (||) - either condition can be trueawk '$1 == "error" || $1 == "warning" { print }' logs.txt
# Negation (!) - condition must be falseawk '!($1 == "info") { print }' logs.txt
# Complex expressionsawk '($1 ~ /^ERR/ || $1 ~ /^FAIL/) && NR > 100 { print }' logs.txt
# Multiple conditions on fieldsawk '$2 >= 500 && $2 <= 1000 && $3 == "active"' data.txtecho -e "error 150\nerror 50\nwarning 200" | awk '$1 == "error" && $2 > 100 { print }'error 150- && has higher precedence than ||
- Use parentheses for clarity in complex expressions
- Patterns are evaluated left to right
Variables and Operators
Built-in variables, operators, and expressions
Built-in Variables
Understanding AWK's built-in variables
Built-in variables reference
Built-in variables track line numbers, field counts, separators, and filenames.
# NR - Total record number (line number across all files)awk '{ print NR, $0 }' file
# FNR - File record number (line number within current file)awk '{ print FNR, FILENAME, $0 }' file1 file2
# NF - Number of fields in current recordawk '{ print "Fields: " NF, "Last field: " $NF }' file
# FS - Field separator (input)awk 'BEGIN { FS=":" } { print $1 }'
# OFS - Output field separatorawk -v OFS="-" '{ print $1, $2, $3 }' file
# RS - Record separator (usually newline)awk 'BEGIN { RS=";" } { print NR, $0 }'
# ORS - Output record separatorawk 'BEGIN { ORS=";" } { print $0 }'
# FILENAME - Current filename being processedawk '{ print FILENAME ":" NR ":" $0 }' file1 file2
# ARGC/ARGV - Command line argument count and valuesawk 'BEGIN { print "Args: " ARGC; for (i=0; i<ARGC; i++) print ARGV[i] }'echo -e "a b c\nd e f g" | awk '{ print "NR=" NR " NF=" NF " Last=" $NF }'NR=1 NF=3 Last=cNR=2 NF=4 Last=g- NR increments with each line across all files
- FNR resets for each new file
- NF changes if you modify fields
- FILENAME shows which file is being processed
Additional built-in variables
Additional built-in variables for environment access, arguments, and match results.
# ENVIRON - Environment variablesawk 'BEGIN { print "User: " ENVIRON["USER"] }'awk 'BEGIN { print "Home: " ENVIRON["HOME"] }'
# ARGC/ARGV - Argumentsawk 'BEGIN { print ARGC; for(i in ARGV) print i, ARGV[i] }' file1 file2
# SUBSEP - Subscript separator for arraysawk 'BEGIN { a[1,2] = "val"; SUBSEP = "-"; for (i in a) print i }'
# RSTART/RLENGTH - From match() functionawk 'BEGIN { str = "hello world" match(str, /world/) print "Start: " RSTART " Length: " RLENGTH}'awk 'BEGIN { print "Arguments: " ARGC }' file1 file2Arguments: 3- ENVIRON allows accessing environment variables
- ARGC is argument count (includes program name)
- ARGV[0] is awk command, ARGV[1] is first file, etc.
Operators and Expressions
Arithmetic, comparison, logical, and string operators
Operators reference
Complete set of AWK operators for arithmetic, comparison, logical, and string operations.
# ARITHMETIC OPERATORS# + (addition) x + y# - (subtraction) x - y# * (multiplication) x * y# / (division) x / y# % (modulo) x % y# ^ (exponentiation) x ^ y
# COMPARISON OPERATORS# < less than# <= less than or equal# > greater than# >= greater than or equal# == equal# != not equal
# LOGICAL OPERATORS# && logical AND# || logical OR# ! logical NOT
# STRING OPERATORS# (space) concatenation: "hello" " " "world"# ~ regex match: $0 ~ /pattern/# !~ regex not match: $0 !~ /pattern/
# ASSIGNMENT OPERATORS# = assignment# += add and assign# -= subtract and assign# *= multiply and assign# /= divide and assign# %= modulo and assign# ^= exponentiate and assign# ++ increment# -- decrementecho "10 3" | awk '{ print $1 + $2, $1 - $2, $1 * $2, $1 / $2, $1 % $2, $1 ^ $2 }'13 7 30 3.33333 1 1000- String concatenation is implicit (space between values)
- Comparison returns 1 (true) or 0 (false)
- Regex operators ~ and !~ test field patterns
Operator examples
Practical examples of operators including compound assignment, concatenation, and ternary operator.
# Arithmetic with increment/decrementawk 'BEGIN { x = 5; print x++, ++x, x--, --x }' # 5 7 7 5
# Compound assignmentawk 'BEGIN { x = 10; x += 5; print x }' # 15
# String concatenationawk '{ print $1 " is " $2 " years old" }' # Concatenate strings
# Ternary operatorawk '{ print ($1 > 50) ? "large" : "small" }' # Conditional value
# Precedence exampleawk 'BEGIN { print 2 + 3 * 4 }' # 14 (not 20)awk 'BEGIN { print (2 + 3) * 4 }' # 20awk 'BEGIN { x = 5; y = 3; print (x > y) ? "x is greater" : "y is greater" }'x is greater- Ternary operator syntax: condition ? true_value : false_value
- Post-increment (x++) returns old value
- Pre-increment (++x) returns new value
User-Defined Variables
Creating and managing custom variables
Variable declaration and scope
Variables in AWK are created automatically and are global by default.
# Variables don't need declaration - created on useawk 'BEGIN { x = 5; y = 10; print x + y }' # 15
# Uninitialized variables are 0 or empty stringawk 'BEGIN { print "x=" x, "y=" y }' # x=0 y=
# Variables are global by defaultawk 'BEGIN { x = 5 } { print x, $0 } END { print x }'
# Function parameters and local variablesawk 'function add(a, b, local1, local2) { local1 = a local2 = b return local1 + local2}BEGIN { print add(3, 4) }' # 7
# Command-line variable assignmentawk -v var=value 'BEGIN { print var }'awk -v name="John" '{ print name, $0 }'awk 'BEGIN { count = 0 } { count++ } END { print "Total lines: " count }' /etc/hostnameTotal lines: 1- Uninitialized variables are 0 (numeric) or "" (string)
- Variables created on first use
- All variables are global except function parameters
- -v flag passes variables to AWK program
Local and global variable conventions
Global variables persist across functions, while function parameters and extra parameters are local.
# Global variables (accessible everywhere)awk 'BEGIN { global_var = 10 }function test() { return global_var * 2 }BEGIN { print test() }' # 20
# Function parameters are localawk 'function f(x) { y = 100 # y is global (created inside function) return x + y}BEGIN { x = 1; y = 2; print f(5); print x, y }' # 105, 1 100
# Local variables (extra parameters)awk 'function f(x, y, local1, local2) { local1 = 10 local2 = 20 return x + y + local1 + local2}BEGIN { print f(1, 2) }' # 33
# Array variablesawk 'BEGIN { arr[1] = "one" arr[2] = "two" for (i in arr) print i, arr[i]}'awk 'BEGIN { total = 0; for (i = 1; i <= 3; i++) total += i; print "Sum 1-3: " total }'Sum 1-3: 6- Extra function parameters after named parameters act as local variables
- Global variables accessible throughout program
- Local variables should be separated by extra whitespace in function signature
Control Flow
Conditional statements, loops, and flow control
If-Else Statements
Conditional execution with if, else, and else if
If-else statements
If-else statements allow conditional execution based on boolean expressions.
# Basic if statementawk '{ if ($1 > 50) print "Large: " $1}' file
# if-else statementawk '{ if ($1 > 50) print "Large" else print "Small"}' file
# Multiple conditions with else-ifawk '{ if ($1 > 100) print "Very large" else if ($1 > 50) print "Large" else if ($1 > 10) print "Medium" else print "Small"}' file
# Nested conditionsawk '{ if ($1 > 50) { if ($2 == "active") print "Active and large" else print "Inactive and large" }}' fileecho -e "apple 75\nbanana 30" | awk '{ if ($2 > 50) print $1 " is popular"; else print $1 " is unpopular" }'apple is popularbanana is unpopular- Condition must evaluate to true (non-zero) or false (zero)
- Braces required for multiple statements in block
- String comparisons use ==, !=, ~, !~
Ternary operator
Ternary operator provides compact conditional value selection.
# Ternary operator: condition ? true_value : false_valueawk '{ print ($1 > 50) ? "large" : "small" }' file
# Nested ternaryawk '{ status = ($1 > 100) ? "very large" : ($1 > 50) ? "large" : "small" print $1 " is " status}' file
# Ternary in variable assignmentawk '{ message = (NR == 1) ? "First line" : "Not first" print message}' file
# Ternary with string valuesawk '{ result = ($1 ~ /^[0-9]+$/) ? "number" : "text" print result}' fileecho -e "5\n150\n75" | awk '{ print $1 " is " (($1 > 100) ? "large" : "small") }'5 is small150 is large75 is small- Ternary operator: condition ? value_if_true : value_if_false
- Useful for simple conditional assignments
- Can be nested for multiple conditions
Loops (For, While, Do-While)
Loop structures for iterating over data
For loops
For loops iterate with counter (C-style) or iterate array keys (for-in).
# C-style for loopawk 'BEGIN { for (i = 1; i <= 5; i++) print i}'
# For loop with arrayawk 'BEGIN { arr[1] = "apple" arr[2] = "banana" arr[3] = "cherry" for (i = 1; i <= 3; i++) print arr[i]}'
# For-in loop (iterate array keys)awk 'BEGIN { data["name"] = "John" data["age"] = "30" data["city"] = "NYC" for (key in data) print key ": " data[key]}'
# For loop with field iterationawk '{ for (i = 1; i <= NF; i++) print "Field " i ": " $i}' fileawk 'BEGIN { for (i=1; i<=3; i++) for (j=1; j<=3; j++) print i, j }'1 11 21 32 12 22 33 13 23 3- For-in loop iterates over array keys in arbitrary order
- C-style for has init, condition, and update
- Empty for (;;) creates infinite loop
While and do-while loops
While loops execute while condition is true; do-while always executes at least once.
# While loopawk 'BEGIN { i = 1 while (i <= 5) { print i i++ }}'
# While loop reading from fileawk '{ i = 1 while (i <= NF) { print "Field " i ": " $i i++ }}' file
# Do-while loop (always runs at least once)awk 'BEGIN { i = 1 do { print i i++ } while (i <= 3)}'
# While loop with breakawk 'BEGIN { i = 1 while (i <= 10) { if (i == 5) break print i i++ }}'awk 'BEGIN { i=1; while (i<=3) { print "i=" i; i++ } }'i=1i=2i=3- While checks condition before executing
- Do-while checks condition after executing
- Both support break and continue statements
Flow Control (Break, Continue, Next)
Control loop and program flow with break, continue, next, and exit
Break and continue
Break exits loops immediately, continue skips to next iteration.
# Break - exit loop immediatelyawk 'BEGIN { for (i = 1; i <= 10; i++) { if (i == 5) break print i }}' # Output: 1 2 3 4
# Continue - skip to next iterationawk 'BEGIN { for (i = 1; i <= 5; i++) { if (i == 3) continue print i }}' # Output: 1 2 4 5
# Break from nested loopawk 'BEGIN { for (i = 1; i <= 3; i++) { for (j = 1; j <= 3; j++) { if (j == 2) break print i, j } }}'awk 'BEGIN { for(i=1; i<=5; i++) { if(i==3) continue; print i } }'1245- Break only exits innermost loop
- Continue skips rest of loop body and goes to next iteration
Next, nextfile, and exit
Next skips to next line, nextfile skips remaining lines in file, exit terminates program.
# next - skip to next input lineawk '{ if ($1 == "skip") next print "Processing: " $0}' file
# nextfile - skip to next fileawk '{ if ($1 == "EOF") nextfile print $0}' file1 file2
# exit - terminate programawk '{ if (NR > 10) exit print $0}' largefile
# exit with status codeawk 'BEGIN { exit 0 }' # Successawk 'BEGIN { exit 1 }' # Error
# exit runs END blockawk 'BEGIN { exit } END { print "Cleanup" }'echo -e "a\nb\nc\nd" | awk 'NR==2 { next } { print }'acd- Next applies to main pattern-action block
- Nextfile skips to next file in ARGV
- Exit runs END blocks before terminating
Built-in Functions
String, mathematical, and array functions
String Functions
Functions for string manipulation and processing
String length and substring
Length, substr, and index extract string information and substrings.
# length(str) - string lengthawk 'BEGIN { print length("hello") }' # 5awk '{ print NF, length($1) }' file # Fields and field length
# substr(str, start, len) - extract substringawk 'BEGIN { print substr("hello world", 7) }' # worldawk 'BEGIN { print substr("hello world", 1, 5) }' # helloawk '{ print substr($0, 1, 10) }' file # First 10 chars of line
# index(str, substr) - find position of substringawk 'BEGIN { print index("hello world", "world") }' # 7awk '{ print index($0, "error") }' # Find "error" position
# Practical example: extract domain from emailawk -F'@' '{ domain = substr($2, 1, length($2)-4); print domain }' emails.txtawk 'BEGIN { s="hello"; print "len=" length(s), "sub=" substr(s, 2, 3), "idx=" index(s, "ll") }'len=5 sub=ell idx=3- length() returns character count
- substr starts at 1 (not 0)
- index() returns 1-based position or 0 if not found
String replacement and formatting
Sub and gsub replace patterns; match tests for matches; toupper/tolower change case.
# sub(regex, repl, target) - replace first matchawk 'BEGIN { s="hello hello"; sub(/hello/, "hi", s); print s }' # hi helloawk '{ sub(/old/, "new"); print }' file # Replace first occurrence
# gsub(regex, repl, target) - replace all matchesawk 'BEGIN { s="hello hello"; gsub(/hello/, "hi", s); print s }' # hi hiawk '{ gsub(/ +/, " "); print }' file # Normalize spaces
# match(str, regex) - test if matches and set RSTART, RLENGTHawk 'BEGIN { if (match("hello world", /wor/)) print "Found at " RSTART " length " RLENGTH}'
# sprintf(format, ...) - formatted stringawk 'BEGIN { printf "%s=%d\n", "count", 42 }'awk 'BEGIN { s = sprintf("%.2f", 3.14159); print s }' # 3.14
# tolower() and toupper()awk 'BEGIN { print tolower("HELLO") }' # helloawk 'BEGIN { print toupper("hello") }' # HELLOawk 'BEGIN { s="hello world"; gsub(/l/, "L", s); print s }'heLLo worLd- sub replaces first occurrence, gsub replaces all
- If target omitted, uses $0
- match sets RSTART and RLENGTH variables
- sprintf formats strings without printing
Mathematical Functions
Arithmetic and trigonometric functions
Basic math functions
Math functions perform calculations on numbers.
# int(x) - integer partawk 'BEGIN { print int(3.7) }' # 3
# sqrt(x) - square rootawk 'BEGIN { print sqrt(16) }' # 4
# sin(x), cos(x), atan2(y,x) - trigonometricawk 'BEGIN { print sin(0), cos(0) }' # 0 1awk 'BEGIN { pi = atan2(0, -1) print "Pi = " pi print "Sin(pi/2) = " sin(pi/2)}'
# exp(x) - e^xawk 'BEGIN { print exp(1) }' # 2.71828...
# log(x) - natural logarithmawk 'BEGIN { print log(2.71828) }' # 1
# Practical: Calculate percentageawk 'BEGIN { print (75/100) * 100 "%"}' # 75%awk 'BEGIN { print "sqrt(25)=" sqrt(25), "int(3.99)=" int(3.99), "exp(1)=" exp(1) }'sqrt(25)=5 int(3.99)=3 exp(1)=2.71828- int() truncates toward zero
- Trigonometric functions use radians
- exp(1) is approximately e
Random numbers
Random functions generate random numbers for simulations and sampling.
# rand() - random number 0 to 1awk 'BEGIN { for (i = 1; i <= 3; i++) print rand()}'
# Random integer 1-10awk 'BEGIN { for (i = 1; i <= 5; i++) print int(rand() * 10) + 1}'
# srand(seed) - seed random generatorawk 'BEGIN { srand(123) print rand()}'
# srand() with no seed uses current timeawk 'BEGIN { srand() print rand()}'awk 'BEGIN { srand(42); for(i=1; i<=3; i++) print int(rand()*10) }'482- rand() returns float between 0 and 1
- srand() seeds with current time if no seed given
- Seeding enables reproducible random sequences
Arrays and Array Functions
Creating and manipulating arrays
Array basics
Arrays store multiple values with numeric or string indices.
# Create and access array elementsawk 'BEGIN { arr[1] = "one" arr[2] = "two" arr[3] = "three" for (i = 1; i <= 3; i++) print arr[i]}'
# Associative array (string keys)awk 'BEGIN { person["name"] = "John" person["age"] = "30" person["city"] = "NYC" for (key in person) print key ": " person[key]}'
# Array from fileawk '{ arr[NR] = $0 } END { for (i in arr) print i ": " arr[i] }' file
# Check if key existsawk 'BEGIN { arr["key"] = "value" if ("key" in arr) print "Found" if ("missing" in arr) print "Not there"}'awk 'BEGIN { a[1]=10; a[2]=20; a[3]=30; for(i in a) sum+=a[i]; print "Sum=" sum }'Sum=60- Array indices can be numbers or strings
- For-in loop iterates in arbitrary order
- in operator tests key existence
Array operations
Delete removes elements, split creates arrays from strings, multi-dimensional arrays possible with composite keys.
# Delete array elementawk 'BEGIN { a[1] = "one"; a[2] = "two" delete a[1] for (i in a) print i, a[i]}' # Output: 2 two
# Delete entire arrayawk 'BEGIN { a[1] = 1; a[2] = 2 delete a for (i in a) print i}' # No output
# Multi-dimensional arraysawk 'BEGIN { a[1,1] = "top-left" a[1,2] = "top-right" a[2,1] = "bot-left" for (key in a) print key ": " a[key]}'
# split() function creates arrayawk 'BEGIN { count = split("a:b:c:d", arr, ":") for (i = 1; i <= count; i++) print arr[i]}'awk 'BEGIN { n=split("one,two,three", a, ","); for(i=1;i<=n;i++) print a[i] }'onetwothree- delete removes single element or entire array
- split returns count of elements created
- Multi-dimensional: a[i,j,k] key is i\034j\034k (SUBSEP)
Advanced Features
User-defined functions, file handling, and advanced techniques
User-Defined Functions
Creating and using custom functions
Defining and calling functions
Functions encapsulate logic for reuse and modularity.
# Basic function definition and exampleawk 'function add(a, b) { return a + b}BEGIN { print "5 + 3 = " add(5, 3)}'
# Function with multiple statementsawk 'function greet(name) { greeting = "Hello, " name "!" return greeting}BEGIN { print greet("Alice") }'
# Function without return statementawk 'function print_info(x) { print "Value: " x}BEGIN { print_info(42) }'
# Function with local variables (extra parameters)awk 'function calculate(x, y, local_result) { local_result = x * y + 10 return local_result}BEGIN { print calculate(3, 4) }'awk 'function double(x) { return x * 2 } BEGIN { print double(21) }'42- Functions must be defined before use
- Parameters are passed by value
- Extra parameters act as local variables
- Return statement optional (returns 0/"" if omitted)
Advanced function patterns
Functions support recursion, array parameters, and complex logic patterns.
# Recursive function (factorial)awk 'function fact(n) { if (n <= 1) return 1 return n * fact(n - 1)}BEGIN { print "5! = " fact(5) }'
# Function modifying array parameter (arrays passed by reference)awk 'function fill_array(arr, n, i) { for (i = 1; i <= n; i++) arr[i] = i * i}BEGIN { fill_array(data, 5) for (i in data) print i, data[i]}'
# Helper functions for validationawk 'function is_number(s) { return s ~ /^[0-9]+$/ }function is_empty(s) { return length(s) == 0 }BEGIN { print "5 is_number: " is_number("5") print empty is_empty: " is_empty("")}'awk 'function max(a, b) { return (a > b) ? a : b } BEGIN { print max(15, 23) }'23- Recursive functions must have base case
- Arrays passed by reference (modifications persist)
- Scalars passed by value (modifications local)
Advanced Input and Output
Multiple file handling, redirection, and pipes
Processing multiple files
Process multiple files with automatic file tracking.
# FILENAME variable shows current fileawk '{ print FILENAME ":" NR ":" $0 }' file1 file2
# FNR is line number within each fileawk 'FNR == 1 { print "Starting " FILENAME }{ print FILENAME ":" FNR ":" $0 }' file1 file2
# ARGIND (gawk) - current argument indexawk '{ print ARGIND, FILENAME, NR, FNR }' file1 file2
# Process specific files onlyawk 'FILENAME == "file1" { print }' file1 file2
# Skip files based on patternawk 'FNR == 1 && FILENAME ~ /skip/ { nextfile }{ print }' file*echo "a" > /tmp/f1 && echo "b" > /tmp/f2 && awk '{ print FILENAME, FNR, $0 }' /tmp/f1 /tmp/f2/tmp/f1 1 a/tmp/f2 1 b- NR tracks total lines across all files
- FNR resets for each file
- FILENAME shows current filename
Output redirection and pipes
Redirect output to files or pipes; getline reads additional lines.
# Redirect output to fileawk '{ print > "output.txt" }' input.txt # Overwriteawk '{ print >> "output.txt" }' input.txt # Append
# Redirect specific pattern to fileawk '/error/ { print > "errors.log" }!/error/ { print > "clean.log" }' logfile
# Pipe output to commandawk '{ print | "sort" }' unsorted.txtawk '{ print | "mail -s report user@host" }' data
# Close file or pipeawk '{ print > "output.txt" close("output.txt") # Flush and close file}' input
# getline - read next lineawk '{ print "Current: " $0 getline next_line print "Next: " next_line}' fileecho -e "hello\nworld" | awk '{ print | "sort -r" }'worldhello- > creates new file, >> appends to file
- | pipes output to command
- close() flushes file and allows reopening
- getline reads next input line into a variable
Practical Examples
Real-world use cases and data processing scenarios
Common Text Processing Tasks
Practical examples of typical AWK operations
Extracting and reformatting data
Common data extraction and transformation operations.
# Extract specific columns from CSVawk -F',' '{ print $1, $3 }' data.csv
# Convert CSV to TSV (tab-separated)awk -F',' -v OFS='\t' '{ print $1, $2, $3 }' data.csv
# Extract IP address from log filesawk '{ print $1 }' /var/log/apache2/access.log
# Count occurrences of each wordawk '{ for(i=1; i<=NF; i++) count[$i]++ }END { for (w in count) print w, count[w] }' file.txt
# Filter and print lines with specific patternawk '/^ERROR|^WARN/ { print }' system.log
# Sum column of numbersawk '{ sum += $1 } END { print "Total: " sum }' numbers.txtecho -e "a:1\nb:2\nc:3" | awk -F':' '{ sum+=$2 } END { print "Sum=" sum }'Sum=6- awk excels at column extraction and reformatting
- Easy to change field separators for format conversion
Log processing and analysis
Processing logs to extract metrics and statistics.
# Count requests by hour from Apache logawk '{ time = $4 sub(/\[/, "", time) gsub(/:/, " ", time) hour = substr(time, 13, 2) count[hour]++}END { for (h in count) print h ":00 - " count[h] " requests"}' /var/log/apache2/access.log
# Extract and count HTTP status codesawk '{ print $(NF-1) }' access.log | sort | uniq -c
# Find top users by request countawk '{ users[$1]++ }END { for (u in users) print u, users[u]}' access.log | sort -k2 -rn | head -10
# Calculate average response timeawk '{ time = $(NF) sum += time count++}END { print "Average: " sum / count " ms"}' response-times.logecho -e "error\nerror\nwarn\nerror" | awk '{ count[$1]++ } END { for (t in count) print t, count[t] }'error 3warn 1- AWK ideal for parsing structured log formats
- Pattern matching finds relevant log lines
- Aggregation generates summary statistics
Advanced Data Processing
Complex data transformations and analysis
Data validation and cleaning
Data cleaning operations remove duplicates, whitespace, and invalid records.
# Remove duplicate lines (preserve order)awk '!seen[$0]++' file
# Remove leading/trailing whitespaceawk '{ gsub(/^[ \t]+|[ \t]+$/, ""); print }' file
# Validate email formatawk '/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/ { print }' emails.txt
# Check if all fields are numericawk '{ for(i=1; i<=NF; i++) if ($i !~ /^[0-9.]+$/) next; print }' data.txt
# Remove lines with specific patternsawk '!/^#/ && !/^$/ && !/^;/ { print }' config.txt
# Normalize whitespace (multiple spaces to single)awk '{ gsub(/ +/, " "); $1=$1; print }' fileecho -e "a\nb\na\nc\nb" | awk '!seen[$0]++'abc- !seen[$0]++ is elegant duplicate removal pattern
- gsub with start/end anchors trims whitespace
- Email validation uses complex regex pattern
Statistical calculations and reports
Statistical operations including mean, min/max, and formatted report generation.
# Calculate mean, median, and modeawk '{ sum += $1; count++ arr[count] = $1}END { printf "Mean: %.2f\n", sum/count}' numbers.txt
# Find min and max valuesawk 'NR==1 { min=$1; max=$1; next }{ if ($1 < min) min=$1; if ($1 > max) max=$1 }END { print "Min: " min ", Max: " max }' numbers.txt
# Generate formatted reportawk 'BEGIN { printf "%-20s %-10s %-10s\n", "Name", "Age", "Score" printf "%-20s %-10s %-10s\n", "----", "---", "-----"}{ total += $3; count++ printf "%-20s %-10d %-10d\n", $1, $2, $3}END { printf "%-20s %-10s %-10.1f\n", "Average", "", total/count}' data.txt
# Percentile calculationawk '{ arr[NR] = $1 sum += $1}END { mean = sum / NR for (i=1; i<=NR; i++) dev += (arr[i] - mean)^2 stddev = sqrt(dev / NR) print "Mean: " mean ", StdDev: " stddev}' data.txtecho -e "5\n10\n15\n20\n25" | awk '{ sum+=$1; arr[NR]=$1 } END { print "Sum=" sum " Avg=" sum/NR " Count=" NR }'Sum=75 Avg=15 Count=5- Array needed for min/max or median calculations
- printf enables precise formatting for reports
- Mathematical functions enable statistical analysis