Merging Two Files with AWK: A Guide

To merge two files using awk, you can use different strategies depending on whether you want to combine columns (e.g., join files based on a common key) or append rows (e.g., concatenate files vertically). Below are practical examples for both scenarios:

1. Merge Two Files Line-by-Line (Combine Columns)

If both files have the same number of lines and you want to merge them side by side (like paste), use awk to read both files sequentially.

Example Files:

file1.txt:

Apple
Banana
Cherry

file2.txt:

100
200
300

AWK Command:

awk 'NR==FNR {a[NR]=$0; next} {print a[FNR], $0}' file1.txt file2.txt

Output:

Apple 100
Banana 200
Cherry 300

Explanation:

  • NR==FNR: True while processing the first file (file1.txt).
  • a[NR]=$0: Store each line of file1.txt in an array.
  • next: Skip to the next line (prevents processing the second file yet).
  • print a[FNR], $0: For file2.txt, print the line from file1.txt with the current line.

2. Merge Two Files Based on a Common Key (Join Columns)

Example Files:

users.txt (Key: Column 1):

101 Alice
102 Bob
103 Charlie

scores.txt (Key: Column 1):

101 85
102 92
103 78

AWK Command:

awk 'NR==FNR {user[$1]=$2; next} $1 in user {print $0, user[$1]}' users.txt scores.txt

Output:

101 85 Alice
102 92 Bob
103 78 Charlie

Explanation:

  • Store names from users.txt using column 1 as the key.
  • Then read scores.txt and match lines based on the same key.

3. Merge Files with Different Delimiters

If your files use different delimiters (e.g., one uses commas and the other tabs), use -F and split to specify them.

file1.csv:

A,Apple
B,Banana
C,Cherry

file2.tsv:

A       Red
B       Yellow
D       Green

Command:

awk -F',' 'NR==FNR {data[$1]=$2; next} {split($0, a, "\t")} a[1] in data {print a[1] "\t" a[2] "\t" data[a[1]]}' file1.csv file2.tsv

Output:

A       Red     Apple
B       Yellow  Banana

4. Merge and Append Rows (Concatenate Files)

If you want to stack two files vertically (like cat):

awk '{print}' file1.txt file2.txt

Output:

Apple
Banana
Cherry
100
200
300

Key Notes

  • Memory Usage: Large files may consume more memory when stored in arrays.
  • Sort First: If your keys aren’t sorted, use sort before merging.

Sort Example:

sort file1.txt > sorted_file1.txt
sort file2.txt > sorted_file2.txt

Use join for Simplicity:

join -1 1 -2 1 users.txt scores.txt

Output:

101 Alice 85
102 Bob 92
103 Charlie 78

When to Use awk vs. Other Tools

  • awk: For complex merging, filtering, or transformation logic.
  • paste: For simple side-by-side merging of lines.
  • join: For fast, key-based merging of sorted files.

Comments

Popular Posts

Why SELinux Is Blocking Your Service (and How to Fix It)

Puppet Code Deploy Troubleshooting & Resolution Guide

Fix: SSH Permission Denied Issue | Real Solution