Chapter 15: Regular Expressions

In this chapter, we’ll explore regular expressions (regex), a powerful tool for matching and manipulating text. You’ll learn how to use regex with tools like grep, sed, and awk to search, filter, and transform text efficiently. By the end of this chapter, you’ll be able to harness the full power of regex in your scripts and command-line workflows.

1. What Are Regular Expressions?

Regular expressions are patterns used to match and manipulate text. They’re widely used in search operations, text processing, and data validation. Regex allows you to define complex search patterns using a combination of literals, metacharacters, character classes, and anchors.

Basic Regex Components

Literals: Match exact characters (e.g., a matches the letter “a”).
Metacharacters: Special characters with specific meanings (e.g., . matches any character, * matches zero or more occurrences).
Character Classes: Match a set of characters (e.g., [a-z] matches any lowercase letter).
Anchors: Match positions in the text (e.g., ^ for the start of a line, $ for the end of a line).

2. Using `grep` with Regex

grep is a command-line tool for searching text using regex. It’s one of the most commonly used tools for filtering lines of text that match a pattern.

Basic Usage

$ grep "pattern" file.txt

Common Options

-i: Ignore case.
-v: Invert match (show lines that don’t match).
-E: Use extended regex (supports +, ?, |, etc.).
-o: Show only the matching part of the line.
-c: Count the number of matching lines.
-n: Display line numbers along with matching lines.

Examples

Match lines containing “error”:
$ grep "error" logfile.txt
Match lines starting with “warning”:
$ grep "^warning" logfile.txt
Match lines ending with “success”:
$ grep "success$" logfile.txt
Count the number of lines containing “error”:
$ grep -c "error" logfile.txt

Advanced `grep` Usage

Search recursively in directories:
$ grep -r "pattern" /path/to/directory
Use extended regex to match multiple patterns:
$ grep -E "error|warning" logfile.txt
Highlight matches in color:
$ grep --color "pattern" file.txt

3. Using `sed` with Regex

sed (stream editor) is a tool for filtering and transforming text using regex. It’s particularly useful for performing search-and-replace operations.

Basic Usage

$ sed 's/pattern/replacement/' file.txt

Common Commands

s: Substitute (replace) text.
p: Print lines.
d: Delete lines.
i: Insert text before a line.
a: Append text after a line.

Examples

Replace “foo” with “bar”:
$ sed 's/foo/bar/' file.txt
Delete lines containing “error”:
$ sed '/error/d' file.txt
Print lines matching “warning”:
$ sed -n '/warning/p' file.txt

Advanced `sed` Usage

Global replacement (replace all occurrences in a line):
$ sed 's/foo/bar/g' file.txt
In-place editing (modify the file directly):
$ sed -i 's/foo/bar/' file.txt
Multiple commands:
$ sed 's/foo/bar/; s/baz/qux/' file.txt

4. Using `awk` with Regex

awk is a powerful text-processing tool that supports regex for pattern matching. It’s particularly useful for working with structured data like CSV files.

Basic Usage

$ awk '/pattern/ { action }' file.txt

Common Actions

print: Print the matching line.
gsub: Globally substitute text.
if/else: Conditional logic.
BEGIN/END: Pre- and post-processing blocks.

Examples

Print lines containing “error”:
$ awk '/error/ { print }' logfile.txt
Replace “foo” with “bar” and print the line:
$ awk '{ gsub(/foo/, "bar"); print }' file.txt
Print specific fields from a CSV file:
$ awk -F ',' '{ print $1, $3 }' data.csv

Advanced `awk` Usage

Calculate the sum of a column:
$ awk '{ total += $1 } END { print "Total:", total }' numbers.txt
Filter rows based on a condition:
$ awk '$3 > 50 { print $1, $3 }' data.txt
Use BEGIN and END blocks:
$ awk 'BEGIN { print "Start processing..." } { print } END { print "Processing complete." }' file.txt

5. Common Regex Patterns

Here are some commonly used regex patterns:

Matching Email Addresses: regex [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Matching URLs: regex https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Matching Dates (YYYY-MM-DD): regex \d{4}-\d{2}-\d{2}
Matching Phone Numbers: regex \+?\d{1,3}[-.\s]?$?\d{1,4}$?[-.\s]?\d{1,4}[-.\s]?\d{1,9}

6. Combining Tools

You can combine grep, sed, and awk with regex to create powerful text-processing pipelines.

Example: Extract Email Addresses from a File

$ grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

Example: Replace Dates in a File

$ sed -E 's/\d{4}-\d{2}-\d{2}/[DATE]/g' file.txt

7. Areas to Expand for Beginners

1. Regex Basics

Expand on Metacharacters: Provide more examples of how metacharacters like ., *, +, and ? work.
- Example: Explain that a+ matches one or more a characters, while a* matches zero or more.
Explain Escaping: Clarify how to escape special characters (e.g., \. matches a literal period).

2. `grep` Section

Add More Beginner-Friendly Examples:
- Show how to search for multiple patterns:
  $ grep -E "error|warning" logfile.txt
- Explain how to search for whole words using -w:
  $ grep -w "error" logfile.txt

3. `sed` Section

Explain In-Place Editing: Clarify that -i modifies the file directly and can create backups:
$ sed -i.bak 's/foo/bar/' file.txt
Add a Simple Example for Line Deletion:
$ sed '3d' file.txt # Deletes the third line

4. `awk` Section

Explain Field Separators: Show how to change the field separator for CSV files:
$ awk -F ',' '{ print $1 }' data.csv
Add a Simple Example for Printing Specific Columns:
$ awk '{ print $1, $3 }' file.txt

8. Modern Alternatives to `grep`, `sed`, and `awk`

While grep, sed, and awk are powerful and widely used, there are modern alternatives that offer improved performance, usability, and additional features. Here are a few:

1. `ripgrep` (rg)

Purpose: A faster alternative to grep that respects .gitignore by default.
Features:
- Recursive search by default.
- Ignores hidden files and binary files automatically.
- Supports regex and multi-threading for faster searches.
Example: $ rg "error" # Searches for "error" recursively in the current directory

2. `fd`

Purpose: A simpler and faster alternative to find for searching files and directories.
Features:
- Intuitive syntax (e.g., fd "*.txt").
- Ignores hidden files and respects .gitignore by default.
- Faster than find for most use cases.
Example: $ fd "\.txt$" # Finds all `.txt` files recursively

3. `jq`

Purpose: A tool for processing JSON data, similar to awk for structured text.
Features:
- Extracts and manipulates JSON data with a simple syntax.
- Supports filtering, mapping, and transforming JSON.
Example: $ echo '{"name": "Alice", "age": 30}' | jq '.name' # Extracts the "name" field

4. `sd`

Purpose: A modern alternative to sed for search-and-replace operations.
Features:
- Simpler syntax than sed.
- Supports regex and is faster for large files.
Example: $ echo "foo bar" | sd "foo" "baz" # Replaces "foo" with "baz"

These tools are particularly useful for beginners due to their intuitive syntax and improved performance.

9. What is `egrep`?

It’s essentially a variant of grep that supports extended regular expressions (ERE) by default. Here’s a detailed explanation of the relationship between grep, egrep, and other related tools:

egrep stands for “extended grep”.
It is a version of grep that uses extended regular expressions (ERE) by default, which support additional metacharacters like +, ?, |, and () for grouping.
In modern systems, egrep is often implemented as a symbolic link or alias to grep -E, which enables extended regex support.

2. Key Differences Between `grep` and `egrep`

Feature	`grep`	`egrep`
Regex Type	Basic Regular Expressions (BRE)	Extended Regular Expressions (ERE)
Metacharacters	`+`, `?`, `\|`, `()` must be escaped (`\+`, `\?`, `\\|`, ``)	`+`, `?`, `\|`, `()` work without escaping
Usage	`grep "pattern" file.txt`	`egrep "pattern" file.txt`
Modern Equivalent	`grep -E` (enables ERE)	`egrep` is essentially `grep -E`

Examples of `egrep` Usage

Basic Usage

$ egrep "pattern" file.txt
Example 1: Match Lines with “error” or “warning”

$ egrep "error|warning" logfile.txt

In grep, you would need to use -E or escape the |: bash $ grep -E "error|warning" logfile.txt $ grep "error\|warning" logfile.txt

Example 2: Match Lines with One or More Digits

$ egrep "[0-9]+" file.txt

In grep, you would need to escape the +: bash $ grep "[0-9]\+" file.txt

Modern Usage: `grep -E`

In modern systems, egrep is often just a shortcut for grep -E. The -E flag enables extended regular expressions in grep, making it functionally equivalent to egrep.

Example

$ grep -E "error|warning" logfile.txt

This is the same as:

$ egrep "error|warning" logfile.txt

Other Related Tools

`fgrep` (Fixed grep)

Purpose: Matches fixed strings (no regex support).
Modern Equivalent: grep -F.

`rgrep` (Recursive grep)

Purpose: Searches recursively through directories.
Modern Equivalent: grep -r.

6. When to Use `egrep` or `grep -E`

Use egrep or grep -E when you need extended regex features like +, ?, |, or () without escaping them.
Use grep for basic regex or when you want to stick to the simpler syntax.
egrep is a variant of grep that supports extended regular expressions (ERE).
In modern systems, egrep is often just an alias for grep -E.
Use egrep or grep -E when you need advanced regex features without escaping metacharacters.

Example Comparison

Using `egrep`

$ egrep "error|warning" logfile.txt

Using `grep -E`

$ grep -E "error|warning" logfile.txt

Using `grep` (with escaping)

$ grep "error\|warning" logfile.txt

By understanding the relationship between grep, egrep, and grep -E, you can choose the right tool for your text-searching needs.

10. Summary

By mastering regex and tools like grep, sed, and awk, you can efficiently search, filter, and transform text in Linux. These tools are indispensable for system administrators, developers, and anyone working with text data.

Practice Time!

Let’s put your new skills to the test:

Use grep to find all lines in a file that contain a valid email address.

2. Use sed to replace all occurrences of “foo” with “bar” in a file.

3. Use awk to print the second column of a CSV file.

4. Write a regex pattern to match phone numbers in a file.

5. Try using ripgrep to search for a pattern recursively in a directory.

6. Use sd to replace “foo” with “baz” in a file

That’s it for this chapter 15 ! You’ve now learned how to use regular expressions with grep, sed, and awk to search, filter, and transform text. In the next chapter, we’ll dive into text processing—using tools like cut, sort, uniq, and wc to manipulate text files. Until then, practice using regex to become more comfortable with its powerful capabilities.

Prev: Chapter 14 | Next: Chapter 16

Chapter 15: Regular Expressions

1. What Are Regular Expressions?

Basic Regex Components

2. Using grep with Regex

Basic Usage

Common Options

Examples

Advanced grep Usage

3. Using sed with Regex

Basic Usage

Common Commands

Examples

Advanced sed Usage

4. Using awk with Regex

Basic Usage

Common Actions

Examples

Advanced awk Usage

5. Common Regex Patterns

6. Combining Tools

Example: Extract Email Addresses from a File

Example: Replace Dates in a File

7. Areas to Expand for Beginners

1. Regex Basics

2. grep Section

3. sed Section

4. awk Section

8. Modern Alternatives to grep, sed, and awk

1. ripgrep (rg)

2. fd

3. jq

4. sd

9. What is egrep?

2. Key Differences Between grep and egrep

Examples of egrep Usage

Basic Usage

Example 2: Match Lines with One or More Digits

Modern Usage: grep -E

Example

Other Related Tools

fgrep (Fixed grep)

rgrep (Recursive grep)

6. When to Use egrep or grep -E

Example Comparison

Using egrep

Using grep -E

Using grep (with escaping)

10. Summary

Practice Time!

2. Using `grep` with Regex

Advanced `grep` Usage

3. Using `sed` with Regex

Advanced `sed` Usage

4. Using `awk` with Regex

Advanced `awk` Usage

2. `grep` Section

3. `sed` Section

4. `awk` Section

8. Modern Alternatives to `grep`, `sed`, and `awk`

1. `ripgrep` (rg)

2. `fd`

3. `jq`

4. `sd`

9. What is `egrep`?

2. Key Differences Between `grep` and `egrep`

Examples of `egrep` Usage

Modern Usage: `grep -E`

`fgrep` (Fixed grep)

`rgrep` (Recursive grep)

6. When to Use `egrep` or `grep -E`

Using `egrep`

Using `grep -E`

Using `grep` (with escaping)