Chapter 15: Regular Expressions

In this chapter, we’ll explore regular expressions (regex), a powerful tool for matching and manipulating text. You’ll learn how to use regex with tools like grep, sed, and awk to search, filter, and transform text efficiently. By the end of this chapter, you’ll be able to harness the full power of regex in your scripts and command-line workflows.


1. What Are Regular Expressions?

Regular expressions are patterns used to match and manipulate text. They’re widely used in search operations, text processing, and data validation. Regex allows you to define complex search patterns using a combination of literals, metacharacters, character classes, and anchors.

Basic Regex Components

  • Literals: Match exact characters (e.g., a matches the letter “a”).
  • Metacharacters: Special characters with specific meanings (e.g., . matches any character, * matches zero or more occurrences).
  • Character Classes: Match a set of characters (e.g., [a-z] matches any lowercase letter).
  • Anchors: Match positions in the text (e.g., ^ for the start of a line, $ for the end of a line).

2. Using grep with Regex

grep is a command-line tool for searching text using regex. It’s one of the most commonly used tools for filtering lines of text that match a pattern.

Basic Usage

$ grep "pattern" file.txt

Common Options

  • -i: Ignore case.
  • -v: Invert match (show lines that don’t match).
  • -E: Use extended regex (supports +, ?, |, etc.).
  • -o: Show only the matching part of the line.
  • -c: Count the number of matching lines.
  • -n: Display line numbers along with matching lines.

Examples

  • Match lines containing “error”
    $ grep "error" logfile.txt
  • Match lines starting with “warning”
    $ grep "^warning" logfile.txt
  • Match lines ending with “success”
    $ grep "success$" logfile.txt
  • Count the number of lines containing “error”
    $ grep -c "error" logfile.txt

Advanced grep Usage

  • Search recursively in directories
    $ grep -r "pattern" /path/to/directory
  • Use extended regex to match multiple patterns
    $ grep -E "error|warning" logfile.txt
  • Highlight matches in color
    $ grep --color "pattern" file.txt

3. Using sed with Regex

sed (stream editor) is a tool for filtering and transforming text using regex. It’s particularly useful for performing search-and-replace operations.

Basic Usage

$ sed 's/pattern/replacement/' file.txt

Common Commands

  • s: Substitute (replace) text.
  • p: Print lines.
  • d: Delete lines.
  • i: Insert text before a line.
  • a: Append text after a line.

Examples

  • Replace “foo” with “bar”
    $ sed 's/foo/bar/' file.txt
  • Delete lines containing “error”
    $ sed '/error/d' file.txt
  • Print lines matching “warning”
    $ sed -n '/warning/p' file.txt

Advanced sed Usage

  • Global replacement (replace all occurrences in a line): 
    $ sed 's/foo/bar/g' file.txt
  • In-place editing (modify the file directly): 
    $ sed -i 's/foo/bar/' file.txt
  • Multiple commands:
    $ sed 's/foo/bar/; s/baz/qux/' file.txt

4. Using awk with Regex

awk is a powerful text-processing tool that supports regex for pattern matching. It’s particularly useful for working with structured data like CSV files.

Basic Usage

$ awk '/pattern/ { action }' file.txt

Common Actions

  • print: Print the matching line.
  • gsub: Globally substitute text.
  • if/else: Conditional logic.
  • BEGIN/END: Pre- and post-processing blocks.

Examples

  • Print lines containing “error”
    $ awk '/error/ { print }' logfile.txt
  • Replace “foo” with “bar” and print the line:
    $ awk '{ gsub(/foo/, "bar"); print }' file.txt
  • Print specific fields from a CSV file
    $ awk -F ',' '{ print $1, $3 }' data.csv

Advanced awk Usage

  • Calculate the sum of a column
    $ awk '{ total += $1 } END { print "Total:", total }' numbers.txt
  • Filter rows based on a condition
    $ awk '$3 > 50 { print $1, $3 }' data.txt
  • Use BEGIN and END blocks
    $ awk 'BEGIN { print "Start processing..." } { print } END { print "Processing complete." }' file.txt

5. Common Regex Patterns

Here are some commonly used regex patterns:

  • Matching Email Addresses: regex [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
  • Matching URLs: regex https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
  • Matching Dates (YYYY-MM-DD): regex \d{4}-\d{2}-\d{2}
  • Matching Phone Numbers: regex \+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}

6. Combining Tools

You can combine grep, sed, and awk with regex to create powerful text-processing pipelines.

Example: Extract Email Addresses from a File

$ grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

Example: Replace Dates in a File

$ sed -E 's/\d{4}-\d{2}-\d{2}/[DATE]/g' file.txt

7. Areas to Expand for Beginners

1. Regex Basics

  • Expand on Metacharacters: Provide more examples of how metacharacters like ., *, +, and ? work.
    • Example: Explain that a+ matches one or more a characters, while a* matches zero or more.
  • Explain Escaping: Clarify how to escape special characters (e.g., \. matches a literal period).

2. grep Section

  • Add More Beginner-Friendly Examples:
    • Show how to search for multiple patterns: 
      $ grep -E "error|warning" logfile.txt
    • Explain how to search for whole words using -w
      $ grep -w "error" logfile.txt

3. sed Section

  • Explain In-Place Editing: Clarify that -i modifies the file directly and can create backups: 
    $ sed -i.bak 's/foo/bar/' file.txt
  • Add a Simple Example for Line Deletion
    $ sed '3d' file.txt # Deletes the third line

4. awk Section

  • Explain Field Separators: Show how to change the field separator for CSV files: 
    $ awk -F ',' '{ print $1 }' data.csv
  • Add a Simple Example for Printing Specific Columns
    $ awk '{ print $1, $3 }' file.txt

8. Modern Alternatives to grep, sed, and awk

While grep, sed, and awk are powerful and widely used, there are modern alternatives that offer improved performance, usability, and additional features. Here are a few:

1. ripgrep (rg)

  • Purpose: A faster alternative to grep that respects .gitignore by default.
  • Features:
    • Recursive search by default.
    • Ignores hidden files and binary files automatically.
    • Supports regex and multi-threading for faster searches.
  • Example: $ rg "error" # Searches for "error" recursively in the current directory

2. fd

  • Purpose: A simpler and faster alternative to find for searching files and directories.
  • Features:
    • Intuitive syntax (e.g., fd "*.txt").
    • Ignores hidden files and respects .gitignore by default.
    • Faster than find for most use cases.
  • Example $ fd "\.txt$" # Finds all `.txt` files recursively

3. jq

  • Purpose: A tool for processing JSON data, similar to awk for structured text.
  • Features:
    • Extracts and manipulates JSON data with a simple syntax.
    • Supports filtering, mapping, and transforming JSON.
  • Example: $ echo '{"name": "Alice", "age": 30}' | jq '.name' # Extracts the "name" field

4. sd

  • Purpose: A modern alternative to sed for search-and-replace operations.
  • Features:
    • Simpler syntax than sed.
    • Supports regex and is faster for large files.
  • Example: $ echo "foo bar" | sd "foo" "baz" # Replaces "foo" with "baz"

These tools are particularly useful for beginners due to their intuitive syntax and improved performance.

 

 


9. What is egrep?

It’s essentially a variant of grep that supports extended regular expressions (ERE) by default. Here’s a detailed explanation of the relationship between grep, egrep, and other related tools:

  • egrep stands for “extended grep”.
  • It is a version of grep that uses extended regular expressions (ERE) by default, which support additional metacharacters like +, ?, |, and () for grouping.
  • In modern systems, egrep is often implemented as a symbolic link or alias to grep -E, which enables extended regex support.

2. Key Differences Between grep and egrep

Featuregrepegrep
Regex TypeBasic Regular Expressions (BRE)Extended Regular Expressions (ERE)
Metacharacters+, ?, |, () must be escaped (\+, \?, \|, \(\))+, ?, |, () work without escaping
Usagegrep "pattern" file.txtegrep "pattern" file.txt
Modern Equivalentgrep -E (enables ERE)egrep is essentially grep -E

Examples of egrep Usage

Basic Usage

$ egrep "pattern" file.txt
Example 1: Match Lines with “error” or “warning”
$ egrep "error|warning" logfile.txt
  • In grep, you would need to use -E or escape the |: bash $ grep -E "error|warning" logfile.txt $ grep "error\|warning" logfile.txt

Example 2: Match Lines with One or More Digits

$ egrep "[0-9]+" file.txt
  • In grep, you would need to escape the +: bash $ grep "[0-9]\+" file.txt

Modern Usage: grep -E

In modern systems, egrep is often just a shortcut for grep -E. The -E flag enables extended regular expressions in grep, making it functionally equivalent to egrep.

Example

$ grep -E "error|warning" logfile.txt

This is the same as:

$ egrep "error|warning" logfile.txt

Other Related Tools

fgrep (Fixed grep)

  • Purpose: Matches fixed strings (no regex support).
  • Modern Equivalent: grep -F.

rgrep (Recursive grep)

  • Purpose: Searches recursively through directories.
  • Modern Equivalent: grep -r.

6. When to Use egrep or grep -E

  • Use egrep or grep -E when you need extended regex features like +, ?, |, or () without escaping them.
  • Use grep for basic regex or when you want to stick to the simpler syntax.
  • egrep is a variant of grep that supports extended regular expressions (ERE).
  • In modern systems, egrep is often just an alias for grep -E.
  • Use egrep or grep -E when you need advanced regex features without escaping metacharacters.

Example Comparison

Using egrep

$ egrep "error|warning" logfile.txt

Using grep -E

$ grep -E "error|warning" logfile.txt

Using grep (with escaping)

$ grep "error\|warning" logfile.txt

By understanding the relationship between grep, egrep, and grep -E, you can choose the right tool for your text-searching needs.


10. Summary

By mastering regex and tools like grep, sed, and awk, you can efficiently search, filter, and transform text in Linux. These tools are indispensable for system administrators, developers, and anyone working with text data.


Practice Time!

Let’s put your new skills to the test:

  1. Use grep to find all lines in a file that contain a valid email address.

2. Use sed to replace all occurrences of “foo” with “bar” in a file.

3. Use awk to print the second column of a CSV file.

4. Write a regex pattern to match phone numbers in a file.

5. Try using ripgrep to search for a pattern recursively in a directory.

6. Use sd to replace “foo” with “baz” in a file


That’s it for this chapter 15 ! You’ve now learned how to use regular expressions with grep, sed, and awk to search, filter, and transform text. In the next chapter, we’ll dive into text processing—using tools like cut, sort, uniq, and wc to manipulate text files. Until then, practice using regex to become more comfortable with its powerful capabilities.


Prev: Chapter 14 | Next: Chapter 16