In this chapter, we’ll explore regular expressions (regex), a powerful tool for matching and manipulating text. You’ll learn how to use regex with tools like grep, sed, and awk to search, filter, and transform text efficiently. By the end of this chapter, you’ll be able to harness the full power of regex in your scripts and command-line workflows.
Regular expressions are patterns used to match and manipulate text. They’re widely used in search operations, text processing, and data validation. Regex allows you to define complex search patterns using a combination of literals, metacharacters, character classes, and anchors.
a matches the letter “a”).. matches any character, * matches zero or more occurrences).[a-z] matches any lowercase letter).^ for the start of a line, $ for the end of a line).grep with Regexgrep is a command-line tool for searching text using regex. It’s one of the most commonly used tools for filtering lines of text that match a pattern.
$ grep "pattern" file.txt-i: Ignore case.-v: Invert match (show lines that don’t match).-E: Use extended regex (supports +, ?, |, etc.).-o: Show only the matching part of the line.-c: Count the number of matching lines.-n: Display line numbers along with matching lines.$ grep "error" logfile.txt$ grep "^warning" logfile.txt$ grep "success$" logfile.txt$ grep -c "error" logfile.txtgrep Usage$ grep -r "pattern" /path/to/directory$ grep -E "error|warning" logfile.txt$ grep --color "pattern" file.txtsed with Regexsed (stream editor) is a tool for filtering and transforming text using regex. It’s particularly useful for performing search-and-replace operations.
$ sed 's/pattern/replacement/' file.txts: Substitute (replace) text.p: Print lines.d: Delete lines.i: Insert text before a line.a: Append text after a line.$ sed 's/foo/bar/' file.txt$ sed '/error/d' file.txt$ sed -n '/warning/p' file.txtsed Usage$ sed 's/foo/bar/g' file.txt$ sed -i 's/foo/bar/' file.txt$ sed 's/foo/bar/; s/baz/qux/' file.txtawk with Regexawk is a powerful text-processing tool that supports regex for pattern matching. It’s particularly useful for working with structured data like CSV files.
$ awk '/pattern/ { action }' file.txtprint: Print the matching line.gsub: Globally substitute text.if/else: Conditional logic.BEGIN/END: Pre- and post-processing blocks.$ awk '/error/ { print }' logfile.txt$ awk '{ gsub(/foo/, "bar"); print }' file.txt$ awk -F ',' '{ print $1, $3 }' data.csvawk Usage$ awk '{ total += $1 } END { print "Total:", total }' numbers.txt$ awk '$3 > 50 { print $1, $3 }' data.txtBEGIN and END blocks: $ awk 'BEGIN { print "Start processing..." } { print } END { print "Processing complete." }' file.txtHere are some commonly used regex patterns:
regex [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}regex https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}regex \d{4}-\d{2}-\d{2}regex \+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9}You can combine grep, sed, and awk with regex to create powerful text-processing pipelines.
$ grep -oP '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt$ sed -E 's/\d{4}-\d{2}-\d{2}/[DATE]/g' file.txt., *, +, and ? work.a+ matches one or more a characters, while a* matches zero or more.\. matches a literal period).grep Section$ grep -E "error|warning" logfile.txt-w: $ grep -w "error" logfile.txtsed Section-i modifies the file directly and can create backups: $ sed -i.bak 's/foo/bar/' file.txt$ sed '3d' file.txt # Deletes the third lineawk Section$ awk -F ',' '{ print $1 }' data.csv$ awk '{ print $1, $3 }' file.txtgrep, sed, and awkWhile grep, sed, and awk are powerful and widely used, there are modern alternatives that offer improved performance, usability, and additional features. Here are a few:
ripgrep (rg)grep that respects .gitignore by default.$ rg "error" # Searches for "error" recursively in the current directoryfdfind for searching files and directories.fd "*.txt")..gitignore by default.find for most use cases. $ fd "\.txt$" # Finds all `.txt` files recursivelyjqawk for structured text.$ echo '{"name": "Alice", "age": 30}' | jq '.name' # Extracts the "name" fieldsdsed for search-and-replace operations.sed.$ echo "foo bar" | sd "foo" "baz" # Replaces "foo" with "baz"These tools are particularly useful for beginners due to their intuitive syntax and improved performance.
egrep?It’s essentially a variant of grep that supports extended regular expressions (ERE) by default. Here’s a detailed explanation of the relationship between grep, egrep, and other related tools:
egrep stands for “extended grep”.grep that uses extended regular expressions (ERE) by default, which support additional metacharacters like +, ?, |, and () for grouping.egrep is often implemented as a symbolic link or alias to grep -E, which enables extended regex support.grep and egrep| Feature | grep | egrep |
|---|---|---|
| Regex Type | Basic Regular Expressions (BRE) | Extended Regular Expressions (ERE) |
| Metacharacters | +, ?, |, () must be escaped (\+, \?, \|, \(\)) | +, ?, |, () work without escaping |
| Usage | grep "pattern" file.txt | egrep "pattern" file.txt |
| Modern Equivalent | grep -E (enables ERE) | egrep is essentially grep -E |
egrep Usage$ egrep "pattern" file.txt
Example 1: Match Lines with “error” or “warning”$ egrep "error|warning" logfile.txtgrep, you would need to use -E or escape the |: bash $ grep -E "error|warning" logfile.txt $ grep "error\|warning" logfile.txt$ egrep "[0-9]+" file.txtgrep, you would need to escape the +: bash $ grep "[0-9]\+" file.txtgrep -EIn modern systems, egrep is often just a shortcut for grep -E. The -E flag enables extended regular expressions in grep, making it functionally equivalent to egrep.
$ grep -E "error|warning" logfile.txtThis is the same as:
$ egrep "error|warning" logfile.txtfgrep (Fixed grep)grep -F.rgrep (Recursive grep)grep -r.egrep or grep -Eegrep or grep -E when you need extended regex features like +, ?, |, or () without escaping them.grep for basic regex or when you want to stick to the simpler syntax.egrep is a variant of grep that supports extended regular expressions (ERE).egrep is often just an alias for grep -E.egrep or grep -E when you need advanced regex features without escaping metacharacters.egrep$ egrep "error|warning" logfile.txtgrep -E$ grep -E "error|warning" logfile.txtgrep (with escaping)$ grep "error\|warning" logfile.txtBy understanding the relationship between grep, egrep, and grep -E, you can choose the right tool for your text-searching needs.
By mastering regex and tools like grep, sed, and awk, you can efficiently search, filter, and transform text in Linux. These tools are indispensable for system administrators, developers, and anyone working with text data.
Let’s put your new skills to the test:
grep to find all lines in a file that contain a valid email address.2. Use sed to replace all occurrences of “foo” with “bar” in a file.
3. Use awk to print the second column of a CSV file.
4. Write a regex pattern to match phone numbers in a file.
5. Try using ripgrep to search for a pattern recursively in a directory.
6. Use sd to replace “foo” with “baz” in a file
That’s it for this chapter 15 ! You’ve now learned how to use regular expressions with grep, sed, and awk to search, filter, and transform text. In the next chapter, we’ll dive into text processing—using tools like cut, sort, uniq, and wc to manipulate text files. Until then, practice using regex to become more comfortable with its powerful capabilities.