Chapter 16: Working with Text

In this chapter, we’ll explore tools and techniques for working with text in Linux. You’ll learn how to use commands like cut, sort, uniq, and wc to manipulate and analyze text files. We’ll also cover modern tools like fzf for interactive text searching and nroff for document formatting. By the end of this chapter, you’ll be able to efficiently process and extract meaningful information from text data.


1. Why Learn Text Processing?

Text processing is a fundamental skill for working with logs, configuration files, CSV data, and more. Whether you’re extracting specific fields, sorting data, or counting occurrences, these tools will help you work with text files like a pro.


2. The cut Command

The cut command extracts specific columns or fields from a file.

Basic Usage

$ cut -d',' -f1 file.csv
  • -d',': Use a comma as the delimiter.
  • -f1: Extract the first field.

Examples

  • Extract the first column of a CSV file $ cut -d',' -f1 file.csv
  • Extract the second and third columns: $ cut -d',' -f2,3 file.csv
  • Extract characters 1 to 5 from each line: $ cut -c1-5 file.txt

3. The sort Command

The sort command sorts lines in a file.

Basic Usage

$ sort file.txt

Common Options

  • -r: Reverse the sort order.
  • -n: Sort numerically.
  • -k: Sort by a specific column (useful for CSV files).

Examples

  • Sort a file alphabetically: $ sort file.txt
  • Sort a file numerically: $ sort -n numbers.txt
  • Sort a CSV file by the second column: $ sort -t',' -k2 file.csv

4. The uniq Command

The uniq command removes duplicate lines from a sorted file.

Basic Usage

$ uniq file.txt

Common Options

  • -c: Count occurrences of each line.
  • -d: Show only duplicate lines.
  • -u: Show only unique lines.

Examples

  • Remove duplicates from a file: $ sort file.txt | uniq
  • Count occurrences of each line: $ sort file.txt | uniq -c
  • Show only duplicate lines: $ sort file.txt | uniq -d

5. The wc Command

The wc (word count) command counts lines, words, and characters in a file.

Basic Usage

$ wc file.txt

Common Options

  • -l: Count lines.
  • -w: Count words.
  • -c: Count characters.

Examples

  • Count lines in a file: b$ wc -l file.txt
  • Count words in a file: $ wc -w file.txt
  • Count characters in a file $ wc -c file.txt

6. Combining Commands

You can combine these commands to create powerful text-processing pipelines.

Example: Count Unique Words in a File

$ cat file.txt | tr ' ' '\n' | sort | uniq -c | sort -nr
  • tr ' ' '\n': Replace spaces with newlines to split words into separate lines.
  • sort: Sort the words alphabetically.
  • uniq -c: Count occurrences of each word.
  • sort -nr: Sort by count in descending order.

Example: Extract the Most Common Error from a Log File

$ grep "ERROR" logfile.txt | cut -d' ' -f4- | sort | uniq -c | sort -nr | head -n 1
  • grep "ERROR": Filter lines containing “ERROR”.
  • cut -d' ' -f4-: Extract the error message (starting from the 4th field).
  • sort | uniq -c: Count occurrences of each error.
  • sort -nr: Sort by count in descending order.
  • head -n 1: Display the most common error.

7. Advanced Text Processing with awk

awk is a powerful tool for text processing and can often replace multiple commands. We have covered it in detail in the previous chapter.

Example: Print the Second Column of a CSV File

$ awk -F',' '{ print $2 }' file.csv
  • -F',': Use a comma as the field separator.
  • { print $2 }: Print the second field.

Example: Sum Numbers in a File

$ awk '{ sum += $1 } END { print sum }' numbers.txt
  • { sum += $1 }: Add the value of the first field to sum.
  • END { print sum }: Print the total sum after processing all lines.

8. Searching Files and Text

find

  • Purpose: Search for files and directories based on criteria like name, size, and modification time.
  • Examples:
  • Find all .txt files in the current directory: $ find . -name "*.txt"
  • Find files modified in the last 7 days: $ find . -mtime -7

fzf (Fuzzy Finder)

  • Purpose: An interactive tool for searching and filtering text.
  • Features:
  • Fuzzy matching for quick searches.
  • Works with pipes and integrates with other commands.
  • Examples:
  • Search through a file interactively: $ cat file.txt | fzf
  • Search for files in the current directory: $ fzf

9. Modern Text Processing Tools

nroff

  • Purpose: A modern version of groff for formatting text documents.
  • Features:
  • Lightweight and faster than groff.
  • Ideal for formatting man pages and simple documents.
  • Example: $ nroff -man file.1 | less

jq

  • Purpose: A tool for processing JSON data.
  • Example: $ echo '{"key": "value"}' | jq '.key'

10. Document Formatting with groff and nroff

groff (GNU Troff)

  • Purpose: A typesetting system for formatting documents.
  • Common Uses:
  • Formatting man pages.
  • Creating professional-quality documents.
  • Example: $ groff -T ps file > output.ps

nroff

  • Purpose: A simpler alternative to groff for terminal-based formatting.
  • Example: $ nroff -man file.1 | less

11. Summary

By mastering text-processing tools like cut, sort, uniq, wc, and awk, as well as modern tools like fzf and nroff, you can efficiently manipulate and analyze text data. These tools are indispensable for system administrators, developers, and anyone working with text files.


Practice Time!

Let’s put your new skills to the test:

  1. Use cut to extract the third column of a CSV file.

2. Use sort and uniq to find unique lines in a file.

3. Use wc to count the number of words in a file.

4 .Combine grep, cut, and sort to find the most frequent error in a log file.

5. Use find to locate all .log files in the current directory.

6. Try using fzf to interactively search through a file.


That’s it for this chapter16 ! You’ve now learned how to use text-processing tools to manipulate and analyze text files. In the next chapter, we’ll dive into networking—using commands like ping, curl, and wget to interact with networks and the web. Until then, practice these tools to become more comfortable with text processing.


Prev: Chapter 15 | Next: Chapter 17