Chapter 16: Working with Text

In this chapter, we’ll explore tools and techniques for working with text in Linux. You’ll learn how to use commands like cut, sort, uniq, and wc to manipulate and analyze text files. We’ll also cover modern tools like fzf for interactive text searching and nroff for document formatting. By the end of this chapter, you’ll be able to efficiently process and extract meaningful information from text data.

1. Why Learn Text Processing?

Text processing is a fundamental skill for working with logs, configuration files, CSV data, and more. Whether you’re extracting specific fields, sorting data, or counting occurrences, these tools will help you work with text files like a pro.

2. The `cut` Command

The cut command extracts specific columns or fields from a file.

Basic Usage

$ cut -d',' -f1 file.csv

-d',': Use a comma as the delimiter.
-f1: Extract the first field.

Examples

Extract the first column of a CSV file: $ cut -d',' -f1 file.csv
Extract the second and third columns: $ cut -d',' -f2,3 file.csv
Extract characters 1 to 5 from each line: $ cut -c1-5 file.txt

3. The `sort` Command

The sort command sorts lines in a file.

Basic Usage

$ sort file.txt

Common Options

-r: Reverse the sort order.
-n: Sort numerically.
-k: Sort by a specific column (useful for CSV files).

Examples

Sort a file alphabetically: $ sort file.txt
Sort a file numerically: $ sort -n numbers.txt
Sort a CSV file by the second column: $ sort -t',' -k2 file.csv

4. The `uniq` Command

The uniq command removes duplicate lines from a sorted file.

Basic Usage

$ uniq file.txt

Common Options

-c: Count occurrences of each line.
-d: Show only duplicate lines.
-u: Show only unique lines.

Examples

Remove duplicates from a file: $ sort file.txt | uniq
Count occurrences of each line: $ sort file.txt | uniq -c
Show only duplicate lines: $ sort file.txt | uniq -d

5. The `wc` Command

The wc (word count) command counts lines, words, and characters in a file.

Basic Usage

$ wc file.txt

Common Options

-l: Count lines.
-w: Count words.
-c: Count characters.

Examples

Count lines in a file: b$ wc -l file.txt
Count words in a file: $ wc -w file.txt
Count characters in a file: $ wc -c file.txt

6. Combining Commands

You can combine these commands to create powerful text-processing pipelines.

Example: Count Unique Words in a File

$ cat file.txt | tr ' ' '\n' | sort | uniq -c | sort -nr

tr ' ' '\n': Replace spaces with newlines to split words into separate lines.
sort: Sort the words alphabetically.
uniq -c: Count occurrences of each word.
sort -nr: Sort by count in descending order.

Example: Extract the Most Common Error from a Log File

$ grep "ERROR" logfile.txt | cut -d' ' -f4- | sort | uniq -c | sort -nr | head -n 1

grep "ERROR": Filter lines containing “ERROR”.
cut -d' ' -f4-: Extract the error message (starting from the 4th field).
sort | uniq -c: Count occurrences of each error.
sort -nr: Sort by count in descending order.
head -n 1: Display the most common error.

7. Advanced Text Processing with `awk`

awk is a powerful tool for text processing and can often replace multiple commands. We have covered it in detail in the previous chapter.

Example: Print the Second Column of a CSV File

$ awk -F',' '{ print $2 }' file.csv

-F',': Use a comma as the field separator.
{ print $2 }: Print the second field.

Example: Sum Numbers in a File

$ awk '{ sum += $1 } END { print sum }' numbers.txt

{ sum += $1 }: Add the value of the first field to sum.
END { print sum }: Print the total sum after processing all lines.

8. Searching Files and Text

`find`

Purpose: Search for files and directories based on criteria like name, size, and modification time.
Examples:
Find all .txt files in the current directory: $ find . -name "*.txt"
Find files modified in the last 7 days: $ find . -mtime -7

`fzf` (Fuzzy Finder)

Purpose: An interactive tool for searching and filtering text.
Features:
Fuzzy matching for quick searches.
Works with pipes and integrates with other commands.
Examples:
Search through a file interactively: $ cat file.txt | fzf
Search for files in the current directory: $ fzf

9. Modern Text Processing Tools

`nroff`

Purpose: A modern version of groff for formatting text documents.
Features:
Lightweight and faster than groff.
Ideal for formatting man pages and simple documents.
Example: $ nroff -man file.1 | less

`jq`

Purpose: A tool for processing JSON data.
Example: $ echo '{"key": "value"}' | jq '.key'

10. Document Formatting with `groff` and `nroff`

`groff` (GNU Troff)

Purpose: A typesetting system for formatting documents.
Common Uses:
Formatting man pages.
Creating professional-quality documents.
Example: $ groff -T ps file > output.ps

`nroff`

Purpose: A simpler alternative to groff for terminal-based formatting.
Example: $ nroff -man file.1 | less

11. Summary

By mastering text-processing tools like cut, sort, uniq, wc, and awk, as well as modern tools like fzf and nroff, you can efficiently manipulate and analyze text data. These tools are indispensable for system administrators, developers, and anyone working with text files.

Practice Time!

Let’s put your new skills to the test:

Use cut to extract the third column of a CSV file.

2. Use sort and uniq to find unique lines in a file.

3. Use wc to count the number of words in a file.

4 .Combine grep, cut, and sort to find the most frequent error in a log file.

5. Use find to locate all .log files in the current directory.

6. Try using fzf to interactively search through a file.

That’s it for this chapter16 ! You’ve now learned how to use text-processing tools to manipulate and analyze text files. In the next chapter, we’ll dive into networking—using commands like ping, curl, and wget to interact with networks and the web. Until then, practice these tools to become more comfortable with text processing.

Prev: Chapter 15 | Next: Chapter 17

Chapter 16: Working with Text

1. Why Learn Text Processing?

2. The cut Command

Basic Usage

Examples

3. The sort Command

Basic Usage

Common Options

Examples

4. The uniq Command

Basic Usage

Common Options

Examples

5. The wc Command

Basic Usage

Common Options

Examples

6. Combining Commands

Example: Count Unique Words in a File

Example: Extract the Most Common Error from a Log File

7. Advanced Text Processing with awk

Example: Print the Second Column of a CSV File

Example: Sum Numbers in a File

8. Searching Files and Text

find

fzf (Fuzzy Finder)

9. Modern Text Processing Tools

nroff

jq

10. Document Formatting with groff and nroff

groff (GNU Troff)

nroff

11. Summary

Practice Time!

2. The `cut` Command

3. The `sort` Command

4. The `uniq` Command

5. The `wc` Command

7. Advanced Text Processing with `awk`

`find`

`fzf` (Fuzzy Finder)

`nroff`

`jq`

10. Document Formatting with `groff` and `nroff`

`groff` (GNU Troff)

`nroff`