In this chapter, we’ll explore tools and techniques for working with text in Linux. You’ll learn how to use commands like cut, sort, uniq, and wc to manipulate and analyze text files. We’ll also cover modern tools like fzf for interactive text searching and nroff for document formatting. By the end of this chapter, you’ll be able to efficiently process and extract meaningful information from text data.
Text processing is a fundamental skill for working with logs, configuration files, CSV data, and more. Whether you’re extracting specific fields, sorting data, or counting occurrences, these tools will help you work with text files like a pro.
cut CommandThe cut command extracts specific columns or fields from a file.
$ cut -d',' -f1 file.csv-d',': Use a comma as the delimiter.-f1: Extract the first field. $ cut -d',' -f1 file.csv$ cut -d',' -f2,3 file.csv$ cut -c1-5 file.txtsort CommandThe sort command sorts lines in a file.
$ sort file.txt-r: Reverse the sort order.-n: Sort numerically.-k: Sort by a specific column (useful for CSV files).$ sort file.txt$ sort -n numbers.txt$ sort -t',' -k2 file.csvuniq CommandThe uniq command removes duplicate lines from a sorted file.
$ uniq file.txt-c: Count occurrences of each line.-d: Show only duplicate lines.-u: Show only unique lines.$ sort file.txt | uniq$ sort file.txt | uniq -c$ sort file.txt | uniq -dwc CommandThe wc (word count) command counts lines, words, and characters in a file.
$ wc file.txt-l: Count lines.-w: Count words.-c: Count characters.b$ wc -l file.txt$ wc -w file.txt $ wc -c file.txtYou can combine these commands to create powerful text-processing pipelines.
$ cat file.txt | tr ' ' '\n' | sort | uniq -c | sort -nrtr ' ' '\n': Replace spaces with newlines to split words into separate lines.sort: Sort the words alphabetically.uniq -c: Count occurrences of each word.sort -nr: Sort by count in descending order.$ grep "ERROR" logfile.txt | cut -d' ' -f4- | sort | uniq -c | sort -nr | head -n 1grep "ERROR": Filter lines containing “ERROR”.cut -d' ' -f4-: Extract the error message (starting from the 4th field).sort | uniq -c: Count occurrences of each error.sort -nr: Sort by count in descending order.head -n 1: Display the most common error.awkawk is a powerful tool for text processing and can often replace multiple commands. We have covered it in detail in the previous chapter.
$ awk -F',' '{ print $2 }' file.csv-F',': Use a comma as the field separator.{ print $2 }: Print the second field.$ awk '{ sum += $1 } END { print sum }' numbers.txt{ sum += $1 }: Add the value of the first field to sum.END { print sum }: Print the total sum after processing all lines.find.txt files in the current directory: $ find . -name "*.txt"$ find . -mtime -7fzf (Fuzzy Finder)$ cat file.txt | fzf$ fzfnroffgroff for formatting text documents.groff.$ nroff -man file.1 | lessjq$ echo '{"key": "value"}' | jq '.key'groff and nroffgroff (GNU Troff)$ groff -T ps file > output.psnroffgroff for terminal-based formatting.$ nroff -man file.1 | lessBy mastering text-processing tools like cut, sort, uniq, wc, and awk, as well as modern tools like fzf and nroff, you can efficiently manipulate and analyze text data. These tools are indispensable for system administrators, developers, and anyone working with text files.
Let’s put your new skills to the test:
cut to extract the third column of a CSV file.2. Use sort and uniq to find unique lines in a file.
3. Use wc to count the number of words in a file.
4 .Combine grep, cut, and sort to find the most frequent error in a log file.
5. Use find to locate all .log files in the current directory.
6. Try using fzf to interactively search through a file.
That’s it for this chapter16 ! You’ve now learned how to use text-processing tools to manipulate and analyze text files. In the next chapter, we’ll dive into networking—using commands like ping, curl, and wget to interact with networks and the web. Until then, practice these tools to become more comfortable with text processing.