LOGO

Split Text Files on Ubuntu Linux - Command Line Guide

September 19, 2006
Topics:LinuxFiles
Split Text Files on Ubuntu Linux - Command Line Guide

Working with Large Text Files in Linux

When handling extensive text files, processing can be time-consuming, particularly if the intention is to import the data into a spreadsheet. Alternatively, the need to extract a specific subset of lines from a file may arise. For these scenarios, Linux provides a powerful suite of command-line tools, including split, wc, tail, cat, and grep. Utilities like sed and awk are also invaluable. This discussion will focus on utilizing split and wc for managing large files.

Examining a Log File

Let's begin by inspecting a sample log file to understand its size. The following command lists the file details:

> ls -l
-rw-r--r-- 1 thegeek ggroup 42046520 2006-09-19 11:42 access.log

As shown, the file, access.log, occupies 42MB of storage. However, determining the number of lines within the file is equally important. For instance, importing into Excel often has a line limit, typically around 65,000 lines.

Counting Lines with wc

The wc utility, short for "word count," can efficiently determine the number of lines in a file. Executing the following command provides the line count:

> wc -l access.log
146330 access.log

The output reveals that the file contains 146,330 lines, significantly exceeding the 65,000-line limit for Excel import. Therefore, splitting the file into smaller segments is necessary.

Splitting the File with split

The split utility facilitates dividing a file into smaller, manageable parts. To split access.log into three segments, the following command is used:

> split -l 60000 access.log
> ls -l total 79124
-rw-rw-r-- 1 thegeek ggroup 40465200 2006-09-19 12:00 access.log
-rw-rw-r-- 1 thegeek ggroup 16598163 2006-09-19 12:05 xaa
-rw-rw-r-- 1 thegeek ggroup 16596545 2006-09-19 12:05 xab
-rw-rw-r-- 1 thegeek ggroup 7270492 2006-09-19 12:05 xac

This command creates three new files: xaa, xab, and xac. Each file contains fewer than 60,000 lines, with the final file, xac, holding any remaining lines.

Alternative Splitting

To divide the file more evenly, a different line count can be specified. For example, to split the file into segments of approximately equal size:

> split -l 73165 access.log

This approach ensures a more balanced distribution of lines across the resulting files.

In conclusion, the split utility provides a straightforward method for dividing large text files into smaller, more manageable portions, enabling efficient processing and import into various applications.

#split#text file#ubuntu#linux#command line#divide file