How to easily break up a text file into pieces smaller than a threshold?

Question

I have some text files that are 100 to 300 MB in length that I want to view in Emacs, but my Emacs gets into some performance issues opening and traversing such large files. Therefore, I'm looking for an easy utility to split a file up into manageable chunks, say 50 MB each and to name the chunks based upon the original name plus some suffix to indicate their location in the sequence. Each chunk would pick up where the prior one was truncated and be no longer than 50 MB, with the last chunk possibly being shorter.

Is there any easy tool to do such on a Linux computer, perhaps something like head or tail that gives multiple results, one for each chunk?

e.g., given file test.out that is 120 MB long, break it up into test.out.1 for the first 50 MBs, test.out.2 for the second 50 MBs, and test.out.3 for the remainder 20 MBs at the end of the file.

I could use combinations of head and tail to get the pieces, but I'd like a tool that abstracts this all out, perhaps a Perl script or Python script someone has already created to do such a task?

Hennes · Accepted Answer · 2013-09-11T01:18:37.160

There already is a nice tool for that: split

> man 1 split 

NAME
     split -- split a file into pieces

SYNOPSIS
     split [-l line_count] [-a suffix_length] [file [prefix]]
     split -b byte_count[K|k|M|m|G|g] [-a suffix_length] [file [prefix]]
     split -p pattern [-a suffix_length] [file [prefix]]

split --bytes 50M test.out test.out_ would split the file test.out into test.out_xaa, test.out_xab, test.out_xac, ...

A much uglier solution would be to use dd

dd if=test.out of=test.out.part1 bs=1M count=50 skip=0 creates a file named test.out.part1 with the first 50M from test.out. You can increase the value for skip to 1 to get the second chunk, to 2 for the third etc etc. Just make sure to also change the filenames or you will end up overwriting the same output file.

How to easily break up a text file into pieces smaller than a threshold?

1 Answers1