I have a text file of ~1GB with about 6k rows (each row is very long) and I need to randomly shuffle its rows. Is it possible? Possibly with awk?
6 Answers
You can use the shuf command from GNU coreutils. The utility is pretty fast and would take less than a minute for shuffling a 1 GB file.
The command below might just work in your case because shuf will read the complete input before opening the output file:
$ shuf -o File.txt < File.txt
- 5,693
- 376
Python 3 one-liner:
python3 -c "import sys, random; L = sys.stdin.readlines(); random.shuffle(L); print(''.join(L), end='')"
Python 2 one-liner:
python2 -c "import sys, random; L = sys.stdin.readlines(); random.shuffle(L); print ''.join(L),"
Reads all the lines from the standard input, shuffles them in-place, then prints them without adding an ending newline (notice the end='' or , from the end).
- 5,693
If like me you came here to look for an alternate to shuf for macOS then use randomize-lines.
Install randomize-lines(homebrew) package, which has an rl command which has similar functionality to shuf.
brew install randomize-lines
Usage: rl [OPTION]... [FILE]...
Randomize the lines of a file (or stdin).
-c, --count=N select N lines from the file
-r, --reselect lines may be selected multiple times
-o, --output=FILE
send output to file
-d, --delimiter=DELIM
specify line delimiter (one character)
-0, --null set line delimiter to null character
(useful with find -print0)
-n, --line-number
print line number with output lines
-q, --quiet, --silent
do not output any errors or warnings
-h, --help display this help and exit
-V, --version output version information and exit
- 111
For OSX the binary is called gshuf.
brew install coreutils
gshuf -o File.txt < File.txt
- 4,407
- 151
I forgot where I found this, but here's the shuffle.pl that I use:
#!/usr/bin/perl -w
@(#) randomize Effectively unsort a text file into random order.
96.02.26 / drl.
Based on Programming Perl, p 245, "Selecting random element ..."
Set the random seed, PP, p 188
srand(time|$$);
Suck in everything in the file.
@a = <>;
Get random lines, write 'em out, mark 'em done.
while ( @a ) {
$choice = splice(@a, rand @a, 1);
print $choice;
}