3

I'm working on a CentOS server and I have to move around and cat together millions of files. I've tried many incarnations of something like the below, but all of them fail with an argument list too long error.

command:

find ./ -iname out.* -type f -exec mv {} /home/user/trash
find ./paramsFile.* -exec cat > parameters.txt 

error:

-bash: /usr/bin/find: Argument list too long
-bash: /bin/cat: Argument list too long

or

echo ./out.* | xargs -I '{}' mv /home/user/trash
(echo ./paramsFile.* | xargs cat) > parameters.txt  

error:

xargs: argument line too long
xargs: argument line too long              

The second command also never finished. I've heard some things about globbing, but I'm not sure I understand it completely. Any hints or suggestions are welcome!

3 Answers3

5

You have multiple mistakes. You should escape the * globbing. You have to put {} between quotes (for filename security), and you have to end the -exec with \;.

find ./ -iname out.\* -type f -exec mv "{}" /home/user/trash \;
find -name ./paramsFile.\* -exec cat "{}" >> parameters.txt \;

The problem here is that * is matching all the files in your directory, thus giving you the error. If find locates the files instead of shell globbing, xargs gets individual filenames that it can use to construct lines of the correct length.

Bernhard
  • 1,143
2

Try this:

find . -iname 'out.*' -type f -exec mv '{}' /home/user/trash \;
find . -name 'paramsFile.*' -print0 | xargs -0 cat >> parameters.txt

The >> is to make sure multiple invocations of cat (if you really have a huge number of files) output to the same file, without overwriting the result from previous calls. Also, make sure parameters.txt starts out empty (or delete it first).

slhck
  • 235,242
jjlin
  • 16,120
0

I don't have a box handy at the moment to give a very good (ie, tested) answer, but I think this is a good use for parallel.

If I understand right, your command

find ./ -iname out.* -type f -exec mv "{}" /home/user/trash

is making one huge command:

mv out.1 out.2 out.3 out.4 ... out.10100942 /home/user/trash

Instead, something like

find ./ -iname out.* -type f | parallel mv "{}" /home/user/trash

will execute millions of smaller commands:

mv out.1 /home/user/trash
mv out.2 /home/user/trash
...

You might want to look into some of parallel's options, specifically -j and -i so you don't unexpectedly overload your server.

PS. Follow @Bernhard's advice, whenever you use a shell variable, especially for user input, quote it! Do "{}" not {}.

djeikyb
  • 951