I was trying to collect all of the Message-ID: headers (lines) in a directory with 200K .eml (plain text) files. A bit naively, I said:
find -type f -exec grep -Fi "message-id:" {} \; > messageids.txt
I let it run overnight, since I figured it would take a while to grep through that many files. A bit to my surprise this morning, messageids.txt is 1.7TB and my partition is full. I realize that what must have happened is that grep's own output is being picked up as input, but I wouldn't (and still don't, intuitively) expect it to repeat endlessly. Which means that my understanding of the forces at play isn't as strong as it should be.
Can anyone provide a detailed explanation of how the command above works and why this infinite loop should (I assume) be expected? Thanks.
Update: The way I'd expect it to work is that find finds a list of files, and on each one of them grep is called. So at some point grep is called on messageids.txt. If I were to do this on, say, a sort command, messageids.txt would be created as soon as the command executes (possibly whacking it, if it already existed), but it wouldn't be populated until the command completes. In this case, for the loop to be infinite, the file must be getting populated before the output is complete, but in such a way that the input from grep is perpetually keeping up on it. That's the bit that doesn't behave like I'd expect, and I was hoping for a detailed explanation of how this process chain is executing so I can firm up my Linux fundamentals.