I have the files file1 and file2, where file2 is a subset of file1. That means, if I iterate over file1, there are some lines that are in file2, and some that aren't, but there is no line in file2 that is not in file1. There may be several lines with the same content in a file. Now I want to get the difference between them, that is, all lines of file1 that aren't in file2.
According to this well received answer
diff(1) isn't the answer, comm(1) is.
(For whatever reason)
But as I understand, for comm the files need to be sorted first. The problem: Both files are ordered (not sorted!), and this order needs to be kept. So what I really want is to iterate over file1, and check for every line, if it is also in file2. If not, write it to file3. If the same content occurs more than once, it should be kept more than once!
Is there any way to do this with the command line?