Prelude:
Given a sorted input of a list of paths/files, how to find their common paths?
Translating into tech term, if feeding the sorted input from stdin, how to pick the shortest proper prefix from the stdin?
Here the "prefix" has the normal meaning, e.g., string 'abcde' has a prefix of 'abc'. Here is my sample input
$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2'
/home/dave
/home/dave/file1
/home/dave/sub2/file2
This is an example to remove successive proper prefix from the stdin, using the command sed:
$ echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2' | sed "N; /^\(.*\)\n\1\//D; P; D"
/home/dave/file1
/home/dave/sub2/file2
Question:
My question is how to preserve the proper prefix instead, and remove all the lines that have that prefix. Sine both /home/dave/file1 and /home/dave/sub2/file2 has the prefix of /home/dave, the /home/dave will be preserved while the other two not. I.e., it will do the complete opposite of what above sed command does.
More info:
- The input would be sorted already
- If I have
/home/dave /home/dave/file1 /home/phil /home/phil/file2(echo -e '/home/dave\n/home/dave/file1\n/home/dave/sub2/file2\n/home/phil\n/home/phil/file2'), I would expect/home/daveand/home/philto be the answer.
Application:
I have two disk volumes containing similiar content. I want to copy what's in v1 but missing from v2 into another disk volume, v3. Using find, sort, and comm, I am able to get a list of what to copy, but I need to further clean up that list. I.e., as long as I have /home/dave in the list, I don't need the other two.
Thanks!