I have a large binary file. I want to extract certain strings from it and copy them to a new text file.
For example, in:
D-wM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM-FM MM-[o@^B^@^@^@^@^@E7cacscKLrrok9bwC3Z64NTnZM-^G
I want to take the number '7' (after the @^@^@E) and every character after it stopping at the Z ('ignoring the M-^G).
I want to copy this 7cacscKLrrok9bwC3Z64NTnZ to a new file.
There will be multiple such strings in one file. The end will always be denoted by the M- (which I don't want copied). The start will always be denoted by a 7 (which I do want copied).
Unfortunately, my knowledge of grep, sed, etc, does not extend to this level. Can someone please suggest a viable way to achieve this?
cat -v filename | grep [7][A-Z,a-z] will show all strings with a '7' followed by a letter but that's not much.
Thank you.
I've noticed that my requirements are rather more complicated.
(I've performed the correct - I hope - formatting this time). Thanks to 'tshiono' for his (?) answer to the earlier submission.
I want to check the ending of a string and, if it ends in M-, grep another string that follows it (with junk in between). If the string does not end in M-, then I don't want it copied (let alone any other strings).
So what I would like is:
grep -a -Po "7[[:alnum:]]+(?=M-)" file_name and if the ending is M- then grep -a -Po "5x[[:alnum:]]+(?=\^)" file_name to copy the string that starts with 5x and ends with a ^.
In this example:
D-wM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM-FM MM-[o@^B^@^@^@^@^@E7cacscKLrrok9bwC3Z64NTnZM-^GwM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
The outcome would be:
7cacscKLrrok9bwC3Z64NTnZ
5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk
However, if the ending is not M- (more precisely, if the ending is ^S), then do not try the second grep and do not record anything at all.
In this example:
D-wM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM-FM MM-[o@^B^@^@^@^@^@E7cacscKLrrok9bwC3Z64NTnZ^SGwM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
The outcome would be null (nothing copied) as the 7cacs... string ends in ^S.
Is grep the correct tool? Grep a file and if the condition in the grep command is 'yes' then issue a different grep command but if the condition is 'no' then do nothing.
Thanks again.
I have noticed one addition modification.
Can one add an OR command to the second part? Grep if the second string starts with 5x OR 6x? 
In the example below, grep -aPo "7[[:alnum:]]+M-.*?5x[[:alnum:]]+\^" filename | grep -aPo "7[[:alnum:]]+(?=M-)|5x[[:alnum:]]+(?=\^)" will extract the strings starting with 7 and the strings starting with 5x.
How can one change the 5x to 5x or 6x?
D-wM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM-FM MM-[o@^B^@^@^@^@^@E7cacscKLrrok9bwC3Z64NTnZM-^GwM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
D-wM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM-FM MM-[o@^B^@^@^@^@^@E7AAAAAscKLrrok9bwC3Z64NTnZM-^GwM-^?^@^@^@^@^@^@^@^Y^@^@^@^@^@^@^@M-lM-FM-MM-[o@^B^@M-lM6x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
In this example, the desired outcome would be:
7cacscKLrrok9bwC3Z64NTnZ
5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk
7AAAAAscKLrrok9bwC3Z64NTnZ
6x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk
UPDATE MARCH 09:
I need to create a series of complex grep (or perl) commands to extract strings from a series of binary files.
I need two strings from the binary file.
The first string will always start with a 1. 
The first string will end with a letter or number. The next letter will always be a lower case k. I do not want this k character.
The difficulty is that the ending k will not always be the first k in the string. It might be the first k but it might not. 
After the k, there is a second string. The second string will always start with an A or a B.
The ending of the second string will be in one of two forms:
a) it will end with a space then display the first three characters from the first string in lower case followed by a )
b) it will end with a ^K then display the first three characters from the first string in lower case.
For example:
1pppsx9YPar8Rvs75tJYWZq3eo8PgwbckB4m4zT7Yg042KIDYUE82e893hY ppp)
Should be:
1pppsx9YPar8Rvs75tJYWZq3eo8Pgwbc and B4m4zT7Yg042KIDYUE82e893hY - delete the k and the space then ppp.
For example:
1zzzsx9YPkr8Rvs75tJYWZq3eo8PgwbckA2m4zT7Yg042KIDYUE82e893hY^Kzzz
Should be:
1zzzsx9YPkar8Rvs75tJYWZq3eo8Pgwbc and A4m4zT7Yg042KIDYUE82e893hY - delete the second k and the ^Kzzz.
In the second example, we see that the first k is part of the first string. It is the k before the A that breaks up the first and second strings. 
I hope there is a super grep expert who can help! Many thanks!
 
     
     
    