I'm working with a couple of binary files and I want to parse UTF-8 strings that exist.
I currently have a function that takes the starting location of a file, then returns the string found:
def str_extract(file, start, size, delimiter = None, index = None):
   file.seek(start)
   if (delimiter != None and index != None):
       return file.read(size).explode('0x00000000')[index] #incorrect
   else:
       return file.read(size)
Some strings in the file are separated by 0x00 00 00 00, is it possible to split these like PHP's explode? I'm new to Python so any pointers on code improvements are welcome.
Sample file:
48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C 00 64 00 | 00 00 00 00 | 31 00 32 00 33 00 which is Hello World123, I've noted the 00 00 00 00 separator by enclosing it with | bars.
So:
str_extract(file, 0x00, 0x20, 0x00000000, 0) => 'Hello World'
Similarly:
str_extract(file, 0x00, 0x20, 0x00000000, 1) => '123'
 
     
     
    