I have a dataset containing 485k strings (1.1 GB).
Each string contains about 700 of chars featuring about 250 variables (1-16 chars per variable), but it doesn't have any splitmarks. Lengths of each variable are known. What is the best way to modify and mark the data by symbol ,?
For example: I have strings like:
0123456789012...
1234567890123...
and array of lengths:
5,3,1,4,...
then I should get like this:
01234,567,8,9012,...
12345,678,9,0123,...
Could anyone help me with this? Python or R-tools are mostly preferred to me...