I have a genome sequencing file in the following format:
chromosome name (string) | location (int) | readings (int)
Data for all chromosomes are stored in one single file and I wish to
- split file into individual chromosome data files;
- convert chromosome names e.g. 'chr1', 'x' to integers.
How can I do that with Pandas?
import pandas as pd
df = pd.read_csv('sample.txt', delimiter='\t', header=None)
The data look like this
0   chr1    3000573     0   
1   chr1    3000574     3   
2   chr2    3000725     1   
3   chr2    3000726     4   
4   chr3    3000900     1   
5   chr3    3000901     0   
I can also reindex the data frame by the chromosome labels chr1, chr2, ...
 
     
    