I'm doing a bioinformatic study, where I process some data and get some outputs into some desired folders. The folder/file structure looks like this, for two of the folders:
binned/90-20-09-2018/bins/90-20-09-2018.001, 90-20-09-2018.002, 90-20-09-2018.003 and so forth
binned/90-25-04-2018/bins/90-25-04-2018.001, 90-25-04-2018.002, 90-25-04-2018.003 and so forth
I know the amount of folders, but the amount of files in the folders, is unknown and will vary.
In another file called taxonomy (eg. binned/90-20-09-2018/bins/quality/taxonomy.txt) is a table of bacterial names for each of the bins (the files named 90-20-09-2018.001, 90-20-09-2018.002 etc.). As you can see, for each bin ID is a corresponding Taxonomy.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bin Id              # unique markers (of 43)   # multi-copy   Taxonomy                                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
  90-20-09-2018.001              25                   15        k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus          
  90-20-09-2018.003              24                   0         k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Streptococcus          
  90-20-09-2018.002              15                   0         k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae_2;g__Lactobacillus_2      
  90-20-09-2018.005              14                   11        k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae                           
  90-20-09-2018.004              12                   0         k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae;g__Mobiluncus  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
What I need, is to rename each of the bin files (90-20-09-2018.001, 90-20-09-2018.002 etc.) into their corresponding taxonomy (genus) name. The genus name is the name that comes after "g". so for bin 001, it would be "Lactobacillus".
So the final result would look like this (for the first folder).
binned/90-20-09-2018/bins/Lactobacillus, Lactobacillus_2, Streptococcus and so forth
I imagine this being done with Python (the only programming language I'm familiar with) Feel free to ask questions if I've been too unclear.
Thanks!
 
    
