I am trying to merge hundreds of samples files that contain species names and proportions into one file in long-format using bash script. I wonder how to add some characters at the beginning of each line of awk output.
I have some sampleID that I saved in the variable $STEM. I used awk to get the species names and proportions from each file. Proportion is at the beginning of each line; species name is at the end (6th place) of each line (tab-separated). But I also want to add the sampleID ($STEM) to the beginning of each line in the output file. Here is my code:
for file in $input_dir/*_species_abundance.txt
do
        STEM=$(basename "$file" _species_abundance.txt )
        echo "processing sample $STEM"
        awk '{print "$STEM," $1,$6}' FS='\t' $file >> $input_dir/merged_species_abundance.txt
done
The "$STEM," part doesn't work as expected, because the current output is "$STEM" instead of substituting it with the sampleID.
Do you have any suggestions on how I can modify my code? Thank you in advance!
Here is some sample input:
  0.45  124078  0       S       148633                s__Faecalibacterium prausnitzii_D
  0.35  95476   0       S       145938                s__Faecalibacterium prausnitzii_C
  0.21  57002   0       S       158191                s__Faecalibacterium prausnitzii_I
  0.18  49503   0       S       224832                s__Faecalibacterium sp900539945
  0.07  18991   0       S       157095                s__Faecalibacterium prausnitzii_G
  0.04  12007   0       S       187396                s__Faecalibacterium prausnitzii_F
...
... 
The first number is the proportion, and the last word is the species name.
The sampleID is something like 1001, 1002, 1003, ...
My desired output would be (comma-separate):
1001,0.45,s__Faecalibacterium prausnitzii_D
1001,0.35,s__Faecalibacterium prausnitzii_C
1001,0.21,s__Faecalibacterium prausnitzii_I
...
1002,0.28,s__Faecalibacterium prausnitzii_D
1002,0.00,s__Faecalibacterium prausnitzii_C
1002,0.01,s__Faecalibacterium prausnitzii_I
...
1003,0.60,s__Faecalibacterium prausnitzii_D
1003,0.02,s__Faecalibacterium prausnitzii_C
1003,0.39,s__Faecalibacterium prausnitzii_I
...
...
 
    