I have a table with 12 columns and want to select the items in the first column (qseqid) based on the second column (sseqid). Meaning that the second column (sseqid) is repeating with different values in the 11th and 12th columns, which areevalueandbitscore, respectively.
The ones that I would like to get are having the lowestevalueand the highestbitscore(whenevalues are the same, the rest of the columns can be ignored and the data is down below).
So, I have made a short code which uses the second columns as a key for the dictionary. I can get five different items from the second column with lists of qseqid+evalueandqseqid+bitscore.
Here is the code:
#!usr/bin/python
filename = "data.txt"
readfile = open(filename,"r")
d = dict()
for i in readfile.readlines():
    i = i.strip()
    i = i.split("\t")
    d.setdefault(i[1], []).append([i[0],i[10]])
    d.setdefault(i[1], []).append([i[0],i[11]])
for x in d:
    print(x,d[x])
readfile.close()
But, I am struggling to get the qseqid with the lowest evalue and the highest bitscore for each sseqid.
Is there any good logic to solve the problem?
Thedata.txtfile (including the header row and with»representing tab characters)
qseqid»sseqid»pident»length»mismatch»gapopen»qstart»qend»sstart»send»evalue»bitscore
ACLA_022040»TBB»32.71»431»258»8»39»468»24»423»2.00E-76»240
ACLA_024600»TBB»80»435»87»0»1»435»1»435»0»729
ACLA_031860»TBB»39.74»453»251»3»1»447»1»437»1.00E-121»357
ACLA_046030»TBB»75.81»434»105»0»1»434»1»434»0»704
ACLA_072490»TBB»41.7»446»245»3»4»447»3»435»2.00E-120»353
ACLA_010400»EF1A»27.31»249»127»8»69»286»9»234»3.00E-13»61.6
ACLA_015630»EF1A»22»491»255»17»186»602»3»439»8.00E-19»78.2
ACLA_016510»EF1A»26.23»122»61»4»21»127»9»116»2.00E-08»46.2
ACLA_023300»EF1A»29.31»447»249»12»48»437»3»439»2.00E-45»155
ACLA_028450»EF1A»85.55»443»63»1»1»443»1»442»0»801
ACLA_074730»CALM»23.13»147»101»4»6»143»2»145»7.00E-08»41.2
ACLA_096170»CALM»29.33»150»96»4»34»179»2»145»1.00E-13»55.1
ACLA_016630»CALM»23.9»159»106»5»58»216»4»147»5.00E-12»51.2
ACLA_031930»RPB2»36.87»1226»633»24»121»1237»26»1219»0»734
ACLA_065630»RPB2»65.79»1257»386»14»1»1252»4»1221»0»1691
ACLA_082370»RPB2»27.69»1228»667»37»31»1132»35»1167»7.00E-110»365
ACLA_061960»ACT»28.57»147»95»5»146»284»69»213»3.00E-12»57.4
ACLA_068200»ACT»28.73»463»231»13»16»471»4»374»1.00E-53»176
ACLA_069960»ACT»24.11»141»97»4»581»718»242»375»9.00E-09»46.2
ACLA_095800»ACT»91.73»375»31»0»1»375»1»375»0»732
And here's a little more readable version of the table's contents:
0            1           2      3        4        5      6    7      8    9        10       11
qseqid       sseqid pident length mismatch  gapopen qstart qend sstart send    evalue bitscore
ACLA_022040  TBB     32.71    431      258        8    39   468     24  423  2.00E-76      240
ACLA_024600  TBB        80    435       87        0     1   435      1  435         0      729
ACLA_031860  TBB     39.74    453      251        3     1   447      1  437 1.00E-121      357
ACLA_046030  TBB     75.81    434      105        0     1   434      1  434         0      704
ACLA_072490  TBB      41.7    446      245        3     4   447      3  435 2.00E-120      353
ACLA_010400  EF1A    27.31    249      127        8    69   286      9  234  3.00E-13     61.6
ACLA_015630  EF1A       22    491      255       17   186   602      3  439  8.00E-19     78.2
ACLA_016510  EF1A    26.23    122       61        4    21   127      9  116  2.00E-08     46.2
ACLA_023300  EF1A    29.31    447      249       12    48   437      3  439  2.00E-45      155
ACLA_028450  EF1A    85.55    443       63        1     1   443      1  442         0      801
ACLA_074730  CALM    23.13    147      101        4     6   143      2  145  7.00E-08     41.2
ACLA_096170  CALM    29.33    150       96        4    34   179      2  145  1.00E-13     55.1
ACLA_016630  CALM     23.9    159      106        5    58   216      4  147  5.00E-12     51.2
ACLA_031930  RPB2    36.87   1226      633       24   121  1237     26 1219         0      734
ACLA_065630  RPB2    65.79   1257      386       14     1  1252      4 1221         0     1691
ACLA_082370  RPB2    27.69   1228      667       37    31  1132     35 1167 7.00E-110      365
ACLA_061960  ACT     28.57    147       95        5   146   284     69  213  3.00E-12     57.4
ACLA_068200  ACT     28.73    463      231       13    16   471      4  374  1.00E-53      176
ACLA_069960  ACT     24.11    141       97        4   581   718    242  375  9.00E-09     46.2
ACLA_095800  ACT     91.73    375       31        0     1   375      1  375         0      732
 
     
     
     
     
    