can anyone help me improve this pandas code?
import pandas as pd
df = pd.DataFrame(
    [
        [
            'chr1', 222
        ],
        [
            'chr1', 233
        ],
        [
            'chr1', 2123
        ],
        [
            'chr2', 244
        ]
    ], columns = ['chrom', 'pos']
)
df2 = pd.DataFrame(
    [
        [
            'chr1', 221, 223
        ],
        [
            'chr1', 230, 240
        ],
    ], columns = ['chrom', 'start', 'end']
)
Gives me 2 dfs with genomic coordinates. The first one is an exact position:
    chrom   pos
0   chr1    222
1   chr1    233
2   chr1    2123
3   chr2    244
and the second is a range:
    chrom   start   end
0   chr1    221 223
1   chr1    230 240
I need to find the count of exact coordinates that are in one of the ranges (in the same chrom)
This works but is slow:
c=0
for chrom, data in df.groupby('chrom'):
    tmp = df2.query(f'chrom == "{chrom}"')
    for p in data.pos:
        for s, e in zip(tmp.start, tmp.end):
            if s < p < e:
                c+=1
Then c = 2
I think I can use agg to do this without iteration (and hopefully faster) but I can't get it working. Can anyone show me how?
PS this is also asked on the bioinformatics stack beta.
 
    