SQL style inner join in Python?

Question

I have two array like this:

[('a', 'beta'), ('b', 'alpha'), ('c', 'beta'), .. ]

[('b', 37), ('c', 22), ('j', 93), .. ]

I want to produce something like:

[('b', 'alpha', 37), ('c', 'beta', 22), .. ]

Is there an easy way to do this?

@depperm I considered a for loop to check if matching and push to a new array but I thought there might be some built in functions that could make it easier. — Philip Kirkbride, May 09 '17 at 12:56
Check this thread: http://stackoverflow.com/questions/7776907/sql-join-or-rs-merge-function-in-numpy — Cleared, May 09 '17 at 12:57
http://stackoverflow.com/questions/17682721/combine-two-arrays-data-using-inner-join — Serge, May 09 '17 at 13:13

score 1 · Answer 1 · answered May 09 '17 at 13:09

There is no built in method. Adding package like numpy will give extra functionalities, I assume.

But if you want to solve it without using any extra packages, you can use a one liner like this:

ar1 = [('a', 'beta'), ('b', 'alpha'), ('c', 'beta')]
ar2 = [('b', 37), ('c', 22), ('j', 93)]
final_ar = [tuple(list(i)+[j[1]]) for i in ar1 for j in ar2 if i[0]==j[0]]
print(final_ar)

Output:

[('b', 'alpha', 37), ('c', 'beta', 22)]

Dan D. · Accepted Answer · 2017-05-09T13:24:27.073

I would suggest a hash discriminator join like method:

l = [('a', 'beta'), ('b', 'alpha'), ('c', 'beta')]
r = [('b', 37), ('c', 22), ('j', 93)]
d = {}
for t in l:
    d.setdefault(t[0], ([],[]))[0].append(t[1:])
for t in r:
    d.setdefault(t[0], ([],[]))[1].append(t[1:])
from itertools import product
ans = [ (k,) + l + r for k,v in d.items() for l,r in product(*v)]

results in:

[('c', 'beta', 22), ('b', 'alpha', 37)]

This has lower complexity closer to O(n+m) than O(nm) because it avoids computing the product(l,r) and then filtering as the naive method would.

Mostly from: Fritz Henglein's Relational algebra with discriminative joins and lazy products

It can also be written as:

def accumulate(it):
    d = {}
    for e in it:
        d.setdefault(e[0], []).append(e[1:])
    return d
l = accumulate([('a', 'beta'), ('b', 'alpha'), ('c', 'beta')])
r = accumulate([('b', 37), ('c', 22), ('j', 93)])
from itertools import product
ans = [ (k,) + l + r for k in l&r for l,r in product(l[k], r[k])]

This accumulates both lists separately (turns [(a,b,...)] into {a:[(b,...)]}) and then computes the intersection between their sets of keys. This looks cleaner. if l&r is not supported between dictionaries replace it with set(l)&set(r).

SQL style inner join in Python?

2 Answers2

Linked