I have a data set as the following:
input file:
id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1
303 14
303 15
303 2
304 16
304 17
304 1
and I need Python code to print out all the possible pair combinations of addr values with common id. There are millions of id and corresponding addr value records in the main test file. So, the code should be able to read columns from a text file.The output will be as follows (only showing for 301 and 302, the rest will continue the pattern):
1 2
1 3
1 4
2 3
2 4
3 4
6 7
6 8
6 9
7 8
7 9
8 9
1 6
1 7
1 8
1 9
2 6
2 7
2 8
2 9
3 6
3 7
3 8
3 9
4 6
4 7
4 8
4 9
1 15
2 15
3 15
......
1 16
2 16
......
15 16
So far I have done the following, but I do not have any idea how to code the pair combination part. I am new in Python, so will appreciate if someone can help me do the coding with a little bit of explanation.
# coding: utf-8
# sample tested in python 3.6
import sys
from pip._vendor.pyparsing import empty
if len(sys.argv) < 2:
sys.stderr.write("Usage: {0} filename\n".format(sys.argv[0]))
sys.exit()
fn = sys.argv[1]
sys.stderr.write("reading " + fn + "...\n")
# Initialize empty set
s = {}
line= 0
fin = open(fn,"r")
for line in fin:
line = line.rstrip()
f = line.split("\t")
line +=1
if line is 1:
txid_prev = line
addr = line
s= addr
continue
txid=line
txid_prev=line
if txid is txid_prev:
s.push(addr)
else:
# connect all pairs in s
# print all pairs as edges
s=addr
txid_prev=txid
if s is not empty:
# connect and print all edges