I have a large CSV file with the following structure:
doc_id, inclusion, id
1, TRUE, 842
1, FALSE, 768
1, FALSE, 292
1, FALSE, 393
2, TRUE, 191
2, TRUE, 389
2, TRUE, 171
...
The id is the primary key. doc_id is a foreign and represents the document which each id is linked to. Many id's are linked to each document. Each id is classified as included (ie. inclusion == True) or excluded (ie. inclusion == False).
I need to create a summary table which displays for each doc_id, the number of id's included and excluded. I can do this for all doc_id's but am not sure how to implement it for each doc_id. For example, for the data above I want:
doc_id included excluded
1 1 3
2 3 0
...
This is my current code:
for index, row in citationData.iterrows():
if row.included == True:
inc = inc + 1
else:
exc = exc + 1