Given a dataframe, I want to get the duplicated indexes, which do not have duplicate values in the columns, and see which values are different.
Specifically, I have this dataframe:
import pandas as pd
wget https://www.dropbox.com/s/vmimze2g4lt4ud3/alt_exon_repeatmasker_intersect.bed
alt_exon_repeatmasker = pd.read_table('alt_exon_repeatmasker_intersect.bed', header=None, index_col=3)
In [74]: alt_exon_repeatmasker.index.is_unique
Out[74]: False
And some of the indexes have duplicate values in the 9th column (the type of DNA repetitive element in this location), and I want to know what are the different types of repetitive elements for individual locations (each index = a genome location).
I'm guessing this will require some kind of groupby and hopefully some groupby ninja can help me out.
To simplify even further, if we only have the index and the repeat type,
genome_location1    MIR3
genome_location1    AluJb
genome_location2    Tigger1
genome_location3    AT_rich
So the output I'd like to see all duplicate indexes and their repeat types, as such:
genome_location1    MIR3
genome_location1    AluJb
EDIT: added toy example