I have a dataframe with two columns 'text' and 'lang' and I need to extract the groups (unique) of 'text' values that have the same number N of languages. For example:
For the following example dataframe:
text     lang
--------------
text_a   en
text_b   es
text_a   es
text_a   it
text_c   de
text_c   pt
text_d   no
...
I can extract the list of languages per unique text:
df.groupby('text').lang.apply(list)
and that gives me a result like this one:
text_a -> [es, en, it, fr]
text_b -> [es, it, de]
text_c -> [es, nl, it]
text_d -> [fr, no, de, pt]
Now, from this result, how can i filter all the texts that appear in the same N languages? For example, for spanish and french the desired result would be all the rows from the initial dataframe where all seleted text values also have 'es' and 'fr' on the lang column.
text     lang
--------------
text_a   fr
text_b   es
text_a   es
text_b   es
text_b   fr
text_c   fr
text_d   es
...
The output contains all texts that have a row with 'es' and a row with 'fr' and only those two appear in the output. The isin() function will not work here.
Thanks in advance.
 
    