I have a pandas dataframe with 3 columns: key1, key2, document.  All three columns are text fields with the size of document ranging from 50 characters to 5000 characters.  I identify a vocabulary based on minimum frequency from the set of documents for each (key1, key2)  for which I am using scikit-learn CountVectorizer and setting min_df.   I am able to do this using df.groupby[['key1','key2']]['document'].apply(vocab).reset_index() where vocab is a function in which I compute and return the vocabulary (as defined above) as a set.
Now, I would like to use these vocabularies (one set for each key1, key2), to filter the corresponding documents so that each document only has words which are in its vocabulary.  I would appreciate any help I can get with this part.
Sample data
Input
key1 | key2 | document
 aa  | bb   | He went home that evening. Then he had soup for dinner.
 aa  | bb   | We want to sit down and eat dinner
 cc  | mm   | Sometimes people eat in a restaurant
 aa  | bb   | The culinary skills of that chef are terrible.  Let us not go there.
 cc  | mm   | People go home after dinner and try to sleep.
Vocabulary - not using counts for the purpose of this example
key1 | key2 | vocab
 aa  | bb   | {went, evening, sit, down, culinary, chef, dinner}
 cc  | mm   | {people, restaurant, home, dinner, sleep}
Result - only use words from corresponding vocab in document
key1 | key2 | document
 aa  | bb   | went evening dinner
 aa  | bb   | sit down dinner
 cc  | mm   | people restaurant
 aa  | bb   | culinary chef
 cc  | mm   | people home dinner sleep
 
     
    