I'm trying to retrieve a list of bigrams with a specific frequency (i).
I've managed to come up with two ways to do it and I am wondering which would be the most efficient. I first create a list of bigrams bg1 then use the nltk.FreqDist method:
import nltk
from nltk import FreqDist
from nltk import bigrams
#setup data
from nltk.book import text1
#keep only alpha words / remove punctuation
alphlist = [w for w in list(text1) if w.isalpha()]
#create bigrams list
bg1 = bigrams(alphlist)
#create freqdist object
fdist1 = nltk.FreqDist(bg1)
Approach one uses the most_common sort first:
for obj in fdist1.most_common():
  if obj[1] == i:
    print(obj)
Approach two parses fdist1 directly:
for obj in fdist1:
  if fdist1[obj] == i:
    print(obj, fdist1[obj]) 
Which approach is better and why?
 
    