I'm writing a text classification system in Python. This is what I'm doing to canonicalize each token:
lem, stem = WordNetLemmatizer(), PorterStemmer()
for doc in corpus:
for word in doc:
lemma = stem.stem(lem.lemmatize(word))
The reason I don't want to just lemmatize is because I noticed that WordNetLemmatizer wasn't handling some common inflections. In the case of adverbs, for example, lem.lemmatize('walking') returns walking.
Is it wise to perform both stemming and lemmatization? Or is it redundant? Do researchers typically do one or the other, and not both?