This question explains how to add your own words to the built-in English stop words of CountVectorizer. I'm interested in seeing the effects on a classifier of eliminating any numbers as tokens.
ENGLISH_STOP_WORDS is stored as a frozen set, so I guess my question boils down (unless there's a method I don't know) to if it's possible to add an arbitrary number represnetation to a frozen list?
My feeling on the question is that it's not possible, since the finiteness of the list you have to pass precludes that.
I suppose one way to accomplish the same thing would be to loop through the test corpus and pop words where word.isdigit() is true to a set/list that I can then union with ENGLISH_STOP_WORDS (see previous answer), but I'd rather be lazy and pass something simpler to the stop_words parameter.