I need to perform stemming on portuguese strings. To do so, i'm tokening the string using nltk.word_tokenize() function a then stemming each word individually. After that, I rebuild the string. It's working, but not performing well. How can i make it faster? The string length is about 2 million words.
    tokenAux=""
    tokens = nltk.word_tokenize(portugueseString)
        for token in tokens:
            tokenAux = token
            tokenAux = stemmer.stem(token)    
            textAux = textAux + " "+ tokenAux
    print(textAux)
Sorry for bad english and thanks!
 
     
    