I am using tidytext package in R to do n-gram analysis.
Since I analyze tweets, I would like to preserve @ and # to capture mentions, retweets, and hashtags. However, unnest_tokens function automatically removes all punctuations and convert text into lower case.
I found unnest_tokens has an option to use regular expression using token='regex', so I can customize the way it cleans the text. But, it only works in unigram analysis and it doesn't work with n-gram because I need to define token='ngrams' to do n-gram analysis.
Is there any way to prevent unnest_tokens from converting text into lowercase in n-gram analysis?