I have the got the following problem - I am given a text file containing huge number of words with repetition allowed. I need to write an algorithm that will output those 1000 words that appear most with their frequencies in decreasing order. Here is an example
**input.txt**
aa aa bb cc bb bb bb dd dd
**output.txt** (note - frequencies can be repeated)
bb 4
aa 2
dd 2
cc 1
And here is my way of solving this problem. First read all the words and store them in
HashMapwith word as the key. The end result of doing this is that I have all the words with their frequencies.Now I iterate over the
HashMapand create an object {word, frequency} for each key value pair and then I insert it in aSortedSet(I wrote a comparator for that).Now I just iterate over the
SortedSetfor 1000 times to get the result.
I would like to know a better way of solving this problem.
Thanks