I have a file (lets say corpus.txt) of around 700 lines, each line containing numbers separated by -. For example:
86-55-267-99-121-72-336-89-211
59-127-245-343-75-245-245
First I need to read the data from the file, find the frequency of each number, measure the Zipf distribution of these numbers and then plot the distribution. I have done the first two parts of the task. I am stuck in drawing the Zipf distribution.
I know that numpy.random.zipf(a, size=None) should be used for this. But I am finding it extremely hard to use it. Any pointers or code snippet would be extremely helpful.
Code:
# Counts frequency as per given n
def calculateFrequency(fileDir):
  frequency = {}
  for line in fileDir:
    line = line.strip().split('-')
    for i in line:
      frequency.setdefault(i, 0)
      frequency[i] += 1
  return frequency
fileDir = open("corpus.txt")
frequency = calculateFrequency(fileDir)
fileDir.close()
print(frequency)
## TODO: Measure and draw zipf distribution
 
     
    
