I have a dataset which looks like this:
   val
   1
   1
   3
   4
   6
   6
   9
   ...
I can't load it into pandas dataframe due to it's huge size. So I aggregate data using Spark to form:
   val   occurrences
   1     2
   3     1
   4     1
   6     2
   9     1
   ...
and load it into pandas dataframe. "val" column is not above 100, so it doesn't take much memory.
My problem is, I can't operate easily on such structure, e.g. find mean or median using pandas nor plot a boxplot with seaborn. I can do it only using explicit formulas written by me, but not ready builtin methods. Is there a pandas structure or any other way, which allows to cope with such data?
For example:
1,1,3,4,6,6,9
would be:
df = pd.DataFrame({'val': [1,3,4,6,9], "occurrences" : [2,1,1,2,1]})
Median is 4. I'm looking for a method to extract median directly from given df.