I'm trying to construct co-occurrence matrix of my dataframe on Databricks using pyspark.pandas API.
I tried this method to construct the matrix. Constructing a co-occurrence matrix in python pandas
The code is working fine in pandas, but is throwing error with pyspark.pandas
coocc = psdf.T.dot(psdf)
coocc
I'm getting this error
TypeError: Unsupported type DataFrame
I checked the doc. https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.dot.html
pyspark.pandas.DataFrame.dot()
Takes series as input.
I tried to converting dataframe to series using psdf.squeeze(), it does not convert dataframe to series, as my dataframe has multiple columns.
Is there any way to change pyspark.pandas.Dataframe to pyspark.pandas.Series?
Or Different method in pyspark.pandas to construct cooccurrence matrix