I'm trying to find missing and null values from my dataframe but I'm getting an exception. I have included only the initial few schema below:
root
|-- created_at: string (nullable = true)
|-- id: long (nullable = true)
|-- id_str: string (nullable = true)
|-- text: string (nullable = true)
|-- display_text_range: string (nullable = true)
|-- source: string (nullable = true)
|-- truncated: boolean (nullable = true)
|-- in_reply_to_status_id: double (nullable = true)
|-- in_reply_to_status_id_str: string (nullable = true)
|-- in_reply_to_user_id: double (nullable = true)
|-- in_reply_to_user_id_str: string (nullable = true)
|-- in_reply_to_screen_name: string (nullable = true)
|-- geo: double (nullable = true)
|-- coordinates: double (nullable = true)
|-- place: double (nullable = true)
|-- contributors: string (nullable = true)
Here is the code which throws the exception. I'm trying to find missing and null values here.
df_mis = df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns])
df_mis.show()
Here is the AnalysisException details:
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<ipython-input-20-6ccaacbbcc7f> in <module>()
----> 1 df_mis = df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns])
      2 df_mis.show()
2 frames
/content/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/dataframe.py in select(self, *cols)
   1683         [Row(name='Alice', age=12), Row(name='Bob', age=15)]
   1684         """
-> 1685         jdf = self._jdf.select(self._jcols(*cols))
   1686         return DataFrame(jdf, self.sql_ctx)
   1687 
/content/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1308         answer = self.gateway_client.send_command(command)
   1309         return_value = get_return_value(
-> 1310             answer, self.gateway_client, self.target_id, self.name)
   1311 
   1312         for temp_arg in temp_args:
/content/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/utils.py in deco(*a, **kw)
    115                 # Hide where the exception came from that shows a non-Pythonic
    116                 # JVM exception message.
--> 117                 raise converted from None
    118             else:
    119                 raise
AnalysisException: Can't extract value from place#14: need struct type but got double
 
    