I am trying to extract Age from DOB column in my Dataframe (in MM/DD/YYYY format & datatype string)
from pyspark.sql.functions import to_date, datediff, floor, current_date
from pyspark.sql import functions as F
from pyspark.sql.functions import col
RawData_Combined = RawData_Combined.select(col("DOB"),to_date(col("DOB"),"MM-dd-yyyy").alias("DOBFINAL"))
RawData_Combined = RawData_Combined.withColumn('AgeDOBFinal', (F.months_between(current_date(), F.col('DOBFINAL')) / 12).cast('int'))
but when i do RawData_Combined.show()
it is giving below output
+----------+--------+-----------+
|       DOB|DOBFINAL|AgeDOBFinal|
+----------+--------+-----------+
| 4/17/1989|    null|       null|
| 3/16/1964|    null|       null|
|  1/1/1970|    null|       null|
| 3/30/1967|    null|       null|
|  2/1/1989|    null|       null|
|  1/1/1995|    null|       null|
|      null|    null|       null|
|  1/1/1976|    null|       null|
|      null|    null|       null|
|  1/1/1958|    null|       null|
|  1/1/1960|    null|       null|
|  1/1/1973|    null|       null|
| 5/18/1988|    null|       null|
|      null|    null|       null|
|  3/3/1980|    null|       null|
|  7/3/1988|    null|       null|
|  1/1/1997|    null|       null|
|  1/1/1961|    null|       null|
|10/16/1955|    null|       null|
|  5/5/1982|    null|       null|
+----------+--------+-----------+
only showing top 20 rows