How to split PySpark dataframe column with separator as dot (.). To me it doesn't seem to work when I use split used on a dot.
E.g. column with value abcd.efgh, should be split into two columns with values abcd and efgh.
How to split PySpark dataframe column with separator as dot (.). To me it doesn't seem to work when I use split used on a dot.
E.g. column with value abcd.efgh, should be split into two columns with values abcd and efgh.
This is the df based on your example.
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([('abcd.efgh',)], ['c1'])
df.show()
#+---------+
#| c1|
#+---------+
#|abcd.efgh|
#+---------+
For splitting one can use split like this:
splitCol = F.split('c1', '[.]', 2)
df = df.select(
splitCol[0].alias('c1_0'),
splitCol[1].alias('c1_1'),
)
df.show()
#+----+----+
#|c1_0|c1_1|
#+----+----+
#|abcd|efgh|
#+----+----+