In pyspark, one can get the local time from the UTC time by passing the timestamp and the timezone to the function from_utc_timestamp
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
>>> df.select(from_utc_timestamp(df.t, "PST").alias('t')).collect()
[Row(t=datetime.datetime(1997, 2, 28, 2, 30))]
The timezone here is provided as a string literal ("PST"). If one were to have the following data structure:
+--------------------------+---------+
| utc_time                 |timezone |
+--------------------------+---------+
|  2018-08-03T23:27:30.000Z|  PST    |
|  2018-08-03T23:27:30.000Z|  GMT    |
|  2018-08-03T23:27:30.000Z|  SGT    |
+--------------------------+---------+
How could one achieve the following new column (preferably without a UDF)?
+--------------------------+-----------------------------------+
| utc_time                 |timezone | local_time              |
+--------------------------+-----------------------------------+
|  2018-08-03T23:27:30.000Z|  PST    | 2018-08-03T15:27:30.000 |
|  2018-08-03T23:27:30.000Z|  GMT    | 2018-08-04T00:27:30.000 |
|  2018-08-03T23:27:30.000Z|  SGT    | 2018-08-04T07:27:30.000 |
+--------------------------+-----------------------------------+
 
    