I have a dataframe with two columns: filename and year. I want to replace the year value in filename with value from year column
Third column in the below table demonstrates the requirement:
+----------------------------+------+----------------------------+
| filename                   | year | reqd_filename              |
+----------------------------+------+----------------------------+
| blah_2020_v1_blah_blah.csv | 1975 | blah_1975_v1_blah_blah.csv |
+----------------------------+------+----------------------------+
| blah_2019_v1_blah_blah.csv | 1984 | blah_1984_v1_blah_blah.csv |
+----------------------------+------+----------------------------+
Code currently looks like below:
df = df.withColumn('filename', F.regexp_replace(F.col('filename',), '(blah_)(.*)(_v1.*)', <Nothing I put here works>))
In short, I want to replace the second group with year column from df
 
     
    