I have log file. More or less they look like this. I want to clean them a bit and get right order as it was real link.
Wondering if someone knows how to write a regex in py(spark) to get desried output.
1: 
https%3A%2F%2Fwww.btv.com%2Fnews%2Ffinland%2Fartikel%2F5174938%2Fzwemmer-zoekactie-julianadorp-kinderen-gered
Desired Output 
https://www.btv.com/news/finland/artikel/5174938/zwemmer-zoekactie-julianadorp-kinderen-gered
2: 
https%3A%2F%2Fwww.weather.com%2F
Desired Output 
https://www.weather.com
3:
https%3A%2F%2Fwww.weather.com%2Ffinland%2Fneerslag%2Fweather%2F3uurs
Desired Output 
https://www.weather.com/finland/neerslag/ weather /uurs
I have tried couple of soltuions but without much of understanding.
    \b\w+\b(?!\/)
   from pyspark.sql.functions import regexp_extract, col
   regexp_extract(column_name, regex, group_number)
   regex('(.)(by)(\s+)(\w+)')  
Thanks in advance
 
    