I have to extract the integers from URL in "Page URL" column and append those extracted integers into a new column with PySpark.
Here is my code:
def url_val(raw_url):
    params = raw_url.split('.com/')
    params = params[1].split('/')
    print(params)
    print("first_scenario")
    url_int = ''.join(x for x in raw_url if x.isdigit())
    return int(url_int)
url_val('https://www.crfashionbook.com/beauty/g28326016/crs-beauty-skincare-product-of-the-day')
The output is: 28326016, which is perfect but now I have to extract all the urls from the column "Page URL" and add those extracted integers into a new column. How would I do that? I have tried the following:
url_udf = udf(lambda x: url_val(x), IntegerType())
final_url_df = spark_df_url.filter(url_udf("Page URL"))
That raised Py4JJavaError.
I have also tried:
(
    spark_df_url.select('Page URL',
              url_udf('Page URL').alias('new_column'))
    .show()
) 
Gave me an error as well.
