Timeout error: Error with 400 StatusCode: "requirement failed: Session isn't active."

Question

I'm using Zeppelin v0.7.3 notebook to run Pyspark scripts. In one paragraph, I am running script to write data from dataframe to a parquet file in a Blob folder. File is partitioned per country. Number of rows of dataframe is 99,452,829. When the script reaches 1 hour, an error is encountered -

Error with 400 StatusCode: "requirement failed: Session isn't active.

My default interpreter for the notebook is jdbc. I have read about timeoutlifecyclemanager and added in the interpreter setting zeppelin.interpreter.lifecyclemanager.timeout.threshold and set it to 7200000 but still encountered the error after it reaches 1 hour runtime at 33% processing completion.

I checked the Blob folder after the 1 hr timeout and parquet files were successfully written to Blob which are indeed partitioned per country.

The script I am running to write DF to parquet Blob is below:

trdpn_cntry_fct_denom_df.write.format("parquet").partitionBy("CNTRY_ID").mode("overwrite").save("wasbs://tradepanelpoc@blobasbackupx2066561.blob.core.windows.net/cbls/hdi/trdpn_cntry_fct_denom_df.parquet")

Is this Zeppelin timeout issue? How can it be extended for more than 1 hour runtime? Thanks for the help.

score 1 · Answer 1 · answered Dec 23 '19 at 18:56

From This stack overflow question's answer which worked for me

Judging by the output, if your application is not finishing with a FAILED status, that sounds like a Livy timeout error: your application is likely taking longer than the defined timeout for a Livy session (which defaults to 1h), so even despite the Spark app succeeds your notebook will receive this error if the app takes longer than the Livy session's timeout.

If that's the case, here's how to address it:

1. edit the /etc/livy/conf/livy.conf file (in the cluster's master node)
2. set the livy.server.session.timeout to a higher value, like 8h (or larger, depending on your app)
3. restart Livy to update the setting: sudo restart livy-server in the cluster's master
4. test your code again

score 0 · Answer 2 · answered Nov 14 '18 at 18:48

0

The timeout lifecycle manager is available since version 0.8.

Seems there is problem with pyspark. Try this solution Pyspark socket timeout exception after application running for a while

answered Nov 14 '18 at 18:48

Max Belousov

353
4
13

Timeout error: Error with 400 StatusCode: "requirement failed: Session isn't active."

2 Answers2

Linked