Pyspark: SparkContext definition in Spyder throws Java gateway error

Question

I would like to use Spyder with pyspark (spark-2.1.1) but I cannot fix a rather frustrating Java error. I launch spyder from command line in Windows 10 after activating a conda environment (Python version is 3.5.3). This is my code:

import pyspark
sc = pyspark.SparkContext("local")
file = sc.textFile("C:/test.log")
words = file.flatMap(lambda line : line.split(" "))
words.count()

When I try to define sc i get the following error:

  File "D:\spark-2.1.1-bin-hadoop2.7\python\pyspark\java_gateway.py", line 95, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")

Exception: Java gateway process exited before sending the driver its port number

For the sake of completeness:

if I run pyspark from the command line after activating the conda environment, it works and correctly performs the word count task.
If I launch Spyder App Desktop from the Start Menu in Windows 10, everything works (but I think I cannot load the right python modules from my conda environment in this case).
The related environment variables seem to be ok:

echo %SPARK_HOME%

D:\spark-2.1.1-bin-hadoop2.7

echo %JAVA_HOME%

C:\Java\jdk1.8.0_121

echo %PYTHONPATH%

D:\spark-2.1.1-bin-hadoop2.7\python;D:\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip; D:\spark-2.1.1-bin-hadoop2.7\python\lib; C:\Users\user\Anaconda3

I have already tried with the solutions proposed here, but nothing worked for me. Any suggestion is greatly appreciated!

Hi @ChiaraM Can you please let me know in detail about your first point. Sorry that I am new to pyspark and trying to do the same as you have but encountered same error as you got. — JKC, Oct 03 '17 at 13:51

score 0 · Accepted Answer · answered May 31 '17 at 13:57

0

Since 1) is working, it is probably best to use the conda environment in Spyder.

In Preferences go to the "Python Interpreter"section and switch from "Default (i.e. the same as Spyder's)" to "Use the following Python interpreter".

If your environment is called spark_env and Anaconda is installed under C:\Program Files\Continnum\Anaconda, the python profile corresponding to this environment is C:\Program Files\Continnum\Anaconda\envs\spark_env\python.exe.

A python console in Spyder startet after this change will be in your conda environment (note that this does not apply to IPyhton).

To check environment variables, you can use python code to make sure these are the same variables your script sees:

   from os import environ
   print(environ['SPARK_HOME'])
   print(environ['JAVA_HOME'])
   try:
          print(environ['PYSPARK_SUBMIT_ARGS'])
   except:
      print("no problem with PYSPARK_SUBMIT_ARGS")  # https://github.com/ContinuumIO/anaconda-issues/issues/1276#issuecomment-277355043

Hope that helps.

answered May 31 '17 at 13:57

daten-kieker

105
7

Thank you very much, your suggestion solved the issue. – ChiaraM Jun 06 '17 at 14:52
Hi @daten-kieker I also got the same error as this question owner got and I have tried to do what you have mentioned. Still I am getting the same error. Can you please help me on this to get resolved :( – JKC Oct 03 '17 at 13:52
Hi @JKC, is your setup and the results to the steps ChiaraM described identical? What is the result of the little test script? – daten-kieker Oct 04 '17 at 10:05
Yes @daten-kieker I get this "no problem with PYSPARK_SUBMIT_ARGS" when i ran the test script – JKC Oct 04 '17 at 10:10
@JKC what is the output of the first lines (environment variables SPARK_HOME and JAVA_HOME)? – daten-kieker Oct 05 '17 at 13:16
Hi @daten-kieker Please find the values below : **D:\IntelliJ Idea - Spark\spark-2.2.0-bin-hadoop2.7** **D:\IntelliJ Idea - Spark\Java\jdk1.8.0_111** – JKC Oct 06 '17 at 07:09
Hi @JKC, maybe the spaces in the paths might be a problem. I cannot check myself (no more Windows setup). Maybe try to change the locations to non-space directories. – daten-kieker Oct 08 '17 at 20:09
Thanks @daten-kieker . I kept all those in a simplified path or removed spaces in the path names. But still I am getting same error :( – JKC Oct 23 '17 at 05:56
@JKC - sorry, then I am out of ideas hwo to remote debug that issue. – daten-kieker Oct 23 '17 at 09:51
Anyways Thanks for your help @daten-kieker. Will update you once I solve it – JKC Oct 23 '17 at 09:52

Pyspark: SparkContext definition in Spyder throws Java gateway error

1 Answers1

Linked