I've been struggling a lot to get Spark running on my Windows 10 device lately, without success. I merely want to try out Spark and to be able to follow tutorials, thus I don't currently have access to a cluster to connect to. In order to install Spark, I completed the following steps, based on this tutorial:
- I installed the Java JDK and placed it to 
C:\jdk. The folder hasbin,conf,include,jmods,legal, andlibfolders inside. - I installed the Java runtime environment and placed it to 
C:\jre. This one hasbin,legal, andlibfolders inside. - I downloaded this folder and placed the 
winutils.exeintoC:\winutils\bin. - I created a 
HADOOP_HOMEuser environmental variable and set it toC:\winutils - I opened the Anaconda Prompt and installed PySpark by 
conda install pysparkto my base environment. - Upon successful installation, I opened a new prompt and typed 
pysparkto verify the installation. This should give a Spark welcome screen. Instead, I got the following long error message though: 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/12/05 12:22:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/12/05 12:22:47 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
java.base/java.lang.Thread.run(Thread.java:833)
C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\shell.py:42: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\shell.py", line 38, in <module>
    spark = SparkSession._create_shell_session()  # type: ignore
  File "C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\sql\session.py", line 553, in _create_shell_session
    return SparkSession.builder.getOrCreate()
  File "C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\sql\session.py", line 228, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\context.py", line 392, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\context.py", line 146, in __init__
    self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
  File "C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\context.py", line 209, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "C:\Spark\spark-3.2.0-bin-hadoop3.2\python\pyspark\context.py", line 329, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "C:\Users\lazarea\Anaconda3\lib\site-packages\py4j\java_gateway.py", line 1573, in __call__
    return_value = get_return_value(
  File "C:\Users\lazarea\Anaconda3\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.storage.StorageUtils$
        at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
        at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
        at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
        at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
        at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:833)
I looked around on Stackoverflow for similar issues and came across this question. This has a similar error message. Yet, the provided solution, i.e. setting the SPARK_LOCAL_IP user environment variable to localhost didn't solve the issue, the same error message persists when typing pyspark to Anaconda Prompt.
Note #1, this might be relevant: When typing pyspark to the command line, no output is provided. Instead, Windows opens the Microsoft Store by default.
Note #2: I tried coding directly in Python and see if there's any more hint from that side. I ran the following snippet:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('sampleApp').getOrCreate()
which returned a similar error message as the one above, with some more, potentially useful information:
An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$
(in unnamed module @0x776b83cc) cannot access class sun.nio.ch.DirectBuffer
(in module java.base) because module java.base does not export sun.nio.ch
to unnamed module @0x776b83cc
Note #3: When opening the command line and typing spark-shell, the following error is output:
java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c947bc5) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c947bc5
  at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
  at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
  at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
  at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
  at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690)
  at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
  ... 55 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql
              ^
Please help me successfully launch Spark because I fail to understand what I might be missing at this point.