I'm using Graphx on Spark for some experiment, and the current step is to get a subgraph of a generated graph. I've checked the the original graph has been generated successfully, not only the lazy lineage goes well but when I try graph.vertices.first() the result is correctly displayed. Now my subgraph code is:
val reg = "(\\d*)11".r
val graphUSA_subgraph = graphUSA.subgraph(
vpred = (id, user) =>{
(id.toString() match{
case reg(x) => true
case _ => false
})
}
)
graphUSA_subgraph.vertices.first()
I meant to get a subgraph only contain nodes whose index ends with "11". I've check the Boolean block in vpred = (id, user) => Boolean and the logic is correct. What confuses me is when I ran the code in spark shell it raised an Error, and log is as follows:
Exception in task * in stage *...
java.io.InvalidClassException:...
unable to create instance
at java.io.ObjectInputStream. ...
...
Caused by: org.apache.spark.SparkException: Only one SparkContext may be running in this JVM ... The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:123)
The error is not caused by Graph.subgraph() itself, because when I ran a simpler version:
val graph_subgraph_1 = graph.subgraph{
vpred = (id, user) => id.toString.endsWith("00")
}
graph_subgraph_1.vertices.first()
Everything went fine.
And then I tried another version which doesn't refer to the reg outside Graph class:
val graphUSA_subgraph_0 = graphUSA.subgraph(
vpred = (id, user) =>{
id.toString().drop(id.toString().length() -2) match{
case "11" => true
case _ => false
}
}
)
graphUSA_subgraph_0.vertices.first()
Everything went fine too.
I'm wondering in which step a new SparkContext is implicitly generated in the pipeline. And it seems quite possible that referring to some val(regs) outside function has caused it.
I've been struggling on this block for quite some time, and would be grateful if anyone could shed some light on it. Thanks in advance!