Hi I'm trying to join two dataframes in spark, and I'm getting the following error:
org.apache.spark.sql.AnalysisException: Reference 'Adapazari' is ambiguous, 
could be: Adapazari#100064, Adapazari#100065.;
According to several sources, this can occur when you try to join two different dataframes together that both have a column with the same name (1, 2, 3). However, in my case, that is not the source of the error. I can tell because (1) my columns all have different names, and (2) the reference indicated in the error is a value contained within the join column.
My code:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
val spark = SparkSession
  .builder().master("local")
  .appName("Spark SQL basic example")
  .config("master", "spark://myhost:7077")
  .getOrCreate()
val sqlContext = spark.sqlContext
import sqlContext.implicits._
val people = spark.read.json("/path/to/people.jsonl")
  .select($"city", $"gender")
  .groupBy($"city")
  .pivot("gender")
  .agg(count("*").alias("total"))
  .drop("0")
  .withColumnRenamed("1", "female")
  .withColumnRenamed("2", "male")
  .na.fill(0)
val cities = spark.read.json("/path/to/cities.jsonl")
  .select($"name", $"longitude", $"latitude")
cities.join(people, $"name" === $"city", "inner")
  .count()
Everything works great until I hit the join line, and then I get the aforementioned error.
The relevant lines in build.sbt are:
libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % "2.1.0",
  "org.apache.spark" % "spark-sql_2.10" % "2.1.0",
  "com.databricks" % "spark-csv_2.10" % "1.5.0",
  "org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
)
 
    