case class Foo1(codes:Seq[String], name:String)
case class Foo2(code:String, description:String)
val ds1 = Seq(
Foo1(Seq("A"), "foo1"),
Foo1(Seq("A", "B"), "foo2"),
Foo1(Seq("B", "C", "D"), "foo3"),
Foo1(Seq("C"), "foo4"),
Foo1(Seq("C", "D"), "foo5")
).toDS
val ds2 = Seq(
Foo2("A", "product A"),
Foo2("B", "product B"),
Foo2("C", "product C"),
Foo2("D", "product D"),
Foo2("E", "product E")
).toDS
val j = ds1.join(ds2, ds2("code") isin (ds1("codes")))
Hopefully this Scala code fragment makes it clear what I'm trying to accomplish, our data is structured so that one data set has a column which contains an array of values, and I wish to join the values within that collection to another data set. So for example Seq("A", "B") in ds1 would join with "A" and "B" in ds2.
The "isin" operator on Column seems to be exactly what I want, and this builds and runs, but when I run it I get the following error:
org.apache.spark.sql.AnalysisException: cannot resolve '(
codeIN (codes))' due to data type mismatch: Arguments must be same type;;
Reading further I see that isin() wants to take a varargs ("splatted args") and seems more suitable for a filter(). So my question is, is this the intended use of this operator, or is there some other way to perform this type of join?