I'm implementing spark(1.5.2) sql RelationProvider for custom data source (properties files).
Can some one please explain how automatic inference algorithm should be implemented?
I'm implementing spark(1.5.2) sql RelationProvider for custom data source (properties files).
Can some one please explain how automatic inference algorithm should be implemented?
In general, you need to create a StructType that represents your schema. A StructType contains an Array[StructField], where each element of the array corresponds to a column in your schema. A StructField can be any supported DataType -- including another StructType for nested schemas.
Creating a schema can be as simple as:
val schema = StructType(Array(
StructField("col1", StringType),
StructField("col2", LongType)
))
If you want to generate a schema from a complex dataset -- one that includes nested StructTypes -- then you most likely need to create a recursive function. A good example of what such a function looks like can be found in the spark-avro integration library. The function toSqlType takes an Avro schema and converts it into a Spark StructType.