I would like to use gapply according to https://spark.apache.org/docs/latest/sparkr.html#gapply
The problem is I am returning a list of 2 dataframes.
return(list(df1, df2))
How do I declare the output schema in this case?
I would like to use gapply according to https://spark.apache.org/docs/latest/sparkr.html#gapply
The problem is I am returning a list of 2 dataframes.
return(list(df1, df2))
How do I declare the output schema in this case?
You cannot use function returning arbitrary list. As per gapply documentation (emphasis mine):
The function func takes as argument a key - grouping columns and a data frame - a local R
data.frame. The output of func is a localR data.frame.
You might be make it work by treating each data.frame as a single Row of type equivalent to something struct<col1:array<typeofcol1>, col2:array<typeofcol2>, ..., coln:array<typeofcoln>>, but only as long as both output data.frames have identical schema.