I have this Spark table:
xydata
y: num 11.00 22.00 33.00 ...
x0: num 1.00 2.00 3.00 ...
x1: num 2.00 3.00 4.00 ...
...
x788: num 2.00 3.00 4.00 ...
And this dataframe in R environment:
penalty
p: num 1.23 2.34 3.45 ...
with the number of rows in both table and dataframe are the same.
I want to subtract y values in xydata with p in penalty, something that is like y = y - p.
Is there any way to do this? I know I can use mutate to update y, that can only be used in the same table.
I'm thinking about combining both table into a new Spark table:
xydata_new
y: num 11.00 22.00 33.00 ...
x0: num 1.00 2.00 3.00 ...
x1: num 2.00 3.00 4.00 ...
...
x788: num 2.00 3.00 4.00 ...
p: num 1.23 2.34 3.45 ...
so that I can use mutate(y = y - p), but again I cannot find a good way to combine both tables. I tried to use dplyr::combine in my other question, but the result is not satisfying.
Data size is big, it can reach 40GB and maybe even more in the future, so collect-ing all tables into R environment to then be manipulated within R (cbind then export as Spark table with tbl) is not an option.