Similar unanswered question: Row by row processing of a Dask DataFrame
I'm working with dataframes that are millions on rows long, and so now I'm trying to have all dataframe operations performed in parallel. One such operation I need converted to Dask is:
for row in df.itertuples():
ratio = row.ratio
tmpratio = row.tmpratio
tmplabel = row.tmplabel
if tmpratio > ratio:
df.loc[row.Index,'ratio'] = tmpratio
df.loc[row.Index,'label'] = tmplabel
What is the appropriate way to set a value by index in Dask, or conditionally set values in rows? Given that .loc doesn't support item assignment in Dask, there does not appear to be a set_value, at[], or iat[] in Dask either.
I have attempted to use map_partitions with assign, but I am not seeing any ability to perform conditional assignment at the row-level.