Load File on Pyspark Worker Once

Asked Jun 23 '18 at 13:10

Active Jun 23 '18 at 15:19

Viewed 244 times

I have a problem with a large object (400mb pickled) I need to use in a UDF.

The object is pickled and on every worker but I don't know how to have it load on a worker outside the UDF which causes it to be reloaded for every row.

Broadcast hasn't really helped the overhead of loading this up for every task crashes everything in my dev environment.

asked Jun 23 '18 at 13:10

mvryan

1

[How to run a function on all Spark workers before processing data in PySpark?](https://stackoverflow.com/q/37343437/8371915) and [How can I load data that can't be pickled in each Spark executor?](https://stackoverflow.com/q/35500196/8371915) should be useful for you. – Alper t. Turker Jun 23 '18 at 16:00

Load File on Pyspark Worker Once

0 Answers0