I'm using a targets workflow pipeline. Part of this pipeline is to monitor a directory of csv files for updates. There are more than 10,000 csv files in this directory, and new files are added weekly. I want to be able to identify the newly added files and append them to an existing set of *.rds files. The easy thing would be to re-run the process that creates the 5 subsets of *.rds files each week, but that takes time. The efficient thing would be to identify the newly added files, and simply bind_rows with the proper rds file.
I can do this easily enough with typical programming using dir() and setdiff(), where I store a snapshot of csv filepaths from the previous day. But I'm struggling to accomplish this within the targets framework.
Here is an attempt that doesn't seem to work. I think I want to monitor the temporary results in the /_targets directory, but I'm not sure how to go about doing that. And, the targets documentation recommended not using tar_load inside the target configuration itself.
tar_script({
list(
tar_target(csv_directory, "/csv/"),
tar_target(csv_snapshot, dir(csv_directory)),
tar_target(append_action, if(length(setdiff(dir(csv_directory), dir(csv_snapshot))) > 0){
...}
})