How do I add many CSV files to the catalog in Kedro?

Question

I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep track of them individually when building my pipeline. I would like to know if there is any way to read all of them in bulk by specifying something in the catalog.yml file?

score 8 · Accepted Answer · edited Sep 18 '20 at 16:18

8

You are looking for PartitionedDataSet. In your example, the catalog.yml might look like this:

my_partitioned_dataset:
  type: "PartitionedDataSet"
  path: "data/01_raw"
  dataset: "pandas.CSVDataSet"

edited Sep 18 '20 at 16:18

schot

10,958
2
46
71

answered May 06 '20 at 22:41

Lim H.

9,870
9
48
74

How do I add many CSV files to the catalog in Kedro?

1 Answers1

Linked