I have a large dataset of daily files located at /some/data/{YYYYMMDD}.parquet (or can also be smth like /some/data/{YYYY}/{MM}/{YYYYMMDD}.parquet).
I describe data source in mycat.yaml file as follows:
sources:
  source_paritioned:
    args:
      engine: pyarrow
      urlpath: "/some/data/*.parquet"
    description: ''
    driver: intake_parquet.source.ParquetSource
I want to be able to read a subset of files (partitions) into memory,
If I run  source = intake.open_catalog('mycat.yaml').source_partitioned; print(source.npartitions)  I get 0. Probably because the partition information is not yet initialized. After source.discover(), source.npartitions is updated to 1726 which is exactly the number of individual files on disk.
How would I load data:
- only for a given day (e.g. 20180101)
 - for a period between to days (e.g. between 20170601 and 20190223) ?
 
If this is described somewhere on the wiki, feel free to point me to the appropriate section.
Note: after thinking a little more, I realized this might be related to functionality of dask and probably my goal can be somehow achieved by converting the source to dask_dataframe with .to_dask method. Therefore putting dask label on this question.