I have a big pipeline, taking a few hours to run. A small part of it needs to run quite often, how do I run it without triggering the entire pipeline?
2 Answers
There are multiple ways to specify which nodes or parts of your pipeline to run.
Use
kedro runparameters like--to-nodes/--from-nodes/--nodeto explicitly define what needs to be run.In
kedro>=0.15.2you can define multiple pipelines, and then run only one of them withkedro run --pipeline <name>. If no--pipelineparameter is specified, the default pipeline is run. The default pipeline might combine several other pipelines. More information about using modular pipelines: https://kedro.readthedocs.io/en/latest/04_user_guide/06_pipelines.html#modular-pipelinesUse tags. Tag a small portion of your pipeline with something like "small", and then do
kedro run --tag small. Read more here: https://kedro.readthedocs.io/en/latest/04_user_guide/05_nodes.html#tagging-nodes
- 136
 - 1
 - 6
 
- 
                    +1 We use tags most often for this type of work. Thanks for sharing the modular-pipelines link. This is a feature that we have yet to really explore. – Waylon Walker Dec 02 '19 at 04:36
 
I would reccomend getting your tags or piplines setup to run correctly from the cli as @idanov suggested. It will be much easier for you in the long run moving to production. I would also add that you can do quite a bit of ad hoc pipeline trimming and running inside of python, here are some examples.
filter by tags
nodes = pipeline.only_nodes_with_tags('cars')
filter by node
nodes = pipeline.only_nodes('b_int_cars')
filter nodes like
query_string = 'cars'
nodes = [
   node.name 
   for node in pipeline.nodes 
   if query_string in node.name
   ]
pipeline.only_nodes(*nodes)
only nodes with tags or
nodes = pipeline.only_nodes_with_tags('cars', 'trains')
only nodes with tags and
raw_nodes = pipeline.only_nodes_with_tags('raw')
car_nodes = pipeline.only_nodes_with_tags('cars')
raw_car_nodes = raw_nodes & car_nodes
raw_nodes = (
   pipeline
   .only_nodes_with_tags('raw')
   .only_nodes_with_tags('cars')
   )
add pipelines
car_nodes = pipeline.only_nodes_with_tags('cars')
train_nodes = pipeline.only_nodes_with_tags('trains')
transportation_nodes = car_nodes + train_nodes
The above was a snippet from my personal kedro notes.
- 543
 - 3
 - 10