Say I have a dag in which one task depends on 4 tasks. All the 4 tasks should only be triggered if the dag's previous run's same task was successful. So all the tasks have depends_on_past as True. However the last task of the dag is a clean up task which should always be triggered. So in the case where the previous day's task failed, the current day's task is not triggered which makes the last task also not triggered. What would be the way to address this?
Asked
Active
Viewed 713 times
1 Answers
0
Always set trigger_rule=TriggerRule.ALL_DONE in your cleanup tasks (those tasks which must always run, irrespective of status of upstream tasks)
And while you seem to have already figured it, it might also make sense to have depends_on_past=False set on your cleanup tasks
y2k-shubham
- 10,183
- 11
- 55
- 131
-
1`all_done` will only make sense if the upstream task was triggered right? So if a task was triggered, irrespective of the result. if the downstream task has `all_done` it will be triggered. However in the case of depends_on_past, the task is not triggered. So all_done does not help. – PiyushC Aug 07 '20 at 13:37
-
Understood. **[1]** But if you say that upstream tasks were not triggered at all, then what is the state of DAG? (is it still running?) **[2]** a very untidy solution would be to implement this custom behaviour by yourself (rather than using `depends_on_past`): have your task skipped (or failed, if you want) manually by [raising `AirflowSkipException` or `AirflowFailException`](https://airflow.apache.org/docs/stable/concepts.html#exceptions) in case the previous task-instance hasn't finished yet (will have to query meta-db for that or leverage SQLAlchemy `TaskInstance` model). – y2k-shubham Aug 07 '20 at 14:29
-
[1] yes,it stays in running state. [2] Yeah manually I can do that. There are some other ways too. But seemed weird for Airflow to have this feature but not support this scenario. – PiyushC Aug 07 '20 at 15:14
-
But if the DAG stays in running state, then eventually cleanup task will run anyways, right? I think what you are trying to do here (to run cleanup task as soon as some upstream tasks gets stuck because of `depends_on_past`) goes against the principle of Airflow's task dependencies, which is why i don't see any easy way to achieve it – y2k-shubham Aug 07 '20 at 15:58
-
I was expecting, because the previous run's task has failed, the current run's task to go into a state which can be handled by some trigger rule so that the downstream task can proceed. But yeah I can understand why having something like this would conflict with the intended behaviour of depends_on_past (that you fix the first failed task and then all the subsequent tasks will start automatically). Still was hoping it could be handled for this scenario I want to solve – PiyushC Aug 07 '20 at 16:09
-
@PiyushC Have you figured out how to solve this? – Bhanu Prakash Mar 31 '22 at 06:34
-
@BhanuPrakash No have not. Though now reading it again, I think the question is ill formed. Closing it – PiyushC Jul 07 '22 at 21:05