Note: This is NOT a duplicate of
I have to trigger certain tasks at remote systems from my Airflow DAG. The straight-forward way to achieve this is SSHHook.
The problem is that the remote system is an EMR cluster which is itself created at runtime (by an upstream task) using EmrCreateJobFlowOperator. So while I can get hold of job_flow_id of the launched EMR cluster (using XCOM), what I need is to an ssh_conn_id to be passed to each downstream task.
Looking at the docs and code, it is evident that Airflow will try to look up for this connection (using conn_id) in db and environment variables, so now the problem boils down to being able to set either of these two properties at runtime (from within an operator).
This seems a rather common problem because if this isn't achievable then the utility of EmrCreateJobFlowOperator would be severely hampered; but I haven't come across any example demonstrating it.
- Is it possible to create (and also destroy) either of these from within an Airflow operator?
- Connection (persisted in Airflow's db)
- Environment Variable (should be accessible to all downstream tasks and not just current task as told here)
- If not, what are my options?
I'm on
Airflow v1.10Python 3.6.6emr-5.15(can upgrade if required)