-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify run_sql
tasks across DAGs
#4883
Comments
Can I try this? I've never contributed to this project before. |
Hi @soysaucewaso, thank you for your interest in contributing to Openverse! I've assigned this issue to you. If you have any questions, you may leave them here. Please check out our welcome and general setup documentation pages for getting started with setting up your local environment. |
It looks as though airflow doesn't supports tasks calling other tasks. Just to clarify, should I change the DAG logic so it first runs the 1st task and then the 2nd? Having run_sql as a helper method abstracted away the run_sql call from the dag to the task which makes the dags logic less complex.
And then tasks could call run_sql_helper What do you think? |
@soysaucewaso in what cases would the task be called from another task? Ideally we'd want to unify all of the cases listed into a single |
Thanks for the response. Currently the alternative run_sql functions are used by various tasks such as create_deleted_records and delete_records_from_media_table. Here's the create_deleted_records task:
Are you suggesting we delete tasks like create_deleted_records which call run_sql and instead implement this logic in the DAGs? Maybe we should keep these tasks, but move them so they return the kwargs for run_sql and then make the DAGs call both tasks? |
@AetherUnbound I forgot to tag you in my previous question |
Is this issue open for contribution? I would like to contribute. |
@mihirahuja1 Hey, feel free to take over. One of the problems I found was that the run_sql implementation in common/sql.py is defined as a task with From my understanding the airflow workflow is supposed to be defined declaratively, so airflow will throw an error when one |
@soysaucewaso have some questions to ensure I understand the problem correctly.
This would help me understand the best direction for implementing the solution. Thank you! |
Hey, I'm not a major contributor on this project so I'm not sure. You could make run_sql_helper take special parameters, while having the run_sql task pass defaults. I last pinged a contributor for this project over a month ago and there's been no response, so I think you can feel free to take whatever action you think would work. |
Description
We have several unique implementations of a common
run_sql
task across our DAGs. This task was pulled out intocommon/sql.py
in #4836:openverse/catalog/dags/common/sql.py
Lines 179 to 202 in 4147944
We have several DAGs which can now use this
run_sql
function directly, rather than re-implementing their own:delete_records
:openverse/catalog/dags/database/delete_records/delete_records.py
Lines 21 to 38 in 9b4f727
batched_update
(this one may require some additional work on either the base function or the call to accommodate thedry_run
variable):openverse/catalog/dags/database/batched_update/batched_update.py
Lines 52 to 82 in cea8050
add_license_url
:openverse/catalog/dags/maintenance/add_license_url.py
Lines 35 to 54 in 3e3799a
Additional context
This also came up in the discussion of #4572
The text was updated successfully, but these errors were encountered: