-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAGs disappearing after some idle time #47294
Comments
dag processor has an issue running for long time #46048 . This might be related to the issue where dag-processor exits after sometime and no dags are parsed which clears the serialized dags table depending on the |
Yes, I have had to restart my dag processor sometimes while running breeze locally |
It turns out that the way SocketIO (which is the class you get from) `sock.makefile` doesn't _actually_ close the socket when you close the IO object, which normally is fine as the socket won't be around and will clean up nicely, but due to forking and us _actually_ wanting to close the socket, we need to be a bit more careful about how we do this. It also turns out I misunderstood when `set_inheritable` applies. It is not about when forking, but specifically only when execing. So I've removed that setting as we don't need it. Closes #46048, and relates to #47294 (it might close it, it might not, unsure at this point)
Fixed by #47304 |
I am still noticing this issue. |
I will add scheduler and Dag processor logs |
#47574 should fix the issue. |
@vatsrahul1001 #47574 is merged now. Can you please test the change in main branch to see if you are still facing dag processor issues? Thanks |
@tirkarthi Thanks for the fix I have verified this and it works well in main now |
Great work @tirkarthi !! |
I am again able to reproduce this again. I see below error in DAG Processor
|
@ephraimbuddy will look into this tomorrow |
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags` method. Its presents now causes issues as it deactivates DAGs wrongly. `handle_removed_files` is a better method more suited to dag bundles as the file's processor is also terminated. Also removed used config variable in scheduler Closes: apache#47294
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags` method. Its present now causes issues as it deactivates DAGs incorrectly. `handle_removed_files` is a better method more suited to dag bundles as the file's processor is also terminated. Also removed used config variable in scheduler Closes: apache#47294
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags` method. Its present now causes issues as it deactivates DAGs incorrectly. `handle_removed_files` is a better method more suited to dag bundles as the file's processor is also terminated. Also removed unused config variable in scheduler and config.yml Closes: apache#47294
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags` method. Its present now causes issues as it deactivates DAGs incorrectly. `handle_removed_files` is a better method more suited to dag bundles as the file's processor is also terminated. Also removed unused config variable in scheduler and config.yml Closes: apache#47294
We currently use Psutil's create_time for process start time, which gives a different time than the time we use to check process duration(which comes from time.monotonic). This leads to the processor timing out after a while due to the large (false)difference in time recorded, especially when the laptop is hibernated. Process time should not depend on the system's time clock, and I guess that's what happened here. Closes: apache#47937, Closes: apache#47294
We currently use Psutil's create_time for process start time, which gives a different time than the time we use to check process duration(which comes from time.monotonic). This leads to the processor timing out after a while due to the large (false)difference in time recorded, especially when the laptop is hibernated. Process time should not depend on the system's time clock, and I guess that's what happened here. Closes: apache#47937, Closes: apache#47294
We currently use Psutil's create_time for process start time, which gives the time the process started using the system clock. We can use time.time to track when the process started processing the files instead. In breeze, once the laptop hibbernates, you would have to restart the dag processor but this fixes it. Since this does not happen in other deployments, we suspect that this issue is peculiar to breeze. Closes: apache#47937, Closes: apache#47294
We currently use Psutil's create_time for process start time, which gives the time the process started using the system clock. We can use time.time to track when the process started processing the files instead. In breeze, once the laptop hibbernates, you would have to restart the dag processor but this fixes it. Since this does not happen in other deployments, we suspect that this issue is peculiar to breeze. Closes: apache#47937, Closes: apache#47294
We currently use Psutil's create_time for process start time, which gives the time the process started using the system clock. We can use time.time to track when the process started processing the files instead. In breeze, once the laptop hibbernates, you would have to restart the dag processor but this fixes it. Since this does not happen in other deployments, we suspect that this issue is peculiar to breeze. Closes: apache#47937, Closes: apache#47294
Apache Airflow version
3.0.0beta
If "Other Airflow 2 version" selected, which one?
No response
What happened?
while using Breeze for testing, I noticed that after some idle time, no DAGs appear in the UI or the get_dags endpoint.
After restarting DAG processor it works
What you think should happen instead?
DAGs should not disappear
How to reproduce
Operating System
linux
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: