Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAGs disappearing after some idle time #47294

Open
1 of 2 tasks
vatsrahul1001 opened this issue Mar 3, 2025 · 11 comments · May be fixed by #48004
Open
1 of 2 tasks

DAGs disappearing after some idle time #47294

vatsrahul1001 opened this issue Mar 3, 2025 · 11 comments · May be fixed by #48004
Assignees
Labels
affected_version:3.0.0beta For all 3.0.0 beta releases area:core area:DAG-processing area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug priority:critical Showstopper bug that should be patched immediately
Milestone

Comments

@vatsrahul1001
Copy link
Collaborator

vatsrahul1001 commented Mar 3, 2025

Apache Airflow version

3.0.0beta

If "Other Airflow 2 version" selected, which one?

No response

What happened?

while using Breeze for testing, I noticed that after some idle time, no DAGs appear in the UI or the get_dags endpoint.
After restarting DAG processor it works

What you think should happen instead?

DAGs should not disappear

How to reproduce

  1. start airflow using brezze start-airflow
  2. You should be able to see DAGs
  3. After some time on inactivity try checking DAGs on UI or with get dags endpoint

Operating System

linux

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@vatsrahul1001 vatsrahul1001 added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Mar 3, 2025
@vatsrahul1001 vatsrahul1001 added affected_version:3.0.0beta For all 3.0.0 beta releases area:DAG-processing priority:critical Showstopper bug that should be patched immediately and removed kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Mar 3, 2025
@dosubot dosubot bot added area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug labels Mar 3, 2025
@tirkarthi
Copy link
Contributor

dag processor has an issue running for long time #46048 . This might be related to the issue where dag-processor exits after sometime and no dags are parsed which clears the serialized dags table depending on the airflow.cfg value which I forgot at the moment.

@phanikumv phanikumv changed the title DAGs disappearing after some ideal time. DAGs disappearing after some idle time Mar 4, 2025
@bbovenzi
Copy link
Contributor

bbovenzi commented Mar 5, 2025

Yes, I have had to restart my dag processor sometimes while running breeze locally

ashb added a commit that referenced this issue Mar 6, 2025
It turns out that the way SocketIO (which is the class you get from)
`sock.makefile` doesn't _actually_ close the socket when you close the IO
object, which normally is fine as the socket won't be around and will clean up
nicely, but due to forking and us _actually_ wanting to close the socket, we
need to be a bit more careful about how we do this.

It also turns out I misunderstood when `set_inheritable` applies. It is not
about when forking, but specifically only when execing. So I've removed that
setting as we don't need it.

Closes #46048, and relates to #47294 (it might close it, it might not, unsure
at this point)
@ashb
Copy link
Member

ashb commented Mar 6, 2025

Fixed by #47304

@ashb ashb closed this as completed Mar 6, 2025
@vatsrahul1001
Copy link
Collaborator Author

I am still noticing this issue.

@vatsrahul1001 vatsrahul1001 reopened this Mar 11, 2025
@vatsrahul1001
Copy link
Collaborator Author

I will add scheduler and Dag processor logs

@tirkarthi
Copy link
Contributor

#47574 should fix the issue.

@tirkarthi
Copy link
Contributor

@vatsrahul1001 #47574 is merged now. Can you please test the change in main branch to see if you are still facing dag processor issues? Thanks

@kaxil kaxil added this to the Airflow 3.0.0 milestone Mar 12, 2025
@vatsrahul1001
Copy link
Collaborator Author

@tirkarthi Thanks for the fix I have verified this and it works well in main now

@phanikumv
Copy link
Contributor

Great work @tirkarthi !!

@vatsrahul1001 vatsrahul1001 reopened this Mar 14, 2025
@vatsrahul1001
Copy link
Collaborator Author

I am again able to reproduce this again. I see below error in DAG Processor

[2025-03-14T14:21:02.057+0000] {manager.py:970} ERROR - Processor for DagFileInfo(rel_path=PosixPath('metadata_and_inlets/fetch_extra_info.py'), bundle_name='dags-folder', bundle_path=PosixPath('/files/dags'), bundle_version=None) with PID 14418 started 432 ago killing it. 2025-03-14 14:21:02 [debug ] Workload process exited [supervisor] exit_code=<Negsignal.SIGKILL: -9> 2025-03-14 14:21:02 [info ] Process exited [supervisor] exit_code=<Negsignal.SIGKILL: -9> pid=14418 signal=SIGKILL [2025-03-14T14:21:07.071+0000] {manager.py:500} INFO - Not time to refresh bundle dags-folder [2025-03-14T14:21:12.135+0000] {manager.py:500} INFO - Not time to refresh bundle dags-folder

@phanikumv
Copy link
Contributor

@ephraimbuddy will look into this tomorrow

ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 18, 2025
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags`
method. Its presents now causes issues as it deactivates DAGs wrongly. `handle_removed_files`
is a better method more suited to dag bundles as the file's processor is also terminated.

Also removed used config variable in scheduler
Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 18, 2025
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags`
method. Its present now causes issues as it deactivates DAGs incorrectly. `handle_removed_files`
is a better method more suited to dag bundles as the file's processor is also terminated.

Also removed used config variable in scheduler
Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 18, 2025
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags`
method. Its present now causes issues as it deactivates DAGs incorrectly. `handle_removed_files`
is a better method more suited to dag bundles as the file's processor is also terminated.

Also removed unused config variable in scheduler and config.yml
Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 18, 2025
This method has been replaced by `handle_removed_files` and `deactivate_deleted_dags`
method. Its present now causes issues as it deactivates DAGs incorrectly. `handle_removed_files`
is a better method more suited to dag bundles as the file's processor is also terminated.

Also removed unused config variable in scheduler and config.yml
Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 20, 2025
We currently use Psutil's create_time for process start time,
which gives a different time than the time we use to check process
duration(which comes from time.monotonic). This leads to the
processor timing out after a while due to the large (false)difference
in time recorded, especially when the laptop is hibernated.

Process time should not depend on the system's time clock, and I guess that's what happened here.

Closes: apache#47937, Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 20, 2025
We currently use Psutil's create_time for process start time,
which gives a different time than the time we use to check process
duration(which comes from time.monotonic). This leads to the
processor timing out after a while due to the large (false)difference
in time recorded, especially when the laptop is hibernated.

Process time should not depend on the system's time clock, and I guess that's what happened here.

Closes: apache#47937, Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 20, 2025
We currently use Psutil's create_time for process start time,
which gives the time the process started using the system clock. We can use
time.time to track when the process started processing the files instead.

In breeze, once the laptop hibbernates, you would have to restart the dag
processor but this fixes it. Since this does not happen in other deployments,
we suspect that this issue is peculiar to breeze.

Closes: apache#47937, Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 21, 2025
We currently use Psutil's create_time for process start time,
which gives the time the process started using the system clock. We can use
time.time to track when the process started processing the files instead.

In breeze, once the laptop hibbernates, you would have to restart the dag
processor but this fixes it. Since this does not happen in other deployments,
we suspect that this issue is peculiar to breeze.

Closes: apache#47937, Closes: apache#47294
ephraimbuddy added a commit to astronomer/airflow that referenced this issue Mar 22, 2025
We currently use Psutil's create_time for process start time,
which gives the time the process started using the system clock. We can use
time.time to track when the process started processing the files instead.

In breeze, once the laptop hibbernates, you would have to restart the dag
processor but this fixes it. Since this does not happen in other deployments,
we suspect that this issue is peculiar to breeze.

Closes: apache#47937, Closes: apache#47294
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:3.0.0beta For all 3.0.0 beta releases area:core area:DAG-processing area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug priority:critical Showstopper bug that should be patched immediately
Projects
None yet
7 participants