Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PFT timeouts #1743

Closed
harrislapiroff opened this issue Sep 18, 2023 · 9 comments
Closed

PFT timeouts #1743

harrislapiroff opened this issue Sep 18, 2023 · 9 comments
Assignees

Comments

@harrislapiroff
Copy link
Contributor

harrislapiroff commented Sep 18, 2023

We've seen a number of timeouts in our alerts recently. @maeve-fpf will investigate whether bumping up resources stops those.

[update 11/15] Next steps here are for the web team to investigate the performance of the export endpoint. (Personally I'd like to see the export endpoint and the API unified so we don't have either diverging features or performance. Export links are functionally basically API calls with a CSV output format.)

@chigby
Copy link
Contributor

chigby commented Oct 2, 2023

The last time I looked into this it was part of #1573

@chigby
Copy link
Contributor

chigby commented Oct 2, 2023

Example of a segment of logs leading up to the worker timeout event:

image

(screenshot in case it's easier to see).

@soleilera soleilera added someday and removed someday labels Oct 4, 2023
@soleilera
Copy link

@maeve-fpf will continue investigation on resource allocation to improve performance. We also created a web team card to timebox a performance investigation on the web app side.

@maeve-fpf
Copy link
Contributor

We've pushed memory request/limit increases for the Django pods, and will continue to monitor for another 2 weeks then close.

@soleilera
Copy link

It's been 2 weeks and I believe PFT timeouts are no longer an issue so I think we can close this, but defer to @maeve-fpf.

@harrislapiroff
Copy link
Contributor Author

We're still seeing occasional timeouts, most recently on 11/10. It may be that the next step is on the web team to investigate performance of specific endpoints. I recall we have a conversation about this at web sync recently, does anyone remember the conclusion there? (attn @chigby @SaptakS)

@chigby
Copy link
Contributor

chigby commented Nov 15, 2023

I think we said we were going to investigate the "export" function?

@harrislapiroff
Copy link
Contributor Author

That sounds right to me. Maybe the next steps here belong to the web team @soleilera. I'll bounce this to our hi pri list

@chigby
Copy link
Contributor

chigby commented Nov 27, 2023

We haven't seen timeouts in several weeks, closing in favor of #1802

@chigby chigby closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants