Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/article checker workers #443

Closed
wants to merge 12 commits into from
Closed

Conversation

Kseymur
Copy link
Contributor

@Kseymur Kseymur commented May 24, 2024

This pull request resolves #425.
It involves three worker scripts designed to process GitHub webhook events, perform web searches, and interact with the OpenAI API for text analysis and fact-checking.
The scripts run on the Cloudflare Workers platform and fit within the time limits for the Paid plan.
Here is an example.

Improvements that can be made:

  • Add full article parsing by links in the RAG part.
  • Add a worker for plagiarism detection.

@Kseymur Kseymur requested a review from evgenydmitriev May 24, 2024 20:04
@evgenydmitriev
Copy link
Contributor

What was the logic for splitting the tool in 3 workers connected with queues? Why not make it one worker? Or use HTTP for communication?

@Kseymur
Copy link
Contributor Author

Kseymur commented May 27, 2024

What was the logic for splitting the tool in 3 workers connected with queues? Why not make it one worker? Or use HTTP for communication?

I tried different options: a single-worker system, a chain of workers interacting via HTTP, and here are the conclusions I came to:

  • This RAG system utilizes multiple requests to the OpenAI API and the search engine API. Because of this, it exceeds the time limits within a single worker (even on the Paid plan).
  • Regarding HTTP communication, the issue again lies in the time limits. It appears that each worker has to wait for a response from the subsequent one. If the subsequent worker does not meet the time limits, the entire system fails.

@evgenydmitriev
Copy link
Contributor

The only time limit you have is CPU time, 30 seconds. Waiting for fetch promises (from other workers or external APIs) doesn't count towards CPU time.

@Kseymur
Copy link
Contributor Author

Kseymur commented May 27, 2024

The only time limit you have is CPU time, 30 seconds. Waiting for fetch promises (from other workers or external APIs) doesn't count towards CPU time.

I thought the same, but based on my experience, I've noticed some practical limitations.
Also, this is what ChatGPT replies to me:
"Based on the information from Cloudflare's documentation and other sources, fetch requests themselves do not count towards the CPU time limit. The CPU time limit for Cloudflare Workers is 30 seconds, and this limit applies to the actual execution time of the code, not the time spent waiting for fetch requests to complete​​​​.

However, while the waiting time for fetch requests is not counted towards the CPU time, the total execution time, including the time spent waiting for these responses, can still lead to practical limitations. If a Worker makes multiple fetch requests sequentially, it may experience delays due to the overall duration required to handle the requests. This can cause the Worker to timeout if the responses are slow or if there are many sequential fetch operations, potentially leading to the Worker exceeding the overall allowed time for request handling​​​​."

I can also make a pr to show you the single-worker code that doesn't complete execution. Maybe I missed smth.

@evgenydmitriev
Copy link
Contributor

We have workers that stay alive for more than a day waiting for different things. If you handle fetch promises correctly, your worker won't time out. Sequential fetch requests sound like a use case for one worker. You usually want to split things into multiple workers if you need extra isolation, parallelization, or caching. Queues introduce another level of complexity and should probably be avoided where possible.

Try asking Perplexity about all this - it's usually more helpful when you are not sure how to correctly formulate your questions. Claude Opus is also more helpful when advising on Cloudflare due to a more recent cutoff date.

@NyanKir
Copy link

NyanKir commented May 30, 2024

@Kseymur Hi, there's an issue with the current implementation due to GitHub's 10-second timeout for webhooks. If we can't process the request within that timeframe, GitHub cancels it, which in turn crashes the worker. This is why a queue is needed here. With a queue, we can pass the task to a second worker and send a successful response to GitHub before the timeout occurs.

@Kseymur
Copy link
Contributor Author

Kseymur commented May 30, 2024

Hi, there's an issue with the current implementation due to GitHub's 10-second timeout for webhooks

Hi! Yes, I have already figured out the issue and I am currently considering the best implementation. Thank you for your help!

@Kseymur Kseymur closed this May 31, 2024
@Kseymur Kseymur deleted the feature/article-checker-workers branch May 31, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate GitHub Actions bot to Cloudflare Workers
3 participants