-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing timeout for the CAPABILITY command #278
Comments
Not responding to
I'm going to leave this ticket open for the first point (above), at least until we have another timeout-related issue or PR to replace it. I would accept a PR to add read timeouts, although I suspect we may need to iterate on the design before it's safe to merge. Since we'll want to iterate on it anyway, a simple naive version could be a good first draft. It might be something to add but leave turned off for 0.5.x and wait until 0.6.x before we enable read timeouts by default. |
Thanks @nevans, I can prepare a PoC PR for this issue. I will take approx. 6 weeks until I can start with this task (just for a rough timeline). |
@denzelem That would be great! I probably won't be working on it any sooner myself. But I should have an updated |
We have an application that lets users sync emails from different email providers back to our app. Yesterday, office 365 had some server issues, and we had lots of background jobs that were running for 30+ minutes, blocking the whole queue. While researching a solution, I stumbled upon this issue. We tracked down where jobs would get stuck with log statements, and realized that the jobs got stuck while authenticating:
Given your second point, we added a specific timeout for the authentication part:
Our current problem is that those timeouts are not working. The jobs still run for 30+ minutes, being stuck waiting for a response during authentication. Any ideas on how to proceed from here? Thanks in advance for any input! |
@wnm I don't know why Timeout isn't working there... it should. You could try setting Now that net-imap has a client config (#288), I'm also open to PRs:
|
@nevans I need to admin that I can not make a PR for this issue anymore, sorry. Since we have other issues with the IMAP endpoint of Microsoft Exchange, we are trying to use the Microsoft Graph-API instead. Just a few examples of the issues, if someone else reads this issue:
|
@denzelem No prob. Thanks for documenting the issue! It's good you have some built in rate limiting. In my experience, rate limiting and exponential backoff is eventually a necessity for any communication with remote servers (especially when you retry after failure). I'm curious about your "mails are send twice". Is this because you are attempting to append to the "Sent" folder after sending via SMTP? The demographics of my company's customer base has never been enough to warrant adding special-cased API access (except for Google, but GMail's IMAP is mostly well behaved... compared to the other big email Providers). Of course, that's a feedback loop problem: if there aren't enough customers to warrant special treatment, you don't implement the workarounds that would get you more of those customers! Also, while I do manage some simple background jobs that make quick connections for small tasks, most of my deployed IMAP connections are long-lived and still use EventMachine(!!!). That has some significant pain points of its own, but it does mean that I'm not usually affected much by one connection screwing up any of the others (...except when we've accidentally retried a failed task without rate limiting and backoff and it gets into an infinite loop, thus hogging CPU and spewing GBs of logs 😉). I long ago just relegated issues like this to the category of "those goofy servers are being goofy again!". 😉 And I have a small list in my head (and sometimes encoded in workarounds) of "yahoo is weird this way, outlook.com is weird that way, MS365 does that wack thing, go daddy hosted IMAP is weird the other way, icloud is crazy for that thing, etc, etc". But my memory is very faulty, and I would like a slightly more methodical classification of which servers are goofy and in what ways are they goofy. I haven't seen a wiki anywhere with these sorts of issues on it... should we start one? (here?) I've been slowly working towards moving our per-server workarounds into a basic strategy-pattern set of classes, but it hasn't been a high priority. We can certainly do more in |
@nevans This is an example backtrace for the timeout: Net::ReadTimeout: Net::ReadTimeout with #<TCPSocket:(closed)> (Net::ReadTimeout)
from net-protocol (0.2.2) lib/net/protocol.rb:229:in `rbuf_fill'
from net-protocol (0.2.2) lib/net/protocol.rb:199:in `readuntil'
from net-protocol (0.2.2) lib/net/protocol.rb:209:in `readline'
from net-smtp (0.4.0) lib/net/smtp.rb:992:in `recv_response'
from net-smtp (0.4.0) lib/net/smtp.rb:954:in `block in data'
from net-smtp (0.4.0) lib/net/smtp.rb:1002:in `critical'
from net-smtp (0.4.0) lib/net/smtp.rb:940:in `data'
from net-smtp (0.4.0) lib/net/smtp.rb:768:in `block in send_message'
from net-smtp (0.4.0) lib/net/smtp.rb:901:in `rcptto_list'
from net-smtp (0.4.0) lib/net/smtp.rb:768:in `send_message'
from mail (2.8.1) lib/mail/network/delivery_methods/smtp_connection.rb:53:in `deliver!'
from mail (2.8.1) lib/mail/network/delivery_methods/smtp.rb:101:in `block in deliver!'
from net-smtp (0.4.0) lib/net/smtp.rb:611:in `start'
from mail (2.8.1) lib/mail/network/delivery_methods/smtp.rb:109:in `start_smtp_session'
from mail (2.8.1) lib/mail/network/delivery_methods/smtp.rb:100:in `deliver!'
from mail (2.8.1) lib/mail/message.rb:2145:in `do_delivery'
from mail (2.8.1) lib/mail/message.rb:253:in `block in deliver'
from actionmailer (7.1.3.2) lib/action_mailer/base.rb:600:in `block in deliver_mail' It looks to me, that this only sends the message and is not related to moving the message to a specific folder. But this is more an assumption and no actually knowing. The actual exception raises in https://github.com/ruby/net-protocol/blob/master/lib/net/protocol.rb#L229. For us those mail are maybe sent twice (or even more), because we just retry on that error (in our case with Sidekiq). I can't say if the issue is located on our server infrastructure or its just Outlook. I recently watched https://www.youtube.com/watch?v=_MXpC-EoQQk && net-ssh/net-ssh#797 (comment), where I had some nice learnings on how to implement timeouts without e.g. issues due to daylight saving time.
I like this approach. Since for many cases (if the own or remote infrastructure is kind of unreliable) timeouts allow some guarantee on when resources become available again. The underlying issues itself remain complicated and might not allow any assumption on in which state the current operation was left. |
Version:
net-imap (0.4.8)
Ruby:
3.2.2
Description
I'm using
net-imap
for fetching emails in some background jobs. There are rare cases where the email provider Microsoft Exchange returns no or an unexpected answer for theCAPABILITY
command. This blocks my background jobs for a long time, sometimes more than 30 minutes. At some point of time aErrno::ECONNRESET (Connection reset by peer)
exception is thrown, I assume when Microsoft Exchange closes the connection.Backtrace
Feature request
Does it make sense, that one can configure a global timeout or a timeout for individual actions (e.g. https://github.com/ruby/net-imap/blob/v0.4.8/lib/net/imap.rb#L2689) within this gem? Or do you recommend e.g. wrapping gem code into own Timeout blocks?
Example
Dropping the line
client.puts "RUBY0001 OK CAPABILITY completed.#{CRLF}"
will block the script for unlimited time.The text was updated successfully, but these errors were encountered: