-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
BlockingIOError and File descriptor xx is used by transport #10617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't think we will be able to do anything without a reproducer here as I've already been over the aiohttp and aiohappyeyeballs code line by line for many days looking for potential leaks and have not been able to find any. The only thing that hasn't been audited is CPython |
What other issues might be causing this exception? When this error occurs, all aiohttp-related code requests fail despite using independent client sessions. I've reviewed the surrounding logs and found no unusual activity. Do you have suggestions for reproducing this issue? Any leads would be greatly appreciated. |
Anything that creates an asyncio.Transport and than potentially reuses the fd.
I don't know how you are using
If you don't care about IPv6/IPv4 failover, you could try turning off the
Can you try Python 3.12+ and see if you can still reproduce the issue? We haven't had any reports with newer Python versions.
|
I've been seeing this issue in our CI systems as well last 2-3 weeks. As described, hard to reproduce, happens intermittently. To mitigate this issue we locked our package versions as per #10506 (comment) and #10561 . (Python 3.10.12)
Unfortunately, we are still seeing the issue. |
Its got to be at https://github.com/python/cpython/blob/b8b4b713c5f8ec0958c7ef8d29d6711889bc94ab/Lib/asyncio/selector_events.py#L510 |
Also are you running a Python version with python/cpython#88863 fixed? |
I got around to checking the cpython code for races: So there is a race in cpython if
But |
@xiaoxiper Can you try #10624 ? |
That's correct. The traceback (with proprietary parts redacted) is:
|
At least for me, as I am running Python |
Looks like we can work around this in |
The race in cpython can happen even if the happyeyeballs algorithm isn't being used. Since @xiaoxiper @olindho1 Can you try this PR: |
I wasn't able to make a reproducer for python/cpython#131728 I think the issue is actually with |
It seems #10624 would probably work around that issue |
Hello! I really appreciate you taking the time looking into this, and the frequent updates from your findings. But I will be honest, I'm rather confused whether you'd like me to test aio-libs/aiohappyeyeballs#157 or #10624 . Regardless, the issue is infrequent and not immediately reproducible as you know. So in order to know if the PR solves the issue, I reckon we'd just have to run with it for a while, and then be satisfied that we haven't seen it occur in X amount of time. I could try and run any of the PRs in our systems, but it's rather cumbersome without a downloadable version from pypi. Would it be possible to deploy it as a dev version or so? PS: I've also been thinking about why your unable to reproduce it, it should be possible, somehow. So I looked into our code stack, and maybe something we're missing here (I am kind of clueless here, just throwing ideas). Our python application runs async, but also starts up additional When I looked into the logs, I've seen occurrences of the error both from the main process, or the sub process. |
Since I can't find an actual bug in aiohttp, aiohappyeyeballs, or CPython for this case I can only guess which one might solve the issue for you so it's best to try both independently for an extended time and report back if either one solves the issue. I'll keep looking for a race bug when I have some more free cycles. |
any workaround suggestions? I am seeing this error a lot with python v3.13.2, and aiohttp v3.11.14 |
No workaround available at this time. We still need a reproducer |
import asyncio
import socket
async def repro_sock_connect_race():
tasks = [
asyncio.create_task(socket_runner(port))
for port in range(200)
]
await asyncio.gather(*tasks)
async def socket_runner(port: int) -> None:
loop = asyncio.get_running_loop()
while True:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setblocking(False)
task = asyncio.current_task()
loop.call_soon(task.cancel)
try:
await loop.create_connection(asyncio.Protocol, None, None, sock=sock)
except asyncio.CancelledError:
sock.close()
Failed attempt at reproducing |
My guess it that its only reproducible when sockets are also being opened/closed in another thread |
failed attempt at making a reproducer with a thread involved import asyncio
import socket
import time
from threading import Thread
class MyThread(Thread):
def run(*args):
while True:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print(["thread sock", sock.fileno()])
sock.setblocking(False)
time.sleep(0.0001)
sock.close()
async def repro_sock_connect_race():
tasks = [asyncio.create_task(socket_runner(port)) for port in range(200)]
await asyncio.gather(*tasks)
async def socket_runner(port: int) -> None:
loop = asyncio.get_running_loop()
while True:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print(["asyncio sock", sock.fileno()])
sock.setblocking(False)
task = asyncio.current_task()
loop.call_soon(task.cancel)
try:
await loop.create_connection(asyncio.Protocol, None, None, sock=sock)
except asyncio.CancelledError:
sock.close()
print("Starting threads")
for _ in range(10):
t = MyThread()
t.start()
print("Thread started")
print("Starting asyncio")
asyncio.run(repro_sock_connect_race()) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
3.11.13 and 3.11.14 was yanked. ref: aio-libs/aiohttp#10617
Absent a solution, I'm going to proceed with the revert |
…10464 fixes #10617 alternative fix is MagicStack/uvloop#646
tried again to make a reproducer.. all it shows is that its cleaned up correctly with the even with the revert import asyncio
import pprint
import socket
from types import TracebackType
from typing import Optional
from asyncio.selector_events import _SelectorSocketTransport
import gc
readers = []
class FakeTimeout:
async def __aenter__(self) -> "FakeTimeout":
return self
async def __aexit__(
self,
exc_type: Optional[type[BaseException]],
exc_val: Optional[BaseException],
exc_tb: Optional[TracebackType],
) -> Optional[bool]:
if exc_type is asyncio.CancelledError:
raise TimeoutError
return None
def _set_result_and_call_cancel(fut):
"""Helper setting the result only if the future was not cancelled."""
if not fut.cancelled():
fut.set_result(None)
def _add_reader(sock):
"""Helper to add a reader callback to the socket."""
readers.append(sock)
async def _create_connection_transport(sock):
print("_create_connection_transport")
loop = asyncio.get_event_loop()
future = loop.create_future()
loop.call_soon(_add_reader, sock)
# loop.call_soon(_set_result_and_call_cancel, future)
try:
await future
except:
# The real create_connection_transport will
# call transport.close()
readers.remove(sock)
print("Cancelled, socket will be closed")
raise
return "transport", "protocol"
async def create_connection(sock):
print("create_connection")
# The real create_connection will no await
# anything here because its passed a sock, but will call
# _create_connection_transport()
transport, protocol = await _create_connection_transport(sock)
return transport, protocol
async def happyeyeballs_start():
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setblocking(False)
return sock
async def socket_runner() -> None:
sock: Optional[socket.socket] = None
connection = None
try:
async with FakeTimeout():
sock = await happyeyeballs_start()
asyncio.current_task().cancel() # simulate timeout
connection = await create_connection(sock)
sock = None
return (
f"Finished socket_runner: connection={connection}, sock={sock} readers={readers}",
sock.fileno(),
)
except TimeoutError:
print("TimeoutError")
finally:
assert sock is not None
return (
f"Failed socket_runner: connection={connection}, sock={sock}, readers={readers}",
sock.fileno(),
)
async def run():
result, fileno = await socket_runner()
print(f"Fileno: {fileno} Result: {result}")
await asyncio.sleep(0) # give time for socket to close
gc.collect()
# Open a new socket, should get the same fileno
# as the one that was closed in socket_runner
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
assert sock.fileno() == fileno
print("New socket fileno: ", sock.fileno())
asyncio.run(run()) |
…10464 (#10656) Reverts #10464 While this change improved the situation for uvloop users, it caused a regression with `SelectorEventLoop` (issue #10617) The alternative fix is MagicStack/uvloop#646 (not merged at the time of this PR) issue #10617 appears to be very similar to python/cpython@d5aeccf If someone can come up with a working reproducer for #10617 we can revisit this. cc @top-oai Minimal implementation that shows on cancellation the socket is cleaned up without the explicit `close` #10617 (comment) so this should be unneeded unless I've missed something (very possible with all the moving parts here) ## Related issue number fixes #10617 (cherry picked from commit 06db052)
…10464 (#10656) Reverts #10464 While this change improved the situation for uvloop users, it caused a regression with `SelectorEventLoop` (issue #10617) The alternative fix is MagicStack/uvloop#646 (not merged at the time of this PR) issue #10617 appears to be very similar to python/cpython@d5aeccf If someone can come up with a working reproducer for #10617 we can revisit this. cc @top-oai Minimal implementation that shows on cancellation the socket is cleaned up without the explicit `close` #10617 (comment) so this should be unneeded unless I've missed something (very possible with all the moving parts here) ## Related issue number fixes #10617 (cherry picked from commit 06db052)
…'s a failure in start_connection() #10464 (#10657) **This is a backport of PR #10656 as merged into master (06db052).** Reverts #10464 While this change improved the situation for uvloop users, it caused a regression with `SelectorEventLoop` (issue #10617) The alternative fix is MagicStack/uvloop#646 (not merged at the time of this PR) issue #10617 appears to be very similar to python/cpython@d5aeccf If someone can come up with a working reproducer for #10617 we can revisit this. cc @top-oai Minimal implementation that shows on cancellation the socket is cleaned up without the explicit `close` #10617 (comment) so this should be unneeded unless I've missed something (very possible with all the moving parts here) ## Related issue number fixes #10617 Co-authored-by: J. Nick Koston <[email protected]>
home-assistant/core#141855 Are related |
3.11.15 #10659 |
We yanked 3.11.13 and 3.11.14 and reverted #10464 because of #10617 so we are doing another release to make sure nobody has to go without the other fixes in .13 and .14 <img width="643" alt="Screenshot 2025-03-31 at 5 42 58 PM" src="https://github.com/user-attachments/assets/08317aa3-27f8-4400-87c1-15eeec0c3682" />
In particular, upgrade to a non-yanked aiohttp. See: aio-libs/aiohttp#10617
This seems to have resolved the issue. We have not seen the issue since 3.11.15. Great job, thanks :) |
`aiohttp` version 3.11.13 was yanked per this issue: aio-libs/aiohttp#10617. I've seen errors similar to those described in the issue for this connector, so I'm updating the `aiohttp` version to hopefully address those errors.
`aiohttp` version 3.11.13 was yanked per this issue: aio-libs/aiohttp#10617. I've seen errors similar to those described in the issue for this connector, so I'm updating the `aiohttp` version to hopefully address those errors.
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
I encountered a similar issue while using the latest version of aiohttp
aiohappyeyeballs: 2.6.1
#10506 (comment)
To Reproduce
It's difficult to reproduce directly - currently unable to reproduce with script-based stress testing; it can only be reproduced in the production environment and occurs after a period of testing
Expected behavior
Logs/tracebacks
Python Version
aiohttp Version
multidict Version
propcache Version
yarl Version
OS
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Related component
Client
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: