Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channels_redis keeps stale connection to redis #393

Open
rythm-of-the-red-man opened this issue Jun 21, 2024 · 1 comment
Open

channels_redis keeps stale connection to redis #393

rythm-of-the-red-man opened this issue Jun 21, 2024 · 1 comment

Comments

@rythm-of-the-red-man
Copy link

Stack

  • Redis instance hosted on azure (aka Azure Cache for Redis)
redis = "^5.0.0"
django-redis = "^5.2.0"
channels-redis = "^4.2.0"
channels = { extras = ["daphne"], version = "^4.0.0" }
Django = "~4.2"

all hosted on azure kubernetes service after ingress-nginx and load balancer.

Traceback

ERROR 2024-06-20 19:41:09,956 daphne.server Exception inside application: Error UNKNOWN while writing to socket. Connection lost.
Traceback (most recent call last):
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py', line 473, in send_packed_command
    await self._writer.drain()
  File '/usr/local/lib/python3.11/asyncio/streams.py', line 392, in drain
    await self._protocol._drain_helper()
  File '/usr/local/lib/python3.11/asyncio/streams.py', line 166, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File '/usr/local/lib/python3.11/site-packages/channels/routing.py', line 62, in __call__
    return await application(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/app/myapp/websockets/middleware.py', line 25, in __call__
    return await super().__call__(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/middleware.py', line 24, in __call__
    return await self.inner(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/routing.py', line 132, in __call__
    return await application(
           ^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/consumer.py', line 94, in app
    return await consumer(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/channels/consumer.py', line 58, in __call__
    await await_many_dispatch(
  File '/usr/local/lib/python3.11/site-packages/channels/utils.py', line 50, in await_many_dispatch
    await dispatch(result)
  File '/usr/local/lib/python3.11/site-packages/channels/consumer.py', line 73, in dispatch
    await handler(message)
  File '/usr/local/lib/python3.11/site-packages/channels/generic/websocket.py', line 249, in websocket_disconnect
    await self.disconnect(message['code'])
  File '/app/myapp/websockets/consumers.py', line 24, in disconnect
    await self.channel_layer.group_discard(self.group_name, self.channel_name)
  File '/usr/local/lib/python3.11/site-packages/channels_redis/core.py', line 518, in group_discard
    await connection.zrem(key, channel)
  File '/usr/local/lib/python3.11/site-packages/sentry_sdk/integrations/redis/asyncio.py', line 66, in _sentry_execute_command
    return await old_execute_command(self, name, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py', line 612, in execute_command
    return await conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py', line 62, in call_with_retry
    await fail(error)
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py', line 599, in _disconnect_raise
    raise error
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py', line 59, in call_with_retry
    return await do()
           ^^^^^^^^^^
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py', line 585, in _send_command_parse_response
    await conn.send_command(*args)
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py', line 497, in send_command
    await self.send_packed_command(
  File '/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py', line 484, in send_packed_command
    raise ConnectionError(
redis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.

Description

This issue keep happening on prod or test env (more or less cloned prod) when we couble channels with redis.

I suspect that managed redis instance timeout idle connection and channels_redis do not attempt to re-connect (the idea might be dumb tho, if so, I'm sorry I don't know much about internals of channels_redis and redis in general). I think it might be a case because the scheme kinda looks as follows: Issue occures when I turn on client app, wait ~10 minutes then try to do any action related to channels like re-establish websocket connection by refreshing page.

I assumed that it might be channels_redis bug that's why I wrote about it here. I'd love any feedback, thanks in advance.

Strange part

Well that's kinda odd but since it happened I decided to include it here. when I run daphne instance in dockerfile like this:

ENTRYPOINT ["/app/etc/entrypoint.sh"]
CMD ["web-prod"]

#########     entrypoint.sh calls this script
#!/bin/bash
# only now we have access to environmental variables so we can call collectstatic
python manage.py collectstatic --noinput -v 0 > /dev/null 2>&1
# run app
daphne -b 0.0.0.0 -p 8000  medishout.asgi:application  -v 3

the issue appears, but if I run another server after connecting to working pod like

daphne -b 0.0.0.0 -p 8001  medishout.asgi:application  -v 3

and i connect to the 2nd one the issue doesn't seem to appear (or I didn't managed to catch it).

@rythm-of-the-red-man
Copy link
Author

rythm-of-the-red-man commented Jun 21, 2024

Well, answer came faster than I tought. This issue was helpful. Since Azure Cache for Redis is not liberal one and healthcheck is off by default (and, I have to admit, poorly described in docs). If you don't mind I'll open PR with amends in docs that will better communicate that you actually can pass additional kwargs to redis-py clinet. In my opinion it might be helpful for users with managed redis instances.

Example of valid config:

    CHANNEL_LAYERS = {
        "default": {
            "BACKEND": "channels_redis.core.RedisChannelLayer",
            "CONFIG": {
                "hosts": [
                    {
                        "address": CHANNELS_REDIS_URL,
                        "retry_on_timeout": True,
                        "health_check_interval": 1,
                        "socket_keepalive": True,
                    }
                ],
                "capacity": 1500,
                "expiry": 5,
            },
        },
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant