RabbitMQ docker image takes more time to start when using home dir in an EFS mount #471
Replies: 8 comments 5 replies
-
The bitnami chart/image isn't derived from this image https://github.com/bitnami/bitnami-docker-rabbitmq/blob/master/3.8/debian-10/Dockerfile The discussion in that thread is pretty informative for troubleshooting the issue, notably bitnami/charts#4936 (comment) and michaelklishin's comment about using |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin @wglambert I did use the debug and strace and here are my findings. please have a look when you have some time while its stuck for 7 minutes at [root@ip-172-31-27-174 fs1]# docker run --cap-add SYS_PTRACE -v /mnt/efs/fs1:/var/lib/rabbitmq -v /mnt/efs/fs1/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf --hostname my-rabbit --name some-rabbit rabbitmq:3.8.11
Unable to find image 'rabbitmq:3.8.11' locally
3.8.11: Pulling from library/rabbitmq
d519e2592276: Pull complete
d22d2dfcfa9c: Pull complete
b3afe92c540b: Pull complete
cd4e41ce9500: Pull complete
e2741828ce46: Extracting [=======================================> ] 25.56MB/32.73MB
e2741828ce46: Pull complete
6cf1935b659a: Pull complete
3df71d67553c: Pull complete
ac4f52d15541: Pull complete
0af823fd61c8: Pull complete
85579530757b: Pull complete
Digest: sha256:52e73c649b3ef628fb2b0dafd5b043c0b397bd188a0326a6514d37662d84b425
Status: Downloaded newer image for rabbitmq:3.8.11
Configuring logger redirection Strace outputs USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
rabbitmq 1 0.2 0.0 4636 888 ? Ss 03:55 0:00 /bin/sh /opt/rabbitmq/sbin/rabbitmq-server
rabbitmq 16 2.7 4.9 1686864 50052 ? Sl 03:55 0:02 /usr/local/lib/erlang/erts-11.1.7/bin/beam.smp -W w -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576
rabbitmq 23 0.0 0.0 4528 880 ? Ss 03:55 0:00 erl_child_setup 1024
rabbitmq 48 0.0 0.0 8280 88 ? S 03:55 0:00 /usr/local/lib/erlang/erts-11.1.7/bin/epmd -daemon
rabbitmq 68 0.0 0.1 8272 1180 ? Ss 03:55 0:00 inet_gethost 4
rabbitmq 69 0.0 0.1 10392 1716 ? S 03:55 0:00 inet_gethost 4
root 70 0.5 0.3 20264 3840 pts/0 Ss 03:56 0:00 bash
root 330 0.0 0.3 36160 3284 pts/0 R+ 03:57 0:00 ps -aux
root@my-rabbit:/# strace -p 1
strace: Process 1 attached
rt_sigsuspend([], 8
root@my-rabbit:/# strace -p 16
strace: Process 16 attached
select(0, NULL, NULL, NULL, NULL
root@my-rabbit:/# strace -p 23
strace: Process 23 attached
select(5, [3 4], NULL, NULL, NULL
root@my-rabbit:/# strace -p 48
strace: Process 48 attached
select(7, [3 4 5], NULL, NULL, {tv_sec=0, tv_usec=147862}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0}) = 0 (Timeout)
select(7, [3 4 5], NULL, NULL, {tv_sec=5, tv_usec=0} |
Beta Was this translation helpful? Give feedback.
-
Sounds similar to helm/charts#1711 (so EFS is just NFS behind the scenes?). Although about elasticsearch, this post seems relevant since RabbitMQ likely also cares about filesystem performance:
|
Beta Was this translation helpful? Give feedback.
-
I have the same issue with rabbitmq:latest |
Beta Was this translation helpful? Give feedback.
-
I'm able to run RabbitMQ 3.8.11 just fine persisting to our internal NFS, but when EFS gets involved start up times turn nasty. Interesting thing though is I see fine run time performance. @michaelklishin would you have any ideas? Just a note, I spent a significant amount of time with AWS support before we found this issue and we can confirm the EFS is functioning just fine, it just seems that for some reason rabbit won't write very quickly to it. Just watching the EFS during start up, it seems to be writing 400mb of quorum queue data very slowly. We're not using this feature yet so I'm not too familiar with it but this happens every start up. It deletes the data and rewrites it. |
Beta Was this translation helpful? Give feedback.
-
Hi team.
we would further test with higher throughput and operations per second from the efs side. |
Beta Was this translation helpful? Give feedback.
-
Reducing the |
Beta Was this translation helpful? Give feedback.
-
Was this issue ever fixed? I'm seeing the same problem but with EBS / ext4 storage, not EFS. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
RabbitMQ Docker image takes a long time to start (5- 10 minutes) when using an efs mount as home directory location(/var/lib/rabbitmq).
To Reproduce
Steps to reproduce the behavior:
Create an efs file system
Create an ec2 instance mounting the efs file system to a path (/mnt/efs/fs1)
Run without using an efs mount
docker run --hostname my-rabbit --name some-rabbit rabbitmq:3.8.11
Starts normally. logs attached with
log.file.level = debug
without_efs.log
Run using an efs mount for home dir
docker run -v /mnt/efs/fs1:/var/lib/rabbitmq --hostname my-rabbit --name some-rabbit rabbitmq:3.8.11
Takes longer time to start. logs attached with
log.file.level = debug
#efs.log
Additional Information
This is also reproducible in Kubernetes (using bitnami rabbitmq chart) .
#bitnami/charts#4936
Beta Was this translation helpful? Give feedback.
All reactions