-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker push retry - EOF -- Randomly getting 401 errors #70
Comments
Yep, i checked the commits before using it as it was not present in the readme, and saw that indeed it is a new feature. .......
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970405366Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=151
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970428398Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=152
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970454513Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=153
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970478241Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=154
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970504699Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=155
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970546552Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=156
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970581228Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=157
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:05.113662779Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=0
Dec 04 10:20:06 test-instance dockerd[30991]: time="2020-12-04T10:20:06.945688409Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=0 |
@SezBulent thanks for raising this issue, It is very interesting that you get the 401 on a GET request, which is the only cached method when |
Hello again, Tried different configurations on nginx, increasing timeouts and also increasing worker_connections and worker_rlimit_nofile to 65535 ( also increased the ulimit -Sn and ulimit -Hn on the host ), and it didn't seem to help at all. Since the builds and pushes are ran inside docker ( using gitlab runners ), i also added on docker daemon.json "max-concurrent-uploads": 1 thinking might be a network issue, but again with no results. What I did not mention, and it is pretty important, is that this issue with "EOF" occurs when running multiple builds/pushes ( more than 5 ) through the proxy. |
I've been able to reproduce trying to push an image with several big layers (above 500Mb), it starts retrying them and I finally get an EOF. I get these messages in the proxy log before each retry:
Eventually I also get:
I've tried to bump @SezBulent can you confirm if you get these errors pushing big layers? |
Hm, indeed the problem is more frequent on images with bigger layers, but did not find any similar error to the ones you found. |
Looks like in my case the problem is related to this location directive https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L279-L283 with it in place the
If I comment out https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L279-L283 all goes fine with |
Thank for the details @fgimenez. I commented the specific block you mentioned, and also added proxy_read_timeout 900s; in each location block from the cache "server" block, but didn't resolve the timeout. I measured as well and i get the timeout after 60s. Did the test by pushing an image myself, and not through the gitlab runner, and found the same logs as you did: 2020/12/16 00:05:29 [warn] 60#60: *181 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000010, client: 127.0.0.1, server: _, request: "PATCH /v2/test/bulent-test/ubuntu-layer-test/blobs/uploads/04759132-8c45-4dc9-b9d7-10b79bc373d1?_state=rioY6lcS0gvIT8UgLsxhQx8PzIMmJc03Qm5aKgfyAER7Ik5hbWUiOiJvbWZlL2J1bGVudC10ZXN0L3VidW50dS1iaWctbGF5ZXItdGVzdCIsIlVVSUQiOiIwNDc1OTEzMi04YzQ1LTRkYzktYjlkNy0xMGI3OWJjMzczZDEiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjAtMTItMTZUMDA6MDU6MjkuODQ2MjIzNDM5WiJ9 HTTP/1.1", host: "test-registry.test.com"
2020/12/16 00:06:29 [error] 59#59: *179 proxy_connect: upstream read timed out (peer:127.0.0.1:443) while connecting to upstream, client: 10.3.249.100, server: _, request: "CONNECT test-registry.test.com:443 HTTP/1.1", host: "test-registry.test.com:443" Seems that indeed there is a default timeout of 60s; will verify nginx docs to see what seems to be missing, and also check the network. |
@SezBulent you are right, commenting out https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L279-L283 in fact didn't help, the successful executions I got were just bc the uploads happened to last less than 60s. Will keep trying to understand where this timeout comes from. |
Something weird is going on, most certainly is something that i am missing. Added the following timeouts in nginx config, and i still get my connection cut after 60s ( to be sure that nothing is replaced at docker startup, i modified the entrypoint and removed most of the nginx config from there ):
and this one in the http block: client_header_timeout 1800s; Even included them in every location block to be sure nothing is overwriting my settings ( not the "client_header_timeout" directive, as this one can be specified only in the http or server block ). Also, by renaming the server_name from each block, based on the comments near each server block ( i changed the server name of the director layer to _director, and the one of the caching layer to _caching ), it seems that the error is generated on the _director server: 2020/12/17 01:14:14 [error] 58#58: *86 proxy_connect: upstream read timed out (peer:127.0.0.1:443) while connecting to upstream, client: 10.3.249.100, server: _directory, request: "CONNECT test-registry.test.com:443 HTTP/1.1", host: "test-registry.test.com:443"
2020/12/17 01:14:20 [info] 60#60: *137 client prematurely closed connection, client: 127.0.0.1, server: _caching, request: "PATCH /v2/test/bulent-test/ubuntu-layer-test/blobs/uploads/08ffc7f3-48bc-4a8c-b9f5-f2d363d0bfd0?_state=0pAPdqLcDu2_cgAJc0lS0xBK9CCsfgMn1m8DZ-lzcgF7Ik5hbWUiOiJvbWZlL2J1bGVudC10ZXN0L3VidW50dS1iaWctbGF5ZXItdGVzdCIsIlVVSUQiOiIwOGZmYzdmMy00OGJjLTRhOGMtYjlmNS1mMmQzNjNkMGJmZDAiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjAtMTItMTdUMDE6MTQ6MjAuNjMwNjExMjMxWiJ9 HTTP/1.1", host: "test-registry.test.com" And as a side note, the client request body buffer warn can be resolved by adding "client_body_buffer_size 1000M;" ( or whatever value seems appropriate ) |
Awesome, thanks a lot, for the record i also tried increasing @rpardini wdyt, any idea about what might be causing the error? |
@fgimenez i think i finally figured it out. Did a bit more digging, and saw that the current Dockerfile gets built on top of this one: Analyzing a bit the content of it, and also checking the current nginx, saw that it was compiled for the http_proxy_connect_module - which that's where the "issue" appears. This module, has 3 more timeout directives, where apparently, the read and send timeouts are set to a default of 60 seconds.
Here's the documentation: Increasing these ones to a higher value, seems to resolve the problem. Managed to push an 2.2 GB image without any timeout occuring. @rpardini i think it would be a good idea if we could also specify some of the nginx configs, as timeouts for example, as environment variables? If you agree, i can drop in a few days a pull request. |
@SezBulent awesome! looking forward to your PR, i've tried adding
to the _director server block and I get a consistent 95s connect timeout when pushing two 1Gb layers. |
Hm, I think you must also add them in the caching server layer, I have personally added them into the http block ( it works, even if in the docs the specified context is "server" ). Also, alongside those ones, i think its a good idea to have these ones specified as well - with any value you find appropriate: proxy_read_timeout 1800s;
proxy_connect_timeout 1800s;
proxy_send_timeout 1800s;
send_timeout 1800s;
client_body_timeout 1800s;
client_header_timeout 1800s;
keepalive_timeout 900s; What i have noticed though, is that when the completion bar is full, and the layers seem to be pushed, docker still hangs for a few minutes ( between 5 and 10 mins ), after which it will output the message that the image was pushed. Its like it's still doing some things in the background, but docker debug log didn't generate anything during this time, and neither the proxy ones. Also as a side note, i'm still getting the random 401 status error, yet at a first glance, it doesn't seem to affect anything. |
Hmm, interesting investigations, do go ahead with PR, just keep in mind that push support is new, experimental, and as such any changes in those timeouts should be hidden behind ENV vars (maybe the same ALLOW_PUSH we already have). The timeouts protect clients (and the proxy itself) from misbehaving servers/clients, and disabling them for all users would be bad. Also I think the 401 issue has to be addressed as well, since that happens early and can change the behaviour of following requests... |
Hello, Sorry for the delay, since the holidays came, and then work popped up, I couldn't find some time to add the changes. |
I'm still getting the same timeouts with 0.6.2 trying to push big layers, for instance with an image like this:
I'll try to dig deeper to understand what's going on and maybe report a different issue for tracking. |
Setting higher values for timeouts through the env vars all is working fine 👍 |
I had also trouble pushing... long story short, request buffering was the cause here. #125 fixed it for me. |
Try as I might, I can not get push to work. I have tried setting all timeout values to 900s, and disabling buffering using the fix in #125 and rpardini/docker-registry-proxy@sha256:9c3df0f61f4fe377328a2111ee72f291513f7723ceb6fa943e534a62ba1d0d7f (latest, at this time, since there is no recent tag or release). I always get a chain of retries and failure as described here. Another way to fix this would be if docker itself made it easy to bypass the proxy for push, but its proxy support is honestly rather minimal and awkward, and I don't expect them to enhance anything in this area. |
Hello,
I recently encountered a problem that occurs only randomly, when pushing images. Some layers are retrying, then suddenly ending with EOF. To note that these errors were not encountered in the past, when the proxy was not used.
Error example:
I also found that some other users had this issue, when running behind a proxy. I ran the registry proxy in debug mode, but didn't find any obvious warning or errors in the proxy logs, besides the fact that i sometimes see a 401 response at the following get request, which i do not know if they are related tot the pushing issue:
Request:
GET https://127.0.0.1:444/v2/ HTTP/1.1
Response:
HTTP/1.1 401 Unauthorized
As info, i ran the debug with the following arguments:
At a first glance it seems that the authorization header is not sent, but i also wonder why are these random request happening towards /v2/.
I also looked through the whole nginx config, and added proxy_set_header X-Forwarded-Proto: https; in both server configs, but didn't seem to help.
I'm still investigating the issue, but meanwhile i opened it because there certainly seems to be something wrong.
Thanks.
The text was updated successfully, but these errors were encountered: