-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terraform with S3 backend sporadically fails with "Error: RequestError: send request failed caused by: Post "https://sts.amazonaws.com/": net/http: TLS handshake timeout" #28714
Comments
Maybe relevant: manually restarting the same CI job (which means no change in config!) normally fixes the issue |
I get this 99% of the time, but then once in a while it just works. Same version: 0.15.3. Will have to re-consider using terraform unless someone have a solution. |
After further investigation I think I have found the issue on my side - it is probably the firewall in my router which is causing it as it works fine when I tried it on a 4G connection. |
Hi @Cajga. The error that you're seeing, This may be caused by service issues at AWS, a poor network connection between your CI server and AWS, or problems with the TLS configuration. |
Hi @gdavison, thanks for looking into this. As you correctly said, it could be that the STS API is down/slow during these failing requests. At the same time, this seems to be happening ONLY when we touch the S3 back-end and NEVER when we use the AWS provider to configure resources on AWS. Would it be possible to make the back-end connection more reliable from the terraform client side? Seems when these issues happen with the golang net/http package, people increase the timeout value of the TLS handshake (here you can find an example of this) or as best practice do a retry if it fails on the first connection. P.S.: If you look into the second logs that I included in the description, you can see that an init was working fine but few seconds later the plan was failing. This closes out the wrong TLS configuration (at least on our side). |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
Terraform Configuration Files
As the issue is happening even at a terraform init here is how the remote backend is configured:
Debug Output
We are running in a CI environment which is configured to run terraform in different directories (sometimes at the same time but inside a container so they do not conflict). At every CI job, we do a terrafom init and a terraform plan or apply. The issues happen with all kind of calls sporadically (~ 1 per every 30 terraform call).
Here you can see two trace outputs from two different runs:
Expected Behavior
Terraform calls work as expected
Actual Behavior
Terraform calls (init/plan and in few occasion apply) fail
Steps to Reproduce
As mentioned the issue happens with all type of terraform calls randomly
Additional Context
As mentioned above, terraform runs in a CI env configuring multiple directories. The issue happens sporadically and with different type of calls.
References
I could find some very old tickets searching for the phrase "net/http: TLS handshake timeout" which were closed with unable to reproduce but I am not sure if they were relevant (some of them had the issue permanently).
The text was updated successfully, but these errors were encountered: