Skip to content

Partial upgrade of AWS SDK Go packages from V1 to V2 #856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

gargipanatula
Copy link
Contributor

What this PR does / why we need it:
Upgrades some packages of the AWS SDK Go from V1 to V2. The AWS SDK Go V1 is being deprecated on July 31, 2025: https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-go-v1-on-july-31-2025/.

This PR upgrades all packages except the github.com/aws/aws-sdk-go/aws/endpoints package, which is used in partition verification. Given that upgrading that particular functionality will require larger changes, its changes will be pushed out in a separate PR.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Addresses #736, but does not fully close it.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 28, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @gargipanatula!

It looks like this is your first PR to kubernetes-sigs/aws-iam-authenticator 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-iam-authenticator has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 28, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @gargipanatula. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gargipanatula
Once this PR has been reviewed and has the lgtm label, please assign wongma7 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 28, 2025
@@ -226,20 +227,19 @@ func (g generator) GetWithOptions(options *GetTokenOptions) (Token, error) {

// create a session with the "base" credentials available
// (from environment variable, profile files, EC2 metadata, etc)
sess, err := session.NewSessionWithOptions(session.Options{
AssumeRoleTokenProvider: StdinStderrTokenProvider,
SharedConfigState: session.SharedConfigEnable,
Copy link
Contributor Author

@gargipanatula gargipanatula May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer setting SharedConfigState, as its behavior when set to SharedConfigEnable is now default in V2:

In V1, SharedConfigEnable sets AWS_SDK_LOAD_CONFIG to a truthy value, so that it loads from the shared config (/.aws/config) and shared credentials (/.aws/credentials) files. (V1 docs)

In V2, using LoadDefaultConfig will load from all supported sources including /.aws/config and /.aws/credentials. (V2 docs).

req.Header.Set(headerSourceArn, reqHeaders[headerSourceArn])
}
return next.HandleBuild(ctx, in)
}), smithymiddleware.Before)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: V2 now uses middleware to add headers, added a test for this.

@bryantbiggs
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 28, 2025
Fn: request.MakeAddToUserAgentHandler(
"aws-iam-authenticator", pkg.Version),
})
if options.Region != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not handled correctly as we are overriding the default region if region is provided in the options right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yep, this should've been changed to

if options.Region != "" {
	cfg.Region = options.Region
}

thank you

AssumeRoleTokenProvider: StdinStderrTokenProvider,
SharedConfigState: session.SharedConfigEnable,
})
cfg, err := config.LoadDefaultConfig(context.Background(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have context passed to the function ?

instanceRegion, err := ec2metadata.Region()
cfg, err := awsconfig.LoadDefaultConfig(context.Background())
if err != nil {
logrus.WithError(err).Errorln("Could not load config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think session.Must would panic if error, this would just log the error and continue right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes ty - will fix so it also panics.

logrus.WithError(err).Errorln("Could not load config")
}
imdsClient := imds.NewFromConfig(cfg)
instanceRegionOutput, err := imdsClient.GetRegion(context.TODO(), &imds.GetRegionInput{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass context to this function since we are using it multiple places?

@@ -203,23 +202,23 @@ func NewFileCacheProvider(clusterID, profile, roleARN string, provider aws.Crede
// Retrieve() implements the Provider interface, returning the cached credential if is not expired,
// otherwise fetching the credential from the underlying Provider and caching the results on disk
// with an expiration time.
func (f *FileCacheProvider) Retrieve() (credentials.Value, error) {
func (f *FileCacheProvider) Retrieve(ctx context.Context) (aws.Credentials, error) {
return f.RetrieveWithContext(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this function if we are adding the ctx parameter as there is a function RetrieveWithContext?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you - that ctx was supposed to be passed into RetrieveWithContext().

FileCacheProvider implements aws.CredentialsProvider, so it needs to implement Retrieve(ctx context.Context) (Credentials, error).


var _ aws.CredentialsProvider = &v2{}

func (p *v2) Retrieve(ctx context.Context) (aws.Credentials, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this logic moved to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was used to convert between V1 and V2 credentials, and now that we've fully switched to V2, we can get rid of it.

Initial commit that added this file: 9cdf38d

@gargipanatula
Copy link
Contributor Author

/retest

config.WithRegion("us-west-2"),
)
if err != nil {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we fail the test in this case ?

@@ -204,23 +203,23 @@ func NewFileCacheProvider(clusterID, profile, roleARN string, provider aws.Crede
// Retrieve() implements the Provider interface, returning the cached credential if is not expired,
// otherwise fetching the credential from the underlying Provider and caching the results on disk
// with an expiration time.
func (f *FileCacheProvider) Retrieve() (credentials.Value, error) {
return f.RetrieveWithContext(context.Background())
func (f *FileCacheProvider) Retrieve(ctx context.Context) (aws.Credentials, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between this and RetrieveWithContext function, can we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally sure why both exist. Looks like it just used to be Retrieve, but the RetrieveWithContext was introduced when some packages were upgraded a few months ago (9cdf38d)

Retrieve() is necessary for the aws.CredentialsProvider interface, maybe we can just merge the functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you get a chance to test the cache file provider feature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just relying on the existing tests since there weren't major changes to the functionality. I might be missing something though, so if there's something we should test then I can take a look and add that in.

sess = sess.Copy(aws.NewConfig().WithRegion(options.Region).WithSTSRegionalEndpoint(endpoints.RegionalSTSEndpoint))
cfg.Region = options.Region
}
if cfg.Region == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this additional logic?

Copy link
Contributor Author

@gargipanatula gargipanatula Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AWS_REGION is not set, then we have to get the region from the instance metadata. Without this, calls will fail because the client won't be configured with a region (sample error from previous e2e test runs: could not get token: operation error STS: GetCallerIdentity, get identity: get credentials: operation error STS: AssumeRole, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region)

V2 documentation link: https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/configure-gosdk.html#specifying-the-aws-region

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the behavior same before in v1 that we set it as empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging a bit, it looks like this actually stems from STS. In AWS SDK Go V1, if a region was not configured, it would fallback to the global endpoint. In V2, it causes the request to fail (documentation here)

Given this, I'm thinking we should panic if we can't find a region through imds, since it means that GetCallerIdentity will fail.

@@ -280,65 +289,85 @@ func (g generator) GetWithOptions(options *GetTokenOptions) (Token, error) {
// If the current session is already a federated identity, carry through
// this session name onto the new session to provide better debugging
// capabilities
resp, err := stsAPI.GetCallerIdentity(&sts.GetCallerIdentityInput{})
resp, err := stsClient.GetCallerIdentity(context.Background(), &sts.GetCallerIdentityInput{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function has ctx param, can we use it?

// Add our custom cluster ID header
smithyhttp.SetHeaderValue(clusterIDHeader, clusterID),
// For backwards compatability, add the unused X-Amz-Expires header
smithyhttp.SetHeaderValue("X-Amz-Expires", "60"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the constant requestPresignParam instead of 60?

@@ -204,23 +203,23 @@ func NewFileCacheProvider(clusterID, profile, roleARN string, provider aws.Crede
// Retrieve() implements the Provider interface, returning the cached credential if is not expired,
// otherwise fetching the credential from the underlying Provider and caching the results on disk
// with an expiration time.
func (f *FileCacheProvider) Retrieve() (credentials.Value, error) {
return f.RetrieveWithContext(context.Background())
func (f *FileCacheProvider) Retrieve(ctx context.Context) (aws.Credentials, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this func signature is same as RetrieveWithContext right?

if aws.StringValue(sess.Config.Region) == "" {
sess.Config.Region = aws.String(region)
func newEC2Client(roleARN, sourceARN, region string, qps int, burst int) EC2API {
ctx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you get this as input param as the caller stack has it

RoleARN: roleARN,
Duration: time.Duration(60) * time.Minute,
stsCfg, err := config.LoadDefaultConfig(ctx,
config.WithRegion(region),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be cfg.region right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can we copy/duplicate the cfg and override the http client instead of LoadDefaultConfig again as if something is changed in the original cfg, it has to be done here also?

@@ -197,17 +206,17 @@ func (p *ec2ProviderImpl) GetPrivateDNSName(id string) (string, error) {
logrus.Infof("Calling ec2:DescribeInstances for the InstanceId = %s ", id)
metrics.Get().EC2DescribeInstanceCallCount.Inc()
// Look up instance from EC2 API
output, err := p.ec2.DescribeInstances(&ec2.DescribeInstancesInput{
InstanceIds: aws.StringSlice([]string{id}),
output, err := p.ec2.DescribeInstances(context.Background(), &ec2.DescribeInstancesInput{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get ctx as param?

{
name: "header with source arn",
args: map[string]string{
headerSourceAccount: "123456789012",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see headerSourceAccount is used, can we remove it?

@@ -482,7 +489,7 @@ func (h *handler) renderTemplate(template string, identity *token.Identity) (str
if !instanceIDPattern.MatchString(identity.SessionName) {
return "", fmt.Errorf("SessionName did not contain an instance id")
}
privateDNSName, err := h.ec2Provider.GetPrivateDNSName(identity.SessionName)
privateDNSName, err := h.ec2Provider.GetPrivateDNSName(context.Background(), identity.SessionName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get the ctx passed as a param since it called from the handler which already has it?

sess = sess.Copy(aws.NewConfig().WithRegion(options.Region).WithSTSRegionalEndpoint(endpoints.RegionalSTSEndpoint))
cfg.Region = options.Region
}
if cfg.Region == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the behavior same before in v1 that we set it as empty?

@@ -204,23 +203,23 @@ func NewFileCacheProvider(clusterID, profile, roleARN string, provider aws.Crede
// Retrieve() implements the Provider interface, returning the cached credential if is not expired,
// otherwise fetching the credential from the underlying Provider and caching the results on disk
// with an expiration time.
func (f *FileCacheProvider) Retrieve() (credentials.Value, error) {
return f.RetrieveWithContext(context.Background())
func (f *FileCacheProvider) Retrieve(ctx context.Context) (aws.Credentials, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you get a chance to test the cache file provider feature?

headerSourceArn: "arn:aws:eks:us-east-1:123456789012:MyCluster/res1",
},
want: map[string]string{
headerSourceArn: "arn:aws:eks:us-east-1:123456789012:MyCluster/res1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have the headerSourceAccount also right?

}{
{
name: "header with source arn",
args: map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can just be a string right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're working with multiple values, I personally find it easier to parse/use them if they're in a data structure. However, if it's more status quo to use a string for the args then I can take a look into that.

@@ -258,8 +266,8 @@ func (p *ec2ProviderImpl) getPrivateDnsAndPublishToCache(instanceIdList []string
// Look up instance from EC2 API
logrus.Infof("Making Batch Query to DescribeInstances for %v instances ", len(instanceIdList))
metrics.Get().EC2DescribeInstanceCallCount.Inc()
output, err := p.ec2.DescribeInstances(&ec2.DescribeInstancesInput{
InstanceIds: aws.StringSlice(instanceIdList),
output, err := p.ec2.DescribeInstances(context.Background(), &ec2.DescribeInstancesInput{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass the context as the function param

stsCfg := cfg
stsCfg.HTTPClient = rateLimitedClient

if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this err check is not needed right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants