Skip to content

awsx.ecr.RegistryImage deletion fails after credential refresh #1537

@rjhuijsman

Description

@rjhuijsman

Describe what happened

I'm using https://www.pulumi.com/registry/packages/awsx/api-docs/ecr/registryimage/ to manage Docker images stored in ECR. Normally, I can create and delete these resources correctly by running pulumi up. However, if I've been away long enough to need to refresh my AWS credentials, then awsx.ecr.RegistryImage deletions are (and remain) broken.

Here's an example timeline:

  1. On Monday:
  • I run aws sso login
  • I run pulumi up. A pulumi_awsx.ecr.RegistryImage is created.
  1. Later on Monday:
  • I edit my code to change the RegistryImage out for a different one (different Pulumi resource name, different image to push).
  • I run pulumi up again. This works: it creates a new pulumi_awsx.ecr.RegistryImage, and at the end of the run deletes the old RegistryImage.
  1. On Tuesday:
    • I run aws sso login again, because my credentials have expired.
    • I edit my code to change the RegistryImage out again, just like on Monday, again creating a whole new Pulumi resource and deleting the old one.
    • I run pulumi up again, just like on Monday. All AWS resources are updated correctly, except deleting the old pulumi_awsx.ecr.RegistryImage fails with a 403.

So to be explicit:

  • I am correctly authenticated on both days: I run aws sso login on both days, and all other AWS resources work.
  • If I run pulumi up 2x on the same day, then there's no problem with pulumi_awsx.ecr.RegistryImage - creation and deletion work. So my Pulumi code seems correct.
  • But on day 2, despite being logged in correctly and having correct Pulumi code, (only) deletion fails for (only) pulumi_awsx.ecr.RegistryImage.

My requirements.txt has up-to-date versions of the Pulumi SDKs:

pulumi>=3.149.0,<4.0.0
pulumi-aws>=6.23.0,<7.0.0
pulumi-awsx>=2.21.0,<3.0.0
pulumi-command>=1.0.1,<2.0.0

This issue is a blocker for our ability to use awsx.ecr.RegistryImage, and thereby makes it really hard for us to use AWS ECR via Pulumi.

As a workaround we are setting the keep_remotely=True setting on awsx.ecr.RegistryImage; that way we don't experience the 403, but leaves the image in the registry, which is not long-term feasible for us.

Sample program

Here is an example of Python code that exhibits this issue:

        repository = pulumi_aws.ecr.Repository(
           f"ecr-repository-{name}",
           name=repository_name,
           image_tag_mutability="IMMUTABLE",
           # Don't block deletion of the repository when the stack is deleted
           # just because we've pushed images to it.
           force_delete=True,
           opts=pulumi.ResourceOptions(
               parent=parent,
               provider=self._account_structure.provider,
           ),
       )

       # Load the image from its tar file into our local Docker daemon; this is
       # where Pulumi expects to find the image when it pushes it to the
       # registry.
       tar_content_hash = content_hash(source_tar_path)
       load_command = pulumi_command.local.Command(
           # Using the digest in the name ensures that if the image changes, a
           # new version of it will be loaded.
           f"{name}-{tar_content_hash}-load",
           create=pulumi.Output.concat(
               "docker load --input ", source_tar_path
           ),
           opts=pulumi.ResourceOptions(parent=parent),
       )
       # The `loaded_image_name` is the same every time, e.g. `mycoolimage:latest`.
       loaded_image_name = load_command.stdout.apply(
           lambda stdout:
           # Remove the last newline, if any.
           stdout.strip()
           # The last line of the output contains the image name.
           .split("\n")[-1]
           # It is the last word on that line.
           .split(" ")[-1]
       )
       registry_image = pulumi_awsx.ecr.RegistryImage(
           # Using a unique name for this resource means that every version of
           # the image will be a new `RegistryImage` resource, which we prefer
           # over replacing an existing `RegistryImage`: replacing first
           # deletes the old image, then creates the new one, which means that
           # there is a time period where the image is not available in ECR. If
           # the deployment were to fail during that time, the image would be
           # permanently unavailable. By creating a new resource for each
           # image, each image gets pushed, and only at the end of the Pulumi
           # run are the old images deleted (when Pulumi knows that their
           # resources are no longer being created).
           f"ecr-image-{name}-{tar_content_hash}",
           repository_url=repository.repository_url,
           source_image=loaded_image_name,
           # A unique tag that ensures we don't try to push the same tag twice
           # with different content.
           tag=tar_content_hash,
           opts=pulumi.ResourceOptions(
               parent=parent,
               provider=self._account_structure.provider,
           ),
       )

Log output

Diagnostics:
  docker:index:RegistryImage (ecr-image-myimage-267884495a165f874573dbf10f0a55d1ad2036099a03505cac7c6e4d4cf1aa9f):
    error:   sdk-v2/provider2.go:515: sdk.helper_schema: Got error deleting registry image: Got bad response from registry: 403 Forbidden: [email protected]
    error: deleting urn:pulumi:aws-test1::reboot-cloud-awsx:ecr:RegistryImage$docker:index/registryImage:RegistryImage::ecr-image-myimage-267884495a165f874573dbf10f0a55d1ad2036099a03505cac7c6e4d4cf1aa9f: 1 error occurred:
    	* Got error deleting registry image: Got bad response from registry: 403 Forbidden

Affected Resource(s)

awsx.ecr.RegistryImage, deletion only

Output of pulumi about

$ pulumi about
CLI
Version 3.100.0
Go Version go1.21.5
Go Compiler gc

Plugins
NAME VERSION
python unknown

Host
OS ubuntu
Version 20.04
Arch x86_64

This project is written in python: executable='/home/vscode/.rye/shims/python3' version='3.10.16'

Current Stack: [REDACTED]

TYPE URN
[REDACTED]

Found no pending operations associated with reboot-dev/aws-test1

Backend
Name pulumi.com
URL https://app.pulumi.com/[REDACTED]
User [REDACTED]
Organizations [REDACTED]
Token type personal

Dependencies:
NAME VERSION
build 1.0.3
certifi 2019.11.28
chardet 3.0.4
dbus-python 1.2.16
idna 2.8.0
isort 5.12.0
mypy 1.2.0
pip 23.0.1
Pygments 2.3.1
PyGObject 3.36.0
python-apt 2.0.1+ubuntu0.20.4.1
PyYAML 5.3.1
requests 2.22.0
requests-unixsocket 0.2.0
ruff 0.1.14
setuptools 65.5.0
six 1.14.0
urllib3 1.25.8
yapf 0.40.2

Pulumi locates its logs in /tmp by default

Additional context

From my naive external view, it looks like awsx.ecr.RegistryImage deletion is still using the old credentials, while e.g. creation uses new credentials. Could that be?

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting-upstreamThe issue cannot be resolved without action in another repository (may be owned by Pulumi).kind/bugSome behavior is incorrect or out of spec

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions