Skip to content

Conversation

@cchristous
Copy link

@cchristous cchristous commented Nov 14, 2025

This attempts to implement #18882.

The automated tests are all passing, but I don't have confidence this actually works because I don't know how to test the actual integration. I have access to and familiarity with Semaphore, and I could test using the 2 SaaS services, if the warehouse were deployed. Though, I am hoping there is an easier way to test this, so I could use some guidance on how to proceed.

@cchristous cchristous marked this pull request as ready for review November 14, 2025 05:15
@cchristous cchristous requested a review from a team as a code owner November 14, 2025 05:15
Comment on lines 44 to 59
# Extract the repo portion from the sub claim
if ":repo:" not in signed_claim:
return False

parts = signed_claim.split(":repo:")
if len(parts) < 2: # pragma: no cover
# This should be unreachable since we already checked ":repo:" is in the string
return False

# The repo value is between :repo: and the next :
repo_and_rest = parts[1]
repo_parts = repo_and_rest.split(":", 1)
repo_in_sub = repo_parts[0]

if not repo_in_sub:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified a bit:

Suggested change
# Extract the repo portion from the sub claim
if ":repo:" not in signed_claim:
return False
parts = signed_claim.split(":repo:")
if len(parts) < 2: # pragma: no cover
# This should be unreachable since we already checked ":repo:" is in the string
return False
# The repo value is between :repo: and the next :
repo_and_rest = parts[1]
repo_parts = repo_and_rest.split(":", 1)
repo_in_sub = repo_parts[0]
if not repo_in_sub:
return False
# There should only be one ':repo:' in the sub:
if signed_claim.count(":repo:") != 1:
return False
_, repo_and_rest = signed_claim.split(":repo:", 1)
repo_in_sub, _ = repo_and_rest.split(":", 1)
if not repo_in_sub:
return False

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified

Comment on lines 80 to 83
"org",
"org_id",
"prj",
"prj_id",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with SemaphoreCI but I'm surprised we wouldn't want to restrict the token based on these claims? (Even if they are duplicated in the subject?) Do you have any more details about them?

Specifically, I would assume that the _id claims could help us prevent resurrection attacks, unless this is not possible with SemaphoreCI?

At the very least, it would be good to check that these match what's configured in addition to checking the subject and repo_slug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added org_id and prj_id to the checked claims to prevent resurrection attacks

Comment on lines 101 to 112
repo_slug = signed_claims.get("repo_slug")
if not repo_slug:
raise InvalidPublisherError("Missing 'repo_slug' claim")

# repo_slug format: owner/repository
if "/" not in repo_slug:
raise InvalidPublisherError(
f"Invalid 'repo_slug' claim format: {repo_slug!r}, "
"expected 'owner/repository'"
)

organization, project = repo_slug.split("/", 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use repo_slug here rather than the org/prj claims?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't correct/sufficient. I've updated the logic. Previously, it used git organization + repo name. The new logic uses Semaphore organization name, Semaphore project name, and repo slug (which contains Git organization + repo name)

from sqlalchemy.orm import Session


SEMAPHORE_OIDC_ISSUER_URL_SUFFIX = ".semaphoreci.com"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting! Generally our model has been either to support a specific issuer URL or to support custom issuer URLs on a per-organization basis.

Does this mean that there is a different issuer for every SemaphoreCI project? Or generally, what are the expected values here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that there is a different issuer for every SemaphoreCI project? Or generally, what are the expected values here?

Yes, the expected values are of the format https://<org-name>.semaphoreci.com where org-name is the name of the Semaphore organization (not the GitHub organization). Ref https://docs.semaphore.io/reference/openid#reference.

@di
Copy link
Member

di commented Nov 14, 2025

The automated tests are all passing, but I don't have confidence this actually works because I don't know how to test the actual integration. I have access to and familiarity with Semaphore, and I could test using the 2 SaaS services, if the warehouse were deployed. Though, I am hoping there is an easier way to test this, so I could use some guidance on how to proceed.

I think the best way to test this would be to put it behind a feature flag and enable it only for test.pypi.org first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants