Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerhub analyzer #3861

Merged

Conversation

kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented Jan 31, 2025

Description:

This PR adds a new analyzer for dockerhub.
Note: We only detect dockerhub PAT(personal access token), and the scope for dockrehub PAT is only for dockerhub repositories.
Screenshot from 2025-01-31 16-25-28

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 changed the title dockerhub analyzer dockerhub analyzer [WIP] Jan 31, 2025
@kashifkhan0771 kashifkhan0771 marked this pull request as ready for review January 31, 2025 11:27
@kashifkhan0771 kashifkhan0771 requested review from a team as code owners January 31, 2025 11:27
@kashifkhan0771 kashifkhan0771 changed the title dockerhub analyzer [WIP] dockerhub analyzer Jan 31, 2025
Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, nice job!

@@ -116,7 +116,7 @@ func TestDockerhub_FromChunk(t *testing.T) {
got[i].Raw = nil
got[i].RawV2 = nil
got[i].ExtraData = nil
}
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this slipped in.

}

// assignHighestPermission selects the highest available permission
func assignHighestPermission(permissions []string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (optional): Depending on the size of the permissions list, we can improve performance by avoiding map creation within this function for each permission. Instead, we can leverage the predefined ranking by creating a package-level ranking map computed once. This way, the function only iterates over the input list once. This approach adds some complexity, so I’ve left it as optional. Use your judgment, as you're likely more familiar with this code.

Ex:

// Precompute a ranking map for the ranking approach.
// Lower index means higher permission.
var permissionRank = func() map[string]int {
	rank := make(map[string]int, len(permissionHierarchy))
	for i, perm := range permissionHierarchy {
		rank[perm] = i
	}
	return rank
}()


func assignHighestPermissionRank(permissions []string) string {
	bestRank := len(permissionHierarchy)
	bestPerm := ""
	for _, perm := range permissions {
		if rank, ok := permissionRank[perm]; ok {
			// Early exit if highest permission is found.
			if rank == 0 {
				return perm
			}
			if rank < bestRank {
				bestRank = rank
				bestPerm = perm
			}
		}
	}
	return bestPerm
}

Benchmark Results for assignHighestPermission

These benchmarks compare the old (set-based) and new (ranking with early exit) implementations under two scenarios:

  • Best-case: The highest permission (repo:admin) appears at the beginning.
  • Worst-case: The highest permission is absent until the very end (or missing altogether).

The tests were run on:

  • OS: Darwin
  • Arch: arm64
  • CPU: Apple M1 Pro
  • Package: github.com/trufflesecurity/trufflehog/v3/pkg/analyzer/analyzers/dockerhub

Runtime (sec/op)

Test Case Old (sec/op) New (sec/op) Change
size=10, best-case 197.550n ± 1% 3.371n ± 0% -98.29% (p=0.000 n=10)
size=10, worst-case 223.40n ± 0% 66.80n ± 1% -70.10% (p=0.000 n=10)
size=100, best-case 1495.000n ± 1% 3.358n ± 0% -99.78% (p=0.000 n=10)
size=100, worst-case 1558.5n ± 7% 548.9n ± 0% -64.78% (p=0.000 n=10)
size=1000, best-case 16555.500n ± 6% 3.371n ± 0% -99.98% (p=0.000 n=10)
size=1000, worst-case 16.284µ ± 1% 5.386µ ± 6% -66.92% (p=0.000 n=10)
size=10000, best-case 120032.500n ± 1% 3.402n ± 3% -100.00% (p=0.000 n=10)
size=10000, worst-case 120.23µ ± 3% 55.33µ ± 3% -53.98% (p=0.000 n=10)
size=100000, best-case 949148.500n ± 5% 3.372n ± 3% -100.00% (p=0.000 n=10)
size=100000, worst-case 943.6µ ± 0% 546.3µ ± 2% -42.10% (p=0.000 n=10)
Geomean 14.30µ 138.6n -99.03%

Memory Usage (B/op)

Test Case Old (B/op) New (B/op) Change
size=10, best-case 320.0 ± 0% 0.0 ± 0% -100.00% (p=0.000 n=10)
size=10, worst-case 320.0 ± 0% 0.0 ± 0% -100.00% (p=0.000 n=10)
size=100, best-case 2.648Ki ± 0% 0.000Ki ± 0% -100.00% (p=0.000 n=10)
size=100, worst-case 2.648Ki ± 0% 0.000Ki ± 0% -100.00% (p=0.000 n=10)
size=1000, best-case 48.02Ki ± 0% 0.00Ki ± 0% -100.00% (p=0.000 n=10)
size=1000, worst-case 48.02Ki ± 0% 0.00Ki ± 0% -100.00% (p=0.000 n=10)
size=10000, best-case 328.0Ki ± 0% 0.0Ki ± 0% -100.00% (p=0.000 n=10)
size=10000, worst-case 328.0Ki ± 0% 0.0Ki ± 0% -100.00% (p=0.000 n=10)
size=100000, best-case 2.523Mi ± 0% 0.000Mi ± 0% -100.00% (p=0.000 n=10)
size=100000, worst-case 2.523Mi ± 0% 0.000Mi ± 0% -100.00% (p=0.000 n=10)
Geomean 32.03Ki ? (Not computed)

Allocations (allocs/op)

Test Case Old (allocs/op) New (allocs/op) Change
size=10, best-case 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=10, worst-case 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=100, best-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=100, worst-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=1000, best-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=1000, worst-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=10000, best-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=10000, worst-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=100000, best-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
size=100000, worst-case 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10)
Geomean 1.741 ? * (Not computed)*

Here are the benchmarks if you were curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your approach looks Solid! 🚀
I added more comments for explanation and applied it.

}

if errorLogin.Login2FAToken != "" {
// TODO: handle it more appropriately
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Just confirming this TODO is valid, and not necessary for this initial implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it is working. If 2FA is enabled for account we return error. Added TODO just in case we figure out a more better approach in future to handle these scenarios.

// create the http client
client := analyzers.NewAnalyzeClientUnrestricted(cfg) // /user/login is a non-safe request

var secretInfo = &SecretInfo{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: This is purely a personal preference, so feel free to disregard. When passing a pointer to functions responsible for populating a struct, I find it clearer to declare the variable without &. This forces the reviewer to recognize that the pointer is being passed explicitly, indicating that we're modifying a pre-existing struct. Again, this is entirely optional.

Ex:

var secretInfo SecretInfo
...
if err := decodeTokenToSecretInfo(token, &secretInfo); err != nil {
  ...
}

if err := fetchRepositories(client, username, token, &secretInfo); err != nil {
  ...
}


result := analyzers.AnalyzerResult{
AnalyzerType: analyzers.AnalyzerTypeDockerHub,
Metadata: map[string]any{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can probably just set the Metadata field directly in here.

Ex:

Metadata: map[string]any{"Valid_Key": info.Valid}

result := analyzers.AnalyzerResult{
AnalyzerType: analyzers.AnalyzerTypeDockerHub,
Metadata: map[string]any{},
Bindings: make([]analyzers.Binding, 0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can pre-allocate here.

Ex:

Bindings: make([]analyzers.Binding, 0, len(info.Repositories))

@kashifkhan0771 kashifkhan0771 force-pushed the feat/oss-96-dockerhub-analyzer branch from 5e72fa5 to d711389 Compare February 10, 2025 06:45
@ahrav ahrav merged commit b10342e into trufflesecurity:main Feb 11, 2025
13 checks passed
@kashifkhan0771 kashifkhan0771 deleted the feat/oss-96-dockerhub-analyzer branch February 12, 2025 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants