You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
It would be nice if trufflehog could smartly scan nested .tar files, as seen in e.g. docker containers.
Problem to be Addressed
When scanning a docker image tarball (such as one saved with docker save ...), trufflehog currently just prints the top-level .tar filename for every hit. This doesn't give a lot of transparency to what component inside the image, or what resulting file path inside a container launched using the image, contains the hit.
Description of the Preferred Solution
Best-case, trufflehog would understand and record-keep when looking inside tar archives, and support doing so in a nested fashion, because docker images are typically nested .tar files of multiple layers, and then print out that context on a hit, maybe something like:
Maybe this would be something generalized, that makes trufflehog filesystem smarter. Or, it might have to be a dedicated mode, trufflehog archive or something. Uncompressed .tar is one thing; I expect compressed archives would be more painful.
Additional Context
There is a fuse filesystem for mounting archives which supports recursive/nested archives as well, https://github.com/mxmlnkn/ratarmount, which transparently turns archive files into subdirectories.
So for example:
mkdir -p some_container
ratarmount -c -r -o ro,allow_other some_container.tar some_container
trufflehog filesystem --directory=some_container 2>&1 | tee "trufflehog_some_container.out"
Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://user:host@foo:3128
File: some_container/e60a0dfc08a94dabb221d8a28c6fdbeaa7cab0c146d35e8eff8e50bc2e4c194b/layer.tar/usr/lib/python2.7/site-packages/urlgrabber/grabber.py
Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://username:[email protected]:80/path
File: some_container/96e436883f4940841fc9f1f7e935bada3859d2ffb0e5455952438d844f8e9c26/layer.tar/usr/lib/python2.7/site-packages/pip/_vendor/urllib3/util/url.py
Found unverified result 🐷🔑❓
Detector Type: PrivateKey
Raw result: -----BEGIN PRIVATE KEY-----
MIICd[snip]
-----END PRIVATE KEY-----
File: some_container/b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20/layer.tar/usr/share/doc/perl-IO-Socket-SSL/example/simulate_proxy.pl
...
Or for a large collection of them:
# for A in *tar ; do
D=$(echo "$A" | sed 's/\.tar$//') ;
mkdir -p "$D" ;
ratarmount -r -o ro,allow_other "$A" "$D" ;
done
$ for A in *tar ; do
D=$(echo "$A" | sed 's/\.tar$//') ;
test -s "trufflehog_${D}.out" && continue ;
echo "$D" ;
trufflehog filesystem --directory="$D" 2>"trufflehog_${D}.err" | tee "trufflehog_${D}.out"
done
If adding native nested-archive support does not seem worth it/desirable, then perhaps just polish/improve this example and document it somewhere.
The text was updated successfully, but these errors were encountered:
I was going to have a look into this but realized I probably don't have enough time to untangle this right now since it's tied to multiple things, so instead I'll try to leave some notes that might be helpful for anyone else looking into it.
Right now the Handler interface has FromFile(context.Context, io.Reader) chan ([]byte) . For archive handler, we might instead want the return type to be (path string, []byte). Then we could update some field on the chunk.SourceMetadata to represent any sub-archive paths.
The problems that I see with it:
Right now, the path can be used to link to the file (e.g., provide a direct link to the file in GitHub), but the sub-archive (archive in archive) can't be linked to in the sources, so this is one factor that would indicate a new field is needed.
Different Source types have unique fields for setting paths. For example, in filesystem, it would be .Filesystem.File, S3 would be .S3.File, GitHub's is .Github.File. Even File itself is not guaranteed, as in the case of Circleci, which might be .Circleci.Link (not sure).
Suggestion might be to add something like ArchivePath to SourceMetadata directly, where you can set full paths, like some_container/b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20/layer.tar:/usr/share/doc/perl-IO-Socket-SSL/example/simulate_proxy.pl. More generally it could look like PATH_TO_FILE_IN_ARCHIVE[:PATH_TO_FILE_IN_SUB_ARCHIVE]...
Community Note
Description
It would be nice if
trufflehog
could smartly scan nested .tar files, as seen in e.g.docker
containers.Problem to be Addressed
When scanning a
docker
image tarball (such as one saved withdocker save ...
),trufflehog
currently just prints the top-level.tar
filename for every hit. This doesn't give a lot of transparency to what component inside the image, or what resulting file path inside a container launched using the image, contains the hit.Description of the Preferred Solution
Best-case,
trufflehog
would understand and record-keep when looking insidetar
archives, and support doing so in a nested fashion, because docker images are typically nested .tar files of multiple layers, and then print out that context on a hit, maybe something like:Maybe this would be something generalized, that makes
trufflehog filesystem
smarter. Or, it might have to be a dedicated mode,trufflehog archive
or something. Uncompressed.tar
is one thing; I expect compressed archives would be more painful.Additional Context
There is a fuse filesystem for mounting archives which supports recursive/nested archives as well, https://github.com/mxmlnkn/ratarmount, which transparently turns archive files into subdirectories.
So for example:
Or for a large collection of them:
If adding native nested-archive support does not seem worth it/desirable, then perhaps just polish/improve this example and document it somewhere.
The text was updated successfully, but these errors were encountered: