-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from go-yara to yara-x; improve performance and readability #734
Open
egibs
wants to merge
19
commits into
chainguard-dev:main
Choose a base branch
from
egibs:use-yara-x-take-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
egibs
force-pushed
the
use-yara-x-take-2
branch
29 times, most recently
from
December 23, 2024 20:20
d4bf8dc
to
c7aaf0d
Compare
egibs
force-pushed
the
use-yara-x-take-2
branch
3 times, most recently
from
December 23, 2024 20:31
e8041d8
to
659cd56
Compare
Signed-off-by: egibs <[email protected]>
egibs
force-pushed
the
use-yara-x-take-2
branch
from
December 23, 2024 20:32
659cd56
to
dcac602
Compare
Signed-off-by: Evan Gibler <[email protected]>
Signed-off-by: egibs <[email protected]>
egibs
changed the title
Swap over to yara-x; improve performance and readability
Migrate from go-yara to yara-x; improve performance and readability
Dec 24, 2024
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: Evan Gibler <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: Evan Gibler <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Signed-off-by: egibs <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes: #227
Closes: #497
After hacking around with yara-x locally last night, the performance gains over Yara are definitely noticeable (usually 2x faster at minimum). Previous versions seemed to be slower (at least
0.10.0
) but my earlier testing may have been bugged/flawed. That said, I wanted to get this rather large refactor up to test the CI experience again since we have to build the API from scratch, but when refreshing the sample data locally I was down to about ~24-25 seconds with the integration tests taking about 27 seconds (on my M1 Pro MBP).The main limitation with
yara-x
is that it does not expose any functionality to turn off problematic rules. To handle this, I added functionality to remove rules prior to compilation so that they are not evaluated by the scanner.To help with CPU overhead, the PR also adds a pool of scanners that can be re-used across files. Previously, we always GC'd the active scanner after each scan which had a sizable impact, especially for scan paths containing many files.
This PR also cleans up
recursiveScan
and fixes the behavior oflongestUnique
and splits out path-related functions into a newpath.go
file. Additionally,processArchive
will now concurrently scan extracted files which was previous serial. With this change, I can scan the OpenJDK package which extracts roughly 136,000 files in ~70-90 seconds.Outside of the
longestUnique
changes, the final, rendered output is essentially 1:1 with the current implementation.Edit: this PR is about 3-4x faster in GHA with 8-core runners (even when running in a container):
The tests and
golangci-lint
jobs now run in a Wolfi container to avoid compiling the yara-x C API each time the Workflows run; this more than halves the runtime of each job (5+ minutes down to ~2 minutes).