Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix incorrect skip result evaluation causing false positives in PyPI malware reporting" #1031

Open
wants to merge 7 commits into
base: staging
Choose a base branch
from

Conversation

art1f1c3R
Copy link
Member

Addressing issue identified in #1027, where skips were being evaluated as false. This PR introduces wrappers passed() and failed() into the ProbLog model that use try_call() statements. Skipped heuristics are no longer defined in the ProbLog model, which is why this try_call() statement is used. This means that evaluating failed(heuristic) will be false if the heuristic passed, or if it was not defined (i.e. was skipped). Similarly, for evaluating passed(), this will be false if the heuristic failed, or if it was not defined. This should handle situations where skips should not cause rules they are part of to trigger. This method was the easiest way to keep as much of the ProbLog model in a static string as possible, without having to perform extensive string operations.

Rule IDs have also been added for debugging purposes, and a method to extract them, so that it is evident what rule was triggered.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 27, 2025
@art1f1c3R art1f1c3R closed this Mar 27, 2025
@art1f1c3R art1f1c3R reopened this Mar 27, 2025
@art1f1c3R art1f1c3R force-pushed the art1f1c3R/malware-bug-1027 branch from 8af93d3 to 97bb593 Compare March 27, 2025 05:33
@art1f1c3R art1f1c3R marked this pull request as ready for review March 27, 2025 05:41
{Confidence.MEDIUM.value}::result("medium_confidence_2") :-
quickUndetailed,
failed({Heuristics.ONE_RELEASE.value}),
passed({Heuristics.WHEEL_ABSENCE.value}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isHeuristics.WHEEL_ABSENCE.value passed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a result of the translation from these combinations:

    (
        HeuristicResult.FAIL,  # Empty Project
        HeuristicResult.SKIP,  # Source Code Repo
        HeuristicResult.FAIL,  # One Release
        HeuristicResult.SKIP,  # High Release Frequency
        HeuristicResult.SKIP,  # Unchanged Release
        HeuristicResult.FAIL,  # Closer Release Join Date
        HeuristicResult.PASS,  # Suspicious Setup
        HeuristicResult.PASS,  # Wheel Absence
        HeuristicResult.FAIL,  # Anomalous Version
        # No project link, only one release, and the maintainer released it shortly
        # after account registration.
        # The setup.py file has no effect and .whl file is present.
        # The version number is anomalous.
    ): Confidence.MEDIUM,
    (
        HeuristicResult.FAIL,  # Empty Project
        HeuristicResult.SKIP,  # Source Code Repo
        HeuristicResult.FAIL,  # One Release
        HeuristicResult.SKIP,  # High Release Frequency
        HeuristicResult.SKIP,  # Unchanged Release
        HeuristicResult.FAIL,  # Closer Release Join Date
        HeuristicResult.FAIL,  # Suspicious Setup
        HeuristicResult.PASS,  # Wheel Absence
        HeuristicResult.FAIL,  # Anomalous Version
        # No project link, only one release, and the maintainer released it shortly
        # after account registration.
        # The setup.py file has no effect and .whl file is present.
        # The version number is anomalous.
    ): Confidence.MEDIUM,
    (
        HeuristicResult.FAIL,  # Empty Project
        HeuristicResult.SKIP,  # Source Code Repo
        HeuristicResult.FAIL,  # One Release
        HeuristicResult.SKIP,  # High Release Frequency
        HeuristicResult.SKIP,  # Unchanged Release
        HeuristicResult.FAIL,  # Closer Release Join Date
        HeuristicResult.SKIP,  # Suspicious Setup
        HeuristicResult.PASS,  # Wheel Absence
        HeuristicResult.FAIL,  # Anomalous Version
        # No project link, only one release, and the maintainer released it shortly
        # after account registration.
        # The setup.py file has no effect and .whl file is present.
        # The version number is anomalous.
    ): Confidence.MEDIUM,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the previous implementation, we wanted to make sure if wheel absence does not fail, we still report as a malicious patterns as long as anomalicious version fails, there is one release, and quickUndetailed matches. With the new declarative approach, do we still need explicitly require passed({Heuristics.WHEEL_ABSENCE.value})?

passed({Heuristics.WHEEL_ABSENCE.value}),
failed({Heuristics.ANOMALOUS_VERSION.value}).

% ----- Evaluation -----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment in the code to explain what happens here? I.e., which rules will be selected and how they are combined, e.g., is it enough if one of them is True? What if more than one rule is evaluated to True?

Copy link
Member Author

@art1f1c3R art1f1c3R Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 12b2c65 should now handle this. The ProbLog model aggregates the results of all of the rules, and a separate query gets the triggered rule IDs.

@art1f1c3R art1f1c3R force-pushed the art1f1c3R/malware-bug-1027 branch 2 times, most recently from 326e88f to 362d0b4 Compare April 2, 2025 23:44
@art1f1c3R art1f1c3R force-pushed the art1f1c3R/malware-bug-1027 branch from 362d0b4 to d46f7ed Compare April 3, 2025 06:13
@behnazh-w behnazh-w changed the title fix: pypi malware reporting false positives due to incorrect skip result evaluation fix: fix incorrect skip result evaluation causing false positives in PyPI malware reporting" Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pypi malware reporting false positives due to incorrect skip result evaluation
2 participants