Skip to content

Conversation

@johnmhoran
Copy link
Member

  • Also replaced PURL normalization option with default deduplication.

Reference: #365

- Also replaced PURL normalization option with default deduplication.

Reference: #365

Signed-off-by: John M. Horan <[email protected]>
@johnmhoran
Copy link
Member Author

@pombredanne This set of changes includes a portion of your detailed feedback from last week -- will address your remaining comments next.

@TG1999 @keshav-space I will also address your remaining comments on the cocoapods support I've added to packageurl-python/purl2url.py and fetchcode/package.py once I've finished updating the purlclci.py and related code and tests.

@johnmhoran
Copy link
Member Author

@pombredanne @AyanSinhaMahapatra I've just committed and pushed my latest PURLCLI-related changes, including adding details to the RTD re the use of the four current PURLCLI tools. I stuck with the existing RTD structure and thus made all my RTD changes in /purldb/purldb-toolkit/README.rst.

This latest push also includes numerous detailed changes to purlcli.py and related test code, including what I think are all of Philippe's recent detailed comments (thanks again for these). One point re the structure and content of the output for the urls command (like the other three commands, the output is illustrated in some detail in my updated RTD entry):

  • Sometimes repository_download_url and repo_download_url_by_package_type are null, and sometimes both are not null. They are always the same as each other, but not always the same as download_url. When they do have a URL value, in my exploration they always have the same URL value as download_url. get_repo_download_url_by_package_type() is only used once -- as a fallback value in get_repo_download_url(). See https://github.com/package-url/packageurl-python/blob/18672be8c5a528ae69143206f66a875ed4043de7/src/packageurl/contrib/purl2url.py#L388

  • Thus my revised approach:

    • Keep download_url as a separate key.
    • Combine repository_download_url and repo_download_url_by_package_type into repository_download_url -- no need to keep repo_download_url_by_package_type as a separate key.

I look forward to your comments.

@johnmhoran
Copy link
Member Author

For easy reference the RTD update is here (although the new TOC doesn't show up in this simple .rst file).

https://github.com/nexB/purldb/blob/f49eff87ae9a4226584096498836de7d5015f6a2/purldb-toolkit/README.rst

Reference: #365

Signed-off-by: John M. Horan <[email protected]>
@pombredanne pombredanne changed the title Update data structure and logging, add cocoapods support Update purlcli documentation, data structure and logging, add cocoapods support May 30, 2024
johnmhoran and others added 2 commits May 30, 2024 11:27
Reference: #365
Signed-off-by: John M. Horan <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
@johnmhoran
Copy link
Member Author

@pombredanne Now that I'm removing all logging/messaging, how do you want to handle a command run on a PURL whose type is not supported? Supported types are different for each command (some overlap some not) and changes over time as we add more support. I used to provide a warning like "'pkg:nginx/[email protected]' not supported with 'metadata' command",, but this now shows up as

TypeError: 'NoneType' object is not iterable

with all of the details included in the console. I can avoid this by first testing whether the particular query returns a value, e.g., for metadata that could be if info(purl):, and that simply returns 0 packages for that PURL.

@johnmhoran
Copy link
Member Author

@pombredanne Given our recent discussions as well as the feedback you just provided re the fetchcode cocoapods work (thank you!), I think the answer to my question is: let the exception be raised and displayed in the console -- do not use if info(purl): to avoid raising such an exception. Put another way, if the queried PURL type is not supported, let the existing underlying code raise an exception if that's how it's been written. Correct? ;-)

@pombredanne
Copy link
Member

@pombredanne Given our recent discussions as well as the feedback you just provided re the fetchcode cocoapods work (thank you!), I think the answer to my question is: let the exception be raised and displayed in the console -- do not use if info(purl): to avoid raising such an exception. Put another way, if the queried PURL type is not supported, let the existing underlying code raise an exception if that's how it's been written. Correct? ;-)

Not exactly:

  • fetchcode is a low level library.

  • purlcli is both a library with callable functions and a CLI.

  • Library functions should let the exception propagate, but also could catch and reraise with a better structured exception subclass and message. "TypeError: 'NoneType' object is not iterable" is not a good exception and message. Instead, if you know that a type is not supported, just raise Exception(f"Unsupported package type: {purl_type}")
    Or do this:

class UnsupportedPackageType(Exception):
    pass

# later in a call:
raise UnsupportedPackageType(purl_type
  • In the CLI, which is an applicationa nd not a library, we could catch these exceptions (possibly design a better set of Exception subclasses), and raise proper error messages and other constructs for proper display and feedback to the user in the CLI "UI"

@johnmhoran
Copy link
Member Author

Thanks @pombredanne . Adding a new exception is an interesting option. For testing, in purlcli.py, I've added

class UnsupportedPackageType(Exception):
    pass

and for the metadata command, inside collect_metadata(), I've revised to read:

    collected_metadata = []
    try:
        for release in list(info(purl)):
            if release is None:
                continue
            release_detail = release.to_dict()
            release_detail.move_to_end("purl", last=False)
            collected_metadata.append(release_detail)
    except:
        raise UnsupportedPackageType(purl)
    return collected_metadata

metadata calls the info() function in fetchcode/package.py, which returns None when the NoRouteAvailable exception is triggered -- I interpret this as meaning NoRouteAvailable == unsupported PURL type. NoRouteAvailable is also used in package_versions.py, which is called by our new versions command. Perhaps NoRouteAvailable could instead be updated to behave like the test UnsupportedPackageType exception, or we could add the new UnsupportedPackageType exception to fetchcode and use it directly in info().

For testing right now, when I run an unsupported PURL type with metadata, the new UnsupportedPackageType exception is raised and identifies the problematic PURL:

. . .

  File "/home/jmh/dev/nexb/purldb/purldb-toolkit/src/purldb_toolkit/purlcli.py", line 160, in collect_metadata
    raise UnsupportedPackageType(purl)
purldb_toolkit.purlcli.UnsupportedPackageType: pkg:nginx/[email protected]

I also added a simple test that seems to work:

    def test_collect_metadata_unsupported_type(self):
        purl = "pkg:nginx/[email protected]"
        with pytest.raises(purlcli.UnsupportedPackageType, match=purl):
            metadata_collection = purlcli.collect_metadata(purl)

@johnmhoran
Copy link
Member Author

@pombredanne I'm cleaning up purlcli.py, and then the 2 test files, and will commit and push when done.

I realize you've had no time to read and reply to my question yesterday about the implementation of your example class UnsupportedPackageType(Exception). I'd implemented one approach of that in the metadata command to handle PURL types that are not supported by an underlying command (e.g., metadata and versions, which will encounter a NoneType error for unsupported PURL types if not handled).

However, since this might not be what you have in mind, I am going to remove that for metadata (and the test I added as well), and also remove my own way of handling that for versions -- for now this will allow the NoneType error to bubble up for unsupported PURL types for the metadata and versions commands.

As discussed in this morning's huddle, purl2url.py has no such exception handling and thus for the urls command returns empty values for PURLs like pkg:/johnmhoran/hello rather than raising an exception. metadata calls info() in fetchcode's package.py, and versions calls versions() in fetchcode's package_versions.py. Each handles non-supported types indirectly in the same way, responding to our command queries with a NoneType exception when the PURL type is not supported.

def info(url):
    """
    Return data according to the `url` string
    `url` string can be purl too
    """
    if url:
        try:
            return router.process(url)
        except NoRouteAvailable:
            return

and

def versions(purl):
    """Return all version for a PURL."""
    if purl:
        try:
            return router.process(purl)
        except NoRouteAvailable:
            return

@pombredanne
Copy link
Member

@johnmhoran re: #436 (comment)

I realize you've had no time to read and reply to my question yesterday about the implementation of your example class UnsupportedPackageType(Exception). I'd implemented one approach of that in the metadata command to handle PURL types that are not supported by an underlying command (e.g., metadata and versions, which will encounter a NoneType error for unsupported PURL types if not handled).

This works, but a simpler and better option in the CLI is to implement a callback instead that will parse the purl and raise an exception IFF the type is not in a supported list of types. See https://github.com/nexB/purldb/blob/4a9ba34901ac494d063d786daf3a5e002abcf9a9/purldb-toolkit/src/purldb_toolkit/purlcli.py#L1045 for an example of such a callback.

However, since this might not be what you have in mind, I am going to remove that for metadata (and the test I added as well), and also remove my own way of handling that for versions -- for now this will allow the NoneType error to bubble up for unsupported PURL types for the metadata and versions commands.

As discussed in this morning's huddle, purl2url.py has no such exception handling and thus for the urls command returns empty values for PURLs like pkg:/johnmhoran/hello rather than raising an exception. metadata calls info() in fetchcode's package.py, and versions calls versions() in fetchcode's package_versions.py. Each handles non-supported types indirectly in the same way, responding to our command queries with a NoneType exception when the PURL type is not supported.

Using a callback enable an early validation and is a better option for the command line.

def info(url):
    """
    Return data according to the `url` string
    `url` string can be purl too
    """
    if url:
        try:
            return router.process(url)
        except NoRouteAvailable:
            return

and

def versions(purl):
    """Return all version for a PURL."""
    if purl:
        try:
            return router.process(purl)
        except NoRouteAvailable:
            return

So the behavior is to return None in these cases. This looks fine to me. You could test if you have a value if needed. But for now, keep things simple, let exceptions bubble up.

@johnmhoran
Copy link
Member Author

@pombredanne I did not follow all that you wrote but for now I am not making any more changes along these lines until we can discuss and I can ask questions real time. At this point my goal is to clean up the logging/messaging structure and related tests I added in purldb-toolkit and then fix the fecthcode cocoapods logging/messaging -- then we can discuss improvements going forward and I'll be eager to learn and implement.

@johnmhoran
Copy link
Member Author

@pombredanne I've finished removing the PURL CLI logging/messaging structure I'd added, merged the most recent main as well as your latest commit to this PR, resolved conflicts, and otherwise updated/fixed the tests. Please take a look when you have the chance.

Reference: #365

Signed-off-by: John M. Horan <[email protected]>
@johnmhoran
Copy link
Member Author

I've just committed and pushed my remaining updates for the metadata command in purlcli.py. Waiting for the GH checks to run.

@johnmhoran johnmhoran requested a review from pombredanne July 19, 2024 20:19
@johnmhoran
Copy link
Member Author

@pombredanne @keshav-space @TG1999 All GitHub checks have passed. This PR is ready for review at your convenience.

Please note that the purlcli code relies in part on updates I've recently pushed to PRs for fetchcode (aboutcode-org/fetchcode#119) and packageurl-python (package-url/packageurl-python#153), which also need review when time permits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants