Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In future trim down custom metrics from payload? #269

Open
tunetheweb opened this issue Jun 20, 2024 · 3 comments
Open

In future trim down custom metrics from payload? #269

tunetheweb opened this issue Jun 20, 2024 · 3 comments

Comments

@tunetheweb
Copy link
Member

tunetheweb commented Jun 20, 2024

As part of the effort to reduce the payload since by deduplicating data we remove the _custom field (amongst others) when saving this data to all tables:

https://github.com/HTTPArchive/wptagent/blob/e4546673d3b658022afb3885885e696290da53c5/HTTPArchive/httparchive.py#L425-L427

    # Remove the fields that are parsed out into separate columns
    page.pop("_parsed_css", None)
    page.pop("_custom", None)
    page.pop("_lighthouse", None)
    ...

However, that is only a list of the custom metrics:

"_custom": [
        "00_reset",
        "Colordepth",
        "Dpi",
...
        "usertiming",
        "valid-head",
        "well-known",
        "wpt_bodies"
    ],
    "_00_reset": null,
    "_Colordepth": 24,
    ...etc

The more weighty parts are the actual custom metrics beaneath this (_00_reset, _Colordepth...etc), some of which are quite large.

So we should enhance this to remove those too to save a lot of weight.

However, for now, them being in there is useful for the legacy tables (since there is no legacy custom metrics table) so leave for now. But filing this issue for when we move off of legacy so we don't forget.

@tunetheweb tunetheweb changed the title In future trim down custom metrics from payload In future trim down custom metrics from payload? Jun 20, 2024
@pmeenan
Copy link
Member

pmeenan commented Jun 20, 2024

Code has been added but commented out. Just need to remove the comment block when we're ready to remove them.

@pmeenan
Copy link
Member

pmeenan commented Jun 20, 2024

FWIW, it's probably a fairly substantial size. the rendered_html metric in particular (as well as the CSS ones) can be multiple megabytes.

@tunetheweb
Copy link
Member Author

Yeah exactly!

We could still inject them into the payload when populating the pages table for legacy. Or just live without them as part of forcing people over?

I’ve updated all the httparchvie queries not to look at custom metrics in the legacy pages table so all good to go from our end. The Web Almanac queries would need to be migrated but hoping the analysts this year will do a lot of those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants