Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess secondary page quality #127

Open
rviscomi opened this issue Aug 16, 2022 · 1 comment
Open

Assess secondary page quality #127

rviscomi opened this issue Aug 16, 2022 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@rviscomi
Copy link
Member

Now that we have a few months of data for secondary pages, let's analyze it to see if we should make any changes to the selection algorithm. We can pick a few popular home pages and see if the secondary pages are what we'd expect.

The custom metric that generates the candidate list of secondary pages is crawl_links.js.

Other questions to look into:

  • How often does a site not have any secondary pages? What's causing that?
  • How much does the inclusion of secondary pages change basic web health stats?
    • technology adoption
    • page weight trends
    • CWV performance/availability
    • Lighthouse scores
  • Is the algorithm suitable for both desktop and mobile? Do we need a mobile-specific one?
@rviscomi rviscomi added the question Further information is requested label Aug 16, 2022
@rviscomi rviscomi added this to the M2: Utilizing capacity milestone Aug 16, 2022
@tunetheweb
Copy link
Member

tunetheweb commented Aug 16, 2022

OK so I had a look at a few pages using this query and changing the core URL:

SELECT
  client,
  page,
  wptid
FROM
  `httparchive.all.pages`
WHERE
  page like 'https://www.amazon.com/%' AND
  date = '2022-07-01' AND
  rank = 1000
ORDER BY
  client,
  page
URL Client Test Result Commentary
Amazon.com Mobile Home Secondary Amazon Prime Video is largest content and correctly loads that video.
Amazon.com Desktop Home Secondary Amazon Alexa category is largest content and correctly loads that category page.
nytimes.com Mobile Home Secondary Article is largest content and this is correctly loaded as secondary page, but paywalled. Though checking out the waterfall it looks like full article loaded so hopefully still useful.
nytimes.com Desktop Home Secondary Article is largest content and this is correctly loaded as secondary page, but paywalled. Though checking out the waterfall it looks like full article loaded so hopefully still useful.
bbc.com Mobile Home Secondary News Article is LCP element. This was correctly loaded as secondary page - though again hidden behind a signup banner in the preview.
bbc.com Desktop Home Secondary News Article is LCP element. This was correctly loaded as secondary page - though again hidden behind a signup banner in the preview.

So from these spot checks, it seems to be working as intended.

However, the variability could make things interesting. We kinda guessed Amazon would load a product page, but cause it's a jack of all trades, it sometimes loads an Amazon Prime video, sometimes a category, and whatever else they are advertising in the banner. And interesting that Amazon, unlike the others, has different LCP link on mobile and desktop. Though that could be just due to the crawl time differences (though only 4 hours apart and on same day in this month).

Similarly the prevalence of pay walls and content blockers could be a lot higher on secondary pages as shown here. While resources all seem to load in the background, it could impact LCP elements and CLS and the like.

Still, from these brief checks it seems to be doing what we wanted it to do. So next thing would be to look at the whole dataset for interesting insights as to how they differ from home pages!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants