Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add captureTDH for Automated TDH Structure Validation to Enhance SEO #34325

Closed
koji-koji opened this issue Jan 15, 2025 · 6 comments
Closed

Comments

@koji-koji
Copy link

🚀 Feature Request

Feature Name: captureTDH Function

The proposed captureTDH function captures the structure of Title, Description, and Heading (hereafter referred to as TDH) essential for SEO. It enables automatic verification through a CI pipeline to ensure that these structures are not inadvertently disrupted. This feature effectively detects and prevents issues related to the hierarchical structure of Heading tags, which are common when using component-based frameworks (e.g., React).

  • Role of the captureTDH Function
    • Extracts the Title, Description, and Heading tags from a page and retrieves their structure as an object.
    • Compares the obtained TDH structure with a predefined baseline file (e.g., target-page-name-TDH.ts). If there is a mismatch, the test fails.
  • toMatchTDH Matcher
    • Similar to a Jest custom matcher, it compares the captured TDH structure with the structure defined in the baseline file.
    • The comparison results are reflected in the automated test outcomes within the CI pipeline.

Example

Test File Example

test('TDH structure is correct', async ({ page }) => {
  await page.goto('https://example.com');
  const tdhStructure = await page.captureTDH({ pageName: 'target-page-name-TDH' });
  expect(tdhStructure).toMatchTDH('target-page-name-TDH.ts');
});

Verification File Example

export const tdh = {
  title: 'page title',
  description: 'page description',
  heading: [{
    headingLevel: 1,
    headingText: 'heading 1 text',
    children: [
      {
        headingLevel: 2,
        headingText: 'heading 2 text',
        children: []
      }
    ]
  }]
}

Motivation

  • Importance of SEO
    • Title and Description are fundamental elements SEO. Properly setting them directly impacts search rankings and click-through rates.
    • The structure of Heading tags is also crucial for SEO. Maintaining the correct hierarchical structure makes it easier for search engines to accurately understand the content.
  • Challenges with Component-Based Frameworks
    • In component-based frameworks like React, even if individual components function correctly, the overall hierarchical structure of Heading tags on a page can unintentionally become disrupted.
    • Manually reviewing this issue is time-consuming and labor-intensive, creating a need for an automated validation method.
  • Leveraging Playwright’s Strengths for Automation
    • Playwright excels in E2E testing, allowing efficient page-level testing.
    • Integrating the captureTDH feature into Playwright enables continuous monitoring and maintenance of SEO and accessibility quality.
@yury-s
Copy link
Member

yury-s commented Jan 15, 2025

Would toMatchAriaSnapshot work for your scenario?

@koji-koji
Copy link
Author

@yury-s
Thank you for your comment!

In my scenario, toMatchAriaSnapshot also works!
I find it to be a very user-friendly tool for checking accessibility.
You can also perform checks on the levels of Heading tags, which is great.

However, I have not yet fully verified whether it can check detailed nesting.
I would like to better understand this by actually running it.
For SEO optimization, I aim to test the nesting structure of H1 - H6 tags.

There is another reason for proposing captureTDH.
I believe that by implementing captureTDH, you can retrieve and test the details of a page's H1 - H6 tags without needing to know them in advance.
For example, if you want to test 20 pages, writing tests for each page's H1 - H6 structure could be time-consuming.
On the other hand, I believe that with captureTDH, you can simply specify the page name and run the tests, making it much easier for many people to use.

@yury-s
Copy link
Member

yury-s commented Jan 16, 2025

However, I have not yet fully verified whether it can check detailed nesting.

Feel free to file separate bug/feature request if something is not working in toMatchAriaSnapshot.

I believe that by implementing captureTDH, you can retrieve and test the details of a page's H1 - H6 tags without needing to know them in advance.
For example, if you want to test 20 pages, writing tests for each page's H1 - H6 structure could be time-consuming.
On the other hand, I believe that with captureTDH, you can simply specify the page name and run the tests, making it much easier for many people to use.

What would the test verify if you don't know expected text values ahead of time? If you just want to match the structure of the document ignoring actual text, you can write toMatchAriaSnapshot expectation with .* regex patterns.

@koji-koji
Copy link
Author

@yury-s
Thank you for recommending toMatchAriaSnapshot . I've tried it out in various ways!
However, I ran into a few issues when attempting to capture the detailed structure of heading tags. Let me share what I found.

Test Results

I tested it using the following markup:

<main>
  <h1>h1 text</h1>
  <section>
    <h2>h2 text</h2>
    <h3>h3 text</h3>
  </section>
  {/* <section> */}
  <h2>h2 text</h2>
  <div>
    <h3>h3 text</h3>
    <h4>h4 text</h4>
  </div>
  {/* </section> */}
  <h2>h2 text</h2>
</main>

And the output from toMatchAriaSnapshot was as follows:

- main:
  - heading "h1 text" [level=1]
  - heading "h2 text" [level=2]
  - heading "h3 text" [level=3]
  - heading "h2 text" [level=2]
  - heading "h3 text" [level=3]
  - heading "h4 text" [level=4]
  - heading "h2 text" [level=2]

From an ARIA perspective, it correctly recognizes the heading levels.
However, it does not reflect the nested DOM structure (parent-child relationships) of elements like section or div, so we found that we can’t actually test the true hierarchical structure of the heading tags.

Why Is the Detailed Hierarchy of Heading Tags Necessary?

Even if it’s not an issue from an accessibility standpoint, properly structured heading tags are often considered important for SEO because they help search engines understand the content hierarchy of the page.

Revisiting the Need for a “captureTDH (tdhSnapshot)” Feature

The captureTDH feature I initially proposed in the Issue (which might be better named tdhSnapshot)
would enable capturing and comparing the entire structure of Title, Description, and Heading tags.
If we could track the accurate DOM-level hierarchy (something ARIA doesn’t handle),
we could incorporate tests to ensure the SEO-critical heading structure remains intact.


If you’re open to considering such a feature, I’d be happy to prepare a draft PR or a proof of concept.

I’d greatly appreciate your thoughts or any concerns you might have. Thank you!


What would the test verify if you don't know expected text values ahead of time? If you just want to match the structure of the document ignoring actual text, you can write toMatchAriaSnapshot expectation with .* regex patterns.

Here is the way I envision it, which I assume is the same way as toMatchSnapshot and toMatchAriaSnapshot.

  1. First Run (Update Snapshot)
    • We capture the Heading structure (and optionally text) from the page.
    • We store this structure in a snapshot file, which can then be reviewed (and committed to source control).
  2. Subsequent Runs
    • The test captures the current page’s Heading structure again.
    • It compares the current structure to the previously saved snapshot.
    • If there’s any difference — either in the hierarchy or the text — the test fails, indicating a potential unintentional change.

@pavelfeldman
Copy link
Member

I'd recommend using a user-land solution and publishing it as a library. This goes outside of the scope for Playwright.

@koji-koji
Copy link
Author

Thank you for your comments!

I understand that this feature can't be incorporated into Playwright. It's a bit disappointing, but I appreciate both of you taking the time to address this issue despite being busy.

In the future, I would like to create and publish my own library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants