[Feature]: Add captureTDH for Automated TDH Structure Validation to Enhance SEO #34325

koji-koji · 2025-01-15T00:13:07Z

🚀 Feature Request

Feature Name: `captureTDH` Function

The proposed captureTDH function captures the structure of Title, Description, and Heading (hereafter referred to as TDH) essential for SEO. It enables automatic verification through a CI pipeline to ensure that these structures are not inadvertently disrupted. This feature effectively detects and prevents issues related to the hierarchical structure of Heading tags, which are common when using component-based frameworks (e.g., React).

Role of the captureTDH Function
- Extracts the Title, Description, and Heading tags from a page and retrieves their structure as an object.
- Compares the obtained TDH structure with a predefined baseline file (e.g., target-page-name-TDH.ts). If there is a mismatch, the test fails.
toMatchTDH Matcher
- Similar to a Jest custom matcher, it compares the captured TDH structure with the structure defined in the baseline file.
- The comparison results are reflected in the automated test outcomes within the CI pipeline.

Example

Test File Example

test('TDH structure is correct', async ({ page }) => {
  await page.goto('https://example.com');
  const tdhStructure = await page.captureTDH({ pageName: 'target-page-name-TDH' });
  expect(tdhStructure).toMatchTDH('target-page-name-TDH.ts');
});

Verification File Example

export const tdh = {
  title: 'page title',
  description: 'page description',
  heading: [{
    headingLevel: 1,
    headingText: 'heading 1 text',
    children: [
      {
        headingLevel: 2,
        headingText: 'heading 2 text',
        children: []
      }
    ]
  }]
}

Motivation

Importance of SEO
- Title and Description are fundamental elements SEO. Properly setting them directly impacts search rankings and click-through rates.
- The structure of Heading tags is also crucial for SEO. Maintaining the correct hierarchical structure makes it easier for search engines to accurately understand the content.
Challenges with Component-Based Frameworks
- In component-based frameworks like React, even if individual components function correctly, the overall hierarchical structure of Heading tags on a page can unintentionally become disrupted.
- Manually reviewing this issue is time-consuming and labor-intensive, creating a need for an automated validation method.
Leveraging Playwright’s Strengths for Automation
- Playwright excels in E2E testing, allowing efficient page-level testing.
- Integrating the captureTDH feature into Playwright enables continuous monitoring and maintenance of SEO and accessibility quality.

The text was updated successfully, but these errors were encountered:

yury-s · 2025-01-15T01:07:59Z

Would toMatchAriaSnapshot work for your scenario?

koji-koji · 2025-01-16T00:52:34Z

@yury-s
Thank you for your comment!

In my scenario, toMatchAriaSnapshot also works!
I find it to be a very user-friendly tool for checking accessibility.
You can also perform checks on the levels of Heading tags, which is great.

However, I have not yet fully verified whether it can check detailed nesting.
I would like to better understand this by actually running it.
For SEO optimization, I aim to test the nesting structure of H1 - H6 tags.

There is another reason for proposing captureTDH.
I believe that by implementing captureTDH, you can retrieve and test the details of a page's H1 - H6 tags without needing to know them in advance.
For example, if you want to test 20 pages, writing tests for each page's H1 - H6 structure could be time-consuming.
On the other hand, I believe that with captureTDH, you can simply specify the page name and run the tests, making it much easier for many people to use.

yury-s · 2025-01-16T00:59:49Z

However, I have not yet fully verified whether it can check detailed nesting.

Feel free to file separate bug/feature request if something is not working in toMatchAriaSnapshot.

I believe that by implementing captureTDH, you can retrieve and test the details of a page's H1 - H6 tags without needing to know them in advance.
For example, if you want to test 20 pages, writing tests for each page's H1 - H6 structure could be time-consuming.
On the other hand, I believe that with captureTDH, you can simply specify the page name and run the tests, making it much easier for many people to use.

What would the test verify if you don't know expected text values ahead of time? If you just want to match the structure of the document ignoring actual text, you can write toMatchAriaSnapshot expectation with .* regex patterns.

koji-koji · 2025-01-20T01:16:08Z

@yury-s
Thank you for recommending toMatchAriaSnapshot . I've tried it out in various ways!
However, I ran into a few issues when attempting to capture the detailed structure of heading tags. Let me share what I found.

Test Results

I tested it using the following markup:

<main>
  <h1>h1 text</h1>
  <section>
    <h2>h2 text</h2>
    <h3>h3 text</h3>
  </section>
  {/* <section> */}
  <h2>h2 text</h2>
  <div>
    <h3>h3 text</h3>
    <h4>h4 text</h4>
  </div>
  {/* </section> */}
  <h2>h2 text</h2>
</main>

And the output from toMatchAriaSnapshot was as follows:

- main:
  - heading "h1 text" [level=1]
  - heading "h2 text" [level=2]
  - heading "h3 text" [level=3]
  - heading "h2 text" [level=2]
  - heading "h3 text" [level=3]
  - heading "h4 text" [level=4]
  - heading "h2 text" [level=2]

From an ARIA perspective, it correctly recognizes the heading levels.
However, it does not reflect the nested DOM structure (parent-child relationships) of elements like section or div, so we found that we can’t actually test the true hierarchical structure of the heading tags.

Why Is the Detailed Hierarchy of Heading Tags Necessary?

Even if it’s not an issue from an accessibility standpoint, properly structured heading tags are often considered important for SEO because they help search engines understand the content hierarchy of the page.

Revisiting the Need for a “captureTDH (tdhSnapshot)” Feature

The captureTDH feature I initially proposed in the Issue (which might be better named tdhSnapshot)
would enable capturing and comparing the entire structure of Title, Description, and Heading tags.
If we could track the accurate DOM-level hierarchy (something ARIA doesn’t handle),
we could incorporate tests to ensure the SEO-critical heading structure remains intact.

If you’re open to considering such a feature, I’d be happy to prepare a draft PR or a proof of concept.

I’d greatly appreciate your thoughts or any concerns you might have. Thank you!

What would the test verify if you don't know expected text values ahead of time? If you just want to match the structure of the document ignoring actual text, you can write toMatchAriaSnapshot expectation with .* regex patterns.

Here is the way I envision it, which I assume is the same way as toMatchSnapshot and toMatchAriaSnapshot.

First Run (Update Snapshot)
- We capture the Heading structure (and optionally text) from the page.
- We store this structure in a snapshot file, which can then be reviewed (and committed to source control).
Subsequent Runs
- The test captures the current page’s Heading structure again.
- It compares the current structure to the previously saved snapshot.
- If there’s any difference — either in the hierarchy or the text — the test fails, indicating a potential unintentional change.

pavelfeldman · 2025-01-21T20:11:33Z

I'd recommend using a user-land solution and publishing it as a library. This goes outside of the scope for Playwright.

koji-koji · 2025-01-24T02:39:48Z

Thank you for your comments!

I understand that this feature can't be incorporated into Playwright. It's a bit disappointing, but I appreciate both of you taking the time to address this issue despite being busy.

In the future, I would like to create and publish my own library.

pavelfeldman closed this as completed Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add captureTDH for Automated TDH Structure Validation to Enhance SEO #34325

[Feature]: Add captureTDH for Automated TDH Structure Validation to Enhance SEO #34325

koji-koji commented Jan 15, 2025

yury-s commented Jan 15, 2025

koji-koji commented Jan 16, 2025

yury-s commented Jan 16, 2025

koji-koji commented Jan 20, 2025

pavelfeldman commented Jan 21, 2025

koji-koji commented Jan 24, 2025

[Feature]: Add captureTDH for Automated TDH Structure Validation to Enhance SEO #34325

[Feature]: Add captureTDH for Automated TDH Structure Validation to Enhance SEO #34325

Comments

koji-koji commented Jan 15, 2025

🚀 Feature Request

Feature Name: captureTDH Function

Example

Motivation

yury-s commented Jan 15, 2025

koji-koji commented Jan 16, 2025

yury-s commented Jan 16, 2025

koji-koji commented Jan 20, 2025

Test Results

Why Is the Detailed Hierarchy of Heading Tags Necessary?

Revisiting the Need for a “captureTDH (tdhSnapshot)” Feature

pavelfeldman commented Jan 21, 2025

koji-koji commented Jan 24, 2025

Feature Name: `captureTDH` Function