Skip to content

Improve SEO description extracted from documents. #705

@buuhuu

Description

@buuhuu

Is your feature request related to a problem? Please describe.

Currently, if not included in the metadata of a page, the description is taken from the first paragraph with more than 10 words and not a link:

/**
* Extracts the description from the document. note, that the selectAll('div > p') used in
* jsdom doesn't work as expected in hast
* @param {Root} hast
* @see https://github.com/syntax-tree/unist/discussions/66
*/
function extractDescription(hast) {
let desc = '';
visit(hast, (node, idx, parent) => {
if (parent?.tagName === 'div' && node.tagName === 'p') {
const words = toString(node).trim().split(/\s+/);
if (words.length >= 10 || words.some((w) => w.length > 25 && !w.startsWith('http'))) {
desc = `${words.slice(0, 25).join(' ')}${words.length > 25 ? ' ...' : ''}`;
return EXIT;
}
}
return CONTINUE;
});
return desc;
}

That condition is not precise enough, as it for example

  • includes domain relative links (starting with /)
  • or does not consider concatenated content of multiple paragraphs

Describe the solution you'd like

I would propose to consider multiple paragraphs if the 10 words criteria is not meet and include headings. For example as page starting with

## Pronađite raspoložive BMW automobile sa lagera.

Odaberite onaj koji najbolje odgovara vašim potrebama.

+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Stock Locator Model Overview                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| ## Pogledajte detalje                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| [/content/dam/metafox/rs/sr/disclaimer-pool/stocklocator/stocklocator-info-icon](/assets/rs/sr/disclaimer-pool/stocklocator/stocklocator-info-icon)   |
|                                                                                                                                                       |
| ### {count} od {count} vozila                                                                                                                         |
|                                                                                                                                                       |
| [/content/dam/metafox/rs/sr/disclaimer-pool/stocklocator/stocklocator-disclaimer](/assets/rs/sr/disclaimer-pool/stocklocator/stocklocator-disclaimer) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| NEMA PRONAĐENIH VOZILA                                                                                                                                |
|                                                                                                                                                       |
| Nažalost, nisu pronađena vozila koja odgovaraju Vašim kriterijumima. Molimo Vas da resetujete filtere i napravite drugačiji izbor.                    |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

Should have the description Pronađite raspoložive BMW automobile sa lagera. Odaberite onaj koji najbolje odgovara vašim potrebama.

In any case a SEO description should probably start with a alpha-numeric character.

Describe alternatives you've considered
Updating the descriptions, but that requires the content team and takes time.

Additional context
slack conversation ff

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions