-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Is your feature request related to a problem? Please describe.
Currently, if not included in the metadata of a page, the description is taken from the first paragraph with more than 10 words and not a link:
helix-html-pipeline/src/steps/extract-metadata.js
Lines 129 to 148 in 3d3e5dc
/** | |
* Extracts the description from the document. note, that the selectAll('div > p') used in | |
* jsdom doesn't work as expected in hast | |
* @param {Root} hast | |
* @see https://github.com/syntax-tree/unist/discussions/66 | |
*/ | |
function extractDescription(hast) { | |
let desc = ''; | |
visit(hast, (node, idx, parent) => { | |
if (parent?.tagName === 'div' && node.tagName === 'p') { | |
const words = toString(node).trim().split(/\s+/); | |
if (words.length >= 10 || words.some((w) => w.length > 25 && !w.startsWith('http'))) { | |
desc = `${words.slice(0, 25).join(' ')}${words.length > 25 ? ' ...' : ''}`; | |
return EXIT; | |
} | |
} | |
return CONTINUE; | |
}); | |
return desc; | |
} |
That condition is not precise enough, as it for example
- includes domain relative links (starting with /)
- or does not consider concatenated content of multiple paragraphs
Describe the solution you'd like
I would propose to consider multiple paragraphs if the 10 words criteria is not meet and include headings. For example as page starting with
## Pronađite raspoložive BMW automobile sa lagera.
Odaberite onaj koji najbolje odgovara vašim potrebama.
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Stock Locator Model Overview |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| ## Pogledajte detalje |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| [/content/dam/metafox/rs/sr/disclaimer-pool/stocklocator/stocklocator-info-icon](/assets/rs/sr/disclaimer-pool/stocklocator/stocklocator-info-icon) |
| |
| ### {count} od {count} vozila |
| |
| [/content/dam/metafox/rs/sr/disclaimer-pool/stocklocator/stocklocator-disclaimer](/assets/rs/sr/disclaimer-pool/stocklocator/stocklocator-disclaimer) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| NEMA PRONAĐENIH VOZILA |
| |
| Nažalost, nisu pronađena vozila koja odgovaraju Vašim kriterijumima. Molimo Vas da resetujete filtere i napravite drugačiji izbor. |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
Should have the description Pronađite raspoložive BMW automobile sa lagera. Odaberite onaj koji najbolje odgovara vašim potrebama.
In any case a SEO description should probably start with a alpha-numeric character.
Describe alternatives you've considered
Updating the descriptions, but that requires the content team and takes time.
Additional context
slack conversation ff