Description
Background
In php/phd#154, we resolved the issue of missing pages in the search index. However, now that these pages are visible in search results, a long-standing bug in result grouping has become apparent.
Issue
Some search results are incorrectly categorized between the "Extensions" and "Other Matches" groups.
Example:
As shown:
- "Security (PHP Manual)" appears in the "Extensions" group, although it is not a PHP extension.
- "Security consideration" (from the
win32service
extension) is incorrectly placed in the "Other Matches" group.
Cause
The client-side search code groups results based on types, including Function, Variable, Class, Exception, Extension, and Other Matches (general). These types are assigned according to the XML element tags in the manual's source.
Issue 1: Incorrect grouping in "Extensions"
The first issue occurs in this section of the code:
Lines 130 to 134 in 27fbef1
The code assumes that any entry with the element tag <book>
, <set>
, or <reference>
is related to extensions, which is inaccurate. Many entries, though using these elements, do not belong to extensions.
Example data:
id | ldesc | element |
---|---|---|
getting-started | Getting Started | book |
install | Installation and Configuration | book |
... | ... | ... |
reserved.variables | Predefined Variables | reference |
wrappers | Supported Protocols and Wrappers | reference |
... | ... | ... |
SELECT "docbook_id", "ldesc", "element"
FROM "ids"
WHERE "element" IN ('book','set','reference')
Issue 2: Incorrect grouping in "Other Matches"
The second issue is due to an assumption in the following code:
Lines 136 to 141 in 27fbef1
The code assumes that entries with the tags <section>
, <chapter>
, <appendix>
, or <article>
do not belong to an extension. While this is not as bad, there are many pages that are part of an extension but are currently placed in the "Other Matches" group:
id | ldesc | element |
---|---|---|
... | ... | ... |
apcu.installation | Installation | section |
apcu.configuration | Runtime Configuration | section |
... | ... | ... |
pdo.setup | Installing/Configuring | chapter |
pdo.constants | Predefined Constants | appendix |
pdo.connections | Connections and Connection management | chapter |
... | ... | ... |
SELECT "docbook_id", "ldesc", "element"
FROM "ids"
WHERE "element" IN ('section','chapter','appendix','article')
PHP Manual index dump
For convenience, here is the dump from the PHD SQLite index for the PHP Manual: php-manual-index_2024-10-08.sql.gz
Notes
- This will continue to be relevant even after Update navbar design and improve search UI #1084 is merged, as it uses the same logic for displaying the result type.
- The screenshot has the upcoming fix for Duplicated titles and descriptions in search index for chunks without parent book phd#159 applied.