Skip to content

Incorrect grouping of search results between "Extensions" and "Other Matches" #1088

Open
@lhsazevedo

Description

@lhsazevedo

Background

In php/phd#154, we resolved the issue of missing pages in the search index. However, now that these pages are visible in search results, a long-standing bug in result grouping has become apparent.

Issue

Some search results are incorrectly categorized between the "Extensions" and "Other Matches" groups.

Example:

image
Query: security

As shown:

  1. "Security (PHP Manual)" appears in the "Extensions" group, although it is not a PHP extension.
  2. "Security consideration" (from the win32service extension) is incorrectly placed in the "Other Matches" group.

Cause

The client-side search code groups results based on types, including Function, Variable, Class, Exception, Extension, and Other Matches (general). These types are assigned according to the XML element tags in the manual's source.

Issue 1: Incorrect grouping in "Extensions"

The first issue occurs in this section of the code:

web-php/js/search.js

Lines 130 to 134 in 27fbef1

case "set":
case "book":
case "reference":
type = "extension";
break;

The code assumes that any entry with the element tag <book>, <set>, or <reference> is related to extensions, which is inaccurate. Many entries, though using these elements, do not belong to extensions.

Example data:

id ldesc element
getting-started Getting Started book
install Installation and Configuration book
... ... ...
reserved.variables Predefined Variables reference
wrappers Supported Protocols and Wrappers reference
... ... ...
SELECT "docbook_id", "ldesc", "element"
FROM "ids" 
WHERE "element" IN ('book','set','reference')

Issue 2: Incorrect grouping in "Other Matches"

The second issue is due to an assumption in the following code:

web-php/js/search.js

Lines 136 to 141 in 27fbef1

case "section":
case "chapter":
case "appendix":
case "article":
default:
type = "general";

The code assumes that entries with the tags <section>, <chapter>, <appendix>, or <article> do not belong to an extension. While this is not as bad, there are many pages that are part of an extension but are currently placed in the "Other Matches" group:

id ldesc element
... ... ...
apcu.installation Installation section
apcu.configuration Runtime Configuration section
... ... ...
pdo.setup Installing/Configuring chapter
pdo.constants Predefined Constants appendix
pdo.connections Connections and Connection management chapter
... ... ...
SELECT "docbook_id", "ldesc", "element"
FROM "ids" 
WHERE "element" IN ('section','chapter','appendix','article')

PHP Manual index dump

For convenience, here is the dump from the PHD SQLite index for the PHP Manual: php-manual-index_2024-10-08.sql.gz

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions