-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search: customize behavior with hooks #4980
Comments
Thanks for suggesting. Could you please explain your desired outcome? "Customizing to fit different use cases" is too broad to be actionable. Also, customizing the search tokenizer for different languages can be implemented by using different separators for two different sites. In the discussion you linked, you mentioned:
This was reported and fixed in #4884 and released in 9.0.7. The functionality for search transformation was removed because the current approach was not general enough and did not scale well. I'm happy to add a new way extension or transformation hook that allows to intercept query (and maybe results) and alter them before returning them, but we need to collect some use cases before tackling that. |
@squidfunk sure: be able to modify the lunr.js pipeline to fit other use cases not covered by the default search pipeline - like this one: Two stage tokenization to add full strings to the index.
@squidfunk I don't think you can add tokens found that are separated by spaces and tokens found that follow some separator regex (two-stage tokenization) using the current
@squidfunk I'm confused by this statement. This is the "old way" of doing it. The "new way" would be to take advantage of the features that lunr.js provides (pipelines)...right? Not sure of the effort involved to enable us to customize our own pipelines, but that is what this issue is asking for. |
I may have formulated a bit badly – yes, it's the old way, but I was talking about rethinking that process, as it only allowed for query transformation and nothing else. Transformation is now done as part of the worker, so maybe we could provide hooks to hook into different parts of the search index, possibly exposing one hook to alter Lunr.js before starting to index documents. This would effectively allow to implement own pipeline functions with which you should achieve what you we're aiming for when we talked about the two-stage tokenization approach. |
To expand on that: we moved transformation into the worker, so the worker is completely self-contained, i.e., defines all behavior. This makes integration of third-party search solutions simpler, as the application itself will apply no processing to the query before sending it to the worker. Before, query transformation was done in the application, then sent to the worker. All of this made the current approach unfeasible, since it involves defining a function in the global scope that is called by the application if defined. We need a new approach for search transformation / extension, but before I started working on that I wanted to verify that this is still something that is needed 😊 We'll add it back shortly. If you have other ideas that we should consider and requirements we need to fulfill, please share them here. So far we collected:
|
@squidfunk so according to the lunr.js docs I think pipelines do this as well, no? From the lunr.js docs:
So I think only point 2 and 3 would need to be implemented:
And point 3 you might just be able to link to the lunrjs docs like you do for the |
I'm not sure if pipelines allow to change the entirety of the syntax, that is Lunr.js field references and operators for boosting, as well as inclusion and exclusion. I think pipelines will only allow to remove, replace, expand or add tokens. Thus, I believe that in the following query, only the terms in brackets are moved through the pipeline:
This would not allow to split/replace meta characters or introduce additional prefix or suffix wildcards. However, more research is needed. If you wish to dig into this, it'll be awesome to get some intel. Otherwise, I'll do that later. |
Please see the announcement in #6307. |
I've reopened #6632 which specifically requests to make |
Context
Since v9.0.0 customizing the search tokenization has been limited to modifying the
separator
keyword in the config (removed support for custom search transform functions).Also since v9.0.0 we are able to specify a choice of 3 different lunr.js pipeline functions (
stemmer
,stopWordFilter
, andtrimmer
) to be able to modify the search pipeline.Description
Instead of just limiting us to only those 3 options (
stemmer
,stopWordFilter
, andtrimmer
), we should be able to specify our own pipeline function (ideally just like how we do so with emojis).So something like this is what I had in mind:
Not sure if that is allowed or if we would be required to write it in ts or js, but you get the idea.
Related links
Use Cases
Visuals
No response
Before submitting
The text was updated successfully, but these errors were encountered: