This repository was archived by the owner on Nov 7, 2025. It is now read-only.
🚧 [WIP] Fixing "surrounding documents" view with new _id implementation #1446
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The challenge here is to fix the "surrounding documents" view in Kibana given the new unique ID (
_idfield) implementation (ref: #1435). See Related screenshots section.At this moment Kibana just says "No documents newer/older than the anchor could be found"
Present situation
Our current implementation of the
_idfield is based on rendering this value dynamically, after fetching all the data from ClickHouse. It looks like this:While at the query parsing/execution phase we can of course access the timestamp field, the "hash of the document" part is computed during JSON response rendering.
The current implementation stores a list of IDs in
ClickhouseQueryTranslator(UniqueIDs) - therefore we know that this query was using_idfield and we have to apply extra logic on JSON response rendering. The situation is quite clear during simple filtering query:When parsing the SQL we simply take the first (timestamp) part of the query, make relevant WHERE clause which will filter out all the non-matching timestamps and then compare the doc hashes during JSON response rendering (see
platform/parsers/elastic_query_dsl/query_translator.go). Of course we have to make sure that we don't fallback to defaultLIMIT 10for our SQL clause because we might not have enough documents to filter from (see(cw *ClickhouseQueryTranslator) parseSize).Problem
When fetching "surrounding documents", Kibana sends following query:
And the
must_notquery becomes quite problematic. It's pretty obvious that when fetching next/previos N documents, we don't want to include that anchor document.So at the SQL query level we cannot filter out all the documents with matching timestamp, because our next document might have exactly the same timestamp. One approach to do so is add a schema transformer to do so.
On response rendering, we cannot rely on the current logic which just leaves only matching ids, because we want the opposite affect. However, this is post-query phase and at that level we're unaware of the query - we just have hits as the result, but didn't know whether
idshas been withinmust_not(or any other logical clause).There are few gotchas here:
sizepassed in query lands in the SQLsLIMITclause - we have to ignore it_idin any aggregation might produce completely absurd resultsPossible solution
TODO: check this PR
Related screenshots
Details