bf: search with aggregate instead of find #1213

vrancurel · 2020-07-15T19:19:12Z

The search was using find().sort() and was disrupting user defined
search queries and custom indexes. The sort() is needed to implement
a stateless paging system. The combo of user defined query and sort is
now implemented with a 2 stage aggregate on server side.
We always limit the execution time maxTimeMs to 5mn (tunable by an
environment variable).

Note that it the number of concurrent search queries is for now not limited (and it should). We know that the aggregate will put an additional burden on the primary, so micro-sharding shall be implemented to divide and conquer. But this is out of scope of this PR so far.

bert-e · 2020-07-15T19:19:13Z

Hello vrancurel,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

bert-e · 2020-07-15T19:19:18Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

Peer approvals must include a mandatory approval from @jonathan-gramain.

rahulreddy

LGTM
Were you able to measure the impact (cpu, elapsed_ms, index usage etc) of using this aggregate?

alexanderchan-scality · 2020-07-15T21:04:34Z

lib/storage/metadata/mongoclient/readStream.js

+                { $match: searchOptions }, // user query
+                { $match: query }, // for paging


was wondering if the pipeline would benefit from first filtering by the indexed _id, then followed by the { $match: searchOptions}, which can possibly involve a full collection scan

What we want is first to do the user defined query: searchOptions, then save it on disk (potentially larger results set than 100MB), then sort this results set.

But I think we need to add a $out stage into a temporary collection and obtain a cursor on this temp collection instead.
Later on a sweeper shall delete temp collections.

vrancurel · 2020-07-16T03:07:11Z

I did not find a way to $out in a separate namespace. So we will need to modify the oplog to filter out __search collections.

vrancurel · 2020-07-16T18:05:47Z

We should also include the date of the query for making the cache hash.

alexanderchan-scality · 2020-07-16T06:44:08Z

lib/storage/metadata/mongoclient/MongoClientInterface.js

+                allowDiskUse: true, // stage large queries on disk
+            },
+            null);
+        _cursor.toArray(err => {


this can possibly lead to out of memory if the aggregate results becomes too large

No the result is empty because there is a $out.

But one problem of this approach is that it returns nothing until the query is done. This is weird from the API client perspective. Any suggestion ?

If it's streaming the output, may be use a combined approach of streaming 1000 documents back to the client and starting a background async job that writes to the temporary collection
https://mongodb.github.io/node-mongodb-native/3.6/reference/cursors/#stream-api
It sounds complex and dirty, ideally I would delegate the task to some other worker but I can't see how.

Yes we could to basically the user-query in an aggregate AND in a find() at the same time. But it will execute the query twice... just to avoid sessions, that's a bit too much...

Actually the $sort here is also killing the query plan, and the good news is ... no need to do the $sort because in the $out collection the keys will already be sorted...

vrancurel · 2020-07-18T02:57:00Z

lib/storage/metadata/mongoclient/MongoClientInterface.js

+            // fallthrough
+            // eslint-disable-next-line
+            params.searchOptions = searchOptions;
+            return this.internalListObject(


For the first page it reverts to regular search (as before).

Need to also limit it.

The search was using find().sort() and was disrupting user defined search queries and custom indexes. The sort() is needed to implement a stateless paging system. The combo of user defined query and sort is now implemented with a 2 stage aggregate on server side. We always limit the execution time maxTimeMs to 5mn (tunable by an environment variable). The result is staged in a temporary bucket and cached for paging. We rely on an external job to cleanup the searches (e.g. daily).

vrancurel force-pushed the bugfix/ZENKO-2633-search-with-aggregates branch from f6de7b0 to cecd214 Compare July 15, 2020 19:22

vrancurel added the Review Only - Do Not Merge label Jul 15, 2020

rahulreddy reviewed Jul 15, 2020

View reviewed changes

alexanderchan-scality reviewed Jul 15, 2020

View reviewed changes

vrancurel force-pushed the bugfix/ZENKO-2633-search-with-aggregates branch 2 times, most recently from 6401dc7 to c5ede0a Compare July 16, 2020 02:55

vrancurel force-pushed the bugfix/ZENKO-2633-search-with-aggregates branch from c5ede0a to 47c18d6 Compare July 16, 2020 15:59

alexanderchan-scality reviewed Jul 16, 2020

View reviewed changes

vrancurel force-pushed the bugfix/ZENKO-2633-search-with-aggregates branch from 47c18d6 to 5132cdb Compare July 18, 2020 02:31

vrancurel commented Jul 18, 2020

View reviewed changes

vrancurel force-pushed the bugfix/ZENKO-2633-search-with-aggregates branch from 5132cdb to 48ef858 Compare July 18, 2020 03:08

vrancurel force-pushed the bugfix/ZENKO-2633-search-with-aggregates branch from 48ef858 to d8e51a6 Compare July 20, 2020 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bf: search with aggregate instead of find #1213

bf: search with aggregate instead of find #1213

vrancurel commented Jul 15, 2020

bert-e commented Jul 15, 2020

bert-e commented Jul 15, 2020

rahulreddy left a comment

alexanderchan-scality Jul 15, 2020 •

edited

Loading

vrancurel Jul 15, 2020

vrancurel Jul 15, 2020

vrancurel commented Jul 16, 2020

vrancurel commented Jul 16, 2020

alexanderchan-scality Jul 16, 2020 •

edited

Loading

vrancurel Jul 16, 2020

vrancurel Jul 16, 2020

rahulreddy Jul 16, 2020 •

edited

Loading

vrancurel Jul 16, 2020

vrancurel Jul 18, 2020

vrancurel Jul 18, 2020

vrancurel Jul 18, 2020

		{ $match: searchOptions }, // user query
		{ $match: query }, // for paging

bf: search with aggregate instead of find #1213

Are you sure you want to change the base?

bf: search with aggregate instead of find #1213

Conversation

vrancurel commented Jul 15, 2020

bert-e commented Jul 15, 2020

Hello vrancurel,

bert-e commented Jul 15, 2020

Waiting for approval

rahulreddy left a comment

Choose a reason for hiding this comment

alexanderchan-scality Jul 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vrancurel commented Jul 16, 2020

vrancurel commented Jul 16, 2020

alexanderchan-scality Jul 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahulreddy Jul 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexanderchan-scality Jul 15, 2020 •

edited

Loading

alexanderchan-scality Jul 16, 2020 •

edited

Loading

rahulreddy Jul 16, 2020 •

edited

Loading