-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make freetext search view indexes "offline only" #9544
Comments
Here are my conclusions after taking a deep dive into possible implementations. The main approach options seem to be:
Alternative rejected:I also took some time to look into any Pouch-specific alternative approaches to freetext searching (since we will no longer be tied to solutions that also work in Couch). Unfortunately, there is absolutely nothing viable here. My original hope was that Mango Beyond Mango, any discussions of freetext searching in Pouch seem to mention pouchdb-quick-search which has not been updated in |
Also here is my conclusion and next steps based on the above analysis:
So, I am currently doing further investigation and prototyping for |
I also agree that Right now, when the app gets an update, there are two things that need to be updated: 1. the medic-client ddoc and 2. the actual app code. We do tell people that they should reload their app to use the new code, but it's entirely possible some app somewhere runs and old webapp code with the new ddoc. I remember we even tracked this through telemetry on some instance that was reporting errors at some point. There's an issue: #7146 |
As far as I know pouchdb doesn't do background indexing, and ignores the Another option would be to store the client only ddoc as an attachment on the I think I'd go for |
I have done some more research here both into the Nouveau api and into the existing functionality of
My current proposal is that we implement the freetext searching for contacts/reports in cht-datasource. Then, we update
I do not think we should implement multi-view searches in cht-datasource (unless we can come up with a feasible way to do paging). Instead we can start by implementing just the single-view freetext searches in cht-datasource for now. Later we can add support for the other single-view searches. (Ultimately, I think it would be best to refactor any code that does not need multi-view searches to just call cht-datasource directly, instead of calling through @m5r @sugat009 I would love your feedback on this proposed cht-datasource API. Thanks!: // ContactTypeQualifier already exists
type FreetextQualifier = Readonly<{ freetext: string }>;
// New REST api: /api/v1/contact/id
namespace Contact {
const getIdsPage = (context: DataContext) => (qualifier: ContactTypeQualifier | FreetextQualifier, cursor: Nullable<string>, limit: number) => Promise<Page<string>>
const getIdsAll = (context: DataContext) => (qualifier: ContactTypeQualifier | FreetextQualifier) => AsyncGenerator<string, null>
}
// New REST api: /api/v1/report/id
namespace Report {
const getIdsPage = (context: DataContext) => (qualifier: FreetextQualifier, cursor: Nullable<string>, limit: number) => Promise<Page<string>>
const getIdsAll = (context: DataContext) => (qualifier: FreetextQualifier) => AsyncGenerator<string, null>
} |
One additional design note with regards to the cht-datasource implementation. We are still going to need a check in the local adapter code to see if the new client-side index exists. Basically here is the logic breakdown:
|
This step raised a red flag for me because it feels unnecessarily complex. I'm guessing this is forced from the pattern of using cht-datasource on both the server and client and trying to reuse code which is valiant. In other code the views you call on the client and the server are the same so it makes sense to share the code. But I think this is a special case, because here there is no shared code - the call to the Nouveau server will only ever be made on the server. Therefore I think the implementation of the REST API should be custom server code and not part of cht-datasource. I think this will make it easier to follow what's going on in the datasource code. Also a question about the index itself - I think we should be aiming for a single Nouveau index for all contacts and reports with one of the indexed fields being "type". My gut feeling is this will be more efficient (disk, memory, build speed) but that could be completely wrong. One way it would almost certainly be worse is if we change the way we index reports it will have to reindex all contacts as well even though nothing has changed, however it's quite likely that lucene reindexes so much more quickly that we shouldn't worry too much about that. If we go with the "one index" approach then it would make sense for the REST API to be more generic so you could theoretically search for either reports or contacts, eg: |
@jkuester On the API design. Let's be more explicit with the endpoint naming. If only one endpoint as Gareth suggested is opted then, |
I 100% agree that this is a bit smelly since the Nouveau code is only used on the server side (and the offline-only views will only be used on the client side). The problem, though, is with regards to the multi-view searches. The api I just have not been able to come up with any approach yet that would be cleaner then encapsulating the Nouveau logic (and the view check) in cht-datasource and letting everything else just flow through.... The ideal scenario, IMHO, would be to make a breaking change to API (CHT 5.0?) where consumers are no longer allowed to do multi-view searches via the IDK... maybe it makes the most sense to have it in cht-datasource for now, but remove it later if we update the
Yeah, I think we have a lot of unanswered questions currently about how the Nouveau index(es) should be configured. That is all more of an investigation for #9542, though it could definitely influence how we structure the REST/cht-datasource apis. I do want to push back a bit on the idea of having a Based on our current search implementation in |
Ahh I see the problem. Thanks for walking me through it! There is potentially another solution if we're willing to double down on Nouveau to replace the entire server side search with the lucene search by indexing all the fields we can currently filter on. This makes the
Your naming is better so let's stick with the convention of resources not actions. I agree with your pragmatic approach of providing the specific APIs first, and adding the generic one later if and when needed. From my understanding lucene works best when dealing with one big heap of docs and loads of keys, rather than separate index silos. This also gives us a lot of flexibility to write future server side queries without having to reindex. So even if we don't expose it via an endpoint just yet I'd like to build a little future proofing into the index. Caveat: these assertions need to be tested for our use case! |
FTR, I have split the cht-datasource changes off into their own sub-issue: #9586 This sub-issue will just be focused on adding offline views to webapp (and a bit of logic to actually use those views in |
As originally discussed on Slack, there is a complication with removing the
The relevant one for this case is The simple conclusion here is that we have to keep the
IDK, I still cannot come up with any way to avoid the |
@jkuester I can't think of a good reason to run validation during muting on the client side. Firstly these validations are rare for xforms because it's better handled through xform validations. Secondly as you say it's flawed because if validation passes on client side that's no guarantee it'll pass on the server with all docs available. The validation like something that was added by accident when we added muting to the client and was no intentional. So in general I would seriously look at removing it in order to improve client side performance by reducing the size of the views. However the original driver for this effort is server side TCO so this is out of scope. Assuming it's straightforward to keep the existing views in the current state let's just do that and raise an issue to optimise client side views at a later date. |
@garethbowen Agreed! The only extra consideration is that I have not found any way to move the current freetext views from one design doc to another (e.g. to the new offline doc that only exists on the clients) that does not require a full rebuilding of the views. This means that we are going to trigger a full rebuild of these indexes once when switching to the offline index and then again when we go do to this It is a bit tricky to profile the exact performance implications this will have since it is highly dependent on the device specs and the number of docs associated with the user. To just give us some reference numbers, I created a test user with These numbers are not the end of the world, but they will be incurred by every device for all offline users. But, this technically is the same price that is always paid when logging in as an existing user on a new device, so maybe I should not worry so much... 🤷 |
After further investigation, this might not be true! I did a deep dive into the Pouch caching functionality and it appears the cache keys are only based on the contents of the If this is true, though, then it gives us a clear benefit to just copying the existing freetext views into our offline design doc (and avoiding forcing a re-indexing). 👍 |
Hi @garethbowen , @jkuester , Related to this, SMS workflows use different CHT validation rules directly to do various validations that query freetext search views. These rules use query |
@1yuv the plan is to keep a way to search through reports and contacts on the server, but to do it in a more storage-effective way with CouchDB Nouveau. The effort to research this new search implementation is tracked in #9542. |
@1yuv, out of curiosity, what are the validations doing with the
These freetext searches are a very blunt tool and it would be good to know if there are other specific use-cases that could be properly optimized with dedicated indexes. |
Hi @jkuester , we use some functions like unique which uses exists in cht-core. We use |
Okay, so the final conclusion here regarding if it is possible to "move" a view to an offline-only ddoc without triggering a re-index is, NO, it is not possible. 😞 As detailed in pouchdb/pouchdb#9006, the new offline index would share the cached view index from the original @garethbowen this brings us back to the question of if we want to just pay this re-indexing cost twice (once to make the index offline and again later to remove the |
I was wondering about the manual view move hack too but it seems too risky. It feels like there's another option where instead of having two different client side ddocs with different names, what about one ddoc with different views? I think it's only slightly more confusing than the other option and means we can keep the views. Another option, which is potentially a complete diversion, is to switch to pouchdb 9 and use the idbnext adapter which builds views in a completely different way, may be much faster, and at least there's a gain to come from the forced client side rebuild. We're risking turning this feature into a giant ball of improvements but it might make it palatable. Otherwise, yes, if we're paying the view build cost let's re-re-review the index definition to be optimal (ie: remove unused keys). |
If by "different views" you mean "the same views", then this might work. Ultimately, the plan is to remove the
Ohh the temptation! This is a good point about the full re-index coming from the switch to How do you feel about the long-term maintenance of offline-only views being inserted into the |
Oh it's definitely sketchy. The main issue I see is IIRC we block modifying the ddocs on the client so you'd have to change that, and then find a way to stop them trying to replicate back to the server. This is similar to having a separate ddoc that you don't want to replicate, but it's just a little sketchier because getting this wrong would be more destructive than having separate ddocs. |
I did some quick prototyping in the browser and I am able to modify the ddoc just fine (and add a view view). And, we are already blocking the ddocs from replicating back to the server, so that should not be an issue. However, I think I found the deal-breaker and that is the fact that when we modify the ddoc on the client but do not sync the modification back to the server, things get unpredictable for what happens when future updates to that ddoc are replicated from the server to the client. In my testing, if I only made one change to the ddoc on the server, then Pouch on the client would choose the local version of the ddoc as the winner and it would not reflect the server-side changes. OTOH, if I made multiple revisions to the ddoc on the server and then tried to replicate them all at once, Pouch on the client would choose the server version as the winner and replace my local changes with the server version. This kind of thing is a hard pass for me and I think we should instead move forwards with the plan to stick to a separate offline ddoc and try to align these changes with |
I have logged #9652 to cover the removal of validations during the client-side muting transition. If we move forwards with that issue, it should be completed before this issue here (so that we can safely not include the Either way, we want to try to align this issue with #9208 (switching to |
Possibly related: #9003 |
Is your feature request related to a problem? Please describe.
If we move to Nouveau for freetext searching on the Couch server (aka "online" use), then we still need a solution for freetext searching "offline" (what offline users would experience via their local Pouch instance).
Describe the solution you'd like
The most straightforward approach should be to just have a design doc that is "client-only" similar to how we have docs that are currently "server-only". This design doc will be indexed by PouchDB clients, but will not be indexed by the Couch server.
Since we are likely going to require a complete re-indexing of these views on the client devices one way or the other, we should take this opportunity to remove the unused
key:value
emissions from the freetext views as originally proposed for the "freetext-lite" views.Describe alternatives you've considered
One interesting alternative would come after our upgrade to Pouch
9
and theindexeddb
adapter. IIRC, Mango queries should be more performance on Pouch withindexeddb
. If that is the case, then we could evaluate possibly using Mango queries for freetext searching. This one blog post suggests it works well... 😅The text was updated successfully, but these errors were encountered: