Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added datasets-taking-too-long warning #309

Closed
wants to merge 2 commits into from

Conversation

yankovs
Copy link
Contributor

@yankovs yankovs commented Dec 21, 2022

Your checklist for this pull request

  • I've read the contributing guideline.
  • I've tested my changes by building and running mquery, and testing changed functionality (if applicable)
  • I've added automated tests for my change (if applicable, optional)
  • I've updated documentation to reflect my change (if applicable)

What is the current behaviour?
Errors in ErrorBoundary are conditionally rendered, so it's either them or the actual content. Changed it so errors are visible alongside the content. Also, code is missing some hinting at errors as described in #307

Some points to think about:

  • what if there are actual errors (not this warning) that come from the api call? It seems like a "race condition" of errors may happen, don't feel like I've uncovered all test cases
  • unable to distinguish between real errors (API call fails because server is down, for example) and warnings

What is the new behaviour?
Shows a warning (should maybe be recolored to yellow?) that compaction should be considered in case fetching api took > 5 seconds.

Test plan

Closing issues

(partially) fixes #307

@msm-code
Copy link
Contributor

msm-code commented Dec 22, 2022

[I'm a bit drunk, so take these comments with a grain of salt]

Thanks for PR! This is still a draft so my comments are not a serious (code) review, just initial ideas:

        withTimeout(
            api.get,
            "/backend/datasets",
            () => this.setState({ error: "Loading topology is taking too long. Consider compaction." }),
            5000
        )

Hardcoded timeout is not very reliable. It may change every reload I guess. Maybe we should use some absolute numbers? For example we can use number of datasets and average dataset size:

  • if number of datasets is < 10: no warning
  • if number of datasets is < 100: warn if average size of a dataset is smaller than 10k files
  • Otherwise warn if number of datasets is > 400 and average size of dataset is smaller than 100k files

(these are just random numbers out of the top of my head. It's hard to give good number for every machine, but 5 seconds == i guess 1000+ datasets is always bad). The point is this change is to warn someone with obviously bad database, not necessarily catch all cases.

For example to give one random data point: I have a server with slow (but big) HDD RAID array. I'm setting up my small mquery instance, and I aim for datasets with up to 1 MM files, to save unnecessary reads. So if I index 40 MM files I'll have roughly 50 datasets. Of course for faster disks smaller datasets may be OK, but still 5 seconds for topology sounds slow :).

@yankovs
Copy link
Contributor Author

yankovs commented Dec 22, 2022

[I'm a bit drunk, so take these comments with a grain of salt]

Thanks for PR! This is still a draft so my comments are not a serious (code) review, just initial ideas:

        withTimeout(
            api.get,
            "/backend/datasets",
            () => this.setState({ error: "Loading topology is taking too long. Consider compaction." }),
            5000
        )

Hardcoded timeout is not very reliable. It may change every reload I guess. Maybe we should use some absolute numbers? For example we can use number of datasets and average dataset size:

  • if number of datasets is < 10: no warning
  • if number of datasets is < 100: warn if average size of a dataset is smaller than 10k files
  • Otherwise warn if number of datasets is > 400 and average size of dataset is smaller than 100k files

(these are just random numbers out of the top of my head. It's hard to give good number for every machine, but 5 seconds == i guess 1000+ datasets is always bad). The point is this change is to warn someone with obviously bad database, not necessarily catch all cases.

For example to give one random data point: I have a server with slow (but big) HDD RAID array. I'm setting up my small mquery instance, and I aim for datasets with up to 1 MM files, to save unnecessary reads. So if I index 40 MM files I'll have roughly 50 datasets. Of course for faster disks smaller datasets may be OK, but still 5 seconds for topology sounds slow :).

Yeah I can see how taking too long to load datasets can be caused by different circumstances which aren't necessarily too many datasets. Well, I treated the compaction warning as more of a general guideline, but If you think a more fine-grained message should be displayed then I'm all in for it :).

What do you think about introducing a notification system into Mquery? I've done some testing and introduced a new dependency into mquery (which isn't preferred, right?) and it looks pretty nice:
notifications
They're even stackable, which is something that we can't achieve with current ErrorBoundary design where only 1 error can be displayed at a time
notifications2

I don't like the idea of adding another dependency instead of making it from scratch but.. it's fast 😄
should be noted that installing it as-is caused errors with npm, requires --legacy-peer-deps

@msm-code
Copy link
Contributor

I've done some testing and introduced a new dependency into mquery (which isn't preferred, right?) and it looks pretty nice:

I'm not a fan of dependencies, but also not a hater. If they bring useful features let's use them.

It does look pretty nice. Personally I'm not a fan of making this time dependent. I'd rather we keep the notifications until user dismisses them, at least for errors. On the other hand, dismissing things may be a bit annoying if it has to be done manually every time when opening the status page.

Well, I treated the compaction warning as more of a general guideline, but If you think a more fine-grained message should be displayed then I'm all in for it :).

I agree that it should be a general guideline, I just meant that maybe we could use number of datasets to decide when to show the message, instead of using time. If my "decision tree" looks too complicated, we can just say that we show warning when number of datasets > 400 (or so - it's quite a lot of datasets IMO).

They're even stackable, which is something that we can't achieve with current ErrorBoundary design where only 1 error can be displayed at a time

I agree errorboundary is probably not a great tool for this. We can use stackable notifications as you've said.
On the other hand, what do you think about using bootstrap alerts? (https://getbootstrap.com/docs/5.0/components/alerts/) I think they would look good, but you would have to create a list of "notifications" yourself somewhere, and later show bootstrap alerts from that list. Not sure if that's easy or very hard to do with the current code.

@msm-cert
Copy link
Member

Hi,

that was fixed a bit differently in #418 - are you OK with this solution? I'm closing this, since I think the goal of this PR was achieved, but feel free to reopen if I'm wrong.

@msm-cert msm-cert closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement some "hint" banners in the admin panel
4 participants