Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-key sorting of statuses #571

Open
1 of 9 tasks
jzohrab opened this issue Jan 17, 2025 · 2 comments
Open
1 of 9 tasks

Add multi-key sorting of statuses #571

jzohrab opened this issue Jan 17, 2025 · 2 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@jzohrab
Copy link
Collaborator

jzohrab commented Jan 17, 2025

From discord link

Sorting by status works great when the book has unknown words but if books have none or the same amount of unknown words then the sort doesn't use next lowest status tier. For example I can't sort this table since the books have no unknowns:

Image

I guess it should sort by unknowns then status 1 through 5

Notes

The book listing is in lute/templates/book/tablelisting.html, with the graph column first getting the UnknownPercent from the datatables query in lute/book/datatables.py, so the unknown percent is used for sorting. The graph is then rendered "on top" of that, with ajax_in_book_stats in the html.

The UnknownPercent is pulled from the table

CREATE TABLE IF NOT EXISTS "bookstats" (
	"BkID" INTEGER NOT NULL  ,
	"distinctterms" INTEGER NULL  ,
	"distinctunknowns" INTEGER NULL  ,
	"unknownpercent" INTEGER NULL  ,
        status_distribution VARCHAR(100) NULL,
	PRIMARY KEY ("BkID"),
	FOREIGN KEY("BkID") REFERENCES "books" ("BkID") ON UPDATE NO ACTION ON DELETE CASCADE
);

with values like this:

sqlite> select * from bookstats;
4|378|220|58|{"0": 220, "1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "98": 0, "99": 158}

"58" here is the percent of unknown terms.

To add this feature correctly, the code needs to change the single value "58" to a composite key (in this case something like {"0": "048", "1": "023", "2": "011", "3": ... etc }, with three digit places for each element to allow for 100 in one of the slots).

Doing all of this in the course of a DB migration would maybe require a python script to run, to load the status_distribution json string and then do calculations. The migration tool currently doesn't handle such a thing. Instead there could be a data clean-up job that runs at startup, that checks the content of the sort string, and if it doesn't match the required pattern it could do some fast processing.

Updated spec after discord discussion.

Probably the best way to do this is to have a method calc_sort_keys added to lute/book/stats.py which does the calc for a new status_distribution_percent field if it's null and if the status_distribution is not null. Call this on every call to the datatables function so the key is always updated.

todos I can think of

  • consider if can be linked somehow to A better way to sort by difficulty #453 -- no don't bother, stick with the current state
  • table migration, add new status_distribution_percent, varchar(100). No need to drop the "unknownpercent" column, it's ok to leave that for later.
  • lute/book/stats.py, add calc_sort_keys method to load status_distribution_percent. Unit tests: nulls, bad json in the distribution field, empty string, valid distribution, distribution with 100% and 0%
  • add unit test for book with no words (e.g. english book, text = "123"
  • call calc_sort_keys in the book datatables python
  • change the book stats calculation to call calc_sort_keys
  • change the datatables to use the new field, template to use sort_key
  • graph uses field, old complicated JS code can be removed
  • once launched, maybe create a separate task to delete the unknownpercent column as it's no longer used
@jzohrab jzohrab added the bug Something isn't working label Jan 17, 2025
@jzohrab jzohrab added this to Lute-v3 Jan 17, 2025
@jzohrab
Copy link
Collaborator Author

jzohrab commented Jan 17, 2025

The above is my feeling about what needs to happen for this to work; but I could be wrong.

I don't believe it would be possible to calculate the sort index dynamically when calling the datatables method. Reason: datatables needs to get the sort index to do its server-side sorting, so it would really need all of those values present. Having that data cached in the table is the only way to do it, afaict.

@jzohrab
Copy link
Collaborator Author

jzohrab commented Jan 17, 2025

While this change is not simple, it's pretty easy, so I think it could be tackled by any motivated dev who wants to give it a shot.

@jzohrab jzohrab added the good first issue Good for newcomers label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
Status: No status
Development

No branches or pull requests

1 participant