Consider add Term-Text mapping #532

jzohrab · 2024-12-07T03:20:50Z

Currently, Lute doesn't build an 'index' of its terms, it only sorts through things dynamically at runtime.

Adding an always-accurate term index (textterms, fields TtID, TtTxID, TtWoID, TtCount) could be useful:

gives potentially useful stats to learners to see what words are common
lets us build stats graphs for books (for the index page) really quickly. (Note this wouldn't be useful when evaluating new texts to see how difficult they would be, b/c those texts wouldn't be indexed, unless they're also fully imported.)
simplifies finding term references, as they'll now be pre-calculated and cached
lets users do book-level filters for terms (e.g. show me status 1 terms in this book)
lets users list terms by descending count frequency (potentially useful when importing a new book, can do some pre-processing of data)

To do:

Figure out how to handle text deletion? If user deletes a text, does that mean that term's "counts" will decrease? Yes, but that should be fine ... if they're deleting texts, sure it might skew some small things but nothing to be too worried about. Lute does the best it can with the data it has, it can't be perfect.
wait for Searching for references should take sentence casing into account #531 to be completed, as doing multiterm searches requires casing to be correct
see if this really is useful. I'm not sure that it is ... it's 'interesting' data, but interesting can just mean more numbers and crap, and not really help encourage reading.
note that common words ('the', 'a', etc) will be found on every single page, so indexing these will be overkill -- might just end up with a ton of data. e.g. 1000 pages 100 unique words each = 100K records
figure out how to backfill existing texts -- it's a big processing job. Could be piecemeal, or could be a startup job that is only run once, depending on efficiency
remove all textterms.TtTxID = TxID when a text is edited
remove all textterms.TtTxID = TxID when a text is rendered -- just rebuild the whole thing during render again
cascade delete textterms = TxID when text deleted, = TxWoID when word is deleted
on save of new multiword term, check all all existing text sentences to see if it should be added to the index for that page (~~requires Searching for references should take sentence casing into account #531 to be done~~)

jzohrab added the enhancement label Dec 7, 2024

jzohrab added this to Lute-v3 Dec 7, 2024

jzohrab mentioned this issue Dec 7, 2024

Track Term last date read #535

Open

4 tasks

jzohrab changed the title ~~Consider add Term-Text mapping (BLOCKED by #531)~~ Consider add Term-Text mapping - BLOCKED by #531 Dec 7, 2024

jzohrab changed the title ~~Consider add Term-Text mapping - BLOCKED by #531~~ Consider add Term-Text mapping - BLOCKED by 531 Dec 7, 2024

jzohrab changed the title ~~Consider add Term-Text mapping - BLOCKED by 531~~ Consider add Term-Text mapping Dec 14, 2024

jzohrab added the architecture fundamental architecture redesign label Jan 8, 2025

jzohrab mentioned this issue Jan 27, 2025

Add frequency statistic to Terms #578

Open

jzohrab removed the enhancement label Jan 28, 2025

Provide feedback