Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify FuzzySearch test (avoid dependency on /usr/share/dict/words) #4531

Merged
merged 6 commits into from
Mar 31, 2025

Conversation

jhrcek
Copy link
Collaborator

@jhrcek jhrcek commented Mar 27, 2025

Trying to address #4530
Saw this in Haskell Weekly newsletter 😄

@jhrcek jhrcek added the CI Continuous integration label Mar 27, 2025
@fendor
Copy link
Collaborator

fendor commented Mar 27, 2025

@jhrcek Thanks for taking care of this! You should only have to add this in test.yml. The bindist and release.yaml jobs are only relevant for the release.

@jhrcek
Copy link
Collaborator Author

jhrcek commented Mar 27, 2025

Ok, /usr/share/dict/words file is now present on ubuntu in CI.
What remains now is to fix (stabilize?) those tests 😄
image

@fendor
Copy link
Collaborator

fendor commented Mar 27, 2025

Didn't expect that 😯 Welp, so sounds like the reference impl is not the same as the FuzzySearch impl...

@jhrcek
Copy link
Collaborator Author

jhrcek commented Mar 27, 2025

I can reproduce that failure locally
cabal test ghcide-tests --test-options='-p"/match works as expected on the english dictionary/"'

Seems to fail quite often and every time there seems to be uppercase letters involved.

I don't know how it's supposed to work, but the main implementation seems to expect that the first parameter consists of lowercase letters (with possible exception of the first letter):

match :: T.Text -- ^ Pattern in lowercase except for first character

but there are many words in the words file that have uppercase letters..

@jhrcek
Copy link
Collaborator Author

jhrcek commented Mar 27, 2025

Now it's clear that reference impl. doesn't agree with the parallel impl.
Sometimes they assign different scores (reference results on the left, parallel on the right):

Input:  "Elys\233e"
[("Elys\233e",246),("Elys\233e's",246)] /= [("Elys\233e",501),("Elys\233e's",501)]

Sometimes reference doesn't return any matches:

Input: "TelePrompte"
          [] /= [("TelePrompter",8177),("TelePrompter's",8177)]
          
Input:  "HR"
          [] /= [("HR",10),("HRH",10)]

@Bodigrim
Copy link
Contributor

Sometimes they assign different scores (reference results on the left, parallel on the right):

That's probably expected, because the "parallel" implementation cuts corners with regards to non-ASCII characters.

Sometimes reference doesn't return any matches:

These violate a precondition that only the first character might be upper case.

@jhrcek
Copy link
Collaborator Author

jhrcek commented Mar 28, 2025

@Bodigrim you seem to have some context on this 😸 (EDIT: I see you implemented the parallel yourself in #2639). Would you say it would make more sense to

  1. filter out the words from /usr/share/dict/words that don't satisfy the precondition or that contain non-ascii characters (or alternatively map non-first chars of each word to lowercase / filter out the non-ascii chars from each word)?
  2. Invest time into making the reference impl. to be have more closely to the parallel impl?
  3. Disable this flaky unit test, because it doesn't add too much value / adds too much time to CI?

@Bodigrim
Copy link
Contributor

I'd put a few unit tests to check that HLS fuzzy search can order a few alternatives in a reasonable way, but that's it. The reference implementation is apparently fragile as are additional system dependencies. Ditch them, there is bigger fish to fry.

@jhrcek jhrcek force-pushed the jhrcek/words-test-fix branch from 2e89ac8 to cb52c79 Compare March 31, 2025 13:45
@jhrcek jhrcek changed the title Ensure /usr/share/dict/words is installed in CI Simplify FuzzySearch test to avoid dependency on /usr/share/dict/words Mar 31, 2025
@jhrcek jhrcek changed the title Simplify FuzzySearch test to avoid dependency on /usr/share/dict/words Simplify FuzzySearch test (avoid dependency on /usr/share/dict/words) Mar 31, 2025
@jhrcek jhrcek marked this pull request as ready for review March 31, 2025 14:01
@jhrcek
Copy link
Collaborator Author

jhrcek commented Mar 31, 2025

I removed all the stuff using words and replaced if with few unit tests + a small property test.
Let me know if you think some more complicated scenarios should be tested.

Copy link
Collaborator

@fendor fendor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fendor fendor enabled auto-merge (squash) March 31, 2025 14:47
@fendor fendor merged commit a154cd6 into master Mar 31, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants