Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve spell correction accuracy #58

Closed
hardiksanchawat opened this issue Sep 12, 2019 · 9 comments
Closed

How to improve spell correction accuracy #58

hardiksanchawat opened this issue Sep 12, 2019 · 9 comments

Comments

@hardiksanchawat
Copy link

Hello Symspellpy team,

How would I improve spell correction performance?

Incorrect Input Sentence: The World Econemic Forum is the Intarnational Organizetion for Public-Private Cooperation.

Correct Output Sentence: The World Economic Forum is the International Organization for Public- Private Cooperation.

Output Duration: 2.49 sec

To correct a sentence it is taking 2 to 3 seconds which is much higher. Please provide me suggestion or any solutions is available.

Regards,
Hardik

@mammothb
Copy link
Owner

Majority of the time is spent creating the dictionary. If your dictionary remain unchanged, you can consider using save_pickle after first creating the dictionary and use that afterwards to improve load time.

Some performance comparison : #31 (comment)

@hardiksanchawat
Copy link
Author

@mammothb Thanks a lot for giving me good suggestion.

Now, Above example performance is reduced down from 2.49 sec to 0.9 sec

@hardiksanchawat
Copy link
Author

hardiksanchawat commented Sep 16, 2019

Hi @mammothb

Is it possible to reduce down spell check correction under 0.2 to 0.3 sec?
Please provide me a suggestion or any solutions is available.

Regards,
Hardik

@hardiksanchawat
Copy link
Author

Hi @mammothb,

What parameters should I change to get more spell check correction accuracy?

Kindly provide your suggestions.

Regards,
Hardik

@mammothb
Copy link
Owner

You can try playing around with max_edit_distance in lookups or maybe create a dictionary with count values that are more context appropriate, e.g., making biology-relate terms have higher frequency if you're doing spell checking of biology research articles.

You can also look at the main project page to check if they have any tips to improve accuracy there.

@hardiksanchawat
Copy link
Author

Hi @mammothb

Is it possible to reduce down spell check correction under 0.2 to 0.3 sec?
Please provide me a suggestion or any solutions is available.

Regards,
Hardik

@mammothb I am facing an issue in performance, it is taking 1 sec to correct the sentence. How can I reduce down the performance? For this issue any suggestions?

Awaiting for your response. Thanks in Advance!

Regards,
Hardik

@mammothb
Copy link
Owner

Increasing prefix_length when creating the SymSpell object could potentially cut down some time during look ups, though I'm not sure if it will drastically cut down look up time by up to 5 times.

You can also use a smaller max_edit_distance when calling look ups but you might get poorer accuracy.

I'm not sure if there are other ways if these still cannot get you the speed you want.

@hardiksanchawat
Copy link
Author

@mammothb Thanks for your suggestions.

I have tried below parameters and getting slightly improvement on performance.
max_edit_distance_dictionary = 3
prefix_length = 8

Can I use multi process on look ups?

Regards,
Hardik

@mammothb
Copy link
Owner

I have not implemented this port with multiprocessing in mind. You can use multiprocessing with look ups but I was not able to get any improvement in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants