Skip to content

Conversation

llimllib
Copy link

@llimllib llimllib commented Aug 14, 2024

Initial checklist

  • I read the support docs
  • I read the contributing guide
  • I agree to follow the code of conduct
  • I searched issues and couldn’t find anything (or linked relevant results below)
  • If applicable, I’ve added docs and tests

Description of changes

This is a quick fix for the problem described in #8, where a long line causes the findEmail regular expression to exhibit pathological behavior.

The solution presented here is to replace every + in the regular expression with {1,255}; it will still be super-linear on long lines but the buffer is now small enough that the function will complete in an acceptable period of time.

The maximum length of a valid email address is 255 according to the IETF here, but I've left room in the regex for 64 before and 255 after the @.

This only improves matters, doesn't fix the problem entirely.

Before this PR, line lenghts of up to about 50,000 cause recursion failure:

image

With this PR, I can run the regex on strings of length up to 10 megabytes, which seems like a very comfortable line length, certainly a big improvement:

image

but it still fails with a recursion limit above that

closes #8

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Aug 14, 2024
@llimllib llimllib mentioned this pull request Aug 14, 2024
4 tasks
@llimllib llimllib force-pushed the fix-pathological-regex branch from 588c484 to e3c3d38 Compare August 14, 2024 20:52
@llimllib llimllib closed this Aug 14, 2024

This comment has been minimized.

wooorm added a commit that referenced this pull request Aug 19, 2024
@wooorm wooorm added the 🤷 no/invalid This cannot be acted upon label Aug 19, 2024
@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤷 no/invalid This cannot be acted upon 👎 phase/no Post cannot or will not be acted on
Development

Successfully merging this pull request may close these issues.

Performance issue with long lines
2 participants