Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to exclude certain strings from parsing #11

Open
ges1227 opened this issue Oct 19, 2017 · 5 comments
Open

Possibility to exclude certain strings from parsing #11

ges1227 opened this issue Oct 19, 2017 · 5 comments

Comments

@ges1227
Copy link

ges1227 commented Oct 19, 2017

Hey @simonpoole,
I tried to parse some of the following examples:

daily 05.00 am - 09.00 pm
and also
06.00 a.m. - 07.00 p.m..

Unfortunately, both didn't pass the non-strict mode of the parser, due to the fragments 'daily' and 'a.m.'. Is a future implementation planned?

In the meantime, would it be possible to provide an additional functionality to help us out? Perhaps one, in which the user is able to exclude certain strings like 'a.m.', 'daily' and others by defining them in advance?

@simonpoole
Copy link
Owner

simonpoole commented Oct 19, 2017

In general there are two ways this could be done:

  • restarting parsing after it fails for unknown tokens (however this wouldn't result in a valid spec in many cases)
  • skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

As to "a.m." - "p.m." the same goes as above, there needs to a non-neligble amount of use for adding these to make sense, we already have a bad case of diminishing returns with a lot of the special cases we are handling in non-strict mode.

@ypid
Copy link

ypid commented Oct 19, 2017

skipping certain predefined strings in lexical analysis, disadvantage: they have to be predefined so would need to be fairly common for this to make sense

Maybe that helps: https://github.com/opening-hours/opening_hours.js/blob/master/locales/word_error_correction.yaml

@simonpoole
Copy link
Owner

@ges1227 I've done some work on this by skipping such token in lexical analysis. This works well from a pure functional pov, unluckily it makes implementation of strict/non-strict modes of the parser rather messy and forces us to return a JAVA Error instead of an Exception if we detect such a token in strict mode, which will cause validators to moan endlessly (this is due to an architectural wart of javacc), So I'm not quite sure if the code should really be included. Need to think about it a bit.

@ges1227
Copy link
Author

ges1227 commented Oct 26, 2017

@simonpoole, @ypid Thanks for your support, it really brought me further!
So I have been experimenting with the YAML file and achieved a rather satisfing solution. Basically my input strings for the OpeningHoursParser will be filtered by replacing foul words according to the definitions in the YAML file (little example attached).

Therefore no worries about a messy parser anymore, your tips helped me to manage the problem.
As a sidenote, the code is far from perfect.. maybe two YAML files (one for regex, one for 'normal' words) are more helpful to distinguish, whether a regex or 'normal' replacement should be applied onto the string.

@simonpoole
Copy link
Owner

a.m. and p.m. supported via 0acfaa6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants