-
-
Notifications
You must be signed in to change notification settings - Fork 2
feat: asturian + shared RomanceNumberExtractor class #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
WalkthroughAdds Asturian support and refactors number parsing into a vocabulary-driven RomanceNumberExtractor with new vocabularies (AST, GL, MWL, PT_PT, PT_BR), updated util dataclasses/tokenization, package dispatch/signature changes, README language table update, and new/updated tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Package as ovos_number_parser.__init__
participant Extractor as RomanceNumberExtractor (AST / PT_PT / PT_BR / MWL / GL)
participant Vocab as NumberVocabulary
Caller->>Package: call public API (e.g., pronounce_number(lang, ...))
Package->>Extractor: select extractor by lang/variant
Extractor->>Vocab: consult vocabulary (numbers, ordinals, fractions, scales, gender rules)
Vocab-->>Extractor: provide mappings & join/decimal rules
Extractor->>Extractor: run parsing/pronunciation algorithms (tokenize, extract, pronounce)
Extractor-->>Package: return computed result
Package-->>Caller: deliver final result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ovos_number_parser/__init__.py (1)
86-121: Fixpronounce_numberdispatch for Asturian to honor ordinals, scale, digit style, and gender.The AST branch currently calls:
if lang.startswith("ast"): return pronounce_number_ast(number, places)This ignores:
short_scale(and the derivedscaleenum),ordinals,digits,gender.As a result,
pronounce_number(..., lang="ast", ordinals=True, digits=DigitPronunciation.DIGIT_BY_DIGIT, gender=GrammaticalGender.FEMININE, short_scale=False)will silently behave like a simple cardinal with LONG scale and masculine gender.
pronounce_number_astalready supportsscale,ordinals,digits, andgender, so we should thread these through:@@ def pronounce_number(...): - if lang.startswith("ca"): - return pronounce_number_ca(number, places) - if lang.startswith("ast"): - return pronounce_number_ast(number, places) + if lang.startswith("ca"): + return pronounce_number_ca(number, places) + if lang.startswith("ast"): + return pronounce_number_ast( + number, + places, + scale=scale, + ordinals=ordinals, + digits=digits, + gender=gender, + )This brings the AST branch in line with PT/MWL behavior and the documented API semantics.
🧹 Nitpick comments (3)
ovos_number_parser/numbers_ast.py (2)
123-136: Simplify_swap_gender_astfeminine branch; theendswith("imu")check is redundant.Any word ending with
"imu"also ends with"u", so the second condition is unreachable. You can drop it for clarity without changing behavior.def _swap_gender_ast(word: str, gender: GrammaticalGender) -> str: @@ - if gender == GrammaticalGender.FEMININE: - if word.endswith('u'): - return word[:-1] + 'a' - if word.endswith('imu'): - return word[:-1] + 'a' + if gender == GrammaticalGender.FEMININE: + if word.endswith('u'): + return word[:-1] + 'a'
316-334: Address Ruff B007: rename unused loop variablesvinis_ordinal_ast.The loop over
_ORDINAL_SCALES_AST[Scale.SHORT]doesn’t use the numeric scale value, only the name. Renaming the unused variable to_makes that intent explicit and silences B007.- for sv, sname in _ORDINAL_SCALES_AST[Scale.SHORT]: - if tok == sname or tok == _swap_gender_ast(sname, GrammaticalGender.FEMININE): + for _, sname in _ORDINAL_SCALES_AST[Scale.SHORT]: + if tok == sname or tok == _swap_gender_ast(sname, GrammaticalGender.FEMININE): return Trueovos_number_parser/__init__.py (1)
38-56: Propagatescaleintonumbers_to_digits_astfor consistency and future-proofing.
numbers_to_digitsaccepts ascaleargument, but the Asturian branch currently ignores it and always uses the default LONG scale, even thoughnumbers_to_digits_astexposes ascaleparameter.While SHORT and LONG scales share the same definitions today, passing the caller’s
scalekeeps behavior consistent with PT/MWL and avoids surprises if Asturian scales diverge later.if lang.startswith("az"): return numbers_to_digits_az(utterance) if lang.startswith("ast"): - return numbers_to_digits_ast(utterance) + return numbers_to_digits_ast(utterance, scale=scale)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md(1 hunks)ovos_number_parser/__init__.py(8 hunks)ovos_number_parser/numbers_ast.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ovos_number_parser/numbers_ast.py (1)
ovos_number_parser/util.py (4)
Scale(12-19)GrammaticalGender(22-28)DigitPronunciation(31-33)text(62-66)
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_ast.py
150-150: Avoid specifying long messages outside the exception class
(TRY003)
189-189: Avoid specifying long messages outside the exception class
(TRY003)
230-230: Avoid specifying long messages outside the exception class
(TRY003)
331-331: Loop control variable sv not used within loop body
Rename unused sv to _sv
(B007)
480-480: Avoid specifying long messages outside the exception class
(TRY003)
517-517: Loop control variable s_name not used within loop body
(B007)
517-517: Loop control variable p_name not used within loop body
(B007)
🔇 Additional comments (6)
README.md (1)
41-41: Asturian support row is consistent with implementation.The
astentry correctly advertises pronounce number/ordinal, extraction, and numbers_to_digits support, matching the newnumbers_astmodule and wiring.ovos_number_parser/__init__.py (5)
15-16: Asturian module import wiring looks correct.The
numbers_astimport exposes all the expected entry points (numbers_to_digits, pronounce number/ordinal, extract, fraction/ordinal checks) and aligns with how other language modules are wired here.
172-195: Asturian fraction pronunciation dispatch is correctly wired.The
pronounce_fractionfunction cleanly adds anastbranch that delegates topronounce_fraction_ast, mirroring the existing PT and MWL handling and respecting thescaleargument.No issues here.
197-223: Asturian ordinal pronunciation dispatch is correctly integrated.The
pronounce_ordinalbranch forlang.startswith("ast")callspronounce_ordinal_ast(number, scale=scale, gender=gender), correctly passing both scale and gender in line with PT/MWL behavior.Once the ordinal hundreds issue in
numbers_ast.pyis addressed, this wiring will behave as expected.
242-295: Asturianextract_numberwiring matches the new implementation.The
extract_numberdispatch:if lang.startswith("ast"): return extract_number_ast(text, scale=scale, ordinals=ordinals)correctly passes both
scaleandordinalsby keyword toextract_number_ast(text, ordinals=False, scale=Scale.LONG). After fixing the aggregation/no-number behavior innumbers_ast.extract_number_ast, this integration should be solid.
305-353: Asturianis_fractionalandis_ordinaldispatches are consistent with other languages.
is_fractionalforastsimply callsis_fractional_ast(input_str), which is appropriate since the Asturian implementation doesn’t use scale.is_ordinalforastcallsis_ordinal_ast(input_str)analogously to PT/MWL.These branches correctly extend the public API without affecting existing language behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ovos_number_parser/numbers_mwl.py (1)
237-356: Fix docstrings: references to "Portuguese" should be "Mirandese".All wrapper function docstrings incorrectly reference "Portuguese" (and "Brazilian Portuguese" in some cases) instead of "Mirandese". This appears to be a copy-paste oversight.
Examples to fix:
- Line 243: "Portuguese" → "Mirandese"
- Line 251: "Portuguese" → "Mirandese"
- Line 263: "Portuguese" → "Mirandese"
- Line 273, 276: "Portuguese" → "Mirandese"
- Line 287: "Portuguese" → "Mirandese"
- Line 309: Remove "Brazilian and European Portuguese variants" reference
- Lines 329, 331, 334: "Portuguese" → "Mirandese"
- Lines 346, 348, 354: "Portuguese" → "Mirandese"
♻️ Duplicate comments (1)
ovos_number_parser/numbers_ast.py (1)
163-166: Ordinals for 200–900 are incomplete.The
ORDINAL_HUNDREDSonly defines100: 'centésimu', leaving ordinals like 200th–900th unsupported. This was previously flagged and the TODO comment acknowledges the gap.
🧹 Nitpick comments (6)
ovos_number_parser/util.py (1)
586-598: Minor: Rename unused loop variable.The loop variable
ion line 588 is unused. Consider using_to indicate an intentionally unused variable.- for i in range(n_zeros): + for _ in range(n_zeros): decimal_pronunciation_parts.append(self.vocab.UNITS[0])tests/test_number_parser_ast.py (2)
28-32: Use hyphen-minus instead of en-dash in comment.Line 29 uses an en-dash (–) which static analysis flags. Use a regular hyphen-minus (-) for consistency.
- # 21–29 must exist + # 21-29 must exist
87-95: Multiple skipped tests indicate incomplete AST functionality.Several tests are skipped with
@unittest.skip("TODO - fix me"):
test_fraction_variants(line 87)test_compound_fraction_phrase(line 92)test_fraction_phrase(line 129)test_negative(line 133)test_large_numbers(line 201)test_negativein integration (line 212)These represent known gaps in the Asturian implementation.
Would you like me to open issues to track these incomplete features, or would you prefer to address them in this PR?
Also applies to: 129-136, 201-216
ovos_number_parser/numbers_pt.py (1)
330-340:is_fractional_ptandis_ordinal_ptdo not accept a variant parameter.These functions always delegate to
PT_PT, while other public APIs accept avariantparameter. This is likely intentional since the vocabulary'sALT_SPELLINGShandles cross-variant recognition, but it creates an API inconsistency.Consider documenting this design choice or adding a
variantparameter for API uniformity.ovos_number_parser/numbers_ast.py (2)
167-176: Ordinal scale coverage is limited.The ordinal scales only define up to
10^6, with TODO comments indicating the need for larger scales. Consider completing these for full feature parity with other languages, or document the supported range limitations.Would you like me to generate the missing ordinal scale entries for Asturian?
107-119: Fraction vocabulary has limited coverage compared to Portuguese.Only fractions 2–12 are defined, while Portuguese includes entries up to 1000. Additionally, entries 11 and 12 already include the "avos" suffix, which may cause duplication issues similar to Portuguese if
DENOMINATOR_PARTICLEis appended for plurals.Consider expanding coverage or documenting the supported range.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
ovos_number_parser/numbers_ast.py(1 hunks)ovos_number_parser/numbers_mwl.py(5 hunks)ovos_number_parser/numbers_pt.py(15 hunks)ovos_number_parser/util.py(10 hunks)tests/test_number_parser_ast.py(1 hunks)tests/test_number_parser_pt.py(6 hunks)tests/test_util.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_util.py (1)
ovos_number_parser/util.py (1)
word_tokenize(115-135)
ovos_number_parser/numbers_ast.py (2)
ovos_number_parser/util.py (11)
Scale(38-45)GrammaticalGender(48-54)DigitPronunciation(57-59)pronounce_ordinal(694-752)is_fractional(380-395)is_ordinal(369-378)text(87-91)extract_number(397-475)pronounce_number(532-656)numbers_to_digits(477-530)pronounce_fraction(658-692)ovos_number_parser/__init__.py (7)
pronounce_ordinal(197-239)is_fractional(305-359)is_ordinal(362-386)extract_number(242-302)pronounce_number(86-169)numbers_to_digits(38-83)pronounce_fraction(172-194)
tests/test_number_parser_ast.py (2)
ovos_number_parser/util.py (3)
Scale(38-45)GrammaticalGender(48-54)text(87-91)ovos_number_parser/numbers_ast.py (6)
pronounce_number_ast(246-267)pronounce_ordinal_ast(183-202)is_fractional_ast(205-214)pronounce_fraction_ast(290-302)extract_number_ast(227-243)numbers_to_digits_ast(270-286)
ovos_number_parser/numbers_pt.py (2)
ovos_number_parser/util.py (13)
Scale(38-45)GrammaticalGender(48-54)DigitPronunciation(57-59)NumberVocabulary(264-360)RomanceNumberExtractor(363-854)pronounce_ordinal(694-752)is_fractional(380-395)is_ordinal(369-378)extract_number(397-475)text(87-91)pronounce_number(532-656)numbers_to_digits(477-530)pronounce_fraction(658-692)ovos_number_parser/__init__.py (7)
pronounce_ordinal(197-239)is_fractional(305-359)is_ordinal(362-386)extract_number(242-302)pronounce_number(86-169)numbers_to_digits(38-83)pronounce_fraction(172-194)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (6)
is_ordinal(362-386)is_fractional(305-359)extract_number(242-302)numbers_to_digits(38-83)pronounce_number(86-169)pronounce_ordinal(197-239)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (14)
Scale(38-45)GrammaticalGender(48-54)DigitPronunciation(57-59)word_tokenize(115-135)NumberVocabulary(264-360)RomanceNumberExtractor(363-854)pronounce_ordinal(694-752)is_fractional(380-395)is_ordinal(369-378)text(87-91)extract_number(397-475)pronounce_number(532-656)numbers_to_digits(477-530)pronounce_fraction(658-692)ovos_number_parser/__init__.py (7)
pronounce_ordinal(197-239)is_fractional(305-359)is_ordinal(362-386)extract_number(242-302)pronounce_number(86-169)numbers_to_digits(38-83)pronounce_fraction(172-194)
🪛 Ruff (0.14.7)
tests/test_number_parser_ast.py
29-29: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
ovos_number_parser/util.py
231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
304-304: Unused lambda argument: gender
(ARG005)
554-554: Avoid specifying long messages outside the exception class
(TRY003)
588-588: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
714-714: Avoid specifying long messages outside the exception class
(TRY003)
756-756: Unused method argument: gender
(ARG002)
771-771: Avoid specifying long messages outside the exception class
(TRY003)
822-822: Avoid specifying long messages outside the exception class
(TRY003)
ovos_number_parser/numbers_mwl.py
6-6: Redefinition of unused Scale from line 3
Remove definition: Scale
(F811)
6-6: Redefinition of unused GrammaticalGender from line 3
Remove definition: GrammaticalGender
(F811)
6-6: Redefinition of unused DigitPronunciation from line 3
Remove definition: DigitPronunciation
(F811)
8-8: Redefinition of unused Union from line 1
Remove definition: Union
(F811)
🔇 Additional comments (22)
ovos_number_parser/numbers_mwl.py (2)
11-43: Verify Mirandese linguistic rules.The TODO comments at lines 21-22 and 38 indicate uncertainty about gender and plural transformations. Consider consulting Mirandese language documentation to confirm these rules are accurate.
46-231: LGTM!The
NumberVocabularyconfiguration for Mirandese is comprehensive and well-structured. The vocabulary covers units, tens, hundreds, fractions, ordinals, and scale mappings appropriately.ovos_number_parser/util.py (5)
7-35: LGTM!The
Tokendataclass properly implements tuple-like access via__iter__and__getitem__, maintaining backward compatibility. The immutability enforcement via__setattr__is correctly implemented.
62-112: LGTM!The
ReplaceableNumberdataclass correctly implements properties for accessing token indices and text, with proper immutability enforcement matching theTokenpattern.
231-260: LGTM!The static analysis warning (B008) about
range(1, 21)as a default argument is a false positive. In Python 3,range()returns an immutable sequence object, so it's safe as a default parameter value.
263-360: LGTM!The
NumberVocabularydataclass is well-designed with sensible defaults. The unusedgenderparameter in the defaultswap_genderlambda (line 304) is intentional - it provides a no-op default that languages can override with gender-aware implementations.
658-692: LGTM!The fraction pronunciation logic correctly handles:
- Division by zero with a special message
- Known fraction words from the vocabulary with pluralization
- Unknown denominators using the denominator particle pattern
tests/test_util.py (2)
6-6: LGTM!The
word_tokenizefunction is correctly added to the imports, aligning with its new public API exposure.
827-878: LGTM!Comprehensive test coverage for
word_tokenizeincluding:
- Basic tokenization
- Special character handling (%, #)
- Hyphen behavior between words vs. in numbers
- Edge cases (empty string, whitespace, trailing hyphens)
- Complex mixed inputs
The tests properly exercise the documented behavior from the implementation.
tests/test_number_parser_ast.py (1)
1-219: Overall: Good test structure for new AST language support.The test file provides solid coverage for the Asturian number parser including vocabulary validation, pronunciation, extraction, ordinals, and number-to-digit conversion. The skipped tests clearly document known limitations.
tests/test_number_parser_pt.py (5)
10-10: LGTM!The
PT_PTandPT_BRwrapper imports correctly expose the vocabulary-based Portuguese variant implementations.
34-86: LGTM!Dictionary tests properly updated to use the new vocabulary-based API (
PT_PT.vocab,PT_BR.vocab) instead of private dictionaries. This aligns with the refactored implementation and provides better encapsulation.
350-375: Excellent test additions for decimal handling.These new tests for
test_significant_digitsandtest_leading_zerosprovide valuable coverage for edge cases in decimal pronunciation:
- Rounding at various decimal places
- Leading zero preservation
- Complex cases like
10.00000056This addresses potential regressions in the refactored decimal handling logic.
377-386: Good use of skip with documentation.The skipped test at line 377 includes a clear TODO comment asking whether the behavior should be implemented. This is a good practice for documenting design decisions that need discussion.
491-495: LGTM!Good addition of zero-division test case to verify the
DIVIDED_BY_ZEROvocabulary entry is used correctly.ovos_number_parser/numbers_pt.py (5)
37-38: Verify the'ma'→'m'masculine conversion logic.The rule that converts words ending in
'ma'to remove the final'a'for masculine may not be appropriate for Portuguese. Words ending in-ma(like "problema", "tema") are typically invariant and don't follow this pattern. This could produce incorrect results for certain inputs.Please verify this transformation is intentional and add a comment explaining which word forms it targets.
50-55: Simplified pluralization is acceptable for number vocabulary.The pluralization logic correctly handles the common
-ão→-õespattern and the default-ssuffix, which covers the number-related vocabulary. More complex Portuguese pluralization rules (e.g.,-l→-is) are not needed for the current number word set.
229-292: Good vocabulary reuse for Brazilian Portuguese variant.The
_PT_BRvocabulary correctly inherits shared properties from_PT_PTwhile overriding variant-specific differences (tens spellings, scale names, default scale). The cross-variantALT_SPELLINGSenables recognition of both regional forms.
437-511: Comprehensive test coverage in__main__block.The test block covers pronunciation, extraction, ordinals, fractions, and edge cases including division by zero. This serves as useful development/debugging reference.
132-145: No issue found with fraction pronunciation.The FRACTION entries for 11–19 (e.g.,
'onze avos') are intentionally pre-suffixed with "avos" and already end in "s". Thepluralize_ptfunction checks if a word ends in "s" and returns it unchanged if it does, so no duplication occurs. Whenpronounce_fractionpluralizes these entries for numerators other than 1, it correctly returns the same string without appending additional particles. Additionally,DENOMINATOR_PARTICLEis only appended to denominators not in theFRACTIONdictionary (line 30 in util.py), so there is no risk of duplication with pre-suffixed fraction names.ovos_number_parser/numbers_ast.py (2)
306-329: Test block provides good development coverage.The
__main__block covers cardinal pronunciation, ordinals, extraction, and digit conversion. Since the implementation now delegates toRomanceNumberExtractor, verify that the shared extractor correctly handles cases like'dos millones trescientos'(which should extract as2000300).
78-87: No action needed. The number 20 is correctly mapped to'venti'in Asturian. The_pronounce_up_to_999method (line 774–775) explicitly checks if the number exists in theTENSdictionary and returns it directly, sopronounce_number_ast(20)correctly returns'venti', which is the standard Asturian spelling for the standalone number 20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
tests/test_number_parser_ast.py (1)
88-95: Fragile test assertion always passes.The assertion on line 94 will always pass because it constructs a string that's guaranteed to contain "cuartus". This doesn't actually verify the expected output.
Apply this fix:
- self.assertIn("cuartus", result2 if "cuartus" in result2 else "cuartus?") + self.assertIn("cuartus", result2)ovos_number_parser/numbers_ast.py (3)
9-17: Unreachable condition: 'imu' check is redundant.Line 12 checks for words ending in 'imu', but this is unreachable because line 10 already catches all words ending in 'u' (which includes 'imu').
if gender == GrammaticalGender.FEMININE: if word.endswith('u'): return word[:-1] + 'a' - if word.endswith('imu'): - return word[:-1] + 'a'
20-28: Pluralization logic produces incorrect results.The TODO comment indicates uncertainty, and the logic is indeed problematic:
- Lines 24-25: Words not ending in 'u' have their last character replaced with 'os' (e.g., "mil" → "mis")
- Lines 26-27: Only adds 's' if word ends in 'u' and doesn't already end in 's'
Asturian pluralization typically follows patterns like
-u→-os,-ón→-ones.def pluralize_ast(word: str): - # TODO - is this accurate? if word.endswith("ón"): return word[:-2] + "ones" - if not word.endswith("u"): - return word[:-1] + "os" - if not word.endswith("s"): + if word.endswith("u"): + return word[:-1] + "os" + if not word.endswith("s"): return word + "s" return word
161-164: Ordinals 200-999 are incomplete.
ORDINAL_HUNDREDSonly defines 100, missing entries for 200-900. This causes ordinals like 256 to drop the hundreds component and pronounce only the last two digits.Consider either:
- Adding explicit mappings for 200, 300, ..., 900
- Adding a fallback strategy to construct ordinals from cardinals
- Explicitly documenting the supported ordinal range
🧹 Nitpick comments (2)
tests/test_number_parser_ast.py (1)
21-21: Replace EN DASH with HYPHEN-MINUS in comment.The comment contains an EN DASH (–) character instead of a standard HYPHEN-MINUS (-). While this doesn't affect functionality, it's better to use standard ASCII characters in code for consistency.
- # 21–29 must exist + # 21-29 must existovos_number_parser/numbers_mwl.py (1)
387-390: Remove unnecessary f-string prefixes.Lines 387-390 use f-string syntax but contain no placeholders. The
fprefix should be removed for clarity.- print(f"'bint'i un' ->", MWL.extract_number("bint'i un")) - print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa")) - print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous")) - print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues")) + print("'bint'i un' ->", MWL.extract_number("bint'i un")) + print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa")) + print("'bint'i dous' ->", MWL.extract_number("bint'i dous")) + print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
ovos_number_parser/__init__.py(10 hunks)ovos_number_parser/numbers_ast.py(1 hunks)ovos_number_parser/numbers_mwl.py(2 hunks)ovos_number_parser/numbers_pt.py(12 hunks)tests/test_number_parser_ast.py(1 hunks)tests/test_number_parser_pt.py(10 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_mwl.py
387-387: f-string without any placeholders
Remove extraneous f prefix
(F541)
388-388: f-string without any placeholders
Remove extraneous f prefix
(F541)
389-389: f-string without any placeholders
Remove extraneous f prefix
(F541)
390-390: f-string without any placeholders
Remove extraneous f prefix
(F541)
ovos_number_parser/__init__.py
53-53: Undefined name numbers_to_digits_ast
(F821)
tests/test_number_parser_ast.py
21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
🔇 Additional comments (3)
ovos_number_parser/numbers_pt.py (1)
58-292: LGTM: Clean refactoring to vocabulary-driven approach.The refactoring from hardcoded dictionaries to
NumberVocabularyandRomanceNumberExtractoris well-executed. The clear separation between private vocabularies (_PT_PT,_PT_BR) and public extractors (PT_PT,PT_BR) follows good design patterns and maintains backward compatibility through deprecated wrappers.tests/test_number_parser_pt.py (1)
1-616: LGTM: Comprehensive test coverage for new API.The test suite has been thoroughly updated to exercise the public
PT_PTandPT_BRinterfaces, with excellent coverage of edge cases including decimal handling, leading zeros, fractions, and integration scenarios. The migration from internal structures to public vocab attributes is complete and consistent.ovos_number_parser/numbers_mwl.py (1)
17-18: TODO comments indicate uncertain gender/plural logic.The TODO comments on lines 17 and 34 indicate uncertainty about the correctness of gender swapping and pluralization rules for Mirandese. Consider verifying these rules against Mirandese grammar references or consulting with a native speaker to ensure correctness.
If you need assistance researching Mirandese grammar rules for gender swapping and pluralization, I can help generate a verification approach.
Also applies to: 34-34
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ovos_number_parser/util.py (1)
771-820: Unusedgenderparameter in_pronounce_up_to_999.As noted in a past review, the
genderparameter is declared but not used. For Romance languages, cardinal numbers like 1 and 2 have gendered forms. This appears to be intentional deferral of gender handling for cardinals.Consider adding a TODO comment if gender support for cardinals is planned, or document why it's not needed here:
def _pronounce_up_to_999(self, n: int, gender: GrammaticalGender = GrammaticalGender.MASCULINE ) -> str: + # Note: gender parameter reserved for future use with gendered cardinal forms (um/uma, dois/duas)
🧹 Nitpick comments (7)
tests/test_number_parser_ast.py (1)
20-23: Minor: EN DASH in comment.The comment on line 21 uses an EN DASH (
–) instead of a standard hyphen-minus (-). This is a cosmetic issue but could cause confusion.def test_composite_20s(self): - # 21–29 must exist + # 21-29 must exist for i in range(21, 30): self.assertIn(i, AST.vocab.TENS)ovos_number_parser/__init__.py (2)
180-186: Double space in line 182.Minor formatting issue with double space before
PT_PT.pronounce_fraction.if lang.startswith("pt"): return PT_BR.pronounce_fraction(fraction_word, scale=scale) if "br" in lang.lower() \ - else PT_PT.pronounce_fraction(fraction_word, scale=scale) + else PT_PT.pronounce_fraction(fraction_word, scale=scale)
210-216: Double space in line 212.Same formatting issue with double space.
if lang.startswith("pt"): return PT_BR.pronounce_ordinal(number, scale=scale, gender=gender) if "br" in lang.lower() \ - else PT_PT.pronounce_ordinal(number, scale=scale, gender=gender) + else PT_PT.pronounce_ordinal(number, scale=scale, gender=gender)ovos_number_parser/util.py (4)
62-113: Consider usingAttributeErrorfor consistency.
ReplaceableNumber.__setattr__raises a genericExceptionwhileToken.__setattr__raisesAttributeError. Both should use the same exception type for consistency.try: getattr(self, key) except AttributeError: super().__setattr__(key, value) else: - raise Exception("Immutable!") + raise AttributeError("Immutable!")
231-260: Avoid mutable default argument withrange().Using
range(1, 21)as a default argument is evaluated once at function definition time. Whilerangeobjects are immutable so this won't cause mutation bugs, it's cleaner to useNoneand set the default inside the function.-def convert_to_mixed_fraction(number: Union[int, float], denominators=range(1, 21)) -> Optional[Tuple[int, int, int]]: +def convert_to_mixed_fraction(number: Union[int, float], denominators=None) -> Optional[Tuple[int, int, int]]: """ Convert floats to components of a mixed fraction representation ... """ int_number = int(number) if int_number == number: return int_number, 0, 1 # whole number, no fraction frac_number = abs(number - int_number) if not denominators: denominators = range(1, 21)
304-306: Unusedgenderparameter in default lambda.The default
swap_genderlambda accepts agenderparameter but doesn't use it. While this is intentional (as the default behavior is identity), consider adding a comment or using_prefix.- swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, gender: word + swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, _gender: word
602-615: Rename unused loop variableito_i.The loop variable
iis unused in the loop body.- for i in range(n_zeros): + for _ in range(n_zeros): decimal_pronunciation_parts.append(self.vocab.UNITS[0])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
ovos_number_parser/__init__.py(11 hunks)ovos_number_parser/util.py(10 hunks)tests/test_number_parser_ast.py(1 hunks)tests/test_number_parser_pt.py(10 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/test_number_parser_ast.py (2)
ovos_number_parser/util.py (8)
GrammaticalGender(48-54)pronounce_number(549-673)is_fractional(380-395)pronounce_fraction(675-709)extract_number(397-492)pronounce_ordinal(711-769)numbers_to_digits(494-547)text(87-91)ovos_number_parser/__init__.py (6)
pronounce_number(83-163)is_fractional(299-353)pronounce_fraction(166-188)extract_number(236-296)pronounce_ordinal(191-233)numbers_to_digits(35-80)
ovos_number_parser/__init__.py (2)
ovos_number_parser/numbers_pt.py (1)
PortugueseVariant(9-16)ovos_number_parser/util.py (6)
numbers_to_digits(494-547)pronounce_number(549-673)pronounce_fraction(675-709)pronounce_ordinal(711-769)extract_number(397-492)text(87-91)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (7)
is_ordinal(356-380)is_fractional(299-353)extract_number(236-296)numbers_to_digits(35-80)pronounce_number(83-163)pronounce_ordinal(191-233)pronounce_fraction(166-188)
🪛 Ruff (0.14.7)
tests/test_number_parser_ast.py
21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
tests/test_number_parser_pt.py
611-611: Do not catch blind exception: Exception
(BLE001)
ovos_number_parser/util.py
231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
304-304: Unused lambda argument: gender
(ARG005)
571-571: Avoid specifying long messages outside the exception class
(TRY003)
605-605: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
731-731: Avoid specifying long messages outside the exception class
(TRY003)
773-773: Unused method argument: gender
(ARG002)
788-788: Avoid specifying long messages outside the exception class
(TRY003)
839-839: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (32)
tests/test_number_parser_ast.py (8)
1-5: LGTM!The imports are clean and appropriate for the test module.
10-33: LGTM! Comprehensive vocabulary tests.These tests properly validate the completeness of the AST vocabulary dictionaries for units, tens, hundreds, and fractions.
39-67: LGTM! Good coverage of pronunciation tests.Tests cover zero, units, teens, composite 20s, tens with units, and hundreds with various configurations.
73-90: LGTM! Fraction tests now properly validate output.The previous fragile assertion issue has been addressed. The tests now correctly verify both singular and plural fraction forms.
96-126: LGTM! Extraction tests cover key scenarios.Tests include simple numbers, hundreds, thousands, millions, negatives, and the no-number case. The skipped fraction phrase test is appropriately marked with a TODO.
132-149: LGTM! Ordinal tests cover basic through composed cases.Good coverage of basic ordinals with gender variants, teens, tens, composed ordinals, and hundredths.
155-172: LGTM! Numbers-to-digits tests are well structured.Tests cover simple conversions, phrase context, mixed text, and the no-number fallback case.
178-202: LGTM! Integration tests provide good round-trip validation.The round-trip test for multiple number values is valuable. Skipped large number test is appropriately marked. Decimal and negative tests ensure end-to-end correctness.
tests/test_number_parser_pt.py (7)
3-6: LGTM! Clean import migration to variant-based API.The imports now correctly reference
PT_PTandPT_BRinstead of internal dictionary mappings.
27-82: LGTM! Dictionary tests properly migrated to vocab-based API.Tests now access vocabulary through
PT_PT.vocab.*andPT_BR.vocab.*paths, validating the new public API surface.
345-370: LGTM! Excellent coverage of decimal edge cases.The new tests for significant digits rounding and leading zeros handling are thorough and validate important pronunciation behavior.
372-381: Good decision to skip uncertain behavior test.The skipped test for decimal edge cases (1.0, 1.00, 1.000) appropriately documents uncertain requirements with a TODO comment.
486-491: LGTM! Zero division handling test.Good addition to verify the behavior when dividing by zero in fractions.
603-612: Acceptable use of broad exception catch in test context.While the static analyzer flags catching
Exception, this is appropriate for a robustness test that verifies the function doesn't raise any exception for large numbers.
614-619: LGTM! Negative number round-trip test.Good integration test validating negative number pronunciation and extraction consistency.
ovos_number_parser/__init__.py (7)
15-15: LGTM! Clean imports for new language extractors.The imports properly bring in
AST,PT_PT,PT_BR, andMWLfor the variant-based routing.Also applies to: 25-26
52-53: LGTM! Correctly addressed previous review comment.The routing now correctly uses
AST.numbers_to_digits(utterance)instead of the previously undefinednumbers_to_digits_ast().
117-118: LGTM! AST pronunciation routing is correct.Properly passes all required parameters to
AST.pronounce_number.
143-148: LGTM! PT variant routing for pronounce_number.Correctly distinguishes between Brazilian and European Portuguese using
"br" in lang.lower().
283-289: LGTM! Extract number routing is consistent.Properly routes to PT_BR or PT_PT based on language variant detection.
341-346: LGTM! Fractional routing added for AST.Properly adds AST support to
is_fractionaldispatch.
368-374: LGTM! Ordinal routing added for AST.Properly adds AST support to
is_ordinaldispatch.ovos_number_parser/util.py (10)
7-35: LGTM! Token dataclass with immutability support.The
Tokenclass properly implements iteration and indexing for backwards compatibility while enforcing immutability through__setattr__.
115-136: LGTM! Clean tokenization function.The
word_tokenizefunction handles special cases like percentages, hashtags, and hyphenated words appropriately.
263-306: LGTM! Comprehensive vocabulary dataclass.
NumberVocabularyencapsulates all language-specific vocabulary and configuration in a clean, typed structure. The callable fields forswap_gender,pluralize, andsingularizeprovide good extensibility.
308-360: LGTM! Well-structured vocabulary helper methods.The
get_number_strings,get_ordinal_strings, andget_fraction_stringsmethods properly combine masculine, feminine, and plural forms for comprehensive word recognition.
397-492: LGTM! Robust number extraction logic.The
extract_numbermethod handles cardinals, ordinals, fractions, scales, decimals, and negatives. The tokenization and lookahead logic is sound.
494-547: LGTM! Clean numbers-to-digits implementation.The span-based approach for identifying and replacing number words is well-designed and handles edge cases like joiners appropriately.
549-673: LGTM! Comprehensive number pronunciation.The
pronounce_numbermethod properly handles negatives, decimals with leading zeros, integer pronunciation, and scale-based large numbers.
675-709: LGTM! Fraction pronunciation with proper pluralization.Handles known fraction words, pluralization, and unknown denominators with the denominator particle.
711-769: LGTM! Ordinal pronunciation with gender and scale support.Properly handles zero, negatives, and large scales with correct gender application.
822-870: LGTM! Ordinal pronunciation for 0-999 range.Properly handles hundreds, tens, and units with gender swapping for all ordinal forms.
| if lang.startswith("pt"): | ||
| return numbers_to_digits_pt(utterance, scale=scale) | ||
| return PT_PT.numbers_to_digits(utterance, scale=scale) | ||
| if lang.startswith("mwl"): | ||
| return numbers_to_digits_mwl(utterance, scale=scale) | ||
| return MWL.numbers_to_digits(utterance, scale=scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check all instances of language detection for Portuguese variants
rg -n 'def (numbers_to_digits|pronounce_number|.*_to_\w+)' ovos_number_parser/__init__.py | head -20Repository: OpenVoiceOS/ovos-number-parser
Length of output: 222
🏁 Script executed:
#!/bin/bash
# Search for all Portuguese-related dispatch logic
rg -n '(PT_PT|"pt"|"br"|lang\.)' ovos_number_parser/__init__.py -B 2 -A 2Repository: OpenVoiceOS/ovos-number-parser
Length of output: 11500
Fix Portuguese variant handling in numbers_to_digits, is_fractional, and is_ordinal.
The dispatch logic is inconsistent across functions. pronounce_number, pronounce_fraction, pronounce_ordinal, and extract_number all check "br" in lang.lower() to route Brazilian Portuguese separately, but numbers_to_digits (line 72), is_fractional (line 341), and is_ordinal (line 368) always route to PT_PT regardless of whether the language is pt-BR or similar. This means Brazilian Portuguese users receive Portugal-specific behavior for these three functions.
🤖 Prompt for AI Agents
In ovos_number_parser/__init__.py around lines 72 to 75, the Portuguese dispatch
always uses PT_PT causing pt-BR variants to get Portugal-specific behavior;
update the routing so that if "br" is in lang.lower() it returns
PT_BR.numbers_to_digits(utterance, scale=scale) otherwise return
PT_PT.numbers_to_digits(...). Apply the same pattern to is_fractional and
is_ordinal (check "br" in lang.lower() and route to PT_BR for Brazilian
Portuguese, PT_PT otherwise) to make the dispatch consistent with
pronounce_number/pronounce_fraction/pronounce_ordinal/extract_number.
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @JarbasAl. * #38 (comment) The following files were modified: * `ovos_number_parser/__init__.py` * `ovos_number_parser/numbers_ast.py` * `ovos_number_parser/numbers_mwl.py` * `ovos_number_parser/numbers_pt.py` * `ovos_number_parser/util.py` * `tests/test_number_parser_ast.py` * `tests/test_number_parser_pt.py`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
ovos_number_parser/numbers_ast.py (1)
38-220: Asturian NumberVocabulary wiring looks solid; consider unifying zero/cero wording.The
_ASTconfiguration hangs together well: units/tens/hundreds cover the expected ranges, fractions and ordinal hundreds are complete, and hooking it up viaAST = RomanceNumberExtractor(_AST)matches the pattern used for other Romance languages.One tiny nit:
DIVIDED_BY_ZERO="a dividir por zero"uses"zero", whileUNITS[0]is"cero". If"cero"is the canonical Asturian form you want everywhere, you may want to standardize this string; otherwise feel free to leave as-is if the variation is intentional.tests/test_number_parser_ast.py (3)
21-23: Replace EN DASH in comment with a plain ASCII hyphen to satisfy Ruff.The comment
# 21–29 must existuses an EN DASH (–), which Ruff flags as ambiguous. Switching it to a standard-(i.e.,21-29) will avoid the warning and keep the source ASCII‑clean.
96-126: Consider adding explicit zero‑extraction coverage alongside “no number” cases.
test_no_numbernicely guards theFalsecontract for inputs without numerals, but there’s no explicit test that"cero"(or any zero form) is parsed as0rather thanFalse. Given past issues around “no number” behavior, it would be useful to add something like:def test_zero_extraction(self): self.assertEqual(AST.extract_number("cero"), 0)to lock in the intended semantics for zero.
132-148: Add a couple of ordinals ≥ 200 to regression‑test the hundreds logic.Since
ORDINAL_HUNDREDSwas recently completed for 200–900, it would be good to add one or two tests here for larger ordinals (e.g.,AST.pronounce_ordinal(256)andAST.pronounce_ordinal(900)) to ensure the hundreds component is preserved correctly and to prevent regressions in that range.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ovos_number_parser/numbers_ast.py(1 hunks)tests/test_number_parser_ast.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ovos_number_parser/numbers_ast.py (2)
ovos_number_parser/util.py (8)
Scale(38-45)GrammaticalGender(48-54)NumberVocabulary(264-360)RomanceNumberExtractor(363-871)pronounce_number(549-673)pronounce_ordinal(711-769)extract_number(397-492)numbers_to_digits(494-547)ovos_number_parser/__init__.py (4)
pronounce_number(83-163)pronounce_ordinal(191-233)extract_number(236-296)numbers_to_digits(35-80)
🪛 Ruff (0.14.7)
tests/test_number_parser_ast.py
21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
🔇 Additional comments (1)
tests/test_number_parser_ast.py (1)
138-140: Clarify expected 12th ordinal form vs vocab entry.
test_teens_ordinalasserts thatAST.pronounce_ordinal(12)contains"décimu", but the Asturian vocab currently definesORDINAL_TENS[12] = "docenu"inovos_number_parser/numbers_ast.py. Depending on howRomanceNumberExtractor.pronounce_ordinalusesORDINAL_TENS, the test and vocab may not be aligned.Please double‑check what the correct Asturian form for “12th” should be and then either:
- Update the vocab entry for
12to match, or- Adjust the test to assert the actual canonical form (e.g.,
"docenu"or another agreed form).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ovos_number_parser/numbers_pt.py (1)
367-395: Avoid positional argument mismatch when delegating topronounce_ordinalHere you delegate to
RomanceNumberExtractor.pronounce_ordinalusing positional arguments:if variant == PortugueseVariant.PT: return PT_PT.pronounce_ordinal(number, gender, scale) return PT_BR.pronounce_ordinal(number, gender, scale)Elsewhere (e.g. in
ovos_number_parser/__init__.py),pronounce_ordinalis always called withscale=andgender=keywords, which strongly suggests the method signature is(number, scale=..., gender=...). In that case, these calls would swapscaleandgender, yielding incorrect behavior for non-default arguments.Switching to keyword arguments here makes this wrapper robust regardless of the exact parameter order:
- if variant == PortugueseVariant.PT: - return PT_PT.pronounce_ordinal(number, gender, scale) - return PT_BR.pronounce_ordinal(number, gender, scale) + if variant == PortugueseVariant.PT: + return PT_PT.pronounce_ordinal(number, scale=scale, gender=gender) + return PT_BR.pronounce_ordinal(number, scale=scale, gender=gender)
🧹 Nitpick comments (4)
ovos_number_parser/numbers_pt.py (4)
19-47: Tightenswap_gender_ptheuristics for...ma/...mendingsThe current rules can generate malformed forms for numerals:
elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):will turn"décima"into"décim"instead of"décimo", and will fire before the more general"a" -> "o"rule.elif gender == GrammaticalGender.FEMININE and word.endswith('m'):works for"um" -> "uma"but will produce incorrect forms like"bom" -> "boma"if ever used outside strict numeral vocabulary.Given this helper is primarily for numeral words, you can safely narrow the rules and avoid the truncation:
- elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): - return word[:-1] - elif gender == GrammaticalGender.MASCULINE and word.endswith('a'): + elif gender == GrammaticalGender.MASCULINE and word.endswith('a'): return word[:-1] + 'o' @@ - elif gender == GrammaticalGender.FEMININE and word.endswith('m'): - return word + 'a' + elif gender == GrammaticalGender.FEMININE and word == "um": + return "uma"This keeps the special-cases (
dois/duas,um/uma) and generico/a/os/asswaps, while avoiding truncated stems and over-broad"m" -> "ma"changes.
50-55: Minor typing nit: add explicit return annotation topluralize_ptFor consistency with
swap_gender_ptand the rest of the typed surface, consider annotating the return type:-def pluralize_pt(word: str): +def pluralize_pt(word: str) -> str:Behavior stays the same; this just tightens the public API.
398-497: Deprecation shims are consistent; consider scale default semanticsThe deprecated helpers (
is_fractional_pt,is_ordinal_pt,extract_number_pt,pronounce_number_pt,numbers_to_digits_pt,pronounce_fraction_pt) all:
- Emit
DeprecationWarningwithstacklevel=2, which is good practice.- Delegate straight through to
PT_PT/PT_BRdepending onvariant.One nuance to be aware of: these shims default
scaletoScale.LONG(where present), which means they override the vocab’s ownDEFAULT_SCALE(e.g.,Scale.SHORTfor_PT_BR). If backward-compat with the legacy API depended on “short vs long” defaults, you may want these wrappers to passscaleonly when explicitly provided, lettingRomanceNumberExtractorfall back tovocab.DEFAULT_SCALE. If the semantics are intentionally shifting and the functions are deprecated anyway, current behavior is acceptable.No immediate changes required, but worth double‑checking against historical expectations.
500-574:__main__tests: BR/PT variant usage fornumbers_to_digits_ptIn the “BR” block you still rely on the default
variant:print("\n--- Testing numbers_to_digits_pt (BR) ---") print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta')}'") ...But
numbers_to_digits_ptdefaultsvarianttoPortugueseVariant.PT, so these examples are actually exercising the PT vocab, not BR. Since the “PT” block below correctly passesvariant=PortugueseVariant.PT, you probably intended the BR block to passPortugueseVariant.BRexplicitly:- print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta')}'") - print(f"'um milhão' -> '{numbers_to_digits_pt('um milhão')}'") - print(f"'dezesseis' -> '{numbers_to_digits_pt('dezesseis')}'") - print(f"'há duzentos e cinquenta carros' -> '{numbers_to_digits_pt('há duzentos e cinquenta carros')}'") + print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta', variant=PortugueseVariant.BR)}'") + print(f"'um milhão' -> '{numbers_to_digits_pt('um milhão', variant=PortugueseVariant.BR)}'") + print(f"'dezesseis' -> '{numbers_to_digits_pt('dezesseis', variant=PortugueseVariant.BR)}'") + print(f"'há duzentos e cinquenta carros' -> '{numbers_to_digits_pt('há duzentos e cinquenta carros', variant=PortugueseVariant.BR)}'")This is only in the manual test harness, but tightening it will make manual checks reflect the intended variant behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ovos_number_parser/numbers_pt.py(12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ovos_number_parser/numbers_pt.py (2)
ovos_number_parser/util.py (12)
Scale(38-45)GrammaticalGender(48-54)DigitPronunciation(57-59)NumberVocabulary(264-360)RomanceNumberExtractor(363-871)pronounce_ordinal(711-769)is_fractional(380-395)is_ordinal(369-378)extract_number(397-492)text(87-91)pronounce_number(549-673)pronounce_fraction(675-709)ovos_number_parser/__init__.py (6)
pronounce_ordinal(191-233)is_fractional(299-353)is_ordinal(356-380)extract_number(236-296)pronounce_number(83-163)pronounce_fraction(166-188)
🔇 Additional comments (1)
ovos_number_parser/numbers_pt.py (1)
58-266: Vocabulary configuration and PT/PT‑BR wrappers look consistentThe
_PT_PT/_PT_BRNumberVocabularydefinitions and thePT_PT/PT_BRRomanceNumberExtractorwrappers are wired together cleanly:
- Shared pieces (
DIVIDED_BY_ZERO,DENOMINATOR_PARTICLE,HUNDRED_PARTICLE,NO_PREV_UNIT,NO_PLURAL, ordinal maps, etc.) are sensibly reused between variants.- Variant-specific pieces (TENS forms, short/long scale names,
DIGIT_SPELLINGSfor"meia", cross-variantALT_SPELLINGS) are localized to the correct vocab.DEFAULT_SCALEdiffers appropriately (LONGfor PT,SHORTfor BR).No structural or obvious correctness issues stand out in these mappings given the
NumberVocabulary/RomanceNumberExtractorcontracts.Also applies to: 268-358, 360-361
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
ovos_number_parser/util.py (1)
777-826: Unusedgenderparameter may indicate missing functionality.The
genderparameter is declared but never used. For Romance languages, cardinal numbers like 1 ("um"/"uma") and 2 ("dois"/"duas") have gendered forms stored inGENDERED_SPELLINGS. Consider applying gender-specific spellings when the number matches a gendered entry.ovos_number_parser/__init__.py (2)
73-76: Inconsistent Portuguese variant handling innumbers_to_digits.Unlike
pronounce_number,extract_number, andpronounce_fractionwhich check"br" in lang.lower()to route Brazilian Portuguese separately,numbers_to_digitsalways routes toPT_PT. For consistency, apply the same pattern.if lang.startswith("pt"): - return PT_PT.numbers_to_digits(utterance, scale=scale) + return PT_BR.numbers_to_digits(utterance, scale=scale) if "br" in lang.lower() \ + else PT_PT.numbers_to_digits(utterance, scale=scale)
367-372: Inconsistent Portuguese variant handling inis_fractionalandis_ordinal.Similar to
numbers_to_digits, these functions always route Portuguese toPT_PTwithout checking for Brazilian Portuguese variant. Apply the same pattern used inpronounce_numberandextract_numberfor consistency.if lang.startswith("pt"): - return PT_PT.is_fractional(input_str) + return PT_BR.is_fractional(input_str) if "br" in lang.lower() \ + else PT_PT.is_fractional(input_str)And for
is_ordinal:if lang.startswith("pt"): - return PT_PT.is_ordinal(input_str) + return PT_BR.is_ordinal(input_str) if "br" in lang.lower() \ + else PT_PT.is_ordinal(input_str)
🧹 Nitpick comments (5)
ovos_number_parser/util.py (3)
7-35: Consider usingfrozen=Truefor dataclass immutability.The custom
__setattr__implementation provides pseudo-immutability, but Python's dataclass decorator supports this natively withfrozen=True. This also provides better IDE support and clearer intent.-@dataclass +@dataclass(frozen=True) class Token: word: str index: int def __iter__(self): yield self.word yield self.index def __getitem__(self, item): if item == 0: return self.word elif item == 1: return self.index raise IndexError - - def __setattr__(self, key, value): - """ - Prevent modification of existing attributes, allowing only new attributes to be set. - - Raises: - Exception: If attempting to modify an attribute that already exists. - """ - try: - getattr(self, key) - except AttributeError: - super().__setattr__(key, value) - else: - raise AttributeError("Immutable!")
93-105: Inconsistent exception type for immutability violation.
Token.__setattr__raisesAttributeError("Immutable!")whileReplaceableNumber.__setattr__raisesException("Immutable!"). Use consistent exception types, preferablyAttributeErrorwhich is more semantically correct for attribute access violations.else: - raise Exception("Immutable!") + raise AttributeError("Immutable!")
609-610: Unused loop variablei.The loop variable
iis not used within the loop body. Rename it to_to indicate it's intentionally unused.- for i in range(n_zeros): + for _ in range(n_zeros): decimal_pronunciation_parts.append(self.vocab.UNITS[0])ovos_number_parser/numbers_mwl.py (1)
392-395: Remove extraneousfprefix from non-interpolated strings.These lines use f-string syntax but contain no placeholders. The print statements use comma separation for output, so the
fprefix is unnecessary.- print(f"'bint'i un' ->", MWL.extract_number("bint'i un")) - print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa")) - print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous")) - print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues")) + print("'bint'i un' ->", MWL.extract_number("bint'i un")) + print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa")) + print("'bint'i dous' ->", MWL.extract_number("bint'i dous")) + print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))ovos_number_parser/__init__.py (1)
113-117: Consider emittingDeprecationWarningforshort_scaleparameter.The
short_scaleparameter is marked as deprecated with a TODO to log a warning, but no warning is actually emitted. For consistent deprecation behavior (similar to the MWL/PT wrapper functions), consider adding a warning.if short_scale is not None: - # TODO log warning - pass + import warnings + warnings.warn( + "short_scale is deprecated, use scale=Scale.SHORT or scale=Scale.LONG instead", + DeprecationWarning, + stacklevel=2, + ) + scale = Scale.SHORT if short_scale else Scale.LONG
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
ovos_number_parser/__init__.py(17 hunks)ovos_number_parser/numbers_mwl.py(1 hunks)ovos_number_parser/numbers_pt.py(9 hunks)ovos_number_parser/util.py(10 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (13)
Scale(38-45)GrammaticalGender(48-54)DigitPronunciation(57-59)NumberVocabulary(264-362)RomanceNumberExtractor(365-877)pronounce_ordinal(716-775)is_fractional(382-397)is_ordinal(371-380)text(87-91)extract_number(399-494)pronounce_number(552-677)numbers_to_digits(496-550)pronounce_fraction(679-714)ovos_number_parser/__init__.py (7)
pronounce_ordinal(198-245)is_fractional(318-379)is_ordinal(382-406)extract_number(248-315)pronounce_number(84-170)numbers_to_digits(36-81)pronounce_fraction(173-195)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (7)
is_ordinal(382-406)is_fractional(318-379)extract_number(248-315)numbers_to_digits(36-81)pronounce_number(84-170)pronounce_ordinal(198-245)pronounce_fraction(173-195)
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_mwl.py
392-392: f-string without any placeholders
Remove extraneous f prefix
(F541)
393-393: f-string without any placeholders
Remove extraneous f prefix
(F541)
394-394: f-string without any placeholders
Remove extraneous f prefix
(F541)
395-395: f-string without any placeholders
Remove extraneous f prefix
(F541)
ovos_number_parser/util.py
231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
304-304: Unused lambda argument: gender
(ARG005)
575-575: Avoid specifying long messages outside the exception class
(TRY003)
609-609: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
737-737: Avoid specifying long messages outside the exception class
(TRY003)
779-779: Unused method argument: gender
(ARG002)
794-794: Avoid specifying long messages outside the exception class
(TRY003)
845-845: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (12)
ovos_number_parser/util.py (2)
231-261: LGTM with minor note on default argument.The static analysis flags
range(1, 21)in the default argument. Whilerangeis immutable (unlike lists), evaluating it at function definition time is unconventional. The existingif not denominatorsguard handles this correctly. No change required.
263-307: Well-structured vocabulary configuration class.The
NumberVocabularydataclass provides a clean, declarative way to define language-specific number parsing rules. The static analysis warning about the unusedgenderparameter in the defaultswap_genderlambda (line 304) is expected—it's a no-op placeholder that language-specific implementations override.ovos_number_parser/numbers_mwl.py (3)
17-21: TODO indicates uncertainty about gender transformation rules.The TODO comment suggests the suffix-based gender transformation rules may need linguistic verification for Mirandese. Consider validating these rules with a Mirandese language reference or native speaker.
42-225: Well-structured Mirandese vocabulary configuration.The
NumberVocabularydefinition for Mirandese is comprehensive, covering units, tens, hundreds, fractions, ordinals, and scale words for both short and long scales. The use ofswap_gender_mwlandpluralize_mwlfor morphological transformations aligns with the vocabulary-driven architecture.
233-340: Good deprecation pattern for backward compatibility.The deprecated wrapper functions properly emit
DeprecationWarningwithstacklevel=2(pointing to the caller) and delegate to the newMWLinstance methods. This maintains backward compatibility while guiding users toward the new API.ovos_number_parser/__init__.py (2)
1-33: Clean import structure for new language support.The imports are well-organized, bringing in the new
AST,MWL,PT_PT, andPT_BRextractors alongside the existing language-specific modules.
53-54: AST (Asturian) language routing follows established patterns.The new Asturian language support is correctly integrated across all dispatch functions (
numbers_to_digits,pronounce_number,pronounce_fraction,pronounce_ordinal,extract_number,is_fractional,is_ordinal), following the same delegation pattern used for MWL and PT variants.Also applies to: 124-125
ovos_number_parser/numbers_pt.py (5)
19-47: Well-implemented gender transformation for Portuguese.The function handles the common irregular case (
dois/duas) explicitly and applies general suffix rules for other words. The pattern is consistent with the Mirandese implementation.
58-266: Comprehensive European Portuguese vocabulary configuration.The
_PT_PTvocabulary is well-structured with appropriate defaults (DEFAULT_SCALE=Scale.LONGfor European convention). The Wikipedia reference in the comment provides good documentation provenance.
268-358: Clean Brazilian Portuguese variant with appropriate overrides.The
_PT_BRvocabulary correctly inherits from_PT_PTand overrides only the differences: spelling variations (dezesseis vs dezasseis), scale word spellings (-lhão vs -lião), andDEFAULT_SCALE=Scale.SHORT(reflecting Brazilian usage). TheDIGIT_SPELLINGSentry for "meia" (colloquial for 6) is a nice regional touch.
363-507: Proper deprecation wrappers maintain backward compatibility.The deprecated functions correctly emit
DeprecationWarningwithstacklevel=2and delegate to the appropriatePT_PTorPT_BRextractor based on thevariantparameter. This allows existing code to continue working while guiding users toward the new API.
575-584: Good addition of edge case tests.The test block now includes edge cases like
1/0and0/0for fraction pronunciation, which helps verify theDIVIDED_BY_ZEROhandling works correctly.
| for scale_val, w in scales_map.items(): | ||
| if token == w or token == self.vocab.pluralize(w): | ||
| current_number = current_number or 0 | ||
| result = result or 0 | ||
| if current_number is None: | ||
| current_number = 1 | ||
| result += current_number * scale_val | ||
| current_number = None | ||
| found_scale = True | ||
| number_consumed = True | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code: if current_number is None is unreachable.
On line 463, current_number = current_number or 0 ensures current_number is never None at this point. Therefore, the condition if current_number is None on line 465 can never be True, making current_number = 1 unreachable.
If the intent is to default to 1 when no preceding number is specified (e.g., "million" → 1,000,000), the logic should be:
for scale_val, w in scales_map.items():
if token == w or token == self.vocab.pluralize(w):
- current_number = current_number or 0
- result = result or 0
if current_number is None:
current_number = 1
+ result = result or 0
result += current_number * scale_val📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for scale_val, w in scales_map.items(): | |
| if token == w or token == self.vocab.pluralize(w): | |
| current_number = current_number or 0 | |
| result = result or 0 | |
| if current_number is None: | |
| current_number = 1 | |
| result += current_number * scale_val | |
| current_number = None | |
| found_scale = True | |
| number_consumed = True | |
| break | |
| for scale_val, w in scales_map.items(): | |
| if token == w or token == self.vocab.pluralize(w): | |
| if current_number is None: | |
| current_number = 1 | |
| result = result or 0 | |
| result += current_number * scale_val | |
| current_number = None | |
| found_scale = True | |
| number_consumed = True | |
| break |
🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 461-471, the code sets current_number
= current_number or 0 which makes the subsequent check if current_number is None
unreachable; to fix, stop forcing current_number to 0 before the scale check
(remove or move the "current_number = current_number or 0" assignment) and
instead only ensure result is initialized (result = result or 0), then keep the
if current_number is None branch to set current_number = 1 when a scale word
appears without a preceding number; ensure number_consumed/found_scale handling
remains the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/test_number_parser_mwl.py (2)
28-33: Consider removing or enabling the commented-out composite number tests.The commented-out assertions for composite numbers (21, 35, 99) suggest these were originally intended but not completed. Since composite numbers are formed dynamically by
RomanceNumberExtractorrather than stored inTENS, these tests may not be applicable to the vocabulary dictionary. Consider removing the dead code to avoid confusion.
127-135: Clean up commented-out ordinal assertions.The commented-out tests for
décimo(10th ordinal) with the@unittest.skipdecorator syntax in comments are confusing. Either enable them with proper@unittest.skipdecorators or remove them. The current mixed approach (decorator syntax inside comments) doesn't serve any purpose.def test_basic_ordinals_masculine(self): self.assertEqual(MWL.pronounce_ordinal(1), "purmerio") self.assertEqual(MWL.pronounce_ordinal(2), "segundo") self.assertEqual(MWL.pronounce_ordinal(3), "terceiro") - # @unittest.skip("TODO - fix me") - #self.assertEqual(MWL.pronounce_ordinal(10), "décimo") def test_basic_ordinals_feminine(self): self.assertEqual(MWL.pronounce_ordinal(1, GrammaticalGender.FEMININE), "purmeria") self.assertEqual(MWL.pronounce_ordinal(2, GrammaticalGender.FEMININE), "segunda") self.assertEqual(MWL.pronounce_ordinal(3, GrammaticalGender.FEMININE), "terceira") - #@unittest.skip("TODO - fix me") - #self.assertEqual(MWL.pronounce_ordinal(10, GrammaticalGender.FEMININE), "décima")ovos_number_parser/numbers_mwl.py (1)
392-395: Remove extraneous f-string prefixes.These print statements use f-string prefixes but the values are passed via comma rather than interpolation, making the
fprefix unnecessary and potentially confusing.- print(f"'bint'i un' ->", MWL.extract_number("bint'i un")) - print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa")) - print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous")) - print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues")) + print("'bint'i un' ->", MWL.extract_number("bint'i un")) + print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa")) + print("'bint'i dous' ->", MWL.extract_number("bint'i dous")) + print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ovos_number_parser/numbers_mwl.py(1 hunks)tests/test_number_parser_mwl.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_number_parser_mwl.py (1)
ovos_number_parser/util.py (2)
GrammaticalGender(48-54)text(87-91)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (13)
Scale(38-45)GrammaticalGender(48-54)DigitPronunciation(57-59)NumberVocabulary(264-362)RomanceNumberExtractor(365-877)pronounce_ordinal(716-775)is_fractional(382-397)is_ordinal(371-380)text(87-91)extract_number(399-494)pronounce_number(552-677)numbers_to_digits(496-550)pronounce_fraction(679-714)ovos_number_parser/__init__.py (7)
pronounce_ordinal(198-245)is_fractional(318-379)is_ordinal(382-406)extract_number(248-315)pronounce_number(84-170)numbers_to_digits(36-81)pronounce_fraction(173-195)
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_mwl.py
392-392: f-string without any placeholders
Remove extraneous f prefix
(F541)
393-393: f-string without any placeholders
Remove extraneous f prefix
(F541)
394-394: f-string without any placeholders
Remove extraneous f prefix
(F541)
395-395: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (10)
tests/test_number_parser_mwl.py (5)
1-5: LGTM! Clean imports and appropriate test dependencies.The imports are minimal and correctly target the MWL extractor and GrammaticalGender enum needed for the tests.
53-114: Good test coverage for cardinal and fraction pronunciation.The tests comprehensively cover basic cardinals, hundreds, thousands, large numbers, decimals, negatives, and fractions. The skipped gendered cardinals test with
@unittest.skip("TODO - fix me")appropriately marks known limitations for future work.
154-186: LGTM! Comprehensive extraction tests with good edge case coverage.The extraction tests appropriately cover cardinals (including gendered variants like
ũa/dues), decimals (including negatives), ordinals (both genders), phrase extraction, and failure cases. TheassertFalseusage for unrecognized input is correct.
192-209: LGTM! Basic digits conversion tests with appropriate skip markers.The basic conversion tests cover essential cases. The skipped decimal and negative conversion tests are appropriately marked for future implementation.
226-240: Verify the expected pronunciation for the large number test.The expected text at line 231 for
1_234_567_899may have an inconsistency. The pronunciation "mil duzientos i trinta i quatro milhones quinhentos i sessenta i siete mil uitocientos i nobenta i nuobe" appears to describe a number structure that may not match exactly. In long scale, 1,234,567,899 would be interpreted differently than in short scale. Ensure the expected output aligns with theScale.LONGdefault of MWL.ovos_number_parser/numbers_mwl.py (5)
1-4: LGTM! Imports are clean and consolidated.The duplicate import issue from the previous review has been addressed.
33-39: LGTM! Basic pluralization logic with acknowledged TODO.The pluralization rules handle
-on→-onesand default-ssuffixing. The TODO comment appropriately flags uncertainty for future refinement.
42-225: Well-structured vocabulary configuration for Mirandese.The
_MWLvocabulary is comprehensive, covering units, tens, hundreds, fractions, ordinals, and scale-specific number words. The configuration correctly setsDEFAULT_SCALE=Scale.LONGfor Mirandese and includes alternative spellings for extraction flexibility.
227-227: LGTM! Clean extractor instantiation.The
MWLobject is correctly instantiated as aRomanceNumberExtractorwith the_MWLvocabulary configuration.
229-340: LGTM! Well-implemented deprecation wrappers.The deprecated functions correctly:
- Emit
DeprecationWarningwithstacklevel=2to point to the caller- Provide clear migration guidance
- Fall back to
MWL.vocab.DEFAULT_SCALEwhen scale isNone- Delegate to the new
MWL.*methods
| elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): | ||
| return word[:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug in masculine gender swap for words ending in -ma.
The logic for converting feminine -ma endings to masculine removes only the final character (word[:-1]), resulting in a word ending in m rather than the expected -o ending. For example, "décima" would become "décim" instead of "décimo".
Apply this diff to fix the gender swap:
elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
- return word[:-1]
+ return word[:-2] + 'o'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): | |
| return word[:-1] | |
| elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): | |
| return word[:-2] + 'o' |
🤖 Prompt for AI Agents
In ovos_number_parser/numbers_mwl.py around lines 20-21, the masculine gender
branch for words ending in 'ma' currently strips only the final character
(word[:-1]) which produces a trailing 'm' (e.g., "décima" -> "décim"); change
the transformation to replace the final 'a' with 'o' (for example return
word[:-1] + 'o') so "décima" becomes "décimo".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (1)
ovos_number_parser/util.py (1)
433-498: Scale words mis-parsed; exact-scale phrases and large numbers are wrong.There are several coupled issues in
extract_number:
Scale words are handled as plain cardinals before the scale logic.
Becausenumbers_mapincludes entries for scale words (e.g.,"millón","mil"), tokens like"millón"or"mil"hit theval is not Nonebranch and are added directly tocurrent_number, so"un millón"becomes 1 000 001 instead of 1 000 000, and"dous millóns cincocentos"doesn’t aggregate as expected.The
current_number is Nonebranch is unreachable.
Inside the scale loop, you do:current_number = current_number or 0 ... if current_number is None: current_number = 1After
current_number = current_number or 0,current_numberis neverNone, so thatifnever fires. Bare scale words like"millón"can’t default to 1.Trailing cardinals after a scale are dropped once
number_consumedis set.
After hitting a scale,number_consumedis set toTrue, and the final fold:if not number_consumed and current_number: result = result or 0 result += current_numberwill skip any remaining
current_number. This is the classic “scaled part + leftover hundreds” problem.Given how central this is to GL and AST (and other Romance languages), this is serious for correctness.
A more robust structure would be:
- Detect scale words before
val is not Nonehandling.- Allow
current_numberto beNoneto mean “implicit 1”.- Always fold any leftover
current_numberintoresultat the end if any numeric content was seen.Sketch (condensed for clarity):
- val = numbers_map.get(token) - if val is not None: + val = numbers_map.get(token) + + # 1) Handle scale words first + scale_val = None + for s_val, w in scales_map.items(): + if token == w or token == self.vocab.pluralize(w): + scale_val = s_val + break + + if scale_val is not None: + # Implicit "one million" when no prefix + if current_number is None: + current_number = 1 + result = (result or 0) + current_number * scale_val + current_number = None + number_consumed = True + + # 2) Plain cardinal tokens + elif val is not None: current_number = current_number or 0 result = result or 0 if prev_token in self.vocab.NEGATIVE_SIGN: is_negative = True @@ - else: - # Handle large scales like milhão, bilhão - found_scale = False - for scale_val, w in scales_map.items(): - if token == w or token == self.vocab.pluralize(w): - current_number = current_number or 0 - result = result or 0 - if current_number is None: - current_number = 1 - result += current_number * scale_val - current_number = None - found_scale = True - number_consumed = True - break - - # Handle decimal numbers - if not found_scale: + else: + # Handle decimal numbersAnd at the end:
- if not number_consumed and current_number: - result = result or 0 - result += current_number + if current_number is not None: + # Fold any trailing cardinal/ordinal component + result = (result or 0) + current_number + number_consumed = TrueCombined with the fraction fix, this should give sensible results for phrases like
"un millón","dous millóns cincocentos", and"tres cuartos".
🧹 Nitpick comments (5)
ovos_number_parser/util.py (3)
231-261: Avoid callable defaultrange(...)in function signature.
convert_to_mixed_fractionusesdenominators=range(1, 21)as a default. Linters flag this (B008) because defaults are evaluated at import time; whilerangeis immutable and safe here, it’s still better to follow the pattern.Consider:
-def convert_to_mixed_fraction(number: Union[int, float], denominators=range(1, 21)) -> Optional[Tuple[int, int, int]]: +def convert_to_mixed_fraction(number: Union[int, float], denominators=None) -> Optional[Tuple[int, int, int]]: @@ - if not denominators: - denominators = range(1, 21) + if denominators is None: + denominators = range(1, 21)
304-306: Unusedgenderargument in defaultswap_genderlambda.The default
swap_genderignores itsgenderparameter, which triggers ARG005 and can confuse readers.If you want a truly no-op default but still keep the signature, you can either:
- swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, gender: word + swap_gender: Callable[[str, GrammaticalGender], str] = staticmethod(lambda word, _gender: word)or define a small named function above and use
_genderin its parameter list.
610-611: Rename unused loop variable to_for clarity.In the decimal pronunciation loop:
for i in range(n_zeros): decimal_pronunciation_parts.append(self.vocab.UNITS[0])
iis unused and triggers B007. Renaming it to_better conveys intent and quiets linters:- for i in range(n_zeros): + for _ in range(n_zeros): decimal_pronunciation_parts.append(self.vocab.UNITS[0])ovos_number_parser/numbers_mwl.py (1)
401-404:__main__demo: f-strings without placeholders.Lines like:
print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))don’t interpolate anything; linters (F541) suggest dropping the
fprefix:-print(f"'bint'i un' ->", MWL.extract_number("bint'i un")) +print("'bint'i un' ->", MWL.extract_number("bint'i un"))Same for the following similar prints.
ovos_number_parser/numbers_gl.py (1)
104-114: Hundreds vocabulary needs feminine forms once_pronounce_up_to_999usesswap_gender.With the proposed change in
_pronounce_up_to_999(wrapping exact hundreds withswap_gender),HUNDREDSwill be interpreted as the masculine base. For GL:HUNDREDS={ ... }and
GENDERED_SPELLINGSonly overrides units1and2.After
_pronounce_up_to_999starts callingswap_genderfor exact hundreds, you should ensureswap_gender_gl(for single words) correctly turns"douscentos"into"douscentas", etc. If that’s hard to express generically, you can also add explicit entries toGENDERED_SPELLINGS[GrammaticalGender.FEMININE]for 200, 300, … 900 so they bypass the generic path entirely.As is, tests such as:
GL.pronounce_number(200, gender=GrammaticalGender.FEMININE)will continue to return
"douscentos"until eitherswap_gender_glorGENDERED_SPELLINGSencode the correct feminine hundreds.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
ovos_number_parser/numbers_ast.py(1 hunks)ovos_number_parser/numbers_gl.py(1 hunks)ovos_number_parser/numbers_mwl.py(1 hunks)ovos_number_parser/util.py(10 hunks)tests/test_number_parser_ast.py(1 hunks)tests/test_number_parser_gl.py(1 hunks)tests/test_number_parser_mwl.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_number_parser_mwl.py
🧰 Additional context used
🧬 Code graph analysis (3)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (6)
is_ordinal(382-406)is_fractional(318-379)extract_number(248-315)numbers_to_digits(36-81)pronounce_number(84-170)pronounce_ordinal(198-245)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (13)
Scale(38-45)GrammaticalGender(48-54)DigitPronunciation(57-59)NumberVocabulary(264-366)RomanceNumberExtractor(369-879)pronounce_ordinal(716-775)is_fractional(386-401)is_ordinal(375-384)text(87-91)extract_number(403-498)pronounce_number(556-677)numbers_to_digits(500-554)pronounce_fraction(679-714)ovos_number_parser/__init__.py (7)
pronounce_ordinal(198-245)is_fractional(318-379)is_ordinal(382-406)extract_number(248-315)pronounce_number(84-170)numbers_to_digits(36-81)pronounce_fraction(173-195)
tests/test_number_parser_ast.py (2)
ovos_number_parser/util.py (2)
GrammaticalGender(48-54)pronounce_number(556-677)ovos_number_parser/__init__.py (1)
pronounce_number(84-170)
🪛 GitHub Actions: Run Unit Tests
tests/test_number_parser_gl.py
[error] 23-23: GL composite numbers gender feminine: expected 'vinte e dúas' but got 'vinte e dous'.
[error] 34-34: GL hundreds: feminine form 'douscentas' expected, but got 'douscentos'.
[error] 55-55: GL long scale: expected 'mil millóns' but got 'mil mil mil'.
[error] 47-47: GL millions: expected 'un millón' but got 'mil mil'.
[error] 80-80: GL ordinals: expected 'millonésimo' for 1,000,000 but got 'mil milésimo'.
tests/test_number_parser_ast.py
[error] 117-117: AST fraction extraction failed: expected 0.75 but got 3.25.
[error] 189-189: Integration: large numbers extraction mismatch: 1234567 vs 1004797.
🪛 Ruff (0.14.7)
ovos_number_parser/util.py
231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
304-304: Unused lambda argument: gender
(ARG005)
579-579: Avoid specifying long messages outside the exception class
(TRY003)
610-610: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
737-737: Avoid specifying long messages outside the exception class
(TRY003)
794-794: Avoid specifying long messages outside the exception class
(TRY003)
847-847: Avoid specifying long messages outside the exception class
(TRY003)
ovos_number_parser/numbers_mwl.py
401-401: f-string without any placeholders
Remove extraneous f prefix
(F541)
402-402: f-string without any placeholders
Remove extraneous f prefix
(F541)
403-403: f-string without any placeholders
Remove extraneous f prefix
(F541)
404-404: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/test_number_parser_ast.py
21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
🔇 Additional comments (7)
tests/test_number_parser_gl.py (2)
19-82: GL tests clearly expose current backend bugs; keep them as-is.The failing expectations here:
"vinte e dúas"vs"vinte e dous"(Line 23),"douscentas"vs"douscentos"(Line 34),"un millón"/"mil millóns"/"millonésimo"vs various"mil …"outputs,are consistent with the issues in
RomanceNumberExtractor(scale selection and lack of gender handling for exact hundreds) andswap_gender_gl. The test data looks correct and should be treated as the source of truth; the fixes belong inovos_number_parser/util.pyandovos_number_parser/numbers_gl.py, not in these tests.
116-147: Extraction and numbers-to-digits tests for GL align with intended semantics.The extraction expectations for:
"dous millóns cincocentos"→2_000_500,"mil vinte e tres"→1023,"trinta e cinco coma catro"→35.4,and
numbers_to_digitsphrases (e.g.,'atopamos dous millóns cincocentos insectos' -> 'atopamos 2000500 insectos') are exactly what the generic extractor is supposed to support. Current failures in these areas are due to extractor logic, not the tests.No changes requested here; once the
extract_numberscale/fraction fixes are applied, these tests should start passing.ovos_number_parser/numbers_ast.py (1)
4-221: AST vocabulary and helpers look consistent; main issues are in shared extractor.The Asturian-specific pieces here—
swap_gender_ast,pluralize_ast,_AST’s UNITS/TENS/HUNDREDS/FRACTION/ORDINAL maps, and theAST = RomanceNumberExtractor(_AST)wiring—are coherent and match what the AST tests exercise.The pipeline failures (fraction extraction of
"tres cuartos"and large-number integration) stem from the sharedRomanceNumberExtractorimplementation (scale and fraction handling), which is already flagged inutil.py. No changes needed in this file once the shared logic is fixed.tests/test_number_parser_ast.py (2)
73-90: Fraction extraction test correctly exposes “integer + fraction” vs “numerator/denominator” bug.
TestFractionsAst.test_fraction_pronounceandTestExtractAST.test_fraction_phraseexpect:
AST.is_fractional("mediu")→0.5,AST.extract_number("tres cuartos")→0.75.Current implementation returns
3.25for the latter because it always doescurrent_number + fraction. This test is doing the right thing by enforcing3/4semantics for “tres cuartos”; the fix belongs inRomanceNumberExtractor.extract_number(see earlier comment), not in the test.
179-189: Large-number round-trip test is valid and highlights scale-selection bug.
TestIntegrationAST.test_large_numbersexpectsAST.extract_number(AST.pronounce_number(1_234_567)) == 1_234_567. The current mismatch (e.g.,1_004_797) is consistent with the sharedpronounce_numberchoosing the smallest scale instead of the largest plusextract_number’s imperfect scale handling.The test is appropriate; fixes should be applied in
RomanceNumberExtractor.pronounce_numberandextract_numberas already outlined.ovos_number_parser/numbers_gl.py (2)
322-429: Deprecated GL wrappers correctly delegate but share the same scale/gender issues.The deprecated functions (
pronounce_ordinal_gl,is_fractional_gl,is_ordinal_gl,extract_number_gl,pronounce_number_gl,numbers_to_digits_gl,pronounce_fraction_gl) are thin adapters overGLwith a deprecation warning, which is good.Just be aware:
- They inherit all behavior from
RomanceNumberExtractorand_GL, including the scale and gender bugs already discussed.- Once you fix the shared logic in
util.pyand the GL-specificswap_gender_gl/ vocab, these wrappers should start passing old tests too.No structural changes recommended here beyond those shared fixes.
432-542:__main__GL demo matches intended semantics but will currently show wrong outputs until core fixes land.The examples printed here (e.g.,
1_234_567→"un millón douscentos trinta e catro mil cincocentos sesenta e sete",1_000_000_000→"mil millóns",- ordinals like
1_000_000→"millonésimo","dous millóns cincocentos"extraction →2_000_500,) align with the expectations encoded in
tests/test_number_parser_gl.py.Right now, users running this script will see the same mispronunciations/extraction errors flagged in the tests (due to the shared scale/gender issues). Once
RomanceNumberExtractorandswap_gender_glare corrected, this demo script should output the intended strings; there’s nothing to change here structurally.
| def swap_gender_gl(word: str, gender: GrammaticalGender) -> str: | ||
| """ | ||
| This function takes the given text and checks if it is a fraction. | ||
| Convert a Portuguese word between masculine and feminine grammatical gender by adjusting its ending. | ||
| Args: | ||
| text (str): the string to check if fractional | ||
| Parameters: | ||
| word (str): The word to convert. | ||
| gender (GrammaticalGender): The target grammatical gender. | ||
| short_scale (bool): use short scale if True, long scale if False | ||
| Returns: | ||
| (bool) or (float): False if not a fraction, otherwise the fraction | ||
| str: The word with its ending swapped to match the specified gender, if applicable; otherwise, the original word. | ||
| """ | ||
| if input_str.endswith('s', -1): | ||
| input_str = input_str[:len(input_str) - 1] # e.g. "fifths" | ||
|
|
||
| aFrac = {"medio": 2, "media": 2, "terzo": 3, "cuarto": 4, | ||
| "cuarta": 4, "quinto": 5, "quinta": 5, "sexto": 6, "sexta": 6, | ||
| "séptimo": 7, "séptima": 7, "oitavo": 8, "oitava": 8, | ||
| "noveno": 9, "novena": 9, "décimo": 10, "décima": 10, | ||
| "onceavo": 11, "onceava": 11, "doceavo": 12, "doceava": 12} | ||
|
|
||
| if input_str.lower() in aFrac: | ||
| return 1.0 / aFrac[input_str] | ||
| if (input_str == "vixésimo" or input_str == "vixésima"): | ||
| return 1.0 / 20 | ||
| if (input_str == "trixésimo" or input_str == "trixésima"): | ||
| return 1.0 / 30 | ||
| if (input_str == "centésimo" or input_str == "centésima"): | ||
| return 1.0 / 100 | ||
| if (input_str == "milésimo" or input_str == "milésima"): | ||
| return 1.0 / 1000 | ||
| return False | ||
|
|
||
|
|
||
| def extract_number_gl(text, short_scale=True, ordinals=False): | ||
| if word == "dois" and gender == GrammaticalGender.FEMININE: | ||
| return "duas" | ||
| elif word == "duas" and gender == GrammaticalGender.MASCULINE: | ||
| return "dois" | ||
|
|
||
| elif gender == GrammaticalGender.FEMININE and word.endswith('o'): | ||
| return word[:-1] + 'a' | ||
| elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): | ||
| return word[:-1] | ||
| elif gender == GrammaticalGender.MASCULINE and word.endswith('a'): | ||
| return word[:-1] + 'o' | ||
| elif gender == GrammaticalGender.FEMININE and word.endswith('os'): | ||
| return word[:-2] + 'as' | ||
| elif gender == GrammaticalGender.MASCULINE and word.endswith('as'): | ||
| return word[:-2] + 'os' | ||
| elif gender == GrammaticalGender.FEMININE and word.endswith('m'): | ||
| return word + 'a' | ||
| return word |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swap_gender_gl is still Portuguese-flavored and doesn’t handle Galician composites.
A few points here:
- The docstring still says “Convert a Portuguese word…”, but this module is Galician-specific.
- Special-casing
"dois"/"duas"is Portuguese; Galician UNITS use"dous"/"dúas", so that block never triggers. swap_gender_gloperates on the entire phrase ("vinte e dous"), but only checks suffixes like'o','os','a','as','m'. For"vinte e dous"(ending in'us'), none of the conditions fire, so feminineGL.pronounce_number(22, gender=FEMININE)stays"vinte e dous"instead of"vinte e dúas"as the tests expect.
Given the shared _pronounce_up_to_999 already builds tens+unit phrases and then calls swap_gender, a pragmatic fix here is:
- Make
swap_gender_glrecognise"dous"/"dúas". - When given a multi-word phrase, apply gender swap only to the last token, not the whole string.
For example:
def swap_gender_gl(word: str, gender: GrammaticalGender) -> str:
- if word == "dois" and gender == GrammaticalGender.FEMININE:
- return "duas"
- elif word == "duas" and gender == GrammaticalGender.MASCULINE:
- return "dois"
+ # Handle multi-word phrases by swapping only the last token
+ if " " in word:
+ parts = word.split()
+ parts[-1] = swap_gender_gl(parts[-1], gender)
+ return " ".join(parts)
+
+ if word == "dous" and gender == GrammaticalGender.FEMININE:
+ return "dúas"
+ elif word == "dúas" and gender == GrammaticalGender.MASCULINE:
+ return "dous"
@@
- elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
- return word[:-1]
+ # …keep or adjust the generic suffix rules as appropriate for GL…and update the docstring to say “Galician” instead of “Portuguese”.
Combined with the _pronounce_up_to_999 hundreds fix in util.py, this should give you the expected "vinte e dúas" and other gendered forms.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In ovos_number_parser/numbers_gl.py around lines 11 to 39, the function
swap_gender_gl still refers to Portuguese in the docstring and special-cases
"dois"/"duas", and it tries to swap gender on the whole phrase instead of only
the final token; this misses Galician forms like "dous"/"dúas" and multi-word
phrases like "vinte e dous". Update the docstring to say "Galician", add
recognition for Galician unit pair "dous" -> "dúas" (and the reverse), and
change the logic so when the input contains multiple words you split by
whitespace, apply the gender-swapping rules only to the last token (handling
endings 'o','a','os','as','m' and the new "dous"/"dúas" case), then rejoin and
return the full phrase.
| def swap_gender_mwl(word: str, gender: GrammaticalGender) -> str: | ||
| if word.endswith("un") and gender == GrammaticalGender.FEMININE: | ||
| return word[:-2] + "ũa" | ||
| elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE: | ||
| return word[:-2] + "ũa" | ||
| elif word == "dous" and gender == GrammaticalGender.FEMININE: | ||
| return "dues" | ||
| elif word == "dues" and gender == GrammaticalGender.MASCULINE: | ||
| return "dous" | ||
|
|
||
| # TODO - is this correct? | ||
| elif gender == GrammaticalGender.FEMININE and word.endswith('o'): | ||
| return word[:-1] + 'a' | ||
| elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): | ||
| return word[:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masculine conversion for words ending in ũa looks wrong.
Current logic:
if word.endswith("un") and gender == GrammaticalGender.FEMININE:
return word[:-2] + "ũa"
elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
return word[:-2] + "ũa"The second branch effectively returns the same "…ũa" ending for masculine by chopping two chars and re‑adding "ũa". For example, "dũa" → "dũa". That doesn’t actually switch gender and is likely a copy‑paste mistake.
You probably meant to restore the masculine "un" form:
- elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
- return word[:-2] + "ũa"
+ elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
+ return word[:-2] + "un"Please double-check with MWL examples, but as written this branch can’t change the word’s gender.
| def is_ordinal(self, input_str: str, scale: Optional[Scale] = None) -> bool: | ||
| """ | ||
| Determine if a string is a Portuguese ordinal word. | ||
| Returns: | ||
| bool: True if the input string is recognized as a Portuguese ordinal, otherwise False. | ||
| """ | ||
| scale = scale or self.vocab.DEFAULT_SCALE | ||
| ordinals_map = self.vocab.get_ordinal_strings(scale) | ||
| return input_str in ordinals_map | ||
|
|
||
| def is_fractional(self, input_str: str) -> Union[float, bool]: | ||
| """ | ||
| Checks if the input string corresponds to a recognized Portuguese fractional word. | ||
| Returns: | ||
| The fractional value as a float if recognized; otherwise, False. | ||
| """ | ||
| fractions_map = self.vocab.get_fraction_strings() | ||
| input_str = input_str.lower().strip() | ||
|
|
||
| # handle fraction denominator strings | ||
| for word, den in fractions_map.items(): | ||
| if input_str == word: | ||
| return 1.0 / den | ||
|
|
||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fraction aggregation treats “three quarters” as 3.25 instead of 0.75.
In extract_number, the fractional branch always does:
current_number = current_number or 0
result = result or 0
fraction = self.is_fractional(token)
result += current_number + fractionFor AST, "tres cuartos" becomes 3 + 1/4 = 3.25 instead of 3/4 = 0.75, matching the failing test in tests/test_number_parser_ast.py (Line 117). Similar patterns will show up for “tres terzos”, etc.
You likely want to treat a bare “ ” phrase as N * (1/den) when there is no prior integer part, while still supporting “integer + fraction” for mixed forms.
One possible approach:
- elif self.is_fractional(token):
- current_number = current_number or 0
- result = result or 0
- fraction = self.is_fractional(token)
- result += current_number + fraction
- current_number = None
- number_consumed = True
+ elif self.is_fractional(token):
+ fraction = self.is_fractional(token)
+ # Initialize result if needed
+ result = result or 0
+
+ # If we've already consumed a scale/decimal/fraction before,
+ # treat current_number as integer part.
+ if number_consumed or not current_number:
+ result += (current_number or 0) + fraction
+ else:
+ # Bare "tres cuartos" style: interpret as numerator/denominator
+ denom = round(1 / fraction)
+ result += current_number / denom
+
+ current_number = None
+ number_consumed = TrueYou might want to refine the conditions (e.g., based on token count or language specifics), but the current behavior is clearly wrong for the AST test case.
| if "." in str(number): | ||
| integer_part = int(number) | ||
| decimal_part_str = str(number).split('.')[1].rstrip("0") | ||
|
|
||
| # Handle cases where the decimal part rounds to zero | ||
| if decimal_part_str and int(decimal_part_str) == 0: | ||
| return self.pronounce_number(integer_part, places, | ||
| scale=scale, | ||
| digits=digits, gender=gender) | ||
|
|
||
| int_pronunciation = self.pronounce_number(integer_part, places, | ||
| scale=scale, | ||
| digits=digits, gender=gender) | ||
|
|
||
| decimal_pronunciation_parts = [] | ||
| # pronounce decimals either as a whole number or digit by digit | ||
| if decimal_part_str: | ||
| if digits == DigitPronunciation.FULL_NUMBER: | ||
|
|
||
| # handle leading zeros | ||
| no_z = decimal_part_str.lstrip("0") # without zeros | ||
| n_zeros = len(decimal_part_str) - len(no_z) | ||
| for i in range(n_zeros): | ||
| decimal_pronunciation_parts.append(self.vocab.UNITS[0]) | ||
|
|
||
| if n_zeros >= places: | ||
| # read all zeros | ||
| last_digit = int(decimal_part_str[n_zeros]) | ||
| after_last_digit = int(decimal_part_str[n_zeros + 1]) if len(decimal_part_str) > n_zeros + 1 else 0 | ||
| # round up last digit if needed | ||
| if after_last_digit >= 5: | ||
| last_digit += 1 | ||
| decimal_pronunciation_parts.append(self._pronounce_up_to_999(last_digit, gender=gender)) | ||
| else: | ||
| if len(decimal_part_str) > places: | ||
| last_digit = int(decimal_part_str[places - 1]) | ||
| after_last_digit = int(decimal_part_str[places]) | ||
| # round up last digit if needed | ||
| if after_last_digit >= 5: | ||
| last_digit += 1 | ||
| decimal_part_str = decimal_part_str[:places - 1] + str(last_digit) | ||
| decimal_pronunciation_parts.append(self.pronounce_number(int(decimal_part_str), gender=gender)) | ||
| else: | ||
| for digit in decimal_part_str: | ||
| decimal_pronunciation_parts.append(self.pronounce_number(int(digit), gender=gender)) | ||
|
|
||
| decimal_pronunciation = " ".join(decimal_pronunciation_parts) or self.vocab.UNITS[0] | ||
| decimal_word = self.vocab.DECIMAL_MARKER[0] | ||
| return f"{int_pronunciation} {decimal_word} {decimal_pronunciation}" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal pronunciation can crash on .0 values and has minor rounding edge cases.
In the decimal branch of pronounce_number:
- For numbers like
1.0or2.00,decimal_part_strbecomes""after.rstrip("0"). The subsequent logic doesn’t guard against an empty string and will eventually callint(decimal_part_str), which raisesValueError. - The
if n_zeros >= places:block indexesdecimal_part_str[n_zeros]without ensuringn_zeros < len(decimal_part_str).
A safer pattern:
if "." in str(number):
integer_part = int(number)
decimal_part_str = str(number).split('.')[1].rstrip("0")
- # Handle cases where the decimal part rounds to zero
- if decimal_part_str and int(decimal_part_str) == 0:
- return self.pronounce_number(integer_part, places,
- scale=scale,
- digits=digits, gender=gender)
+ # If nothing remains after stripping zeros, it's effectively an integer
+ if not decimal_part_str:
+ return self.pronounce_number(
+ integer_part, places, scale=scale, digits=digits, gender=gender
+ )and ensure any indexing into decimal_part_str[...] is only done when len(decimal_part_str) > index.
🧰 Tools
🪛 Ruff (0.14.7)
610-610: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
| # Find the smallest scale that fits the number | ||
| scale_candidates = [(scale_val, scale_str) | ||
| for scale_val, scale_str in scales_map.items() | ||
| if n >= scale_val] | ||
| scale_val: Tuple[int, str] = scale_candidates[0] | ||
|
|
||
| count = n // scale_val[0] | ||
| remainder = n % scale_val[0] | ||
|
|
||
| # Pronounce the 'count' part of the number | ||
| is_singular = count == 1 or scale_val[0] in self.vocab.NO_PLURAL | ||
| scale_word = scale_val[1] if is_singular else self.vocab.pluralize(scale_val[1]) | ||
| if count == 1 and scale_val[0] in self.vocab.NO_PREV_UNIT: | ||
| count_str = scale_word | ||
| else: | ||
| count_pronunciation = self.pronounce_number(count, places, scale) | ||
| count_str = f"{count_pronunciation} {scale_word}" | ||
|
|
||
| # If there's no remainder, we're done | ||
| if remainder == 0: | ||
| return count_str | ||
|
|
||
| # Pronounce the remainder and join with the correct conjunction | ||
| remainder_str = self.pronounce_number(remainder, places, scale) | ||
| # Conjunction logic: add JOIN_WORD if the remainder is the last group and is | ||
| # less than 100 or a multiple of 100. | ||
| join_word = self.vocab.JOIN_WORD[0] if len(self.vocab.JOIN_WORD) else "" | ||
| if remainder < 100 or (remainder < 1000 and remainder % 100 == 0): | ||
| return f"{count_str} {join_word} {remainder_str}" | ||
| else: | ||
| return f"{count_str} {remainder_str}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pronunciation of large numbers picks the smallest scale instead of the largest.
In pronounce_number, you compute:
scales_map = ...
scale_candidates = [(scale_val, scale_str)
for scale_val, scale_str in scales_map.items()
if n >= scale_val]
scale_val: Tuple[int, str] = scale_candidates[0]Because dictionaries are insertion-ordered, this picks the first matching scale, i.e., the smallest value ≥ 1 000. For GL, where LONG_SCALE starts with 1000: "mil" then 10**6: "millón", 1_000_000 is pronounced as "mil mil"; 1_000_000_000 becomes "mil mil mil". This exactly matches the GL test failures.
You want the largest scale ≤ n. Suggestion:
- scale_candidates = [(scale_val, scale_str)
- for scale_val, scale_str in scales_map.items()
- if n >= scale_val]
- scale_val: Tuple[int, str] = scale_candidates[0]
+ scale_candidates = [
+ (scale_val, scale_str)
+ for scale_val, scale_str in scales_map.items()
+ if n >= scale_val
+ ]
+ if not scale_candidates:
+ return self._pronounce_up_to_999(n, gender)
+ # choose the largest applicable scale
+ scale_val, scale_str = max(scale_candidates, key=lambda item: item[0])and then use scale_str instead of re-indexing scale_val[1] below.
The same bug exists in pronounce_ordinal (see comment there); both need to be fixed to get correct “un millón”, “mil millóns”, and “millonésimo”.
🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 647 to 677, the code picks the first
matching scale (smallest) from scale_candidates causing large numbers to use
repeated smaller scales; change the logic to select the largest scale <= n by
using max(scale_candidates, key=lambda t: t[0]) (or sort and take last) and
unpack it as (scale_val, scale_str) so subsequent code uses scale_str rather
than re-indexing scale_val[1]; apply the exact same fix to the corresponding
block in pronounce_ordinal so both functions choose the largest applicable scale
and use the proper scale_str variable.
| is_singular = count == 1 or scale_val[0] in self.vocab.NO_PLURAL | ||
| scale_word = scale_val[1] if is_singular else self.vocab.pluralize(scale_val[1]) | ||
| if count == 1 and scale_val[0] in self.vocab.NO_PREV_UNIT: | ||
| count_str = scale_word | ||
| else: | ||
| count_pronunciation = self.pronounce_number(count, places, scale) | ||
| count_str = f"{count_pronunciation} {scale_word}" | ||
|
|
||
| # If there's no remainder, we're done | ||
| if remainder == 0: | ||
| return count_str | ||
|
|
||
| # Pronounce the remainder and join with the correct conjunction | ||
| remainder_str = self.pronounce_number(remainder, places, scale) | ||
| # Conjunction logic: add JOIN_WORD if the remainder is the last group and is | ||
| # less than 100 or a multiple of 100. | ||
| join_word = self.vocab.JOIN_WORD[0] if len(self.vocab.JOIN_WORD) else "" | ||
| if remainder < 100 or (remainder < 1000 and remainder % 100 == 0): | ||
| return f"{count_str} {join_word} {remainder_str}" | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hundreds ignore grammatical gender for exact multiples (e.g., 200 feminine).
_pronounce_up_to_999 applies swap_gender only in the composite case and returns self.vocab.HUNDREDS[n] directly for exact multiples of 100:
if n in self.vocab.HUNDREDS:
return self.vocab.HUNDREDS[n]
...
return self.vocab.swap_gender(" ".join(parts), gender)This means GL.pronounce_number(200, gender=FEMININE) always yields "douscentos", never "douscentas", which is exactly what the GL tests report.
A generic fix that works for all Romance languages:
- if n in self.vocab.HUNDREDS:
- return self.vocab.HUNDREDS[n]
+ if n in self.vocab.HUNDREDS:
+ base = self.vocab.HUNDREDS[n]
+ return self.vocab.swap_gender(base, gender)Given other languages either have no gender variation for hundreds (in which case swap_gender will be a no-op) or can encode their rules in swap_gender, this change centralizes the behavior correctly.
🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 657 to 676, the function handling
hundreds returns self.vocab.HUNDREDS[n] directly for exact multiples of 100,
causing grammatical gender to be ignored (e.g., 200 always yields masculine
form); change the logic so the returned hundred word is passed through
self.vocab.swap_gender(..., gender) (i.e., apply swap_gender to the hundred-word
in the exact-multiple branch), and ensure the same swap_gender usage remains for
the composite case so all hundreds respect the requested gender across
languages.
| scales_map = self.vocab.ORDINAL_SHORT_SCALE if scale == Scale.SHORT else self.vocab.ORDINAL_LONG_SCALE | ||
|
|
||
| # Find the smallest scale that fits the number | ||
| scale_candidates = [(scale_val, scale_str) | ||
| for scale_val, scale_str in scales_map.items() | ||
| if n >= scale_val] | ||
| scale_val: Tuple[int, str] = scale_candidates[0] | ||
|
|
||
| count = n // scale_val[0] | ||
| remainder = n % scale_val[0] | ||
|
|
||
| # Special case for "milésimo" and other large scales where 'um' is not needed | ||
| if count == 1: | ||
| count_str = self.vocab.swap_gender(scale_val[1], gender) | ||
| else: | ||
| # Pronounce the 'count' part of the number and the scale word | ||
| count_pronunciation = self.pronounce_number(count, scale=scale) | ||
| scale_word = self.vocab.swap_gender(scale_val[1], gender) | ||
| count_str = f"{count_pronunciation} {scale_word}" | ||
|
|
||
| # If there's no remainder, we're done | ||
| if remainder == 0: | ||
| return count_str | ||
|
|
||
| # Pronounce the remainder and join | ||
| remainder_str = self.pronounce_ordinal(remainder, gender, scale) | ||
|
|
||
| return f"{count_str} {remainder_str}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordinal pronunciation has the same “smallest scale” bug as cardinal pronunciation.
pronounce_ordinal mirrors pronounce_number in its scale handling:
scale_candidates = [(scale_val, scale_str)
for scale_val, scale_str in scales_map.items()
if n >= scale_val]
scale_val: Tuple[int, str] = scale_candidates[0]For GL, 1_000_000 ordinal becomes "mil milésimo" instead of "millonésimo", exactly as seen in the GL tests.
Apply the same fix as for pronounce_number:
- scale_candidates = [(scale_val, scale_str)
- for scale_val, scale_str in scales_map.items()
- if n >= scale_val]
- scale_val: Tuple[int, str] = scale_candidates[0]
+ scale_candidates = [
+ (scale_val, scale_str)
+ for scale_val, scale_str in scales_map.items()
+ if n >= scale_val
+ ]
+ if not scale_candidates:
+ # 0–999 should already have been handled
+ raise ValueError("No ordinal scale found for number")
+ scale_val, scale_str = max(scale_candidates, key=lambda item: item[0])
@@
- if count == 1:
- count_str = self.vocab.swap_gender(scale_val[1], gender)
+ if count == 1:
+ count_str = self.vocab.swap_gender(scale_str, gender)
@@
- scale_word = self.vocab.swap_gender(scale_val[1], gender)
+ scale_word = self.vocab.swap_gender(scale_str, gender)🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 748 to 775, the ordinal pronunciation
picks the first matching scale candidate resulting in wrong outputs (e.g., "mil
milésimo" for 1_000_000); change the selection logic to choose the largest
scale_val <= n (same fix as pronounce_number): build scale_candidates as you
already do, then set scale_val = max(scale_candidates, key=lambda t: t[0]) (or
sort and take the last) so the biggest fitting scale is used; keep the rest of
the logic (count, remainder, gender swap, recursion) unchanged.
Summary by CodeRabbit
New Features
Documentation
Compatibility / Deprecations
Tests
✏️ Tip: You can customize this high-level summary in your review settings.