feat: asturian + shared RomanceNumberExtractor class #38

JarbasAl · 2025-12-03T00:38:46Z

Summary by CodeRabbit

New Features
- Full Asturian support plus dedicated Portuguese variants, Mirandese and Galician backends; richer vocabulary-driven parsing/pronunciation and exposed word tokenizer.
Documentation
- Supported languages table updated to include Asturian.
Compatibility / Deprecations
- Backwards-compatible wrappers with deprecation notices while APIs migrate to the new extractor/vocabulary model.
Tests
- Expanded test coverage for Asturian, Portuguese variants, Mirandese, Galician, tokenizer, pronunciation, extraction and digits conversion.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-03T00:38:54Z

Walkthrough

Adds Asturian support and refactors number parsing into a vocabulary-driven RomanceNumberExtractor with new vocabularies (AST, GL, MWL, PT_PT, PT_BR), updated util dataclasses/tokenization, package dispatch/signature changes, README language table update, and new/updated tests.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Adds `ast` (Asturian) to Supported Languages table with all capability columns marked ✅.
Package dispatch `ovos_number_parser/__init__.py`	Exports extended (`AST`, `PT_PT`, `PT_BR`, `MWL`); public APIs (numbers_to_digits, pronounce_number, pronounce_fraction, pronounce_ordinal, extract_number, is_fractional, is_ordinal) accept `scale: Optional[Scale]` (deprecate `short_scale`) and route to variant-specific RomanceNumberExtractor instances based on `lang`.
Asturian implementation `ovos_number_parser/numbers_ast.py`	New Asturian helpers (`swap_gender_ast`, `pluralize_ast`), `_AST` NumberVocabulary and `AST = RomanceNumberExtractor(_AST)` exposed for parsing/pronunciation/digits; demo `__main__` included.
Mirandese (MWL) `ovos_number_parser/numbers_mwl.py`	Migrated to `MWL = RomanceNumberExtractor(_MWL)` with `swap_gender_mwl`/`pluralize_mwl`; legacy top-level functions retained as deprecated wrappers delegating to `MWL`; scale params made Optional where applicable.
Portuguese refactor `ovos_number_parser/numbers_pt.py`	Replaced inline mappings with `NumberVocabulary` configs; added `PT_PT`/`PT_BR` extractors, `swap_gender_pt`/`pluralize_pt`; legacy functions now deprecation wrappers delegating to extractor methods; tests updated to use public vocab.
Galician implementation `ovos_number_parser/numbers_gl.py`	Added `swap_gender_gl`, `pluralize_gl`, `_GL` NumberVocabulary and `GL = RomanceNumberExtractor(_GL)` plus deprecated wrappers delegating to `GL`.
Core utilities refactor `ovos_number_parser/util.py`	Major redesign: `Token` → immutable dataclass, `ReplaceableNumber` dataclass, new `word_tokenize`, updated `tokenize`, typed helpers, `NumberVocabulary` data model, and `RomanceNumberExtractor` class centralizing is_ordinal/is_fractional, extract_number, numbers_to_digits, pronounce_* logic; public API surface expanded.
Tests — Asturian `tests/test_number_parser_ast.py`	New tests for AST vocab, pronunciations, ordinals, fractions, extraction, numbers-to-digits, and integration (some tests skipped/TODO).
Tests — Portuguese `tests/test_number_parser_pt.py`	Tests migrated to public vocab-based interfaces (`PT_PT`, `PT_BR`, `.vocab.*`) and extractor methods instead of private internals; updated expectations for variant and decimal behavior.
Tests — Mirandese `tests/test_number_parser_mwl.py`	New comprehensive tests covering MWL vocab, pronunciations, fractions, ordinals, extraction, digits, and integration (some skipped).
Tests — Galician `tests/test_number_parser_gl.py`	New comprehensive tests for GL pronunciation, fractions, extraction, digits, decimals, ordinals, and long-scale behavior.
Tests — Utilities `tests/test_util.py`	Adds/updates tests for `word_tokenize` exposure and tokenization behaviors (punctuation, hyphens, whitespace, edge cases).

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Package as ovos_number_parser.__init__
    participant Extractor as RomanceNumberExtractor (AST / PT_PT / PT_BR / MWL / GL)
    participant Vocab as NumberVocabulary

    Caller->>Package: call public API (e.g., pronounce_number(lang, ...))
    Package->>Extractor: select extractor by lang/variant
    Extractor->>Vocab: consult vocabulary (numbers, ordinals, fractions, scales, gender rules)
    Vocab-->>Extractor: provide mappings & join/decimal rules
    Extractor->>Extractor: run parsing/pronunciation algorithms (tokenize, extract, pronounce)
    Extractor-->>Package: return computed result
    Package-->>Caller: deliver final result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas needing extra attention:
- ovos_number_parser/util.py — NumberVocabulary, RomanceNumberExtractor logic, tokenization changes, immutability enforcement.
- ovos_number_parser/numbers_ast.py — language rules, gender/plural/ordinal correctness.
- ovos_number_parser/numbers_pt.py, numbers_mwl.py, numbers_gl.py — variant routing, deprecation wrappers, compatibility behavior.
- ovos_number_parser/init.py — dispatch logic and updated function signatures (scale/short_scale handling).
- Updated tests — ensure test expectations align with refactored public APIs and skipped/TODO markings.

"I hopped through words and rules today,
Asturian counts now find their way —
Fractions and ordinals softly sung,
Digits and genders neatly strung,
A rabbit's cheer: hop, parse, hooray! 🥕"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.45% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: adding Asturian language support and introducing a shared RomanceNumberExtractor class architecture used across multiple language implementations.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ast

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ovos_number_parser/__init__.py (1)
86-121: Fix pronounce_number dispatch for Asturian to honor ordinals, scale, digit style, and gender.

The AST branch currently calls:
if lang.startswith("ast"):
    return pronounce_number_ast(number, places)
This ignores:

short_scale (and the derived scale enum),

ordinals,

digits,

gender.

As a result, pronounce_number(..., lang="ast", ordinals=True, digits=DigitPronunciation.DIGIT_BY_DIGIT, gender=GrammaticalGender.FEMININE, short_scale=False) will silently behave like a simple cardinal with LONG scale and masculine gender.

pronounce_number_ast already supports scale, ordinals, digits, and gender, so we should thread these through:
@@ def pronounce_number(...):
-    if lang.startswith("ca"):
-        return pronounce_number_ca(number, places)
-    if lang.startswith("ast"):
-        return pronounce_number_ast(number, places)
+    if lang.startswith("ca"):
+        return pronounce_number_ca(number, places)
+    if lang.startswith("ast"):
+        return pronounce_number_ast(
+            number,
+            places,
+            scale=scale,
+            ordinals=ordinals,
+            digits=digits,
+            gender=gender,
+        )
This brings the AST branch in line with PT/MWL behavior and the documented API semantics.

🧹 Nitpick comments (3)

ovos_number_parser/numbers_ast.py (2)
123-136: Simplify _swap_gender_ast feminine branch; the endswith("imu") check is redundant.

Any word ending with "imu" also ends with "u", so the second condition is unreachable. You can drop it for clarity without changing behavior.
 def _swap_gender_ast(word: str, gender: GrammaticalGender) -> str:
@@
-    if gender == GrammaticalGender.FEMININE:
-        if word.endswith('u'):
-            return word[:-1] + 'a'
-        if word.endswith('imu'):
-            return word[:-1] + 'a'
+    if gender == GrammaticalGender.FEMININE:
+        if word.endswith('u'):
+            return word[:-1] + 'a'
316-334: Address Ruff B007: rename unused loop variable sv in is_ordinal_ast.

The loop over _ORDINAL_SCALES_AST[Scale.SHORT] doesn’t use the numeric scale value, only the name. Renaming the unused variable to _ makes that intent explicit and silences B007.
-    for sv, sname in _ORDINAL_SCALES_AST[Scale.SHORT]:
-        if tok == sname or tok == _swap_gender_ast(sname, GrammaticalGender.FEMININE):
+    for _, sname in _ORDINAL_SCALES_AST[Scale.SHORT]:
+        if tok == sname or tok == _swap_gender_ast(sname, GrammaticalGender.FEMININE):
             return True
ovos_number_parser/__init__.py (1)
38-56: Propagate scale into numbers_to_digits_ast for consistency and future-proofing.

numbers_to_digits accepts a scale argument, but the Asturian branch currently ignores it and always uses the default LONG scale, even though numbers_to_digits_ast exposes a scale parameter.

While SHORT and LONG scales share the same definitions today, passing the caller’s scale keeps behavior consistent with PT/MWL and avoids surprises if Asturian scales diverge later.
     if lang.startswith("az"):
         return numbers_to_digits_az(utterance)
     if lang.startswith("ast"):
-        return numbers_to_digits_ast(utterance)
+        return numbers_to_digits_ast(utterance, scale=scale)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e450831 and e89e084.

📒 Files selected for processing (3)

README.md (1 hunks)
ovos_number_parser/__init__.py (8 hunks)
ovos_number_parser/numbers_ast.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

ovos_number_parser/numbers_ast.py (1)

ovos_number_parser/util.py (4)

Scale (12-19)

GrammaticalGender (22-28)

DigitPronunciation (31-33)

text (62-66)

🪛 Ruff (0.14.7)

ovos_number_parser/numbers_ast.py

150-150: Avoid specifying long messages outside the exception class

(TRY003)

189-189: Avoid specifying long messages outside the exception class

(TRY003)

230-230: Avoid specifying long messages outside the exception class

(TRY003)

331-331: Loop control variable sv not used within loop body

Rename unused sv to _sv

(B007)

480-480: Avoid specifying long messages outside the exception class

(TRY003)

517-517: Loop control variable s_name not used within loop body

(B007)

517-517: Loop control variable p_name not used within loop body

(B007)

🔇 Additional comments (6)

README.md (1)

41-41: Asturian support row is consistent with implementation.

The ast entry correctly advertises pronounce number/ordinal, extraction, and numbers_to_digits support, matching the new numbers_ast module and wiring.
ovos_number_parser/__init__.py (5)
15-16: Asturian module import wiring looks correct.

The numbers_ast import exposes all the expected entry points (numbers_to_digits, pronounce number/ordinal, extract, fraction/ordinal checks) and aligns with how other language modules are wired here.

172-195: Asturian fraction pronunciation dispatch is correctly wired.

The pronounce_fraction function cleanly adds an ast branch that delegates to pronounce_fraction_ast, mirroring the existing PT and MWL handling and respecting the scale argument.

No issues here.

197-223: Asturian ordinal pronunciation dispatch is correctly integrated.

The pronounce_ordinal branch for lang.startswith("ast") calls pronounce_ordinal_ast(number, scale=scale, gender=gender), correctly passing both scale and gender in line with PT/MWL behavior.

Once the ordinal hundreds issue in numbers_ast.py is addressed, this wiring will behave as expected.

242-295: Asturian extract_number wiring matches the new implementation.

The extract_number dispatch:
if lang.startswith("ast"):
    return extract_number_ast(text, scale=scale, ordinals=ordinals)
correctly passes both scale and ordinals by keyword to extract_number_ast(text, ordinals=False, scale=Scale.LONG). After fixing the aggregation/no-number behavior in numbers_ast.extract_number_ast, this integration should be solid.

305-353: Asturian is_fractional and is_ordinal dispatches are consistent with other languages.

is_fractional for ast simply calls is_fractional_ast(input_str), which is appropriate since the Asturian implementation doesn’t use scale.

is_ordinal for ast calls is_ordinal_ast(input_str) analogously to PT/MWL.

These branches correctly extend the public API without affecting existing language behavior.

ovos_number_parser/numbers_ast.py

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ovos_number_parser/numbers_mwl.py (1)

237-356: Fix docstrings: references to "Portuguese" should be "Mirandese".

All wrapper function docstrings incorrectly reference "Portuguese" (and "Brazilian Portuguese" in some cases) instead of "Mirandese". This appears to be a copy-paste oversight.

Examples to fix:

Line 243: "Portuguese" → "Mirandese"

Line 251: "Portuguese" → "Mirandese"

Line 263: "Portuguese" → "Mirandese"

Line 273, 276: "Portuguese" → "Mirandese"

Line 287: "Portuguese" → "Mirandese"

Line 309: Remove "Brazilian and European Portuguese variants" reference

Lines 329, 331, 334: "Portuguese" → "Mirandese"

Lines 346, 348, 354: "Portuguese" → "Mirandese"

♻️ Duplicate comments (1)

ovos_number_parser/numbers_ast.py (1)

163-166: Ordinals for 200–900 are incomplete.

The ORDINAL_HUNDREDS only defines 100: 'centésimu', leaving ordinals like 200th–900th unsupported. This was previously flagged and the TODO comment acknowledges the gap.

🧹 Nitpick comments (6)

ovos_number_parser/util.py (1)
586-598: Minor: Rename unused loop variable.

The loop variable i on line 588 is unused. Consider using _ to indicate an intentionally unused variable.
-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])
tests/test_number_parser_ast.py (2)
28-32: Use hyphen-minus instead of en-dash in comment.

Line 29 uses an en-dash (–) which static analysis flags. Use a regular hyphen-minus (-) for consistency.
-        # 21–29 must exist
+        # 21-29 must exist
87-95: Multiple skipped tests indicate incomplete AST functionality.

Several tests are skipped with @unittest.skip("TODO - fix me"):

test_fraction_variants (line 87)

test_compound_fraction_phrase (line 92)

test_fraction_phrase (line 129)

test_negative (line 133)

test_large_numbers (line 201)

test_negative in integration (line 212)

These represent known gaps in the Asturian implementation.

Would you like me to open issues to track these incomplete features, or would you prefer to address them in this PR?

Also applies to: 129-136, 201-216
ovos_number_parser/numbers_pt.py (1)

330-340: is_fractional_pt and is_ordinal_pt do not accept a variant parameter.

These functions always delegate to PT_PT, while other public APIs accept a variant parameter. This is likely intentional since the vocabulary's ALT_SPELLINGS handles cross-variant recognition, but it creates an API inconsistency.

Consider documenting this design choice or adding a variant parameter for API uniformity.

ovos_number_parser/numbers_ast.py (2)

167-176: Ordinal scale coverage is limited.

The ordinal scales only define up to 10^6, with TODO comments indicating the need for larger scales. Consider completing these for full feature parity with other languages, or document the supported range limitations.

Would you like me to generate the missing ordinal scale entries for Asturian?

107-119: Fraction vocabulary has limited coverage compared to Portuguese.

Only fractions 2–12 are defined, while Portuguese includes entries up to 1000. Additionally, entries 11 and 12 already include the "avos" suffix, which may cause duplication issues similar to Portuguese if DENOMINATOR_PARTICLE is appended for plurals.

Consider expanding coverage or documenting the supported range.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e89e084 and ec9e39d.

📒 Files selected for processing (7)

ovos_number_parser/numbers_ast.py (1 hunks)
ovos_number_parser/numbers_mwl.py (5 hunks)
ovos_number_parser/numbers_pt.py (15 hunks)
ovos_number_parser/util.py (10 hunks)
tests/test_number_parser_ast.py (1 hunks)
tests/test_number_parser_pt.py (6 hunks)
tests/test_util.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

tests/test_util.py (1)

ovos_number_parser/util.py (1)

word_tokenize (115-135)

ovos_number_parser/numbers_ast.py (2)

ovos_number_parser/util.py (11)

Scale (38-45)

GrammaticalGender (48-54)

DigitPronunciation (57-59)

pronounce_ordinal (694-752)

is_fractional (380-395)

is_ordinal (369-378)

text (87-91)

extract_number (397-475)

pronounce_number (532-656)

numbers_to_digits (477-530)

pronounce_fraction (658-692)

ovos_number_parser/__init__.py (7)

pronounce_ordinal (197-239)

is_fractional (305-359)

is_ordinal (362-386)

extract_number (242-302)

pronounce_number (86-169)

numbers_to_digits (38-83)

pronounce_fraction (172-194)

tests/test_number_parser_ast.py (2)

ovos_number_parser/util.py (3)

Scale (38-45)

GrammaticalGender (48-54)

text (87-91)

ovos_number_parser/numbers_ast.py (6)

pronounce_number_ast (246-267)

pronounce_ordinal_ast (183-202)

is_fractional_ast (205-214)

pronounce_fraction_ast (290-302)

extract_number_ast (227-243)

numbers_to_digits_ast (270-286)

ovos_number_parser/numbers_pt.py (2)

ovos_number_parser/util.py (13)

Scale (38-45)

GrammaticalGender (48-54)

DigitPronunciation (57-59)

NumberVocabulary (264-360)

RomanceNumberExtractor (363-854)

pronounce_ordinal (694-752)

is_fractional (380-395)

is_ordinal (369-378)

extract_number (397-475)

text (87-91)

pronounce_number (532-656)

numbers_to_digits (477-530)

pronounce_fraction (658-692)

ovos_number_parser/__init__.py (7)

pronounce_ordinal (197-239)

is_fractional (305-359)

is_ordinal (362-386)

extract_number (242-302)

pronounce_number (86-169)

numbers_to_digits (38-83)

pronounce_fraction (172-194)

ovos_number_parser/util.py (1)

ovos_number_parser/__init__.py (6)

is_ordinal (362-386)

is_fractional (305-359)

extract_number (242-302)

numbers_to_digits (38-83)

pronounce_number (86-169)

pronounce_ordinal (197-239)

ovos_number_parser/numbers_mwl.py (2)

ovos_number_parser/util.py (14)

Scale (38-45)

GrammaticalGender (48-54)

DigitPronunciation (57-59)

word_tokenize (115-135)

NumberVocabulary (264-360)

RomanceNumberExtractor (363-854)

pronounce_ordinal (694-752)

is_fractional (380-395)

is_ordinal (369-378)

text (87-91)

extract_number (397-475)

pronounce_number (532-656)

numbers_to_digits (477-530)

pronounce_fraction (658-692)

ovos_number_parser/__init__.py (7)

pronounce_ordinal (197-239)

is_fractional (305-359)

is_ordinal (362-386)

extract_number (242-302)

pronounce_number (86-169)

numbers_to_digits (38-83)

pronounce_fraction (172-194)

🪛 Ruff (0.14.7)

tests/test_number_parser_ast.py

29-29: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

304-304: Unused lambda argument: gender

(ARG005)

554-554: Avoid specifying long messages outside the exception class

(TRY003)

588-588: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

714-714: Avoid specifying long messages outside the exception class

(TRY003)

756-756: Unused method argument: gender

(ARG002)

771-771: Avoid specifying long messages outside the exception class

(TRY003)

822-822: Avoid specifying long messages outside the exception class

(TRY003)

ovos_number_parser/numbers_mwl.py

6-6: Redefinition of unused Scale from line 3

Remove definition: Scale

(F811)

6-6: Redefinition of unused GrammaticalGender from line 3

Remove definition: GrammaticalGender

(F811)

6-6: Redefinition of unused DigitPronunciation from line 3

Remove definition: DigitPronunciation

(F811)

8-8: Redefinition of unused Union from line 1

Remove definition: Union

(F811)

🔇 Additional comments (22)

ovos_number_parser/numbers_mwl.py (2)

11-43: Verify Mirandese linguistic rules.

The TODO comments at lines 21-22 and 38 indicate uncertainty about gender and plural transformations. Consider consulting Mirandese language documentation to confirm these rules are accurate.

46-231: LGTM!

The NumberVocabulary configuration for Mirandese is comprehensive and well-structured. The vocabulary covers units, tens, hundreds, fractions, ordinals, and scale mappings appropriately.

ovos_number_parser/util.py (5)

7-35: LGTM!

The Token dataclass properly implements tuple-like access via __iter__ and __getitem__, maintaining backward compatibility. The immutability enforcement via __setattr__ is correctly implemented.

62-112: LGTM!

The ReplaceableNumber dataclass correctly implements properties for accessing token indices and text, with proper immutability enforcement matching the Token pattern.

231-260: LGTM!

The static analysis warning (B008) about range(1, 21) as a default argument is a false positive. In Python 3, range() returns an immutable sequence object, so it's safe as a default parameter value.

263-360: LGTM!

The NumberVocabulary dataclass is well-designed with sensible defaults. The unused gender parameter in the default swap_gender lambda (line 304) is intentional - it provides a no-op default that languages can override with gender-aware implementations.

658-692: LGTM!

The fraction pronunciation logic correctly handles:

Division by zero with a special message

Known fraction words from the vocabulary with pluralization

Unknown denominators using the denominator particle pattern

tests/test_util.py (2)

6-6: LGTM!

The word_tokenize function is correctly added to the imports, aligning with its new public API exposure.

827-878: LGTM!

Comprehensive test coverage for word_tokenize including:

Basic tokenization

Special character handling (%, #)

Hyphen behavior between words vs. in numbers

Edge cases (empty string, whitespace, trailing hyphens)

Complex mixed inputs

The tests properly exercise the documented behavior from the implementation.

tests/test_number_parser_ast.py (1)

1-219: Overall: Good test structure for new AST language support.

The test file provides solid coverage for the Asturian number parser including vocabulary validation, pronunciation, extraction, ordinals, and number-to-digit conversion. The skipped tests clearly document known limitations.

tests/test_number_parser_pt.py (5)

10-10: LGTM!

The PT_PT and PT_BR wrapper imports correctly expose the vocabulary-based Portuguese variant implementations.

34-86: LGTM!

Dictionary tests properly updated to use the new vocabulary-based API (PT_PT.vocab, PT_BR.vocab) instead of private dictionaries. This aligns with the refactored implementation and provides better encapsulation.

350-375: Excellent test additions for decimal handling.

These new tests for test_significant_digits and test_leading_zeros provide valuable coverage for edge cases in decimal pronunciation:

Rounding at various decimal places

Leading zero preservation

Complex cases like 10.00000056

This addresses potential regressions in the refactored decimal handling logic.

377-386: Good use of skip with documentation.

The skipped test at line 377 includes a clear TODO comment asking whether the behavior should be implemented. This is a good practice for documenting design decisions that need discussion.

491-495: LGTM!

Good addition of zero-division test case to verify the DIVIDED_BY_ZERO vocabulary entry is used correctly.

ovos_number_parser/numbers_pt.py (5)

37-38: Verify the 'ma' → 'm' masculine conversion logic.

The rule that converts words ending in 'ma' to remove the final 'a' for masculine may not be appropriate for Portuguese. Words ending in -ma (like "problema", "tema") are typically invariant and don't follow this pattern. This could produce incorrect results for certain inputs.

Please verify this transformation is intentional and add a comment explaining which word forms it targets.

50-55: Simplified pluralization is acceptable for number vocabulary.

The pluralization logic correctly handles the common -ão → -ões pattern and the default -s suffix, which covers the number-related vocabulary. More complex Portuguese pluralization rules (e.g., -l → -is) are not needed for the current number word set.

229-292: Good vocabulary reuse for Brazilian Portuguese variant.

The _PT_BR vocabulary correctly inherits shared properties from _PT_PT while overriding variant-specific differences (tens spellings, scale names, default scale). The cross-variant ALT_SPELLINGS enables recognition of both regional forms.

437-511: Comprehensive test coverage in __main__ block.

The test block covers pronunciation, extraction, ordinals, fractions, and edge cases including division by zero. This serves as useful development/debugging reference.

132-145: No issue found with fraction pronunciation.

The FRACTION entries for 11–19 (e.g., 'onze avos') are intentionally pre-suffixed with "avos" and already end in "s". The pluralize_pt function checks if a word ends in "s" and returns it unchanged if it does, so no duplication occurs. When pronounce_fraction pluralizes these entries for numerators other than 1, it correctly returns the same string without appending additional particles. Additionally, DENOMINATOR_PARTICLE is only appended to denominators not in the FRACTION dictionary (line 30 in util.py), so there is no risk of duplication with pre-suffixed fraction names.

ovos_number_parser/numbers_ast.py (2)

306-329: Test block provides good development coverage.

The __main__ block covers cardinal pronunciation, ordinals, extraction, and digit conversion. Since the implementation now delegates to RomanceNumberExtractor, verify that the shared extractor correctly handles cases like 'dos millones trescientos' (which should extract as 2000300).

78-87: No action needed. The number 20 is correctly mapped to 'venti' in Asturian. The _pronounce_up_to_999 method (line 774–775) explicitly checks if the number exists in the TENS dictionary and returns it directly, so pronounce_number_ast(20) correctly returns 'venti', which is the standard Asturian spelling for the standalone number 20.

ovos_number_parser/numbers_ast.py

ovos_number_parser/numbers_mwl.py

ovos_number_parser/util.py

tests/test_number_parser_ast.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

tests/test_number_parser_ast.py (1)
88-95: Fragile test assertion always passes.

The assertion on line 94 will always pass because it constructs a string that's guaranteed to contain "cuartus". This doesn't actually verify the expected output.

Apply this fix:
-        self.assertIn("cuartus", result2 if "cuartus" in result2 else "cuartus?")
+        self.assertIn("cuartus", result2)
ovos_number_parser/numbers_ast.py (3)
9-17: Unreachable condition: 'imu' check is redundant.

Line 12 checks for words ending in 'imu', but this is unreachable because line 10 already catches all words ending in 'u' (which includes 'imu').
 if gender == GrammaticalGender.FEMININE:
     if word.endswith('u'):
         return word[:-1] + 'a'
-    if word.endswith('imu'):
-        return word[:-1] + 'a'
20-28: Pluralization logic produces incorrect results.

The TODO comment indicates uncertainty, and the logic is indeed problematic:

Lines 24-25: Words not ending in 'u' have their last character replaced with 'os' (e.g., "mil" → "mis")

Lines 26-27: Only adds 's' if word ends in 'u' and doesn't already end in 's'

Asturian pluralization typically follows patterns like -u → -os, -ón → -ones.
 def pluralize_ast(word: str):
-    # TODO - is this accurate?
     if word.endswith("ón"):
         return word[:-2] + "ones"
-    if not word.endswith("u"):
-        return word[:-1] + "os"
-    if not word.endswith("s"):
+    if word.endswith("u"):
+        return word[:-1] + "os"
+    if not word.endswith("s"):
         return word + "s"
     return word
161-164: Ordinals 200-999 are incomplete.

ORDINAL_HUNDREDS only defines 100, missing entries for 200-900. This causes ordinals like 256 to drop the hundreds component and pronounce only the last two digits.

Consider either:

Adding explicit mappings for 200, 300, ..., 900

Adding a fallback strategy to construct ordinals from cardinals

Explicitly documenting the supported ordinal range

🧹 Nitpick comments (2)

tests/test_number_parser_ast.py (1)
21-21: Replace EN DASH with HYPHEN-MINUS in comment.

The comment contains an EN DASH (–) character instead of a standard HYPHEN-MINUS (-). While this doesn't affect functionality, it's better to use standard ASCII characters in code for consistency.
-        # 21–29 must exist
+        # 21-29 must exist
ovos_number_parser/numbers_mwl.py (1)
387-390: Remove unnecessary f-string prefixes.

Lines 387-390 use f-string syntax but contain no placeholders. The f prefix should be removed for clarity.
-    print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
-    print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
-    print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous"))
-    print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues"))
+    print("'bint'i un' ->", MWL.extract_number("bint'i un"))
+    print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
+    print("'bint'i dous' ->", MWL.extract_number("bint'i dous"))
+    print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec9e39d and 11a6e54.

📒 Files selected for processing (6)

ovos_number_parser/__init__.py (10 hunks)
ovos_number_parser/numbers_ast.py (1 hunks)
ovos_number_parser/numbers_mwl.py (2 hunks)
ovos_number_parser/numbers_pt.py (12 hunks)
tests/test_number_parser_ast.py (1 hunks)
tests/test_number_parser_pt.py (10 hunks)

🧰 Additional context used

🪛 Ruff (0.14.7)

ovos_number_parser/numbers_mwl.py

387-387: f-string without any placeholders

Remove extraneous f prefix

(F541)

388-388: f-string without any placeholders

Remove extraneous f prefix

(F541)

389-389: f-string without any placeholders

Remove extraneous f prefix

(F541)

390-390: f-string without any placeholders

Remove extraneous f prefix

(F541)

ovos_number_parser/__init__.py

53-53: Undefined name numbers_to_digits_ast

(F821)

tests/test_number_parser_ast.py

21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

🔇 Additional comments (3)

ovos_number_parser/numbers_pt.py (1)

58-292: LGTM: Clean refactoring to vocabulary-driven approach.

The refactoring from hardcoded dictionaries to NumberVocabulary and RomanceNumberExtractor is well-executed. The clear separation between private vocabularies (_PT_PT, _PT_BR) and public extractors (PT_PT, PT_BR) follows good design patterns and maintains backward compatibility through deprecated wrappers.

tests/test_number_parser_pt.py (1)

1-616: LGTM: Comprehensive test coverage for new API.

The test suite has been thoroughly updated to exercise the public PT_PT and PT_BR interfaces, with excellent coverage of edge cases including decimal handling, leading zeros, fractions, and integration scenarios. The migration from internal structures to public vocab attributes is complete and consistent.

ovos_number_parser/numbers_mwl.py (1)

17-18: TODO comments indicate uncertain gender/plural logic.

The TODO comments on lines 17 and 34 indicate uncertainty about the correctness of gender swapping and pluralization rules for Mirandese. Consider verifying these rules against Mirandese grammar references or consulting with a native speaker to ensure correctness.

If you need assistance researching Mirandese grammar rules for gender swapping and pluralization, I can help generate a verification approach.

Also applies to: 34-34

ovos_number_parser/__init__.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

ovos_number_parser/util.py (1)
771-820: Unused gender parameter in _pronounce_up_to_999.

As noted in a past review, the gender parameter is declared but not used. For Romance languages, cardinal numbers like 1 and 2 have gendered forms. This appears to be intentional deferral of gender handling for cardinals.

Consider adding a TODO comment if gender support for cardinals is planned, or document why it's not needed here:
     def _pronounce_up_to_999(self,
                              n: int,
                              gender: GrammaticalGender = GrammaticalGender.MASCULINE
                              ) -> str:
+        # Note: gender parameter reserved for future use with gendered cardinal forms (um/uma, dois/duas)

🧹 Nitpick comments (7)

tests/test_number_parser_ast.py (1)
20-23: Minor: EN DASH in comment.

The comment on line 21 uses an EN DASH (–) instead of a standard hyphen-minus (-). This is a cosmetic issue but could cause confusion.
     def test_composite_20s(self):
-        # 21–29 must exist
+        # 21-29 must exist
         for i in range(21, 30):
             self.assertIn(i, AST.vocab.TENS)
ovos_number_parser/__init__.py (2)
180-186: Double space in line 182.

Minor formatting issue with double space before PT_PT.pronounce_fraction.
     if lang.startswith("pt"):
         return PT_BR.pronounce_fraction(fraction_word, scale=scale) if "br" in lang.lower() \
-            else  PT_PT.pronounce_fraction(fraction_word, scale=scale)
+            else PT_PT.pronounce_fraction(fraction_word, scale=scale)
210-216: Double space in line 212.

Same formatting issue with double space.
     if lang.startswith("pt"):
         return PT_BR.pronounce_ordinal(number, scale=scale, gender=gender) if "br" in lang.lower() \
-            else  PT_PT.pronounce_ordinal(number, scale=scale, gender=gender)
+            else PT_PT.pronounce_ordinal(number, scale=scale, gender=gender)
ovos_number_parser/util.py (4)
62-113: Consider using AttributeError for consistency.

ReplaceableNumber.__setattr__ raises a generic Exception while Token.__setattr__ raises AttributeError. Both should use the same exception type for consistency.
         try:
             getattr(self, key)
         except AttributeError:
             super().__setattr__(key, value)
         else:
-            raise Exception("Immutable!")
+            raise AttributeError("Immutable!")
231-260: Avoid mutable default argument with range().

Using range(1, 21) as a default argument is evaluated once at function definition time. While range objects are immutable so this won't cause mutation bugs, it's cleaner to use None and set the default inside the function.
-def convert_to_mixed_fraction(number: Union[int, float], denominators=range(1, 21)) -> Optional[Tuple[int, int, int]]:
+def convert_to_mixed_fraction(number: Union[int, float], denominators=None) -> Optional[Tuple[int, int, int]]:
     """
     Convert floats to components of a mixed fraction representation
     ...
     """
     int_number = int(number)
     if int_number == number:
         return int_number, 0, 1  # whole number, no fraction

     frac_number = abs(number - int_number)
     if not denominators:
         denominators = range(1, 21)
304-306: Unused gender parameter in default lambda.

The default swap_gender lambda accepts a gender parameter but doesn't use it. While this is intentional (as the default behavior is identity), consider adding a comment or using _ prefix.
-    swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, gender: word
+    swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, _gender: word
602-615: Rename unused loop variable i to _i.

The loop variable i is unused in the loop body.
-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11a6e54 and 2714095.

📒 Files selected for processing (4)

ovos_number_parser/__init__.py (11 hunks)
ovos_number_parser/util.py (10 hunks)
tests/test_number_parser_ast.py (1 hunks)
tests/test_number_parser_pt.py (10 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tests/test_number_parser_ast.py (2)

ovos_number_parser/util.py (8)

GrammaticalGender (48-54)

pronounce_number (549-673)

is_fractional (380-395)

pronounce_fraction (675-709)

extract_number (397-492)

pronounce_ordinal (711-769)

numbers_to_digits (494-547)

text (87-91)

ovos_number_parser/__init__.py (6)

pronounce_number (83-163)

is_fractional (299-353)

pronounce_fraction (166-188)

extract_number (236-296)

pronounce_ordinal (191-233)

numbers_to_digits (35-80)

ovos_number_parser/__init__.py (2)

ovos_number_parser/numbers_pt.py (1)

PortugueseVariant (9-16)

ovos_number_parser/util.py (6)

numbers_to_digits (494-547)

pronounce_number (549-673)

pronounce_fraction (675-709)

pronounce_ordinal (711-769)

extract_number (397-492)

text (87-91)

ovos_number_parser/util.py (1)

ovos_number_parser/__init__.py (7)

is_ordinal (356-380)

is_fractional (299-353)

extract_number (236-296)

numbers_to_digits (35-80)

pronounce_number (83-163)

pronounce_ordinal (191-233)

pronounce_fraction (166-188)

🪛 Ruff (0.14.7)

tests/test_number_parser_ast.py

21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

tests/test_number_parser_pt.py

611-611: Do not catch blind exception: Exception

(BLE001)

ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

304-304: Unused lambda argument: gender

(ARG005)

571-571: Avoid specifying long messages outside the exception class

(TRY003)

605-605: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

731-731: Avoid specifying long messages outside the exception class

(TRY003)

773-773: Unused method argument: gender

(ARG002)

788-788: Avoid specifying long messages outside the exception class

(TRY003)

839-839: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (32)

tests/test_number_parser_ast.py (8)

1-5: LGTM!

The imports are clean and appropriate for the test module.

10-33: LGTM! Comprehensive vocabulary tests.

These tests properly validate the completeness of the AST vocabulary dictionaries for units, tens, hundreds, and fractions.

39-67: LGTM! Good coverage of pronunciation tests.

Tests cover zero, units, teens, composite 20s, tens with units, and hundreds with various configurations.

73-90: LGTM! Fraction tests now properly validate output.

The previous fragile assertion issue has been addressed. The tests now correctly verify both singular and plural fraction forms.

96-126: LGTM! Extraction tests cover key scenarios.

Tests include simple numbers, hundreds, thousands, millions, negatives, and the no-number case. The skipped fraction phrase test is appropriately marked with a TODO.

132-149: LGTM! Ordinal tests cover basic through composed cases.

Good coverage of basic ordinals with gender variants, teens, tens, composed ordinals, and hundredths.

155-172: LGTM! Numbers-to-digits tests are well structured.

Tests cover simple conversions, phrase context, mixed text, and the no-number fallback case.

178-202: LGTM! Integration tests provide good round-trip validation.

The round-trip test for multiple number values is valuable. Skipped large number test is appropriately marked. Decimal and negative tests ensure end-to-end correctness.

tests/test_number_parser_pt.py (7)

3-6: LGTM! Clean import migration to variant-based API.

The imports now correctly reference PT_PT and PT_BR instead of internal dictionary mappings.

27-82: LGTM! Dictionary tests properly migrated to vocab-based API.

Tests now access vocabulary through PT_PT.vocab.* and PT_BR.vocab.* paths, validating the new public API surface.

345-370: LGTM! Excellent coverage of decimal edge cases.

The new tests for significant digits rounding and leading zeros handling are thorough and validate important pronunciation behavior.

372-381: Good decision to skip uncertain behavior test.

The skipped test for decimal edge cases (1.0, 1.00, 1.000) appropriately documents uncertain requirements with a TODO comment.

486-491: LGTM! Zero division handling test.

Good addition to verify the behavior when dividing by zero in fractions.

603-612: Acceptable use of broad exception catch in test context.

While the static analyzer flags catching Exception, this is appropriate for a robustness test that verifies the function doesn't raise any exception for large numbers.

614-619: LGTM! Negative number round-trip test.

Good integration test validating negative number pronunciation and extraction consistency.

ovos_number_parser/__init__.py (7)

15-15: LGTM! Clean imports for new language extractors.

The imports properly bring in AST, PT_PT, PT_BR, and MWL for the variant-based routing.

Also applies to: 25-26

52-53: LGTM! Correctly addressed previous review comment.

The routing now correctly uses AST.numbers_to_digits(utterance) instead of the previously undefined numbers_to_digits_ast().

117-118: LGTM! AST pronunciation routing is correct.

Properly passes all required parameters to AST.pronounce_number.

143-148: LGTM! PT variant routing for pronounce_number.

Correctly distinguishes between Brazilian and European Portuguese using "br" in lang.lower().

283-289: LGTM! Extract number routing is consistent.

Properly routes to PT_BR or PT_PT based on language variant detection.

341-346: LGTM! Fractional routing added for AST.

Properly adds AST support to is_fractional dispatch.

368-374: LGTM! Ordinal routing added for AST.

Properly adds AST support to is_ordinal dispatch.

ovos_number_parser/util.py (10)

7-35: LGTM! Token dataclass with immutability support.

The Token class properly implements iteration and indexing for backwards compatibility while enforcing immutability through __setattr__.

115-136: LGTM! Clean tokenization function.

The word_tokenize function handles special cases like percentages, hashtags, and hyphenated words appropriately.

263-306: LGTM! Comprehensive vocabulary dataclass.

NumberVocabulary encapsulates all language-specific vocabulary and configuration in a clean, typed structure. The callable fields for swap_gender, pluralize, and singularize provide good extensibility.

308-360: LGTM! Well-structured vocabulary helper methods.

The get_number_strings, get_ordinal_strings, and get_fraction_strings methods properly combine masculine, feminine, and plural forms for comprehensive word recognition.

397-492: LGTM! Robust number extraction logic.

The extract_number method handles cardinals, ordinals, fractions, scales, decimals, and negatives. The tokenization and lookahead logic is sound.

494-547: LGTM! Clean numbers-to-digits implementation.

The span-based approach for identifying and replacing number words is well-designed and handles edge cases like joiners appropriately.

549-673: LGTM! Comprehensive number pronunciation.

The pronounce_number method properly handles negatives, decimals with leading zeros, integer pronunciation, and scale-based large numbers.

675-709: LGTM! Fraction pronunciation with proper pluralization.

Handles known fraction words, pluralization, and unknown denominators with the denominator particle.

711-769: LGTM! Ordinal pronunciation with gender and scale support.

Properly handles zero, negatives, and large scales with correct gender application.

822-870: LGTM! Ordinal pronunciation for 0-999 range.

Properly handles hundreds, tens, and units with gender swapping for all ordinal forms.

coderabbitai · 2025-12-04T05:35:39Z

ovos_number_parser/__init__.py

    if lang.startswith("pt"):
-        return numbers_to_digits_pt(utterance, scale=scale)
+        return PT_PT.numbers_to_digits(utterance, scale=scale)
    if lang.startswith("mwl"):
-        return numbers_to_digits_mwl(utterance, scale=scale)
+        return MWL.numbers_to_digits(utterance, scale=scale)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check all instances of language detection for Portuguese variants rg -n 'def (numbers_to_digits|pronounce_number|.*_to_\w+)' ovos_number_parser/__init__.py | head -20

Repository: OpenVoiceOS/ovos-number-parser

Length of output: 222

🏁 Script executed:

#!/bin/bash # Search for all Portuguese-related dispatch logic rg -n '(PT_PT|"pt"|"br"|lang\.)' ovos_number_parser/__init__.py -B 2 -A 2

Repository: OpenVoiceOS/ovos-number-parser

Length of output: 11500

Fix Portuguese variant handling in numbers_to_digits, is_fractional, and is_ordinal.

The dispatch logic is inconsistent across functions. pronounce_number, pronounce_fraction, pronounce_ordinal, and extract_number all check "br" in lang.lower() to route Brazilian Portuguese separately, but numbers_to_digits (line 72), is_fractional (line 341), and is_ordinal (line 368) always route to PT_PT regardless of whether the language is pt-BR or similar. This means Brazilian Portuguese users receive Portugal-specific behavior for these three functions.

🤖 Prompt for AI Agents

In ovos_number_parser/__init__.py around lines 72 to 75, the Portuguese dispatch always uses PT_PT causing pt-BR variants to get Portugal-specific behavior; update the routing so that if "br" is in lang.lower() it returns PT_BR.numbers_to_digits(utterance, scale=scale) otherwise return PT_PT.numbers_to_digits(...). Apply the same pattern to is_fractional and is_ordinal (check "br" in lang.lower() and route to PT_BR for Brazilian Portuguese, PT_PT otherwise) to make the dispatch consistent with pronounce_number/pronounce_fraction/pronounce_ordinal/extract_number.

coderabbitai · 2025-12-04T13:31:56Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #39

@JarbasAl

Docstrings generation was requested by @JarbasAl. * #38 (comment) The following files were modified: * `ovos_number_parser/__init__.py` * `ovos_number_parser/numbers_ast.py` * `ovos_number_parser/numbers_mwl.py` * `ovos_number_parser/numbers_pt.py` * `ovos_number_parser/util.py` * `tests/test_number_parser_ast.py` * `tests/test_number_parser_pt.py`

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

ovos_number_parser/numbers_ast.py (1)

38-220: Asturian NumberVocabulary wiring looks solid; consider unifying zero/cero wording.

The _AST configuration hangs together well: units/tens/hundreds cover the expected ranges, fractions and ordinal hundreds are complete, and hooking it up via AST = RomanceNumberExtractor(_AST) matches the pattern used for other Romance languages.

One tiny nit: DIVIDED_BY_ZERO="a dividir por zero" uses "zero", while UNITS[0] is "cero". If "cero" is the canonical Asturian form you want everywhere, you may want to standardize this string; otherwise feel free to leave as-is if the variation is intentional.
tests/test_number_parser_ast.py (3)
21-23: Replace EN DASH in comment with a plain ASCII hyphen to satisfy Ruff.

The comment # 21–29 must exist uses an EN DASH (–), which Ruff flags as ambiguous. Switching it to a standard - (i.e., 21-29) will avoid the warning and keep the source ASCII‑clean.

96-126: Consider adding explicit zero‑extraction coverage alongside “no number” cases.

test_no_number nicely guards the False contract for inputs without numerals, but there’s no explicit test that "cero" (or any zero form) is parsed as 0 rather than False. Given past issues around “no number” behavior, it would be useful to add something like:
def test_zero_extraction(self):
    self.assertEqual(AST.extract_number("cero"), 0)
to lock in the intended semantics for zero.

132-148: Add a couple of ordinals ≥ 200 to regression‑test the hundreds logic.

Since ORDINAL_HUNDREDS was recently completed for 200–900, it would be good to add one or two tests here for larger ordinals (e.g., AST.pronounce_ordinal(256) and AST.pronounce_ordinal(900)) to ensure the hundreds component is preserved correctly and to prevent regressions in that range.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 971d773 and 69f2f64.

📒 Files selected for processing (2)

ovos_number_parser/numbers_ast.py (1 hunks)
tests/test_number_parser_ast.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

ovos_number_parser/numbers_ast.py (2)

ovos_number_parser/util.py (8)

Scale (38-45)

GrammaticalGender (48-54)

NumberVocabulary (264-360)

RomanceNumberExtractor (363-871)

pronounce_number (549-673)

pronounce_ordinal (711-769)

extract_number (397-492)

numbers_to_digits (494-547)

ovos_number_parser/__init__.py (4)

pronounce_number (83-163)

pronounce_ordinal (191-233)

extract_number (236-296)

numbers_to_digits (35-80)

🪛 Ruff (0.14.7)

tests/test_number_parser_ast.py

21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

🔇 Additional comments (1)

tests/test_number_parser_ast.py (1)

138-140: Clarify expected 12th ordinal form vs vocab entry.

test_teens_ordinal asserts that AST.pronounce_ordinal(12) contains "décimu", but the Asturian vocab currently defines ORDINAL_TENS[12] = "docenu" in ovos_number_parser/numbers_ast.py. Depending on how RomanceNumberExtractor.pronounce_ordinal uses ORDINAL_TENS, the test and vocab may not be aligned.

Please double‑check what the correct Asturian form for “12th” should be and then either:

Update the vocab entry for 12 to match, or

Adjust the test to assert the actual canonical form (e.g., "docenu" or another agreed form).

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ovos_number_parser/numbers_pt.py (1)
367-395: Avoid positional argument mismatch when delegating to pronounce_ordinal

Here you delegate to RomanceNumberExtractor.pronounce_ordinal using positional arguments:
if variant == PortugueseVariant.PT:
    return PT_PT.pronounce_ordinal(number, gender, scale)
return PT_BR.pronounce_ordinal(number, gender, scale)
Elsewhere (e.g. in ovos_number_parser/__init__.py), pronounce_ordinal is always called with scale= and gender= keywords, which strongly suggests the method signature is (number, scale=..., gender=...). In that case, these calls would swap scale and gender, yielding incorrect behavior for non-default arguments.

Switching to keyword arguments here makes this wrapper robust regardless of the exact parameter order:
-    if variant == PortugueseVariant.PT:
-        return PT_PT.pronounce_ordinal(number, gender, scale)
-    return PT_BR.pronounce_ordinal(number, gender, scale)
+    if variant == PortugueseVariant.PT:
+        return PT_PT.pronounce_ordinal(number, scale=scale, gender=gender)
+    return PT_BR.pronounce_ordinal(number, scale=scale, gender=gender)

🧹 Nitpick comments (4)

ovos_number_parser/numbers_pt.py (4)
19-47: Tighten swap_gender_pt heuristics for ...ma / ...m endings

The current rules can generate malformed forms for numerals:

elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): will turn "décima" into "décim" instead of "décimo", and will fire before the more general "a" -> "o" rule.

elif gender == GrammaticalGender.FEMININE and word.endswith('m'): works for "um" -> "uma" but will produce incorrect forms like "bom" -> "boma" if ever used outside strict numeral vocabulary.

Given this helper is primarily for numeral words, you can safely narrow the rules and avoid the truncation:
-    elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
-        return word[:-1]
-    elif gender == GrammaticalGender.MASCULINE and word.endswith('a'):
+    elif gender == GrammaticalGender.MASCULINE and word.endswith('a'):
         return word[:-1] + 'o'
@@
-    elif gender == GrammaticalGender.FEMININE and word.endswith('m'):
-        return word + 'a'
+    elif gender == GrammaticalGender.FEMININE and word == "um":
+        return "uma"
This keeps the special-cases (dois/duas, um/uma) and generic o/a / os/as swaps, while avoiding truncated stems and over-broad "m" -> "ma" changes.

50-55: Minor typing nit: add explicit return annotation to pluralize_pt

For consistency with swap_gender_pt and the rest of the typed surface, consider annotating the return type:
-def pluralize_pt(word: str):
+def pluralize_pt(word: str) -> str:
Behavior stays the same; this just tightens the public API.

398-497: Deprecation shims are consistent; consider scale default semantics

The deprecated helpers (is_fractional_pt, is_ordinal_pt, extract_number_pt, pronounce_number_pt, numbers_to_digits_pt, pronounce_fraction_pt) all:

Emit DeprecationWarning with stacklevel=2, which is good practice.

Delegate straight through to PT_PT / PT_BR depending on variant.

One nuance to be aware of: these shims default scale to Scale.LONG (where present), which means they override the vocab’s own DEFAULT_SCALE (e.g., Scale.SHORT for _PT_BR). If backward-compat with the legacy API depended on “short vs long” defaults, you may want these wrappers to pass scale only when explicitly provided, letting RomanceNumberExtractor fall back to vocab.DEFAULT_SCALE. If the semantics are intentionally shifting and the functions are deprecated anyway, current behavior is acceptable.

No immediate changes required, but worth double‑checking against historical expectations.

500-574: __main__ tests: BR/PT variant usage for numbers_to_digits_pt

In the “BR” block you still rely on the default variant:
print("\n--- Testing numbers_to_digits_pt (BR) ---")
print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta')}'")
...
But numbers_to_digits_pt defaults variant to PortugueseVariant.PT, so these examples are actually exercising the PT vocab, not BR. Since the “PT” block below correctly passes variant=PortugueseVariant.PT, you probably intended the BR block to pass PortugueseVariant.BR explicitly:
-    print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta')}'")
-    print(f"'um milhão' -> '{numbers_to_digits_pt('um milhão')}'")
-    print(f"'dezesseis' -> '{numbers_to_digits_pt('dezesseis')}'")
-    print(f"'há duzentos e cinquenta carros' -> '{numbers_to_digits_pt('há duzentos e cinquenta carros')}'")
+    print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta', variant=PortugueseVariant.BR)}'")
+    print(f"'um milhão' -> '{numbers_to_digits_pt('um milhão', variant=PortugueseVariant.BR)}'")
+    print(f"'dezesseis' -> '{numbers_to_digits_pt('dezesseis', variant=PortugueseVariant.BR)}'")
+    print(f"'há duzentos e cinquenta carros' -> '{numbers_to_digits_pt('há duzentos e cinquenta carros', variant=PortugueseVariant.BR)}'")
This is only in the manual test harness, but tightening it will make manual checks reflect the intended variant behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69f2f64 and 4cd17ac.

📒 Files selected for processing (1)

ovos_number_parser/numbers_pt.py (12 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

ovos_number_parser/numbers_pt.py (2)

ovos_number_parser/util.py (12)

Scale (38-45)

GrammaticalGender (48-54)

DigitPronunciation (57-59)

NumberVocabulary (264-360)

RomanceNumberExtractor (363-871)

pronounce_ordinal (711-769)

is_fractional (380-395)

is_ordinal (369-378)

extract_number (397-492)

text (87-91)

pronounce_number (549-673)

pronounce_fraction (675-709)

ovos_number_parser/__init__.py (6)

pronounce_ordinal (191-233)

is_fractional (299-353)

is_ordinal (356-380)

extract_number (236-296)

pronounce_number (83-163)

pronounce_fraction (166-188)

🔇 Additional comments (1)

ovos_number_parser/numbers_pt.py (1)

58-266: Vocabulary configuration and PT/PT‑BR wrappers look consistent

The _PT_PT / _PT_BR NumberVocabulary definitions and the PT_PT / PT_BR RomanceNumberExtractor wrappers are wired together cleanly:

Shared pieces (DIVIDED_BY_ZERO, DENOMINATOR_PARTICLE, HUNDRED_PARTICLE, NO_PREV_UNIT, NO_PLURAL, ordinal maps, etc.) are sensibly reused between variants.

Variant-specific pieces (TENS forms, short/long scale names, DIGIT_SPELLINGS for "meia", cross-variant ALT_SPELLINGS) are localized to the correct vocab.

DEFAULT_SCALE differs appropriately (LONG for PT, SHORT for BR).

No structural or obvious correctness issues stand out in these mappings given the NumberVocabulary / RomanceNumberExtractor contracts.

Also applies to: 268-358, 360-361

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

ovos_number_parser/util.py (1)

777-826: Unused gender parameter may indicate missing functionality.

The gender parameter is declared but never used. For Romance languages, cardinal numbers like 1 ("um"/"uma") and 2 ("dois"/"duas") have gendered forms stored in GENDERED_SPELLINGS. Consider applying gender-specific spellings when the number matches a gendered entry.
ovos_number_parser/__init__.py (2)
73-76: Inconsistent Portuguese variant handling in numbers_to_digits.

Unlike pronounce_number, extract_number, and pronounce_fraction which check "br" in lang.lower() to route Brazilian Portuguese separately, numbers_to_digits always routes to PT_PT. For consistency, apply the same pattern.
     if lang.startswith("pt"):
-        return PT_PT.numbers_to_digits(utterance, scale=scale)
+        return PT_BR.numbers_to_digits(utterance, scale=scale) if "br" in lang.lower() \
+            else PT_PT.numbers_to_digits(utterance, scale=scale)
367-372: Inconsistent Portuguese variant handling in is_fractional and is_ordinal.

Similar to numbers_to_digits, these functions always route Portuguese to PT_PT without checking for Brazilian Portuguese variant. Apply the same pattern used in pronounce_number and extract_number for consistency.
     if lang.startswith("pt"):
-        return PT_PT.is_fractional(input_str)
+        return PT_BR.is_fractional(input_str) if "br" in lang.lower() \
+            else PT_PT.is_fractional(input_str)
And for is_ordinal:
     if lang.startswith("pt"):
-        return PT_PT.is_ordinal(input_str)
+        return PT_BR.is_ordinal(input_str) if "br" in lang.lower() \
+            else PT_PT.is_ordinal(input_str)

🧹 Nitpick comments (5)

ovos_number_parser/util.py (3)
7-35: Consider using frozen=True for dataclass immutability.

The custom __setattr__ implementation provides pseudo-immutability, but Python's dataclass decorator supports this natively with frozen=True. This also provides better IDE support and clearer intent.
-@dataclass
+@dataclass(frozen=True)
 class Token:
     word: str
     index: int

     def __iter__(self):
         yield self.word
         yield self.index

     def __getitem__(self, item):
         if item == 0:
             return self.word
         elif item == 1:
             return self.index
         raise IndexError
-
-    def __setattr__(self, key, value):
-        """
-        Prevent modification of existing attributes, allowing only new attributes to be set.
-
-        Raises:
-            Exception: If attempting to modify an attribute that already exists.
-        """
-        try:
-            getattr(self, key)
-        except AttributeError:
-            super().__setattr__(key, value)
-        else:
-            raise AttributeError("Immutable!")
93-105: Inconsistent exception type for immutability violation.

Token.__setattr__ raises AttributeError("Immutable!") while ReplaceableNumber.__setattr__ raises Exception("Immutable!"). Use consistent exception types, preferably AttributeError which is more semantically correct for attribute access violations.
         else:
-            raise Exception("Immutable!")
+            raise AttributeError("Immutable!")
609-610: Unused loop variable i.

The loop variable i is not used within the loop body. Rename it to _ to indicate it's intentionally unused.
-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])
ovos_number_parser/numbers_mwl.py (1)
392-395: Remove extraneous f prefix from non-interpolated strings.

These lines use f-string syntax but contain no placeholders. The print statements use comma separation for output, so the f prefix is unnecessary.
-    print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
-    print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
-    print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous"))
-    print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues"))
+    print("'bint'i un' ->", MWL.extract_number("bint'i un"))
+    print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
+    print("'bint'i dous' ->", MWL.extract_number("bint'i dous"))
+    print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))
ovos_number_parser/__init__.py (1)
113-117: Consider emitting DeprecationWarning for short_scale parameter.

The short_scale parameter is marked as deprecated with a TODO to log a warning, but no warning is actually emitted. For consistent deprecation behavior (similar to the MWL/PT wrapper functions), consider adding a warning.
     if short_scale is not None:
-        # TODO log warning
-        pass
+        import warnings
+        warnings.warn(
+            "short_scale is deprecated, use scale=Scale.SHORT or scale=Scale.LONG instead",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        scale = Scale.SHORT if short_scale else Scale.LONG

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cd17ac and 7c27d20.

📒 Files selected for processing (4)

ovos_number_parser/__init__.py (17 hunks)
ovos_number_parser/numbers_mwl.py (1 hunks)
ovos_number_parser/numbers_pt.py (9 hunks)
ovos_number_parser/util.py (10 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

ovos_number_parser/numbers_mwl.py (2)

ovos_number_parser/util.py (13)

Scale (38-45)

GrammaticalGender (48-54)

DigitPronunciation (57-59)

NumberVocabulary (264-362)

RomanceNumberExtractor (365-877)

pronounce_ordinal (716-775)

is_fractional (382-397)

is_ordinal (371-380)

text (87-91)

extract_number (399-494)

pronounce_number (552-677)

numbers_to_digits (496-550)

pronounce_fraction (679-714)

ovos_number_parser/__init__.py (7)

pronounce_ordinal (198-245)

is_fractional (318-379)

is_ordinal (382-406)

extract_number (248-315)

pronounce_number (84-170)

numbers_to_digits (36-81)

pronounce_fraction (173-195)

ovos_number_parser/util.py (1)

ovos_number_parser/__init__.py (7)

is_ordinal (382-406)

is_fractional (318-379)

extract_number (248-315)

numbers_to_digits (36-81)

pronounce_number (84-170)

pronounce_ordinal (198-245)

pronounce_fraction (173-195)

🪛 Ruff (0.14.7)

ovos_number_parser/numbers_mwl.py

392-392: f-string without any placeholders

Remove extraneous f prefix

(F541)

393-393: f-string without any placeholders

Remove extraneous f prefix

(F541)

394-394: f-string without any placeholders

Remove extraneous f prefix

(F541)

395-395: f-string without any placeholders

Remove extraneous f prefix

(F541)

ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

304-304: Unused lambda argument: gender

(ARG005)

575-575: Avoid specifying long messages outside the exception class

(TRY003)

609-609: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

737-737: Avoid specifying long messages outside the exception class

(TRY003)

779-779: Unused method argument: gender

(ARG002)

794-794: Avoid specifying long messages outside the exception class

(TRY003)

845-845: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (12)

ovos_number_parser/util.py (2)

231-261: LGTM with minor note on default argument.

The static analysis flags range(1, 21) in the default argument. While range is immutable (unlike lists), evaluating it at function definition time is unconventional. The existing if not denominators guard handles this correctly. No change required.

263-307: Well-structured vocabulary configuration class.

The NumberVocabulary dataclass provides a clean, declarative way to define language-specific number parsing rules. The static analysis warning about the unused gender parameter in the default swap_gender lambda (line 304) is expected—it's a no-op placeholder that language-specific implementations override.

ovos_number_parser/numbers_mwl.py (3)

17-21: TODO indicates uncertainty about gender transformation rules.

The TODO comment suggests the suffix-based gender transformation rules may need linguistic verification for Mirandese. Consider validating these rules with a Mirandese language reference or native speaker.

42-225: Well-structured Mirandese vocabulary configuration.

The NumberVocabulary definition for Mirandese is comprehensive, covering units, tens, hundreds, fractions, ordinals, and scale words for both short and long scales. The use of swap_gender_mwl and pluralize_mwl for morphological transformations aligns with the vocabulary-driven architecture.

233-340: Good deprecation pattern for backward compatibility.

The deprecated wrapper functions properly emit DeprecationWarning with stacklevel=2 (pointing to the caller) and delegate to the new MWL instance methods. This maintains backward compatibility while guiding users toward the new API.

ovos_number_parser/__init__.py (2)

1-33: Clean import structure for new language support.

The imports are well-organized, bringing in the new AST, MWL, PT_PT, and PT_BR extractors alongside the existing language-specific modules.

53-54: AST (Asturian) language routing follows established patterns.

The new Asturian language support is correctly integrated across all dispatch functions (numbers_to_digits, pronounce_number, pronounce_fraction, pronounce_ordinal, extract_number, is_fractional, is_ordinal), following the same delegation pattern used for MWL and PT variants.

Also applies to: 124-125

ovos_number_parser/numbers_pt.py (5)

19-47: Well-implemented gender transformation for Portuguese.

The function handles the common irregular case (dois/duas) explicitly and applies general suffix rules for other words. The pattern is consistent with the Mirandese implementation.

58-266: Comprehensive European Portuguese vocabulary configuration.

The _PT_PT vocabulary is well-structured with appropriate defaults (DEFAULT_SCALE=Scale.LONG for European convention). The Wikipedia reference in the comment provides good documentation provenance.

268-358: Clean Brazilian Portuguese variant with appropriate overrides.

The _PT_BR vocabulary correctly inherits from _PT_PT and overrides only the differences: spelling variations (dezesseis vs dezasseis), scale word spellings (-lhão vs -lião), and DEFAULT_SCALE=Scale.SHORT (reflecting Brazilian usage). The DIGIT_SPELLINGS entry for "meia" (colloquial for 6) is a nice regional touch.

363-507: Proper deprecation wrappers maintain backward compatibility.

The deprecated functions correctly emit DeprecationWarning with stacklevel=2 and delegate to the appropriate PT_PT or PT_BR extractor based on the variant parameter. This allows existing code to continue working while guiding users toward the new API.

575-584: Good addition of edge case tests.

The test block now includes edge cases like 1/0 and 0/0 for fraction pronunciation, which helps verify the DIVIDED_BY_ZERO handling works correctly.

coderabbitai · 2025-12-04T19:24:06Z

ovos_number_parser/util.py

+                for scale_val, w  in scales_map.items():
+                    if token == w or token == self.vocab.pluralize(w):
+                        current_number = current_number or 0
+                        result = result or 0
+                        if current_number is None:
+                            current_number = 1
+                        result += current_number * scale_val
+                        current_number = None
+                        found_scale = True
+                        number_consumed = True
+                        break


⚠️ Potential issue | 🟡 Minor

Dead code: if current_number is None is unreachable.

On line 463, current_number = current_number or 0 ensures current_number is never None at this point. Therefore, the condition if current_number is None on line 465 can never be True, making current_number = 1 unreachable.

If the intent is to default to 1 when no preceding number is specified (e.g., "million" → 1,000,000), the logic should be:

for scale_val, w in scales_map.items(): if token == w or token == self.vocab.pluralize(w): - current_number = current_number or 0 - result = result or 0 if current_number is None: current_number = 1 + result = result or 0 result += current_number * scale_val

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for scale_val, w in scales_map.items():

if token == w or token == self.vocab.pluralize(w):

current_number = current_number or 0

result = result or 0

if current_number is None:

current_number = 1

result += current_number * scale_val

current_number = None

found_scale = True

number_consumed = True

break

for scale_val, w in scales_map.items():

if token == w or token == self.vocab.pluralize(w):

if current_number is None:

current_number = 1

result = result or 0

result += current_number * scale_val

current_number = None

found_scale = True

number_consumed = True

break

🤖 Prompt for AI Agents

In ovos_number_parser/util.py around lines 461-471, the code sets current_number = current_number or 0 which makes the subsequent check if current_number is None unreachable; to fix, stop forcing current_number to 0 before the scale check (remove or move the "current_number = current_number or 0" assignment) and instead only ensure result is initialized (result = result or 0), then keep the if current_number is None branch to set current_number = 1 when a scale word appears without a preceding number; ensure number_consumed/found_scale handling remains the same.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tests/test_number_parser_mwl.py (2)
28-33: Consider removing or enabling the commented-out composite number tests.

The commented-out assertions for composite numbers (21, 35, 99) suggest these were originally intended but not completed. Since composite numbers are formed dynamically by RomanceNumberExtractor rather than stored in TENS, these tests may not be applicable to the vocabulary dictionary. Consider removing the dead code to avoid confusion.

127-135: Clean up commented-out ordinal assertions.

The commented-out tests for décimo (10th ordinal) with the @unittest.skip decorator syntax in comments are confusing. Either enable them with proper @unittest.skip decorators or remove them. The current mixed approach (decorator syntax inside comments) doesn't serve any purpose.
     def test_basic_ordinals_masculine(self):
         self.assertEqual(MWL.pronounce_ordinal(1), "purmerio")
         self.assertEqual(MWL.pronounce_ordinal(2), "segundo")
         self.assertEqual(MWL.pronounce_ordinal(3), "terceiro")
-        # @unittest.skip("TODO - fix me")
-        #self.assertEqual(MWL.pronounce_ordinal(10), "décimo")

     def test_basic_ordinals_feminine(self):
         self.assertEqual(MWL.pronounce_ordinal(1, GrammaticalGender.FEMININE), "purmeria")
         self.assertEqual(MWL.pronounce_ordinal(2, GrammaticalGender.FEMININE), "segunda")
         self.assertEqual(MWL.pronounce_ordinal(3, GrammaticalGender.FEMININE), "terceira")
-        #@unittest.skip("TODO - fix me")
-        #self.assertEqual(MWL.pronounce_ordinal(10, GrammaticalGender.FEMININE), "décima")
ovos_number_parser/numbers_mwl.py (1)
392-395: Remove extraneous f-string prefixes.

These print statements use f-string prefixes but the values are passed via comma rather than interpolation, making the f prefix unnecessary and potentially confusing.
-    print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
-    print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
-    print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous"))
-    print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues"))
+    print("'bint'i un' ->", MWL.extract_number("bint'i un"))
+    print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
+    print("'bint'i dous' ->", MWL.extract_number("bint'i dous"))
+    print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c27d20 and abb45b3.

📒 Files selected for processing (2)

ovos_number_parser/numbers_mwl.py (1 hunks)
tests/test_number_parser_mwl.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

tests/test_number_parser_mwl.py (1)

ovos_number_parser/util.py (2)

GrammaticalGender (48-54)

text (87-91)

ovos_number_parser/numbers_mwl.py (2)

ovos_number_parser/util.py (13)

Scale (38-45)

GrammaticalGender (48-54)

DigitPronunciation (57-59)

NumberVocabulary (264-362)

RomanceNumberExtractor (365-877)

pronounce_ordinal (716-775)

is_fractional (382-397)

is_ordinal (371-380)

text (87-91)

extract_number (399-494)

pronounce_number (552-677)

numbers_to_digits (496-550)

pronounce_fraction (679-714)

ovos_number_parser/__init__.py (7)

pronounce_ordinal (198-245)

is_fractional (318-379)

is_ordinal (382-406)

extract_number (248-315)

pronounce_number (84-170)

numbers_to_digits (36-81)

pronounce_fraction (173-195)

🪛 Ruff (0.14.7)

ovos_number_parser/numbers_mwl.py

392-392: f-string without any placeholders

Remove extraneous f prefix

(F541)

393-393: f-string without any placeholders

Remove extraneous f prefix

(F541)

394-394: f-string without any placeholders

Remove extraneous f prefix

(F541)

395-395: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (10)

tests/test_number_parser_mwl.py (5)

1-5: LGTM! Clean imports and appropriate test dependencies.

The imports are minimal and correctly target the MWL extractor and GrammaticalGender enum needed for the tests.

53-114: Good test coverage for cardinal and fraction pronunciation.

The tests comprehensively cover basic cardinals, hundreds, thousands, large numbers, decimals, negatives, and fractions. The skipped gendered cardinals test with @unittest.skip("TODO - fix me") appropriately marks known limitations for future work.

154-186: LGTM! Comprehensive extraction tests with good edge case coverage.

The extraction tests appropriately cover cardinals (including gendered variants like ũa/dues), decimals (including negatives), ordinals (both genders), phrase extraction, and failure cases. The assertFalse usage for unrecognized input is correct.

192-209: LGTM! Basic digits conversion tests with appropriate skip markers.

The basic conversion tests cover essential cases. The skipped decimal and negative conversion tests are appropriately marked for future implementation.

226-240: Verify the expected pronunciation for the large number test.

The expected text at line 231 for 1_234_567_899 may have an inconsistency. The pronunciation "mil duzientos i trinta i quatro milhones quinhentos i sessenta i siete mil uitocientos i nobenta i nuobe" appears to describe a number structure that may not match exactly. In long scale, 1,234,567,899 would be interpreted differently than in short scale. Ensure the expected output aligns with the Scale.LONG default of MWL.

ovos_number_parser/numbers_mwl.py (5)

1-4: LGTM! Imports are clean and consolidated.

The duplicate import issue from the previous review has been addressed.

33-39: LGTM! Basic pluralization logic with acknowledged TODO.

The pluralization rules handle -on → -ones and default -s suffixing. The TODO comment appropriately flags uncertainty for future refinement.

42-225: Well-structured vocabulary configuration for Mirandese.

The _MWL vocabulary is comprehensive, covering units, tens, hundreds, fractions, ordinals, and scale-specific number words. The configuration correctly sets DEFAULT_SCALE=Scale.LONG for Mirandese and includes alternative spellings for extraction flexibility.

227-227: LGTM! Clean extractor instantiation.

The MWL object is correctly instantiated as a RomanceNumberExtractor with the _MWL vocabulary configuration.

229-340: LGTM! Well-implemented deprecation wrappers.

The deprecated functions correctly:

Emit DeprecationWarning with stacklevel=2 to point to the caller

Provide clear migration guidance

Fall back to MWL.vocab.DEFAULT_SCALE when scale is None

Delegate to the new MWL.* methods

coderabbitai · 2025-12-04T19:47:00Z

ovos_number_parser/numbers_mwl.py

+    elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
+        return word[:-1]


⚠️ Potential issue | 🟠 Major

Potential bug in masculine gender swap for words ending in -ma.

The logic for converting feminine -ma endings to masculine removes only the final character (word[:-1]), resulting in a word ending in m rather than the expected -o ending. For example, "décima" would become "décim" instead of "décimo".

Apply this diff to fix the gender swap:

elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): - return word[:-1] + return word[:-2] + 'o'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):

return word[:-1]

elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):

return word[:-2] + 'o'

🤖 Prompt for AI Agents

In ovos_number_parser/numbers_mwl.py around lines 20-21, the masculine gender branch for words ending in 'ma' currently strips only the final character (word[:-1]) which produces a trailing 'm' (e.g., "décima" -> "décim"); change the transformation to replace the final 'a' with 'o' (for example return word[:-1] + 'o') so "décima" becomes "décimo".

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (1)

ovos_number_parser/util.py (1)

433-498: Scale words mis-parsed; exact-scale phrases and large numbers are wrong.

There are several coupled issues in extract_number:

Scale words are handled as plain cardinals before the scale logic.
Because numbers_map includes entries for scale words (e.g., "millón", "mil"), tokens like "millón" or "mil" hit the val is not None branch and are added directly to current_number, so "un millón" becomes 1 000 001 instead of 1 000 000, and "dous millóns cincocentos" doesn’t aggregate as expected.
The current_number is None branch is unreachable.
Inside the scale loop, you do:
```
current_number = current_number or 0
...
if current_number is None:
    current_number = 1
```
After current_number = current_number or 0, current_number is never None, so that if never fires. Bare scale words like "millón" can’t default to 1.
Trailing cardinals after a scale are dropped once number_consumed is set.
After hitting a scale, number_consumed is set to True, and the final fold:
```
if not number_consumed and current_number:
    result = result or 0
    result += current_number
```
will skip any remaining current_number. This is the classic “scaled part + leftover hundreds” problem.

Given how central this is to GL and AST (and other Romance languages), this is serious for correctness.

A more robust structure would be:

Detect scale words before val is not None handling.
Allow current_number to be None to mean “implicit 1”.
Always fold any leftover current_number into result at the end if any numeric content was seen.

Sketch (condensed for clarity):

-        val = numbers_map.get(token)
-        if val is not None:
+        val = numbers_map.get(token)
+
+        # 1) Handle scale words first
+        scale_val = None
+        for s_val, w in scales_map.items():
+            if token == w or token == self.vocab.pluralize(w):
+                scale_val = s_val
+                break
+
+        if scale_val is not None:
+            # Implicit "one million" when no prefix
+            if current_number is None:
+                current_number = 1
+            result = (result or 0) + current_number * scale_val
+            current_number = None
+            number_consumed = True
+
+        # 2) Plain cardinal tokens
+        elif val is not None:
             current_number = current_number or 0
             result = result or 0
             if prev_token in self.vocab.NEGATIVE_SIGN:
                 is_negative = True
@@
-            else:
-                # Handle large scales like milhão, bilhão
-                found_scale = False
-                for scale_val, w  in scales_map.items():
-                    if token == w or token == self.vocab.pluralize(w):
-                        current_number = current_number or 0
-                        result = result or 0
-                        if current_number is None:
-                            current_number = 1
-                        result += current_number * scale_val
-                        current_number = None
-                        found_scale = True
-                        number_consumed = True
-                        break
-
-                # Handle decimal numbers
-                if not found_scale:
+            else:
+                # Handle decimal numbers

And at the end:

-        if not number_consumed and current_number:
-            result = result or 0
-            result += current_number
+        if current_number is not None:
+            # Fold any trailing cardinal/ordinal component
+            result = (result or 0) + current_number
+            number_consumed = True

Combined with the fraction fix, this should give sensible results for phrases like "un millón", "dous millóns cincocentos", and "tres cuartos".

🧹 Nitpick comments (5)

ovos_number_parser/util.py (3)
231-261: Avoid callable default range(...) in function signature.

convert_to_mixed_fraction uses denominators=range(1, 21) as a default. Linters flag this (B008) because defaults are evaluated at import time; while range is immutable and safe here, it’s still better to follow the pattern.

Consider:
-def convert_to_mixed_fraction(number: Union[int, float], denominators=range(1, 21)) -> Optional[Tuple[int, int, int]]:
+def convert_to_mixed_fraction(number: Union[int, float], denominators=None) -> Optional[Tuple[int, int, int]]:
@@
-    if not denominators:
-        denominators = range(1, 21)
+    if denominators is None:
+        denominators = range(1, 21)
304-306: Unused gender argument in default swap_gender lambda.

The default swap_gender ignores its gender parameter, which triggers ARG005 and can confuse readers.

If you want a truly no-op default but still keep the signature, you can either:
-    swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, gender: word
+    swap_gender: Callable[[str, GrammaticalGender], str] = staticmethod(lambda word, _gender: word)
or define a small named function above and use _gender in its parameter list.

610-611: Rename unused loop variable to _ for clarity.

In the decimal pronunciation loop:
for i in range(n_zeros):
    decimal_pronunciation_parts.append(self.vocab.UNITS[0])
i is unused and triggers B007. Renaming it to _ better conveys intent and quiets linters:
-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])
ovos_number_parser/numbers_mwl.py (1)
401-404: __main__ demo: f-strings without placeholders.

Lines like:
print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
don’t interpolate anything; linters (F541) suggest dropping the f prefix:
-print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
+print("'bint'i un' ->", MWL.extract_number("bint'i un"))
Same for the following similar prints.
ovos_number_parser/numbers_gl.py (1)
104-114: Hundreds vocabulary needs feminine forms once _pronounce_up_to_999 uses swap_gender.

With the proposed change in _pronounce_up_to_999 (wrapping exact hundreds with swap_gender), HUNDREDS will be interpreted as the masculine base. For GL:
HUNDREDS={
    ...
}
and GENDERED_SPELLINGS only overrides units 1 and 2.

After _pronounce_up_to_999 starts calling swap_gender for exact hundreds, you should ensure swap_gender_gl (for single words) correctly turns "douscentos" into "douscentas", etc. If that’s hard to express generically, you can also add explicit entries to GENDERED_SPELLINGS[GrammaticalGender.FEMININE] for 200, 300, … 900 so they bypass the generic path entirely.

As is, tests such as:
GL.pronounce_number(200, gender=GrammaticalGender.FEMININE)
will continue to return "douscentos" until either swap_gender_gl or GENDERED_SPELLINGS encode the correct feminine hundreds.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abb45b3 and c82b4ec.

📒 Files selected for processing (7)

ovos_number_parser/numbers_ast.py (1 hunks)
ovos_number_parser/numbers_gl.py (1 hunks)
ovos_number_parser/numbers_mwl.py (1 hunks)
ovos_number_parser/util.py (10 hunks)
tests/test_number_parser_ast.py (1 hunks)
tests/test_number_parser_gl.py (1 hunks)
tests/test_number_parser_mwl.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_number_parser_mwl.py

🧰 Additional context used

🧬 Code graph analysis (3)

ovos_number_parser/util.py (1)

ovos_number_parser/__init__.py (6)

is_ordinal (382-406)

is_fractional (318-379)

extract_number (248-315)

numbers_to_digits (36-81)

pronounce_number (84-170)

pronounce_ordinal (198-245)

ovos_number_parser/numbers_mwl.py (2)

ovos_number_parser/util.py (13)

Scale (38-45)

GrammaticalGender (48-54)

DigitPronunciation (57-59)

NumberVocabulary (264-366)

RomanceNumberExtractor (369-879)

pronounce_ordinal (716-775)

is_fractional (386-401)

is_ordinal (375-384)

text (87-91)

extract_number (403-498)

pronounce_number (556-677)

numbers_to_digits (500-554)

pronounce_fraction (679-714)

ovos_number_parser/__init__.py (7)

pronounce_ordinal (198-245)

is_fractional (318-379)

is_ordinal (382-406)

extract_number (248-315)

pronounce_number (84-170)

numbers_to_digits (36-81)

pronounce_fraction (173-195)

tests/test_number_parser_ast.py (2)

ovos_number_parser/util.py (2)

GrammaticalGender (48-54)

pronounce_number (556-677)

ovos_number_parser/__init__.py (1)

pronounce_number (84-170)

🪛 GitHub Actions: Run Unit Tests

tests/test_number_parser_gl.py

[error] 23-23: GL composite numbers gender feminine: expected 'vinte e dúas' but got 'vinte e dous'.

[error] 34-34: GL hundreds: feminine form 'douscentas' expected, but got 'douscentos'.

[error] 55-55: GL long scale: expected 'mil millóns' but got 'mil mil mil'.

[error] 47-47: GL millions: expected 'un millón' but got 'mil mil'.

[error] 80-80: GL ordinals: expected 'millonésimo' for 1,000,000 but got 'mil milésimo'.

tests/test_number_parser_ast.py

[error] 117-117: AST fraction extraction failed: expected 0.75 but got 3.25.

[error] 189-189: Integration: large numbers extraction mismatch: 1234567 vs 1004797.

🪛 Ruff (0.14.7)

ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

304-304: Unused lambda argument: gender

(ARG005)

579-579: Avoid specifying long messages outside the exception class

(TRY003)

610-610: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

737-737: Avoid specifying long messages outside the exception class

(TRY003)

794-794: Avoid specifying long messages outside the exception class

(TRY003)

847-847: Avoid specifying long messages outside the exception class

(TRY003)

ovos_number_parser/numbers_mwl.py

401-401: f-string without any placeholders

Remove extraneous f prefix

(F541)

402-402: f-string without any placeholders

Remove extraneous f prefix

(F541)

403-403: f-string without any placeholders

Remove extraneous f prefix

(F541)

404-404: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/test_number_parser_ast.py

21-21: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

🔇 Additional comments (7)

tests/test_number_parser_gl.py (2)

19-82: GL tests clearly expose current backend bugs; keep them as-is.

The failing expectations here:

"vinte e dúas" vs "vinte e dous" (Line 23),

"douscentas" vs "douscentos" (Line 34),

"un millón" / "mil millóns" / "millonésimo" vs various "mil …" outputs,

are consistent with the issues in RomanceNumberExtractor (scale selection and lack of gender handling for exact hundreds) and swap_gender_gl. The test data looks correct and should be treated as the source of truth; the fixes belong in ovos_number_parser/util.py and ovos_number_parser/numbers_gl.py, not in these tests.

116-147: Extraction and numbers-to-digits tests for GL align with intended semantics.

The extraction expectations for:

"dous millóns cincocentos" → 2_000_500,

"mil vinte e tres" → 1023,

"trinta e cinco coma catro" → 35.4,

and numbers_to_digits phrases (e.g., 'atopamos dous millóns cincocentos insectos' -> 'atopamos 2000500 insectos') are exactly what the generic extractor is supposed to support. Current failures in these areas are due to extractor logic, not the tests.

No changes requested here; once the extract_number scale/fraction fixes are applied, these tests should start passing.

ovos_number_parser/numbers_ast.py (1)

4-221: AST vocabulary and helpers look consistent; main issues are in shared extractor.

The Asturian-specific pieces here—swap_gender_ast, pluralize_ast, _AST’s UNITS/TENS/HUNDREDS/FRACTION/ORDINAL maps, and the AST = RomanceNumberExtractor(_AST) wiring—are coherent and match what the AST tests exercise.

The pipeline failures (fraction extraction of "tres cuartos" and large-number integration) stem from the shared RomanceNumberExtractor implementation (scale and fraction handling), which is already flagged in util.py. No changes needed in this file once the shared logic is fixed.

tests/test_number_parser_ast.py (2)

73-90: Fraction extraction test correctly exposes “integer + fraction” vs “numerator/denominator” bug.

TestFractionsAst.test_fraction_pronounce and TestExtractAST.test_fraction_phrase expect:

AST.is_fractional("mediu") → 0.5,

AST.extract_number("tres cuartos") → 0.75.

Current implementation returns 3.25 for the latter because it always does current_number + fraction. This test is doing the right thing by enforcing 3/4 semantics for “tres cuartos”; the fix belongs in RomanceNumberExtractor.extract_number (see earlier comment), not in the test.

179-189: Large-number round-trip test is valid and highlights scale-selection bug.

TestIntegrationAST.test_large_numbers expects AST.extract_number(AST.pronounce_number(1_234_567)) == 1_234_567. The current mismatch (e.g., 1_004_797) is consistent with the shared pronounce_number choosing the smallest scale instead of the largest plus extract_number’s imperfect scale handling.

The test is appropriate; fixes should be applied in RomanceNumberExtractor.pronounce_number and extract_number as already outlined.

ovos_number_parser/numbers_gl.py (2)

322-429: Deprecated GL wrappers correctly delegate but share the same scale/gender issues.

The deprecated functions (pronounce_ordinal_gl, is_fractional_gl, is_ordinal_gl, extract_number_gl, pronounce_number_gl, numbers_to_digits_gl, pronounce_fraction_gl) are thin adapters over GL with a deprecation warning, which is good.

Just be aware:

They inherit all behavior from RomanceNumberExtractor and _GL, including the scale and gender bugs already discussed.

Once you fix the shared logic in util.py and the GL-specific swap_gender_gl / vocab, these wrappers should start passing old tests too.

No structural changes recommended here beyond those shared fixes.

432-542: __main__ GL demo matches intended semantics but will currently show wrong outputs until core fixes land.

The examples printed here (e.g.,

1_234_567 → "un millón douscentos trinta e catro mil cincocentos sesenta e sete",

1_000_000_000 → "mil millóns",

ordinals like 1_000_000 → "millonésimo",

"dous millóns cincocentos" extraction → 2_000_500,

) align with the expectations encoded in tests/test_number_parser_gl.py.

Right now, users running this script will see the same mispronunciations/extraction errors flagged in the tests (due to the shared scale/gender issues). Once RomanceNumberExtractor and swap_gender_gl are corrected, this demo script should output the intended strings; there’s nothing to change here structurally.

coderabbitai · 2025-12-04T23:25:11Z

ovos_number_parser/numbers_gl.py

+def swap_gender_gl(word: str, gender: GrammaticalGender) -> str:
    """
-    This function takes the given text and checks if it is a fraction.
+    Convert a Portuguese word between masculine and feminine grammatical gender by adjusting its ending.

-    Args:
-        text (str): the string to check if fractional
+    Parameters:
+        word (str): The word to convert.
+        gender (GrammaticalGender): The target grammatical gender.

-        short_scale (bool): use short scale if True, long scale if False
    Returns:
-        (bool) or (float): False if not a fraction, otherwise the fraction
-
+        str: The word with its ending swapped to match the specified gender, if applicable; otherwise, the original word.
    """
-    if input_str.endswith('s', -1):
-        input_str = input_str[:len(input_str) - 1]  # e.g. "fifths"
-
-    aFrac = {"medio": 2, "media": 2, "terzo": 3, "cuarto": 4,
-             "cuarta": 4, "quinto": 5, "quinta": 5, "sexto": 6, "sexta": 6,
-             "séptimo": 7, "séptima": 7, "oitavo": 8, "oitava": 8,
-             "noveno": 9, "novena": 9, "décimo": 10, "décima": 10,
-             "onceavo": 11, "onceava": 11, "doceavo": 12, "doceava": 12}
-
-    if input_str.lower() in aFrac:
-        return 1.0 / aFrac[input_str]
-    if (input_str == "vixésimo" or input_str == "vixésima"):
-        return 1.0 / 20
-    if (input_str == "trixésimo" or input_str == "trixésima"):
-        return 1.0 / 30
-    if (input_str == "centésimo" or input_str == "centésima"):
-        return 1.0 / 100
-    if (input_str == "milésimo" or input_str == "milésima"):
-        return 1.0 / 1000
-    return False
-
-
-def extract_number_gl(text, short_scale=True, ordinals=False):
+    if word == "dois" and gender == GrammaticalGender.FEMININE:
+        return "duas"
+    elif word == "duas" and gender == GrammaticalGender.MASCULINE:
+        return "dois"
+
+    elif gender == GrammaticalGender.FEMININE and word.endswith('o'):
+        return word[:-1] + 'a'
+    elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
+        return word[:-1]
+    elif gender == GrammaticalGender.MASCULINE and word.endswith('a'):
+        return word[:-1] + 'o'
+    elif gender == GrammaticalGender.FEMININE and word.endswith('os'):
+        return word[:-2] + 'as'
+    elif gender == GrammaticalGender.MASCULINE and word.endswith('as'):
+        return word[:-2] + 'os'
+    elif gender == GrammaticalGender.FEMININE and word.endswith('m'):
+        return word + 'a'
+    return word


⚠️ Potential issue | 🟠 Major

swap_gender_gl is still Portuguese-flavored and doesn’t handle Galician composites.

A few points here:

The docstring still says “Convert a Portuguese word…”, but this module is Galician-specific.

Special-casing "dois"/"duas" is Portuguese; Galician UNITS use "dous"/"dúas", so that block never triggers.

swap_gender_gl operates on the entire phrase ("vinte e dous"), but only checks suffixes like 'o', 'os', 'a', 'as', 'm'. For "vinte e dous" (ending in 'us'), none of the conditions fire, so feminine GL.pronounce_number(22, gender=FEMININE) stays "vinte e dous" instead of "vinte e dúas" as the tests expect.

Given the shared _pronounce_up_to_999 already builds tens+unit phrases and then calls swap_gender, a pragmatic fix here is:

Make swap_gender_gl recognise "dous"/"dúas".

When given a multi-word phrase, apply gender swap only to the last token, not the whole string.

For example:

def swap_gender_gl(word: str, gender: GrammaticalGender) -> str: - if word == "dois" and gender == GrammaticalGender.FEMININE: - return "duas" - elif word == "duas" and gender == GrammaticalGender.MASCULINE: - return "dois" + # Handle multi-word phrases by swapping only the last token + if " " in word: + parts = word.split() + parts[-1] = swap_gender_gl(parts[-1], gender) + return " ".join(parts) + + if word == "dous" and gender == GrammaticalGender.FEMININE: + return "dúas" + elif word == "dúas" and gender == GrammaticalGender.MASCULINE: + return "dous" @@ - elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): - return word[:-1] + # …keep or adjust the generic suffix rules as appropriate for GL…

and update the docstring to say “Galician” instead of “Portuguese”.

Combined with the _pronounce_up_to_999 hundreds fix in util.py, this should give you the expected "vinte e dúas" and other gendered forms.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In ovos_number_parser/numbers_gl.py around lines 11 to 39, the function swap_gender_gl still refers to Portuguese in the docstring and special-cases "dois"/"duas", and it tries to swap gender on the whole phrase instead of only the final token; this misses Galician forms like "dous"/"dúas" and multi-word phrases like "vinte e dous". Update the docstring to say "Galician", add recognition for Galician unit pair "dous" -> "dúas" (and the reverse), and change the logic so when the input contains multiple words you split by whitespace, apply the gender-swapping rules only to the last token (handling endings 'o','a','os','as','m' and the new "dous"/"dúas" case), then rejoin and return the full phrase.

coderabbitai · 2025-12-04T23:25:12Z

ovos_number_parser/numbers_mwl.py

+def swap_gender_mwl(word: str, gender: GrammaticalGender) -> str:
+    if word.endswith("un") and gender == GrammaticalGender.FEMININE:
+        return word[:-2] + "ũa"
+    elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
+        return word[:-2] + "ũa"
+    elif word == "dous" and gender == GrammaticalGender.FEMININE:
+        return "dues"
+    elif word == "dues" and gender == GrammaticalGender.MASCULINE:
+        return "dous"
+
+    # TODO - is this correct?
+    elif gender == GrammaticalGender.FEMININE and word.endswith('o'):
+        return word[:-1] + 'a'
+    elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
+        return word[:-1]


⚠️ Potential issue | 🟡 Minor

Masculine conversion for words ending in ũa looks wrong.

Current logic:

if word.endswith("un") and gender == GrammaticalGender.FEMININE: return word[:-2] + "ũa" elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE: return word[:-2] + "ũa"

The second branch effectively returns the same "…ũa" ending for masculine by chopping two chars and re‑adding "ũa". For example, "dũa" → "dũa". That doesn’t actually switch gender and is likely a copy‑paste mistake.

You probably meant to restore the masculine "un" form:

- elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE: - return word[:-2] + "ũa" + elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE: + return word[:-2] + "un"

Please double-check with MWL examples, but as written this branch can’t change the word’s gender.

coderabbitai · 2025-12-04T23:25:12Z

ovos_number_parser/util.py

+    def is_ordinal(self, input_str: str, scale: Optional[Scale] = None) -> bool:
+        """
+        Determine if a string is a Portuguese ordinal word.
+
+        Returns:
+            bool: True if the input string is recognized as a Portuguese ordinal, otherwise False.
+        """
+        scale = scale or self.vocab.DEFAULT_SCALE
+        ordinals_map = self.vocab.get_ordinal_strings(scale)
+        return input_str in ordinals_map
+
+    def is_fractional(self, input_str: str) -> Union[float, bool]:
+        """
+        Checks if the input string corresponds to a recognized Portuguese fractional word.
+
+        Returns:
+            The fractional value as a float if recognized; otherwise, False.
+        """
+        fractions_map = self.vocab.get_fraction_strings()
+        input_str = input_str.lower().strip()
+
+        # handle fraction denominator strings
+        for word, den in fractions_map.items():
+            if input_str == word:
+                return 1.0 / den
+
+        return False


⚠️ Potential issue | 🟠 Major

Fraction aggregation treats “three quarters” as 3.25 instead of 0.75.

In extract_number, the fractional branch always does:

current_number = current_number or 0 result = result or 0 fraction = self.is_fractional(token) result += current_number + fraction

For AST, "tres cuartos" becomes 3 + 1/4 = 3.25 instead of 3/4 = 0.75, matching the failing test in tests/test_number_parser_ast.py (Line 117). Similar patterns will show up for “tres terzos”, etc.

You likely want to treat a bare “ ” phrase as N * (1/den) when there is no prior integer part, while still supporting “integer + fraction” for mixed forms.

One possible approach:

- elif self.is_fractional(token): - current_number = current_number or 0 - result = result or 0 - fraction = self.is_fractional(token) - result += current_number + fraction - current_number = None - number_consumed = True + elif self.is_fractional(token): + fraction = self.is_fractional(token) + # Initialize result if needed + result = result or 0 + + # If we've already consumed a scale/decimal/fraction before, + # treat current_number as integer part. + if number_consumed or not current_number: + result += (current_number or 0) + fraction + else: + # Bare "tres cuartos" style: interpret as numerator/denominator + denom = round(1 / fraction) + result += current_number / denom + + current_number = None + number_consumed = True

You might want to refine the conditions (e.g., based on token count or language specifics), but the current behavior is clearly wrong for the AST test case.

coderabbitai · 2025-12-04T23:25:12Z

ovos_number_parser/util.py

+        if "." in str(number):
+            integer_part = int(number)
+            decimal_part_str = str(number).split('.')[1].rstrip("0")
+
+            # Handle cases where the decimal part rounds to zero
+            if decimal_part_str and int(decimal_part_str) == 0:
+                return self.pronounce_number(integer_part, places,
+                                           scale=scale,
+                                           digits=digits, gender=gender)
+
+            int_pronunciation = self.pronounce_number(integer_part, places,
+                                                    scale=scale,
+                                                    digits=digits, gender=gender)
+
+            decimal_pronunciation_parts = []
+            #  pronounce decimals either as a whole number or digit by digit
+            if decimal_part_str:
+                if digits == DigitPronunciation.FULL_NUMBER:
+
+                    # handle leading zeros
+                    no_z = decimal_part_str.lstrip("0") # without zeros
+                    n_zeros = len(decimal_part_str) - len(no_z)
+                    for i in range(n_zeros):
+                        decimal_pronunciation_parts.append(self.vocab.UNITS[0])
+
+                    if n_zeros >= places:
+                        # read all zeros
+                        last_digit = int(decimal_part_str[n_zeros])
+                        after_last_digit = int(decimal_part_str[n_zeros + 1]) if len(decimal_part_str) > n_zeros + 1 else 0
+                        # round up last digit if needed
+                        if after_last_digit >= 5:
+                            last_digit += 1
+                        decimal_pronunciation_parts.append(self._pronounce_up_to_999(last_digit, gender=gender))
+                    else:
+                        if len(decimal_part_str) > places:
+                            last_digit = int(decimal_part_str[places - 1])
+                            after_last_digit = int(decimal_part_str[places])
+                            # round up last digit if needed
+                            if after_last_digit >= 5:
+                                last_digit += 1
+                            decimal_part_str = decimal_part_str[:places - 1] + str(last_digit)
+                        decimal_pronunciation_parts.append(self.pronounce_number(int(decimal_part_str), gender=gender))
+                else:
+                    for digit in decimal_part_str:
+                        decimal_pronunciation_parts.append(self.pronounce_number(int(digit), gender=gender))
+
+            decimal_pronunciation = " ".join(decimal_pronunciation_parts) or self.vocab.UNITS[0]
+            decimal_word = self.vocab.DECIMAL_MARKER[0]
+            return f"{int_pronunciation} {decimal_word} {decimal_pronunciation}"
+


⚠️ Potential issue | 🟠 Major

Decimal pronunciation can crash on .0 values and has minor rounding edge cases.

In the decimal branch of pronounce_number:

For numbers like 1.0 or 2.00, decimal_part_str becomes "" after .rstrip("0"). The subsequent logic doesn’t guard against an empty string and will eventually call int(decimal_part_str), which raises ValueError.

The if n_zeros >= places: block indexes decimal_part_str[n_zeros] without ensuring n_zeros < len(decimal_part_str).

A safer pattern:

if "." in str(number): integer_part = int(number) decimal_part_str = str(number).split('.')[1].rstrip("0") - # Handle cases where the decimal part rounds to zero - if decimal_part_str and int(decimal_part_str) == 0: - return self.pronounce_number(integer_part, places, - scale=scale, - digits=digits, gender=gender) + # If nothing remains after stripping zeros, it's effectively an integer + if not decimal_part_str: + return self.pronounce_number( + integer_part, places, scale=scale, digits=digits, gender=gender + )

and ensure any indexing into decimal_part_str[...] is only done when len(decimal_part_str) > index.

🧰 Tools

🪛 Ruff (0.14.7)

610-610: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

coderabbitai · 2025-12-04T23:25:12Z

ovos_number_parser/util.py

+        # Find the smallest scale that fits the number
+        scale_candidates = [(scale_val, scale_str)
+                            for scale_val, scale_str in scales_map.items()
+                            if n >= scale_val]
+        scale_val: Tuple[int, str] = scale_candidates[0]
+
+        count = n // scale_val[0]
+        remainder = n % scale_val[0]
+
+        # Pronounce the 'count' part of the number
+        is_singular = count == 1 or scale_val[0] in self.vocab.NO_PLURAL
+        scale_word = scale_val[1] if is_singular else self.vocab.pluralize(scale_val[1])
+        if count == 1 and scale_val[0] in self.vocab.NO_PREV_UNIT:
+            count_str = scale_word
+        else:
+            count_pronunciation = self.pronounce_number(count, places, scale)
+            count_str = f"{count_pronunciation} {scale_word}"
+
+        # If there's no remainder, we're done
+        if remainder == 0:
+            return count_str
+
+        # Pronounce the remainder and join with the correct conjunction
+        remainder_str = self.pronounce_number(remainder, places, scale)
+        # Conjunction logic: add JOIN_WORD if the remainder is the last group and is
+        # less than 100 or a multiple of 100.
+        join_word = self.vocab.JOIN_WORD[0] if len(self.vocab.JOIN_WORD) else ""
+        if remainder < 100 or (remainder < 1000 and remainder % 100 == 0):
+            return f"{count_str} {join_word} {remainder_str}"
+        else:
+            return f"{count_str} {remainder_str}"


⚠️ Potential issue | 🔴 Critical

Pronunciation of large numbers picks the smallest scale instead of the largest.

In pronounce_number, you compute:

scales_map = ... scale_candidates = [(scale_val, scale_str) for scale_val, scale_str in scales_map.items() if n >= scale_val] scale_val: Tuple[int, str] = scale_candidates[0]

Because dictionaries are insertion-ordered, this picks the first matching scale, i.e., the smallest value ≥ 1 000. For GL, where LONG_SCALE starts with 1000: "mil" then 10**6: "millón", 1_000_000 is pronounced as "mil mil"; 1_000_000_000 becomes "mil mil mil". This exactly matches the GL test failures.

You want the largest scale ≤ n. Suggestion:

- scale_candidates = [(scale_val, scale_str) - for scale_val, scale_str in scales_map.items() - if n >= scale_val] - scale_val: Tuple[int, str] = scale_candidates[0] + scale_candidates = [ + (scale_val, scale_str) + for scale_val, scale_str in scales_map.items() + if n >= scale_val + ] + if not scale_candidates: + return self._pronounce_up_to_999(n, gender) + # choose the largest applicable scale + scale_val, scale_str = max(scale_candidates, key=lambda item: item[0])

and then use scale_str instead of re-indexing scale_val[1] below.

The same bug exists in pronounce_ordinal (see comment there); both need to be fixed to get correct “un millón”, “mil millóns”, and “millonésimo”.

🤖 Prompt for AI Agents

In ovos_number_parser/util.py around lines 647 to 677, the code picks the first matching scale (smallest) from scale_candidates causing large numbers to use repeated smaller scales; change the logic to select the largest scale <= n by using max(scale_candidates, key=lambda t: t[0]) (or sort and take last) and unpack it as (scale_val, scale_str) so subsequent code uses scale_str rather than re-indexing scale_val[1]; apply the exact same fix to the corresponding block in pronounce_ordinal so both functions choose the largest applicable scale and use the proper scale_str variable.

coderabbitai · 2025-12-04T23:25:12Z

ovos_number_parser/util.py

+        is_singular = count == 1 or scale_val[0] in self.vocab.NO_PLURAL
+        scale_word = scale_val[1] if is_singular else self.vocab.pluralize(scale_val[1])
+        if count == 1 and scale_val[0] in self.vocab.NO_PREV_UNIT:
+            count_str = scale_word
+        else:
+            count_pronunciation = self.pronounce_number(count, places, scale)
+            count_str = f"{count_pronunciation} {scale_word}"
+
+        # If there's no remainder, we're done
+        if remainder == 0:
+            return count_str
+
+        # Pronounce the remainder and join with the correct conjunction
+        remainder_str = self.pronounce_number(remainder, places, scale)
+        # Conjunction logic: add JOIN_WORD if the remainder is the last group and is
+        # less than 100 or a multiple of 100.
+        join_word = self.vocab.JOIN_WORD[0] if len(self.vocab.JOIN_WORD) else ""
+        if remainder < 100 or (remainder < 1000 and remainder % 100 == 0):
+            return f"{count_str} {join_word} {remainder_str}"
+        else:


⚠️ Potential issue | 🟠 Major

Hundreds ignore grammatical gender for exact multiples (e.g., 200 feminine).

_pronounce_up_to_999 applies swap_gender only in the composite case and returns self.vocab.HUNDREDS[n] directly for exact multiples of 100:

if n in self.vocab.HUNDREDS: return self.vocab.HUNDREDS[n] ... return self.vocab.swap_gender(" ".join(parts), gender)

This means GL.pronounce_number(200, gender=FEMININE) always yields "douscentos", never "douscentas", which is exactly what the GL tests report.

A generic fix that works for all Romance languages:

- if n in self.vocab.HUNDREDS: - return self.vocab.HUNDREDS[n] + if n in self.vocab.HUNDREDS: + base = self.vocab.HUNDREDS[n] + return self.vocab.swap_gender(base, gender)

Given other languages either have no gender variation for hundreds (in which case swap_gender will be a no-op) or can encode their rules in swap_gender, this change centralizes the behavior correctly.

🤖 Prompt for AI Agents

In ovos_number_parser/util.py around lines 657 to 676, the function handling hundreds returns self.vocab.HUNDREDS[n] directly for exact multiples of 100, causing grammatical gender to be ignored (e.g., 200 always yields masculine form); change the logic so the returned hundred word is passed through self.vocab.swap_gender(..., gender) (i.e., apply swap_gender to the hundred-word in the exact-multiple branch), and ensure the same swap_gender usage remains for the composite case so all hundreds respect the requested gender across languages.

coderabbitai · 2025-12-04T23:25:12Z

ovos_number_parser/util.py

+        scales_map = self.vocab.ORDINAL_SHORT_SCALE if scale == Scale.SHORT else self.vocab.ORDINAL_LONG_SCALE
+
+        # Find the smallest scale that fits the number
+        scale_candidates = [(scale_val, scale_str)
+                            for scale_val, scale_str in scales_map.items()
+                            if n >= scale_val]
+        scale_val: Tuple[int, str] = scale_candidates[0]
+
+        count = n // scale_val[0]
+        remainder = n % scale_val[0]
+
+        # Special case for "milésimo" and other large scales where 'um' is not needed
+        if count == 1:
+            count_str = self.vocab.swap_gender(scale_val[1], gender)
+        else:
+            # Pronounce the 'count' part of the number and the scale word
+            count_pronunciation = self.pronounce_number(count, scale=scale)
+            scale_word = self.vocab.swap_gender(scale_val[1], gender)
+            count_str = f"{count_pronunciation} {scale_word}"
+
+        # If there's no remainder, we're done
+        if remainder == 0:
+            return count_str
+
+        # Pronounce the remainder and join
+        remainder_str = self.pronounce_ordinal(remainder, gender, scale)
+
+        return f"{count_str} {remainder_str}"


⚠️ Potential issue | 🔴 Critical

Ordinal pronunciation has the same “smallest scale” bug as cardinal pronunciation.

pronounce_ordinal mirrors pronounce_number in its scale handling:

scale_candidates = [(scale_val, scale_str) for scale_val, scale_str in scales_map.items() if n >= scale_val] scale_val: Tuple[int, str] = scale_candidates[0]

For GL, 1_000_000 ordinal becomes "mil milésimo" instead of "millonésimo", exactly as seen in the GL tests.

Apply the same fix as for pronounce_number:

- scale_candidates = [(scale_val, scale_str) - for scale_val, scale_str in scales_map.items() - if n >= scale_val] - scale_val: Tuple[int, str] = scale_candidates[0] + scale_candidates = [ + (scale_val, scale_str) + for scale_val, scale_str in scales_map.items() + if n >= scale_val + ] + if not scale_candidates: + # 0–999 should already have been handled + raise ValueError("No ordinal scale found for number") + scale_val, scale_str = max(scale_candidates, key=lambda item: item[0]) @@ - if count == 1: - count_str = self.vocab.swap_gender(scale_val[1], gender) + if count == 1: + count_str = self.vocab.swap_gender(scale_str, gender) @@ - scale_word = self.vocab.swap_gender(scale_val[1], gender) + scale_word = self.vocab.swap_gender(scale_str, gender)

🤖 Prompt for AI Agents

In ovos_number_parser/util.py around lines 748 to 775, the ordinal pronunciation picks the first matching scale candidate resulting in wrong outputs (e.g., "mil milésimo" for 1_000_000); change the selection logic to choose the largest scale_val <= n (same fix as pronounce_number): build scale_candidates as you already do, then set scale_val = max(scale_candidates, key=lambda t: t[0]) (or sort and take the last) so the biggest fitting scale is used; keep the rest of the logic (count, remainder, gender swap, recursion) unchanged.

feat: asturian

e89e084

github-actions bot added feature and removed feature labels Dec 3, 2025

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

ovos_number_parser/numbers_ast.py Outdated Show resolved Hide resolved

ovos_number_parser/numbers_ast.py Outdated Show resolved Hide resolved

JarbasAl marked this pull request as draft December 3, 2025 00:59

JarbasAl added 2 commits December 3, 2025 18:59

refactor: utils.py typing

65c15fa

feat: shared RomanceNumberExtractor class

ec9e39d

JarbasAl marked this pull request as ready for review December 4, 2025 04:33

JarbasAl changed the title ~~feat: asturian~~ feat: asturian + shared RomanceNumberExtractor class Dec 4, 2025