Skip to content

Conversation

@JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Dec 3, 2025

Summary by CodeRabbit

  • New Features

    • Full Asturian support plus dedicated Portuguese variants, Mirandese and Galician backends; richer vocabulary-driven parsing/pronunciation and exposed word tokenizer.
  • Documentation

    • Supported languages table updated to include Asturian.
  • Compatibility / Deprecations

    • Backwards-compatible wrappers with deprecation notices while APIs migrate to the new extractor/vocabulary model.
  • Tests

    • Expanded test coverage for Asturian, Portuguese variants, Mirandese, Galician, tokenizer, pronunciation, extraction and digits conversion.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

Adds Asturian support and refactors number parsing into a vocabulary-driven RomanceNumberExtractor with new vocabularies (AST, GL, MWL, PT_PT, PT_BR), updated util dataclasses/tokenization, package dispatch/signature changes, README language table update, and new/updated tests.

Changes

Cohort / File(s) Summary
Documentation
README.md
Adds ast (Asturian) to Supported Languages table with all capability columns marked ✅.
Package dispatch
ovos_number_parser/__init__.py
Exports extended (AST, PT_PT, PT_BR, MWL); public APIs (numbers_to_digits, pronounce_number, pronounce_fraction, pronounce_ordinal, extract_number, is_fractional, is_ordinal) accept scale: Optional[Scale] (deprecate short_scale) and route to variant-specific RomanceNumberExtractor instances based on lang.
Asturian implementation
ovos_number_parser/numbers_ast.py
New Asturian helpers (swap_gender_ast, pluralize_ast), _AST NumberVocabulary and AST = RomanceNumberExtractor(_AST) exposed for parsing/pronunciation/digits; demo __main__ included.
Mirandese (MWL)
ovos_number_parser/numbers_mwl.py
Migrated to MWL = RomanceNumberExtractor(_MWL) with swap_gender_mwl/pluralize_mwl; legacy top-level functions retained as deprecated wrappers delegating to MWL; scale params made Optional where applicable.
Portuguese refactor
ovos_number_parser/numbers_pt.py
Replaced inline mappings with NumberVocabulary configs; added PT_PT/PT_BR extractors, swap_gender_pt/pluralize_pt; legacy functions now deprecation wrappers delegating to extractor methods; tests updated to use public vocab.
Galician implementation
ovos_number_parser/numbers_gl.py
Added swap_gender_gl, pluralize_gl, _GL NumberVocabulary and GL = RomanceNumberExtractor(_GL) plus deprecated wrappers delegating to GL.
Core utilities refactor
ovos_number_parser/util.py
Major redesign: Token → immutable dataclass, ReplaceableNumber dataclass, new word_tokenize, updated tokenize, typed helpers, NumberVocabulary data model, and RomanceNumberExtractor class centralizing is_ordinal/is_fractional, extract_number, numbers_to_digits, pronounce_* logic; public API surface expanded.
Tests — Asturian
tests/test_number_parser_ast.py
New tests for AST vocab, pronunciations, ordinals, fractions, extraction, numbers-to-digits, and integration (some tests skipped/TODO).
Tests — Portuguese
tests/test_number_parser_pt.py
Tests migrated to public vocab-based interfaces (PT_PT, PT_BR, .vocab.*) and extractor methods instead of private internals; updated expectations for variant and decimal behavior.
Tests — Mirandese
tests/test_number_parser_mwl.py
New comprehensive tests covering MWL vocab, pronunciations, fractions, ordinals, extraction, digits, and integration (some skipped).
Tests — Galician
tests/test_number_parser_gl.py
New comprehensive tests for GL pronunciation, fractions, extraction, digits, decimals, ordinals, and long-scale behavior.
Tests — Utilities
tests/test_util.py
Adds/updates tests for word_tokenize exposure and tokenization behaviors (punctuation, hyphens, whitespace, edge cases).

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Package as ovos_number_parser.__init__
    participant Extractor as RomanceNumberExtractor (AST / PT_PT / PT_BR / MWL / GL)
    participant Vocab as NumberVocabulary

    Caller->>Package: call public API (e.g., pronounce_number(lang, ...))
    Package->>Extractor: select extractor by lang/variant
    Extractor->>Vocab: consult vocabulary (numbers, ordinals, fractions, scales, gender rules)
    Vocab-->>Extractor: provide mappings & join/decimal rules
    Extractor->>Extractor: run parsing/pronunciation algorithms (tokenize, extract, pronounce)
    Extractor-->>Package: return computed result
    Package-->>Caller: deliver final result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • ovos_number_parser/util.py — NumberVocabulary, RomanceNumberExtractor logic, tokenization changes, immutability enforcement.
    • ovos_number_parser/numbers_ast.py — language rules, gender/plural/ordinal correctness.
    • ovos_number_parser/numbers_pt.py, numbers_mwl.py, numbers_gl.py — variant routing, deprecation wrappers, compatibility behavior.
    • ovos_number_parser/init.py — dispatch logic and updated function signatures (scale/short_scale handling).
    • Updated tests — ensure test expectations align with refactored public APIs and skipped/TODO markings.

"I hopped through words and rules today,
Asturian counts now find their way —
Fractions and ordinals softly sung,
Digits and genders neatly strung,
A rabbit's cheer: hop, parse, hooray! 🥕"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.45% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding Asturian language support and introducing a shared RomanceNumberExtractor class architecture used across multiple language implementations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ast

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added feature and removed feature labels Dec 3, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ovos_number_parser/__init__.py (1)

86-121: Fix pronounce_number dispatch for Asturian to honor ordinals, scale, digit style, and gender.

The AST branch currently calls:

if lang.startswith("ast"):
    return pronounce_number_ast(number, places)

This ignores:

  • short_scale (and the derived scale enum),
  • ordinals,
  • digits,
  • gender.

As a result, pronounce_number(..., lang="ast", ordinals=True, digits=DigitPronunciation.DIGIT_BY_DIGIT, gender=GrammaticalGender.FEMININE, short_scale=False) will silently behave like a simple cardinal with LONG scale and masculine gender.

pronounce_number_ast already supports scale, ordinals, digits, and gender, so we should thread these through:

@@ def pronounce_number(...):
-    if lang.startswith("ca"):
-        return pronounce_number_ca(number, places)
-    if lang.startswith("ast"):
-        return pronounce_number_ast(number, places)
+    if lang.startswith("ca"):
+        return pronounce_number_ca(number, places)
+    if lang.startswith("ast"):
+        return pronounce_number_ast(
+            number,
+            places,
+            scale=scale,
+            ordinals=ordinals,
+            digits=digits,
+            gender=gender,
+        )

This brings the AST branch in line with PT/MWL behavior and the documented API semantics.

🧹 Nitpick comments (3)
ovos_number_parser/numbers_ast.py (2)

123-136: Simplify _swap_gender_ast feminine branch; the endswith("imu") check is redundant.

Any word ending with "imu" also ends with "u", so the second condition is unreachable. You can drop it for clarity without changing behavior.

 def _swap_gender_ast(word: str, gender: GrammaticalGender) -> str:
@@
-    if gender == GrammaticalGender.FEMININE:
-        if word.endswith('u'):
-            return word[:-1] + 'a'
-        if word.endswith('imu'):
-            return word[:-1] + 'a'
+    if gender == GrammaticalGender.FEMININE:
+        if word.endswith('u'):
+            return word[:-1] + 'a'

316-334: Address Ruff B007: rename unused loop variable sv in is_ordinal_ast.

The loop over _ORDINAL_SCALES_AST[Scale.SHORT] doesn’t use the numeric scale value, only the name. Renaming the unused variable to _ makes that intent explicit and silences B007.

-    for sv, sname in _ORDINAL_SCALES_AST[Scale.SHORT]:
-        if tok == sname or tok == _swap_gender_ast(sname, GrammaticalGender.FEMININE):
+    for _, sname in _ORDINAL_SCALES_AST[Scale.SHORT]:
+        if tok == sname or tok == _swap_gender_ast(sname, GrammaticalGender.FEMININE):
             return True
ovos_number_parser/__init__.py (1)

38-56: Propagate scale into numbers_to_digits_ast for consistency and future-proofing.

numbers_to_digits accepts a scale argument, but the Asturian branch currently ignores it and always uses the default LONG scale, even though numbers_to_digits_ast exposes a scale parameter.

While SHORT and LONG scales share the same definitions today, passing the caller’s scale keeps behavior consistent with PT/MWL and avoids surprises if Asturian scales diverge later.

     if lang.startswith("az"):
         return numbers_to_digits_az(utterance)
     if lang.startswith("ast"):
-        return numbers_to_digits_ast(utterance)
+        return numbers_to_digits_ast(utterance, scale=scale)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e450831 and e89e084.

📒 Files selected for processing (3)
  • README.md (1 hunks)
  • ovos_number_parser/__init__.py (8 hunks)
  • ovos_number_parser/numbers_ast.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ovos_number_parser/numbers_ast.py (1)
ovos_number_parser/util.py (4)
  • Scale (12-19)
  • GrammaticalGender (22-28)
  • DigitPronunciation (31-33)
  • text (62-66)
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_ast.py

150-150: Avoid specifying long messages outside the exception class

(TRY003)


189-189: Avoid specifying long messages outside the exception class

(TRY003)


230-230: Avoid specifying long messages outside the exception class

(TRY003)


331-331: Loop control variable sv not used within loop body

Rename unused sv to _sv

(B007)


480-480: Avoid specifying long messages outside the exception class

(TRY003)


517-517: Loop control variable s_name not used within loop body

(B007)


517-517: Loop control variable p_name not used within loop body

(B007)

🔇 Additional comments (6)
README.md (1)

41-41: Asturian support row is consistent with implementation.

The ast entry correctly advertises pronounce number/ordinal, extraction, and numbers_to_digits support, matching the new numbers_ast module and wiring.

ovos_number_parser/__init__.py (5)

15-16: Asturian module import wiring looks correct.

The numbers_ast import exposes all the expected entry points (numbers_to_digits, pronounce number/ordinal, extract, fraction/ordinal checks) and aligns with how other language modules are wired here.


172-195: Asturian fraction pronunciation dispatch is correctly wired.

The pronounce_fraction function cleanly adds an ast branch that delegates to pronounce_fraction_ast, mirroring the existing PT and MWL handling and respecting the scale argument.

No issues here.


197-223: Asturian ordinal pronunciation dispatch is correctly integrated.

The pronounce_ordinal branch for lang.startswith("ast") calls pronounce_ordinal_ast(number, scale=scale, gender=gender), correctly passing both scale and gender in line with PT/MWL behavior.

Once the ordinal hundreds issue in numbers_ast.py is addressed, this wiring will behave as expected.


242-295: Asturian extract_number wiring matches the new implementation.

The extract_number dispatch:

if lang.startswith("ast"):
    return extract_number_ast(text, scale=scale, ordinals=ordinals)

correctly passes both scale and ordinals by keyword to extract_number_ast(text, ordinals=False, scale=Scale.LONG). After fixing the aggregation/no-number behavior in numbers_ast.extract_number_ast, this integration should be solid.


305-353: Asturian is_fractional and is_ordinal dispatches are consistent with other languages.

  • is_fractional for ast simply calls is_fractional_ast(input_str), which is appropriate since the Asturian implementation doesn’t use scale.
  • is_ordinal for ast calls is_ordinal_ast(input_str) analogously to PT/MWL.

These branches correctly extend the public API without affecting existing language behavior.

@JarbasAl JarbasAl marked this pull request as draft December 3, 2025 00:59
@JarbasAl JarbasAl marked this pull request as ready for review December 4, 2025 04:33
@JarbasAl JarbasAl changed the title feat: asturian feat: asturian + shared RomanceNumberExtractor class Dec 4, 2025
@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ovos_number_parser/numbers_mwl.py (1)

237-356: Fix docstrings: references to "Portuguese" should be "Mirandese".

All wrapper function docstrings incorrectly reference "Portuguese" (and "Brazilian Portuguese" in some cases) instead of "Mirandese". This appears to be a copy-paste oversight.

Examples to fix:

  • Line 243: "Portuguese" → "Mirandese"
  • Line 251: "Portuguese" → "Mirandese"
  • Line 263: "Portuguese" → "Mirandese"
  • Line 273, 276: "Portuguese" → "Mirandese"
  • Line 287: "Portuguese" → "Mirandese"
  • Line 309: Remove "Brazilian and European Portuguese variants" reference
  • Lines 329, 331, 334: "Portuguese" → "Mirandese"
  • Lines 346, 348, 354: "Portuguese" → "Mirandese"
♻️ Duplicate comments (1)
ovos_number_parser/numbers_ast.py (1)

163-166: Ordinals for 200–900 are incomplete.

The ORDINAL_HUNDREDS only defines 100: 'centésimu', leaving ordinals like 200th–900th unsupported. This was previously flagged and the TODO comment acknowledges the gap.

🧹 Nitpick comments (6)
ovos_number_parser/util.py (1)

586-598: Minor: Rename unused loop variable.

The loop variable i on line 588 is unused. Consider using _ to indicate an intentionally unused variable.

-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])
tests/test_number_parser_ast.py (2)

28-32: Use hyphen-minus instead of en-dash in comment.

Line 29 uses an en-dash (–) which static analysis flags. Use a regular hyphen-minus (-) for consistency.

-        # 21–29 must exist
+        # 21-29 must exist

87-95: Multiple skipped tests indicate incomplete AST functionality.

Several tests are skipped with @unittest.skip("TODO - fix me"):

  • test_fraction_variants (line 87)
  • test_compound_fraction_phrase (line 92)
  • test_fraction_phrase (line 129)
  • test_negative (line 133)
  • test_large_numbers (line 201)
  • test_negative in integration (line 212)

These represent known gaps in the Asturian implementation.

Would you like me to open issues to track these incomplete features, or would you prefer to address them in this PR?

Also applies to: 129-136, 201-216

ovos_number_parser/numbers_pt.py (1)

330-340: is_fractional_pt and is_ordinal_pt do not accept a variant parameter.

These functions always delegate to PT_PT, while other public APIs accept a variant parameter. This is likely intentional since the vocabulary's ALT_SPELLINGS handles cross-variant recognition, but it creates an API inconsistency.

Consider documenting this design choice or adding a variant parameter for API uniformity.

ovos_number_parser/numbers_ast.py (2)

167-176: Ordinal scale coverage is limited.

The ordinal scales only define up to 10^6, with TODO comments indicating the need for larger scales. Consider completing these for full feature parity with other languages, or document the supported range limitations.

Would you like me to generate the missing ordinal scale entries for Asturian?


107-119: Fraction vocabulary has limited coverage compared to Portuguese.

Only fractions 2–12 are defined, while Portuguese includes entries up to 1000. Additionally, entries 11 and 12 already include the "avos" suffix, which may cause duplication issues similar to Portuguese if DENOMINATOR_PARTICLE is appended for plurals.

Consider expanding coverage or documenting the supported range.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e89e084 and ec9e39d.

📒 Files selected for processing (7)
  • ovos_number_parser/numbers_ast.py (1 hunks)
  • ovos_number_parser/numbers_mwl.py (5 hunks)
  • ovos_number_parser/numbers_pt.py (15 hunks)
  • ovos_number_parser/util.py (10 hunks)
  • tests/test_number_parser_ast.py (1 hunks)
  • tests/test_number_parser_pt.py (6 hunks)
  • tests/test_util.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
tests/test_util.py (1)
ovos_number_parser/util.py (1)
  • word_tokenize (115-135)
ovos_number_parser/numbers_ast.py (2)
ovos_number_parser/util.py (11)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • DigitPronunciation (57-59)
  • pronounce_ordinal (694-752)
  • is_fractional (380-395)
  • is_ordinal (369-378)
  • text (87-91)
  • extract_number (397-475)
  • pronounce_number (532-656)
  • numbers_to_digits (477-530)
  • pronounce_fraction (658-692)
ovos_number_parser/__init__.py (7)
  • pronounce_ordinal (197-239)
  • is_fractional (305-359)
  • is_ordinal (362-386)
  • extract_number (242-302)
  • pronounce_number (86-169)
  • numbers_to_digits (38-83)
  • pronounce_fraction (172-194)
tests/test_number_parser_ast.py (2)
ovos_number_parser/util.py (3)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • text (87-91)
ovos_number_parser/numbers_ast.py (6)
  • pronounce_number_ast (246-267)
  • pronounce_ordinal_ast (183-202)
  • is_fractional_ast (205-214)
  • pronounce_fraction_ast (290-302)
  • extract_number_ast (227-243)
  • numbers_to_digits_ast (270-286)
ovos_number_parser/numbers_pt.py (2)
ovos_number_parser/util.py (13)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • DigitPronunciation (57-59)
  • NumberVocabulary (264-360)
  • RomanceNumberExtractor (363-854)
  • pronounce_ordinal (694-752)
  • is_fractional (380-395)
  • is_ordinal (369-378)
  • extract_number (397-475)
  • text (87-91)
  • pronounce_number (532-656)
  • numbers_to_digits (477-530)
  • pronounce_fraction (658-692)
ovos_number_parser/__init__.py (7)
  • pronounce_ordinal (197-239)
  • is_fractional (305-359)
  • is_ordinal (362-386)
  • extract_number (242-302)
  • pronounce_number (86-169)
  • numbers_to_digits (38-83)
  • pronounce_fraction (172-194)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (6)
  • is_ordinal (362-386)
  • is_fractional (305-359)
  • extract_number (242-302)
  • numbers_to_digits (38-83)
  • pronounce_number (86-169)
  • pronounce_ordinal (197-239)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (14)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • DigitPronunciation (57-59)
  • word_tokenize (115-135)
  • NumberVocabulary (264-360)
  • RomanceNumberExtractor (363-854)
  • pronounce_ordinal (694-752)
  • is_fractional (380-395)
  • is_ordinal (369-378)
  • text (87-91)
  • extract_number (397-475)
  • pronounce_number (532-656)
  • numbers_to_digits (477-530)
  • pronounce_fraction (658-692)
ovos_number_parser/__init__.py (7)
  • pronounce_ordinal (197-239)
  • is_fractional (305-359)
  • is_ordinal (362-386)
  • extract_number (242-302)
  • pronounce_number (86-169)
  • numbers_to_digits (38-83)
  • pronounce_fraction (172-194)
🪛 Ruff (0.14.7)
tests/test_number_parser_ast.py

29-29: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


304-304: Unused lambda argument: gender

(ARG005)


554-554: Avoid specifying long messages outside the exception class

(TRY003)


588-588: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


714-714: Avoid specifying long messages outside the exception class

(TRY003)


756-756: Unused method argument: gender

(ARG002)


771-771: Avoid specifying long messages outside the exception class

(TRY003)


822-822: Avoid specifying long messages outside the exception class

(TRY003)

ovos_number_parser/numbers_mwl.py

6-6: Redefinition of unused Scale from line 3

Remove definition: Scale

(F811)


6-6: Redefinition of unused GrammaticalGender from line 3

Remove definition: GrammaticalGender

(F811)


6-6: Redefinition of unused DigitPronunciation from line 3

Remove definition: DigitPronunciation

(F811)


8-8: Redefinition of unused Union from line 1

Remove definition: Union

(F811)

🔇 Additional comments (22)
ovos_number_parser/numbers_mwl.py (2)

11-43: Verify Mirandese linguistic rules.

The TODO comments at lines 21-22 and 38 indicate uncertainty about gender and plural transformations. Consider consulting Mirandese language documentation to confirm these rules are accurate.


46-231: LGTM!

The NumberVocabulary configuration for Mirandese is comprehensive and well-structured. The vocabulary covers units, tens, hundreds, fractions, ordinals, and scale mappings appropriately.

ovos_number_parser/util.py (5)

7-35: LGTM!

The Token dataclass properly implements tuple-like access via __iter__ and __getitem__, maintaining backward compatibility. The immutability enforcement via __setattr__ is correctly implemented.


62-112: LGTM!

The ReplaceableNumber dataclass correctly implements properties for accessing token indices and text, with proper immutability enforcement matching the Token pattern.


231-260: LGTM!

The static analysis warning (B008) about range(1, 21) as a default argument is a false positive. In Python 3, range() returns an immutable sequence object, so it's safe as a default parameter value.


263-360: LGTM!

The NumberVocabulary dataclass is well-designed with sensible defaults. The unused gender parameter in the default swap_gender lambda (line 304) is intentional - it provides a no-op default that languages can override with gender-aware implementations.


658-692: LGTM!

The fraction pronunciation logic correctly handles:

  • Division by zero with a special message
  • Known fraction words from the vocabulary with pluralization
  • Unknown denominators using the denominator particle pattern
tests/test_util.py (2)

6-6: LGTM!

The word_tokenize function is correctly added to the imports, aligning with its new public API exposure.


827-878: LGTM!

Comprehensive test coverage for word_tokenize including:

  • Basic tokenization
  • Special character handling (%, #)
  • Hyphen behavior between words vs. in numbers
  • Edge cases (empty string, whitespace, trailing hyphens)
  • Complex mixed inputs

The tests properly exercise the documented behavior from the implementation.

tests/test_number_parser_ast.py (1)

1-219: Overall: Good test structure for new AST language support.

The test file provides solid coverage for the Asturian number parser including vocabulary validation, pronunciation, extraction, ordinals, and number-to-digit conversion. The skipped tests clearly document known limitations.

tests/test_number_parser_pt.py (5)

10-10: LGTM!

The PT_PT and PT_BR wrapper imports correctly expose the vocabulary-based Portuguese variant implementations.


34-86: LGTM!

Dictionary tests properly updated to use the new vocabulary-based API (PT_PT.vocab, PT_BR.vocab) instead of private dictionaries. This aligns with the refactored implementation and provides better encapsulation.


350-375: Excellent test additions for decimal handling.

These new tests for test_significant_digits and test_leading_zeros provide valuable coverage for edge cases in decimal pronunciation:

  • Rounding at various decimal places
  • Leading zero preservation
  • Complex cases like 10.00000056

This addresses potential regressions in the refactored decimal handling logic.


377-386: Good use of skip with documentation.

The skipped test at line 377 includes a clear TODO comment asking whether the behavior should be implemented. This is a good practice for documenting design decisions that need discussion.


491-495: LGTM!

Good addition of zero-division test case to verify the DIVIDED_BY_ZERO vocabulary entry is used correctly.

ovos_number_parser/numbers_pt.py (5)

37-38: Verify the 'ma''m' masculine conversion logic.

The rule that converts words ending in 'ma' to remove the final 'a' for masculine may not be appropriate for Portuguese. Words ending in -ma (like "problema", "tema") are typically invariant and don't follow this pattern. This could produce incorrect results for certain inputs.

Please verify this transformation is intentional and add a comment explaining which word forms it targets.


50-55: Simplified pluralization is acceptable for number vocabulary.

The pluralization logic correctly handles the common -ão-ões pattern and the default -s suffix, which covers the number-related vocabulary. More complex Portuguese pluralization rules (e.g., -l-is) are not needed for the current number word set.


229-292: Good vocabulary reuse for Brazilian Portuguese variant.

The _PT_BR vocabulary correctly inherits shared properties from _PT_PT while overriding variant-specific differences (tens spellings, scale names, default scale). The cross-variant ALT_SPELLINGS enables recognition of both regional forms.


437-511: Comprehensive test coverage in __main__ block.

The test block covers pronunciation, extraction, ordinals, fractions, and edge cases including division by zero. This serves as useful development/debugging reference.


132-145: No issue found with fraction pronunciation.

The FRACTION entries for 11–19 (e.g., 'onze avos') are intentionally pre-suffixed with "avos" and already end in "s". The pluralize_pt function checks if a word ends in "s" and returns it unchanged if it does, so no duplication occurs. When pronounce_fraction pluralizes these entries for numerators other than 1, it correctly returns the same string without appending additional particles. Additionally, DENOMINATOR_PARTICLE is only appended to denominators not in the FRACTION dictionary (line 30 in util.py), so there is no risk of duplication with pre-suffixed fraction names.

ovos_number_parser/numbers_ast.py (2)

306-329: Test block provides good development coverage.

The __main__ block covers cardinal pronunciation, ordinals, extraction, and digit conversion. Since the implementation now delegates to RomanceNumberExtractor, verify that the shared extractor correctly handles cases like 'dos millones trescientos' (which should extract as 2000300).


78-87: No action needed. The number 20 is correctly mapped to 'venti' in Asturian. The _pronounce_up_to_999 method (line 774–775) explicitly checks if the number exists in the TENS dictionary and returns it directly, so pronounce_number_ast(20) correctly returns 'venti', which is the standard Asturian spelling for the standalone number 20.

@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
tests/test_number_parser_ast.py (1)

88-95: Fragile test assertion always passes.

The assertion on line 94 will always pass because it constructs a string that's guaranteed to contain "cuartus". This doesn't actually verify the expected output.

Apply this fix:

-        self.assertIn("cuartus", result2 if "cuartus" in result2 else "cuartus?")
+        self.assertIn("cuartus", result2)
ovos_number_parser/numbers_ast.py (3)

9-17: Unreachable condition: 'imu' check is redundant.

Line 12 checks for words ending in 'imu', but this is unreachable because line 10 already catches all words ending in 'u' (which includes 'imu').

 if gender == GrammaticalGender.FEMININE:
     if word.endswith('u'):
         return word[:-1] + 'a'
-    if word.endswith('imu'):
-        return word[:-1] + 'a'

20-28: Pluralization logic produces incorrect results.

The TODO comment indicates uncertainty, and the logic is indeed problematic:

  • Lines 24-25: Words not ending in 'u' have their last character replaced with 'os' (e.g., "mil" → "mis")
  • Lines 26-27: Only adds 's' if word ends in 'u' and doesn't already end in 's'

Asturian pluralization typically follows patterns like -u-os, -ón-ones.

 def pluralize_ast(word: str):
-    # TODO - is this accurate?
     if word.endswith("ón"):
         return word[:-2] + "ones"
-    if not word.endswith("u"):
-        return word[:-1] + "os"
-    if not word.endswith("s"):
+    if word.endswith("u"):
+        return word[:-1] + "os"
+    if not word.endswith("s"):
         return word + "s"
     return word

161-164: Ordinals 200-999 are incomplete.

ORDINAL_HUNDREDS only defines 100, missing entries for 200-900. This causes ordinals like 256 to drop the hundreds component and pronounce only the last two digits.

Consider either:

  • Adding explicit mappings for 200, 300, ..., 900
  • Adding a fallback strategy to construct ordinals from cardinals
  • Explicitly documenting the supported ordinal range
🧹 Nitpick comments (2)
tests/test_number_parser_ast.py (1)

21-21: Replace EN DASH with HYPHEN-MINUS in comment.

The comment contains an EN DASH (–) character instead of a standard HYPHEN-MINUS (-). While this doesn't affect functionality, it's better to use standard ASCII characters in code for consistency.

-        # 21–29 must exist
+        # 21-29 must exist
ovos_number_parser/numbers_mwl.py (1)

387-390: Remove unnecessary f-string prefixes.

Lines 387-390 use f-string syntax but contain no placeholders. The f prefix should be removed for clarity.

-    print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
-    print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
-    print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous"))
-    print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues"))
+    print("'bint'i un' ->", MWL.extract_number("bint'i un"))
+    print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
+    print("'bint'i dous' ->", MWL.extract_number("bint'i dous"))
+    print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec9e39d and 11a6e54.

📒 Files selected for processing (6)
  • ovos_number_parser/__init__.py (10 hunks)
  • ovos_number_parser/numbers_ast.py (1 hunks)
  • ovos_number_parser/numbers_mwl.py (2 hunks)
  • ovos_number_parser/numbers_pt.py (12 hunks)
  • tests/test_number_parser_ast.py (1 hunks)
  • tests/test_number_parser_pt.py (10 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_mwl.py

387-387: f-string without any placeholders

Remove extraneous f prefix

(F541)


388-388: f-string without any placeholders

Remove extraneous f prefix

(F541)


389-389: f-string without any placeholders

Remove extraneous f prefix

(F541)


390-390: f-string without any placeholders

Remove extraneous f prefix

(F541)

ovos_number_parser/__init__.py

53-53: Undefined name numbers_to_digits_ast

(F821)

tests/test_number_parser_ast.py

21-21: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

🔇 Additional comments (3)
ovos_number_parser/numbers_pt.py (1)

58-292: LGTM: Clean refactoring to vocabulary-driven approach.

The refactoring from hardcoded dictionaries to NumberVocabulary and RomanceNumberExtractor is well-executed. The clear separation between private vocabularies (_PT_PT, _PT_BR) and public extractors (PT_PT, PT_BR) follows good design patterns and maintains backward compatibility through deprecated wrappers.

tests/test_number_parser_pt.py (1)

1-616: LGTM: Comprehensive test coverage for new API.

The test suite has been thoroughly updated to exercise the public PT_PT and PT_BR interfaces, with excellent coverage of edge cases including decimal handling, leading zeros, fractions, and integration scenarios. The migration from internal structures to public vocab attributes is complete and consistent.

ovos_number_parser/numbers_mwl.py (1)

17-18: TODO comments indicate uncertain gender/plural logic.

The TODO comments on lines 17 and 34 indicate uncertainty about the correctness of gender swapping and pluralization rules for Mirandese. Consider verifying these rules against Mirandese grammar references or consulting with a native speaker to ensure correctness.

If you need assistance researching Mirandese grammar rules for gender swapping and pluralization, I can help generate a verification approach.

Also applies to: 34-34

@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ovos_number_parser/util.py (1)

771-820: Unused gender parameter in _pronounce_up_to_999.

As noted in a past review, the gender parameter is declared but not used. For Romance languages, cardinal numbers like 1 and 2 have gendered forms. This appears to be intentional deferral of gender handling for cardinals.

Consider adding a TODO comment if gender support for cardinals is planned, or document why it's not needed here:

     def _pronounce_up_to_999(self,
                              n: int,
                              gender: GrammaticalGender = GrammaticalGender.MASCULINE
                              ) -> str:
+        # Note: gender parameter reserved for future use with gendered cardinal forms (um/uma, dois/duas)
🧹 Nitpick comments (7)
tests/test_number_parser_ast.py (1)

20-23: Minor: EN DASH in comment.

The comment on line 21 uses an EN DASH () instead of a standard hyphen-minus (-). This is a cosmetic issue but could cause confusion.

     def test_composite_20s(self):
-        # 21–29 must exist
+        # 21-29 must exist
         for i in range(21, 30):
             self.assertIn(i, AST.vocab.TENS)
ovos_number_parser/__init__.py (2)

180-186: Double space in line 182.

Minor formatting issue with double space before PT_PT.pronounce_fraction.

     if lang.startswith("pt"):
         return PT_BR.pronounce_fraction(fraction_word, scale=scale) if "br" in lang.lower() \
-            else  PT_PT.pronounce_fraction(fraction_word, scale=scale)
+            else PT_PT.pronounce_fraction(fraction_word, scale=scale)

210-216: Double space in line 212.

Same formatting issue with double space.

     if lang.startswith("pt"):
         return PT_BR.pronounce_ordinal(number, scale=scale, gender=gender) if "br" in lang.lower() \
-            else  PT_PT.pronounce_ordinal(number, scale=scale, gender=gender)
+            else PT_PT.pronounce_ordinal(number, scale=scale, gender=gender)
ovos_number_parser/util.py (4)

62-113: Consider using AttributeError for consistency.

ReplaceableNumber.__setattr__ raises a generic Exception while Token.__setattr__ raises AttributeError. Both should use the same exception type for consistency.

         try:
             getattr(self, key)
         except AttributeError:
             super().__setattr__(key, value)
         else:
-            raise Exception("Immutable!")
+            raise AttributeError("Immutable!")

231-260: Avoid mutable default argument with range().

Using range(1, 21) as a default argument is evaluated once at function definition time. While range objects are immutable so this won't cause mutation bugs, it's cleaner to use None and set the default inside the function.

-def convert_to_mixed_fraction(number: Union[int, float], denominators=range(1, 21)) -> Optional[Tuple[int, int, int]]:
+def convert_to_mixed_fraction(number: Union[int, float], denominators=None) -> Optional[Tuple[int, int, int]]:
     """
     Convert floats to components of a mixed fraction representation
     ...
     """
     int_number = int(number)
     if int_number == number:
         return int_number, 0, 1  # whole number, no fraction

     frac_number = abs(number - int_number)
     if not denominators:
         denominators = range(1, 21)

304-306: Unused gender parameter in default lambda.

The default swap_gender lambda accepts a gender parameter but doesn't use it. While this is intentional (as the default behavior is identity), consider adding a comment or using _ prefix.

-    swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, gender: word
+    swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, _gender: word

602-615: Rename unused loop variable i to _i.

The loop variable i is unused in the loop body.

-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11a6e54 and 2714095.

📒 Files selected for processing (4)
  • ovos_number_parser/__init__.py (11 hunks)
  • ovos_number_parser/util.py (10 hunks)
  • tests/test_number_parser_ast.py (1 hunks)
  • tests/test_number_parser_pt.py (10 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/test_number_parser_ast.py (2)
ovos_number_parser/util.py (8)
  • GrammaticalGender (48-54)
  • pronounce_number (549-673)
  • is_fractional (380-395)
  • pronounce_fraction (675-709)
  • extract_number (397-492)
  • pronounce_ordinal (711-769)
  • numbers_to_digits (494-547)
  • text (87-91)
ovos_number_parser/__init__.py (6)
  • pronounce_number (83-163)
  • is_fractional (299-353)
  • pronounce_fraction (166-188)
  • extract_number (236-296)
  • pronounce_ordinal (191-233)
  • numbers_to_digits (35-80)
ovos_number_parser/__init__.py (2)
ovos_number_parser/numbers_pt.py (1)
  • PortugueseVariant (9-16)
ovos_number_parser/util.py (6)
  • numbers_to_digits (494-547)
  • pronounce_number (549-673)
  • pronounce_fraction (675-709)
  • pronounce_ordinal (711-769)
  • extract_number (397-492)
  • text (87-91)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (7)
  • is_ordinal (356-380)
  • is_fractional (299-353)
  • extract_number (236-296)
  • numbers_to_digits (35-80)
  • pronounce_number (83-163)
  • pronounce_ordinal (191-233)
  • pronounce_fraction (166-188)
🪛 Ruff (0.14.7)
tests/test_number_parser_ast.py

21-21: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

tests/test_number_parser_pt.py

611-611: Do not catch blind exception: Exception

(BLE001)

ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


304-304: Unused lambda argument: gender

(ARG005)


571-571: Avoid specifying long messages outside the exception class

(TRY003)


605-605: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


731-731: Avoid specifying long messages outside the exception class

(TRY003)


773-773: Unused method argument: gender

(ARG002)


788-788: Avoid specifying long messages outside the exception class

(TRY003)


839-839: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (32)
tests/test_number_parser_ast.py (8)

1-5: LGTM!

The imports are clean and appropriate for the test module.


10-33: LGTM! Comprehensive vocabulary tests.

These tests properly validate the completeness of the AST vocabulary dictionaries for units, tens, hundreds, and fractions.


39-67: LGTM! Good coverage of pronunciation tests.

Tests cover zero, units, teens, composite 20s, tens with units, and hundreds with various configurations.


73-90: LGTM! Fraction tests now properly validate output.

The previous fragile assertion issue has been addressed. The tests now correctly verify both singular and plural fraction forms.


96-126: LGTM! Extraction tests cover key scenarios.

Tests include simple numbers, hundreds, thousands, millions, negatives, and the no-number case. The skipped fraction phrase test is appropriately marked with a TODO.


132-149: LGTM! Ordinal tests cover basic through composed cases.

Good coverage of basic ordinals with gender variants, teens, tens, composed ordinals, and hundredths.


155-172: LGTM! Numbers-to-digits tests are well structured.

Tests cover simple conversions, phrase context, mixed text, and the no-number fallback case.


178-202: LGTM! Integration tests provide good round-trip validation.

The round-trip test for multiple number values is valuable. Skipped large number test is appropriately marked. Decimal and negative tests ensure end-to-end correctness.

tests/test_number_parser_pt.py (7)

3-6: LGTM! Clean import migration to variant-based API.

The imports now correctly reference PT_PT and PT_BR instead of internal dictionary mappings.


27-82: LGTM! Dictionary tests properly migrated to vocab-based API.

Tests now access vocabulary through PT_PT.vocab.* and PT_BR.vocab.* paths, validating the new public API surface.


345-370: LGTM! Excellent coverage of decimal edge cases.

The new tests for significant digits rounding and leading zeros handling are thorough and validate important pronunciation behavior.


372-381: Good decision to skip uncertain behavior test.

The skipped test for decimal edge cases (1.0, 1.00, 1.000) appropriately documents uncertain requirements with a TODO comment.


486-491: LGTM! Zero division handling test.

Good addition to verify the behavior when dividing by zero in fractions.


603-612: Acceptable use of broad exception catch in test context.

While the static analyzer flags catching Exception, this is appropriate for a robustness test that verifies the function doesn't raise any exception for large numbers.


614-619: LGTM! Negative number round-trip test.

Good integration test validating negative number pronunciation and extraction consistency.

ovos_number_parser/__init__.py (7)

15-15: LGTM! Clean imports for new language extractors.

The imports properly bring in AST, PT_PT, PT_BR, and MWL for the variant-based routing.

Also applies to: 25-26


52-53: LGTM! Correctly addressed previous review comment.

The routing now correctly uses AST.numbers_to_digits(utterance) instead of the previously undefined numbers_to_digits_ast().


117-118: LGTM! AST pronunciation routing is correct.

Properly passes all required parameters to AST.pronounce_number.


143-148: LGTM! PT variant routing for pronounce_number.

Correctly distinguishes between Brazilian and European Portuguese using "br" in lang.lower().


283-289: LGTM! Extract number routing is consistent.

Properly routes to PT_BR or PT_PT based on language variant detection.


341-346: LGTM! Fractional routing added for AST.

Properly adds AST support to is_fractional dispatch.


368-374: LGTM! Ordinal routing added for AST.

Properly adds AST support to is_ordinal dispatch.

ovos_number_parser/util.py (10)

7-35: LGTM! Token dataclass with immutability support.

The Token class properly implements iteration and indexing for backwards compatibility while enforcing immutability through __setattr__.


115-136: LGTM! Clean tokenization function.

The word_tokenize function handles special cases like percentages, hashtags, and hyphenated words appropriately.


263-306: LGTM! Comprehensive vocabulary dataclass.

NumberVocabulary encapsulates all language-specific vocabulary and configuration in a clean, typed structure. The callable fields for swap_gender, pluralize, and singularize provide good extensibility.


308-360: LGTM! Well-structured vocabulary helper methods.

The get_number_strings, get_ordinal_strings, and get_fraction_strings methods properly combine masculine, feminine, and plural forms for comprehensive word recognition.


397-492: LGTM! Robust number extraction logic.

The extract_number method handles cardinals, ordinals, fractions, scales, decimals, and negatives. The tokenization and lookahead logic is sound.


494-547: LGTM! Clean numbers-to-digits implementation.

The span-based approach for identifying and replacing number words is well-designed and handles edge cases like joiners appropriately.


549-673: LGTM! Comprehensive number pronunciation.

The pronounce_number method properly handles negatives, decimals with leading zeros, integer pronunciation, and scale-based large numbers.


675-709: LGTM! Fraction pronunciation with proper pluralization.

Handles known fraction words, pluralization, and unknown denominators with the denominator particle.


711-769: LGTM! Ordinal pronunciation with gender and scale support.

Properly handles zero, negatives, and large scales with correct gender application.


822-870: LGTM! Ordinal pronunciation for 0-999 range.

Properly handles hundreds, tens, and units with gender swapping for all ordinal forms.

Comment on lines 71 to +75
if lang.startswith("pt"):
return numbers_to_digits_pt(utterance, scale=scale)
return PT_PT.numbers_to_digits(utterance, scale=scale)
if lang.startswith("mwl"):
return numbers_to_digits_mwl(utterance, scale=scale)
return MWL.numbers_to_digits(utterance, scale=scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check all instances of language detection for Portuguese variants
rg -n 'def (numbers_to_digits|pronounce_number|.*_to_\w+)' ovos_number_parser/__init__.py | head -20

Repository: OpenVoiceOS/ovos-number-parser

Length of output: 222


🏁 Script executed:

#!/bin/bash
# Search for all Portuguese-related dispatch logic
rg -n '(PT_PT|"pt"|"br"|lang\.)' ovos_number_parser/__init__.py -B 2 -A 2

Repository: OpenVoiceOS/ovos-number-parser

Length of output: 11500


Fix Portuguese variant handling in numbers_to_digits, is_fractional, and is_ordinal.

The dispatch logic is inconsistent across functions. pronounce_number, pronounce_fraction, pronounce_ordinal, and extract_number all check "br" in lang.lower() to route Brazilian Portuguese separately, but numbers_to_digits (line 72), is_fractional (line 341), and is_ordinal (line 368) always route to PT_PT regardless of whether the language is pt-BR or similar. This means Brazilian Portuguese users receive Portugal-specific behavior for these three functions.

🤖 Prompt for AI Agents
In ovos_number_parser/__init__.py around lines 72 to 75, the Portuguese dispatch
always uses PT_PT causing pt-BR variants to get Portugal-specific behavior;
update the routing so that if "br" is in lang.lower() it returns
PT_BR.numbers_to_digits(utterance, scale=scale) otherwise return
PT_PT.numbers_to_digits(...). Apply the same pattern to is_fractional and
is_ordinal (check "br" in lang.lower() and route to PT_BR for Brazilian
Portuguese, PT_PT otherwise) to make the dispatch consistent with
pronounce_number/pronounce_fraction/pronounce_ordinal/extract_number.

@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #39

coderabbitai bot added a commit that referenced this pull request Dec 4, 2025
Docstrings generation was requested by @JarbasAl.

* #38 (comment)

The following files were modified:

* `ovos_number_parser/__init__.py`
* `ovos_number_parser/numbers_ast.py`
* `ovos_number_parser/numbers_mwl.py`
* `ovos_number_parser/numbers_pt.py`
* `ovos_number_parser/util.py`
* `tests/test_number_parser_ast.py`
* `tests/test_number_parser_pt.py`
@coderabbitai coderabbitai bot mentioned this pull request Dec 4, 2025
@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
ovos_number_parser/numbers_ast.py (1)

38-220: Asturian NumberVocabulary wiring looks solid; consider unifying zero/cero wording.

The _AST configuration hangs together well: units/tens/hundreds cover the expected ranges, fractions and ordinal hundreds are complete, and hooking it up via AST = RomanceNumberExtractor(_AST) matches the pattern used for other Romance languages.

One tiny nit: DIVIDED_BY_ZERO="a dividir por zero" uses "zero", while UNITS[0] is "cero". If "cero" is the canonical Asturian form you want everywhere, you may want to standardize this string; otherwise feel free to leave as-is if the variation is intentional.

tests/test_number_parser_ast.py (3)

21-23: Replace EN DASH in comment with a plain ASCII hyphen to satisfy Ruff.

The comment # 21–29 must exist uses an EN DASH (), which Ruff flags as ambiguous. Switching it to a standard - (i.e., 21-29) will avoid the warning and keep the source ASCII‑clean.


96-126: Consider adding explicit zero‑extraction coverage alongside “no number” cases.

test_no_number nicely guards the False contract for inputs without numerals, but there’s no explicit test that "cero" (or any zero form) is parsed as 0 rather than False. Given past issues around “no number” behavior, it would be useful to add something like:

def test_zero_extraction(self):
    self.assertEqual(AST.extract_number("cero"), 0)

to lock in the intended semantics for zero.


132-148: Add a couple of ordinals ≥ 200 to regression‑test the hundreds logic.

Since ORDINAL_HUNDREDS was recently completed for 200–900, it would be good to add one or two tests here for larger ordinals (e.g., AST.pronounce_ordinal(256) and AST.pronounce_ordinal(900)) to ensure the hundreds component is preserved correctly and to prevent regressions in that range.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 971d773 and 69f2f64.

📒 Files selected for processing (2)
  • ovos_number_parser/numbers_ast.py (1 hunks)
  • tests/test_number_parser_ast.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ovos_number_parser/numbers_ast.py (2)
ovos_number_parser/util.py (8)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • NumberVocabulary (264-360)
  • RomanceNumberExtractor (363-871)
  • pronounce_number (549-673)
  • pronounce_ordinal (711-769)
  • extract_number (397-492)
  • numbers_to_digits (494-547)
ovos_number_parser/__init__.py (4)
  • pronounce_number (83-163)
  • pronounce_ordinal (191-233)
  • extract_number (236-296)
  • numbers_to_digits (35-80)
🪛 Ruff (0.14.7)
tests/test_number_parser_ast.py

21-21: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

🔇 Additional comments (1)
tests/test_number_parser_ast.py (1)

138-140: Clarify expected 12th ordinal form vs vocab entry.

test_teens_ordinal asserts that AST.pronounce_ordinal(12) contains "décimu", but the Asturian vocab currently defines ORDINAL_TENS[12] = "docenu" in ovos_number_parser/numbers_ast.py. Depending on how RomanceNumberExtractor.pronounce_ordinal uses ORDINAL_TENS, the test and vocab may not be aligned.

Please double‑check what the correct Asturian form for “12th” should be and then either:

  • Update the vocab entry for 12 to match, or
  • Adjust the test to assert the actual canonical form (e.g., "docenu" or another agreed form).

@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ovos_number_parser/numbers_pt.py (1)

367-395: Avoid positional argument mismatch when delegating to pronounce_ordinal

Here you delegate to RomanceNumberExtractor.pronounce_ordinal using positional arguments:

if variant == PortugueseVariant.PT:
    return PT_PT.pronounce_ordinal(number, gender, scale)
return PT_BR.pronounce_ordinal(number, gender, scale)

Elsewhere (e.g. in ovos_number_parser/__init__.py), pronounce_ordinal is always called with scale= and gender= keywords, which strongly suggests the method signature is (number, scale=..., gender=...). In that case, these calls would swap scale and gender, yielding incorrect behavior for non-default arguments.

Switching to keyword arguments here makes this wrapper robust regardless of the exact parameter order:

-    if variant == PortugueseVariant.PT:
-        return PT_PT.pronounce_ordinal(number, gender, scale)
-    return PT_BR.pronounce_ordinal(number, gender, scale)
+    if variant == PortugueseVariant.PT:
+        return PT_PT.pronounce_ordinal(number, scale=scale, gender=gender)
+    return PT_BR.pronounce_ordinal(number, scale=scale, gender=gender)
🧹 Nitpick comments (4)
ovos_number_parser/numbers_pt.py (4)

19-47: Tighten swap_gender_pt heuristics for ...ma / ...m endings

The current rules can generate malformed forms for numerals:

  • elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'): will turn "décima" into "décim" instead of "décimo", and will fire before the more general "a" -> "o" rule.
  • elif gender == GrammaticalGender.FEMININE and word.endswith('m'): works for "um" -> "uma" but will produce incorrect forms like "bom" -> "boma" if ever used outside strict numeral vocabulary.

Given this helper is primarily for numeral words, you can safely narrow the rules and avoid the truncation:

-    elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
-        return word[:-1]
-    elif gender == GrammaticalGender.MASCULINE and word.endswith('a'):
+    elif gender == GrammaticalGender.MASCULINE and word.endswith('a'):
         return word[:-1] + 'o'
@@
-    elif gender == GrammaticalGender.FEMININE and word.endswith('m'):
-        return word + 'a'
+    elif gender == GrammaticalGender.FEMININE and word == "um":
+        return "uma"

This keeps the special-cases (dois/duas, um/uma) and generic o/a / os/as swaps, while avoiding truncated stems and over-broad "m" -> "ma" changes.


50-55: Minor typing nit: add explicit return annotation to pluralize_pt

For consistency with swap_gender_pt and the rest of the typed surface, consider annotating the return type:

-def pluralize_pt(word: str):
+def pluralize_pt(word: str) -> str:

Behavior stays the same; this just tightens the public API.


398-497: Deprecation shims are consistent; consider scale default semantics

The deprecated helpers (is_fractional_pt, is_ordinal_pt, extract_number_pt, pronounce_number_pt, numbers_to_digits_pt, pronounce_fraction_pt) all:

  • Emit DeprecationWarning with stacklevel=2, which is good practice.
  • Delegate straight through to PT_PT / PT_BR depending on variant.

One nuance to be aware of: these shims default scale to Scale.LONG (where present), which means they override the vocab’s own DEFAULT_SCALE (e.g., Scale.SHORT for _PT_BR). If backward-compat with the legacy API depended on “short vs long” defaults, you may want these wrappers to pass scale only when explicitly provided, letting RomanceNumberExtractor fall back to vocab.DEFAULT_SCALE. If the semantics are intentionally shifting and the functions are deprecated anyway, current behavior is acceptable.

No immediate changes required, but worth double‑checking against historical expectations.


500-574: __main__ tests: BR/PT variant usage for numbers_to_digits_pt

In the “BR” block you still rely on the default variant:

print("\n--- Testing numbers_to_digits_pt (BR) ---")
print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta')}'")
...

But numbers_to_digits_pt defaults variant to PortugueseVariant.PT, so these examples are actually exercising the PT vocab, not BR. Since the “PT” block below correctly passes variant=PortugueseVariant.PT, you probably intended the BR block to pass PortugueseVariant.BR explicitly:

-    print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta')}'")
-    print(f"'um milhão' -> '{numbers_to_digits_pt('um milhão')}'")
-    print(f"'dezesseis' -> '{numbers_to_digits_pt('dezesseis')}'")
-    print(f"'há duzentos e cinquenta carros' -> '{numbers_to_digits_pt('há duzentos e cinquenta carros')}'")
+    print(f"'quinhentos e cinquenta' -> '{numbers_to_digits_pt('quinhentos e cinquenta', variant=PortugueseVariant.BR)}'")
+    print(f"'um milhão' -> '{numbers_to_digits_pt('um milhão', variant=PortugueseVariant.BR)}'")
+    print(f"'dezesseis' -> '{numbers_to_digits_pt('dezesseis', variant=PortugueseVariant.BR)}'")
+    print(f"'há duzentos e cinquenta carros' -> '{numbers_to_digits_pt('há duzentos e cinquenta carros', variant=PortugueseVariant.BR)}'")

This is only in the manual test harness, but tightening it will make manual checks reflect the intended variant behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69f2f64 and 4cd17ac.

📒 Files selected for processing (1)
  • ovos_number_parser/numbers_pt.py (12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ovos_number_parser/numbers_pt.py (2)
ovos_number_parser/util.py (12)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • DigitPronunciation (57-59)
  • NumberVocabulary (264-360)
  • RomanceNumberExtractor (363-871)
  • pronounce_ordinal (711-769)
  • is_fractional (380-395)
  • is_ordinal (369-378)
  • extract_number (397-492)
  • text (87-91)
  • pronounce_number (549-673)
  • pronounce_fraction (675-709)
ovos_number_parser/__init__.py (6)
  • pronounce_ordinal (191-233)
  • is_fractional (299-353)
  • is_ordinal (356-380)
  • extract_number (236-296)
  • pronounce_number (83-163)
  • pronounce_fraction (166-188)
🔇 Additional comments (1)
ovos_number_parser/numbers_pt.py (1)

58-266: Vocabulary configuration and PT/PT‑BR wrappers look consistent

The _PT_PT / _PT_BR NumberVocabulary definitions and the PT_PT / PT_BR RomanceNumberExtractor wrappers are wired together cleanly:

  • Shared pieces (DIVIDED_BY_ZERO, DENOMINATOR_PARTICLE, HUNDRED_PARTICLE, NO_PREV_UNIT, NO_PLURAL, ordinal maps, etc.) are sensibly reused between variants.
  • Variant-specific pieces (TENS forms, short/long scale names, DIGIT_SPELLINGS for "meia", cross-variant ALT_SPELLINGS) are localized to the correct vocab.
  • DEFAULT_SCALE differs appropriately (LONG for PT, SHORT for BR).

No structural or obvious correctness issues stand out in these mappings given the NumberVocabulary / RomanceNumberExtractor contracts.

Also applies to: 268-358, 360-361

@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
ovos_number_parser/util.py (1)

777-826: Unused gender parameter may indicate missing functionality.

The gender parameter is declared but never used. For Romance languages, cardinal numbers like 1 ("um"/"uma") and 2 ("dois"/"duas") have gendered forms stored in GENDERED_SPELLINGS. Consider applying gender-specific spellings when the number matches a gendered entry.

ovos_number_parser/__init__.py (2)

73-76: Inconsistent Portuguese variant handling in numbers_to_digits.

Unlike pronounce_number, extract_number, and pronounce_fraction which check "br" in lang.lower() to route Brazilian Portuguese separately, numbers_to_digits always routes to PT_PT. For consistency, apply the same pattern.

     if lang.startswith("pt"):
-        return PT_PT.numbers_to_digits(utterance, scale=scale)
+        return PT_BR.numbers_to_digits(utterance, scale=scale) if "br" in lang.lower() \
+            else PT_PT.numbers_to_digits(utterance, scale=scale)

367-372: Inconsistent Portuguese variant handling in is_fractional and is_ordinal.

Similar to numbers_to_digits, these functions always route Portuguese to PT_PT without checking for Brazilian Portuguese variant. Apply the same pattern used in pronounce_number and extract_number for consistency.

     if lang.startswith("pt"):
-        return PT_PT.is_fractional(input_str)
+        return PT_BR.is_fractional(input_str) if "br" in lang.lower() \
+            else PT_PT.is_fractional(input_str)

And for is_ordinal:

     if lang.startswith("pt"):
-        return PT_PT.is_ordinal(input_str)
+        return PT_BR.is_ordinal(input_str) if "br" in lang.lower() \
+            else PT_PT.is_ordinal(input_str)
🧹 Nitpick comments (5)
ovos_number_parser/util.py (3)

7-35: Consider using frozen=True for dataclass immutability.

The custom __setattr__ implementation provides pseudo-immutability, but Python's dataclass decorator supports this natively with frozen=True. This also provides better IDE support and clearer intent.

-@dataclass
+@dataclass(frozen=True)
 class Token:
     word: str
     index: int

     def __iter__(self):
         yield self.word
         yield self.index

     def __getitem__(self, item):
         if item == 0:
             return self.word
         elif item == 1:
             return self.index
         raise IndexError
-
-    def __setattr__(self, key, value):
-        """
-        Prevent modification of existing attributes, allowing only new attributes to be set.
-
-        Raises:
-            Exception: If attempting to modify an attribute that already exists.
-        """
-        try:
-            getattr(self, key)
-        except AttributeError:
-            super().__setattr__(key, value)
-        else:
-            raise AttributeError("Immutable!")

93-105: Inconsistent exception type for immutability violation.

Token.__setattr__ raises AttributeError("Immutable!") while ReplaceableNumber.__setattr__ raises Exception("Immutable!"). Use consistent exception types, preferably AttributeError which is more semantically correct for attribute access violations.

         else:
-            raise Exception("Immutable!")
+            raise AttributeError("Immutable!")

609-610: Unused loop variable i.

The loop variable i is not used within the loop body. Rename it to _ to indicate it's intentionally unused.

-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])
ovos_number_parser/numbers_mwl.py (1)

392-395: Remove extraneous f prefix from non-interpolated strings.

These lines use f-string syntax but contain no placeholders. The print statements use comma separation for output, so the f prefix is unnecessary.

-    print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
-    print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
-    print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous"))
-    print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues"))
+    print("'bint'i un' ->", MWL.extract_number("bint'i un"))
+    print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
+    print("'bint'i dous' ->", MWL.extract_number("bint'i dous"))
+    print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))
ovos_number_parser/__init__.py (1)

113-117: Consider emitting DeprecationWarning for short_scale parameter.

The short_scale parameter is marked as deprecated with a TODO to log a warning, but no warning is actually emitted. For consistent deprecation behavior (similar to the MWL/PT wrapper functions), consider adding a warning.

     if short_scale is not None:
-        # TODO log warning
-        pass
+        import warnings
+        warnings.warn(
+            "short_scale is deprecated, use scale=Scale.SHORT or scale=Scale.LONG instead",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        scale = Scale.SHORT if short_scale else Scale.LONG
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cd17ac and 7c27d20.

📒 Files selected for processing (4)
  • ovos_number_parser/__init__.py (17 hunks)
  • ovos_number_parser/numbers_mwl.py (1 hunks)
  • ovos_number_parser/numbers_pt.py (9 hunks)
  • ovos_number_parser/util.py (10 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (13)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • DigitPronunciation (57-59)
  • NumberVocabulary (264-362)
  • RomanceNumberExtractor (365-877)
  • pronounce_ordinal (716-775)
  • is_fractional (382-397)
  • is_ordinal (371-380)
  • text (87-91)
  • extract_number (399-494)
  • pronounce_number (552-677)
  • numbers_to_digits (496-550)
  • pronounce_fraction (679-714)
ovos_number_parser/__init__.py (7)
  • pronounce_ordinal (198-245)
  • is_fractional (318-379)
  • is_ordinal (382-406)
  • extract_number (248-315)
  • pronounce_number (84-170)
  • numbers_to_digits (36-81)
  • pronounce_fraction (173-195)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (7)
  • is_ordinal (382-406)
  • is_fractional (318-379)
  • extract_number (248-315)
  • numbers_to_digits (36-81)
  • pronounce_number (84-170)
  • pronounce_ordinal (198-245)
  • pronounce_fraction (173-195)
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_mwl.py

392-392: f-string without any placeholders

Remove extraneous f prefix

(F541)


393-393: f-string without any placeholders

Remove extraneous f prefix

(F541)


394-394: f-string without any placeholders

Remove extraneous f prefix

(F541)


395-395: f-string without any placeholders

Remove extraneous f prefix

(F541)

ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


304-304: Unused lambda argument: gender

(ARG005)


575-575: Avoid specifying long messages outside the exception class

(TRY003)


609-609: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


737-737: Avoid specifying long messages outside the exception class

(TRY003)


779-779: Unused method argument: gender

(ARG002)


794-794: Avoid specifying long messages outside the exception class

(TRY003)


845-845: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (12)
ovos_number_parser/util.py (2)

231-261: LGTM with minor note on default argument.

The static analysis flags range(1, 21) in the default argument. While range is immutable (unlike lists), evaluating it at function definition time is unconventional. The existing if not denominators guard handles this correctly. No change required.


263-307: Well-structured vocabulary configuration class.

The NumberVocabulary dataclass provides a clean, declarative way to define language-specific number parsing rules. The static analysis warning about the unused gender parameter in the default swap_gender lambda (line 304) is expected—it's a no-op placeholder that language-specific implementations override.

ovos_number_parser/numbers_mwl.py (3)

17-21: TODO indicates uncertainty about gender transformation rules.

The TODO comment suggests the suffix-based gender transformation rules may need linguistic verification for Mirandese. Consider validating these rules with a Mirandese language reference or native speaker.


42-225: Well-structured Mirandese vocabulary configuration.

The NumberVocabulary definition for Mirandese is comprehensive, covering units, tens, hundreds, fractions, ordinals, and scale words for both short and long scales. The use of swap_gender_mwl and pluralize_mwl for morphological transformations aligns with the vocabulary-driven architecture.


233-340: Good deprecation pattern for backward compatibility.

The deprecated wrapper functions properly emit DeprecationWarning with stacklevel=2 (pointing to the caller) and delegate to the new MWL instance methods. This maintains backward compatibility while guiding users toward the new API.

ovos_number_parser/__init__.py (2)

1-33: Clean import structure for new language support.

The imports are well-organized, bringing in the new AST, MWL, PT_PT, and PT_BR extractors alongside the existing language-specific modules.


53-54: AST (Asturian) language routing follows established patterns.

The new Asturian language support is correctly integrated across all dispatch functions (numbers_to_digits, pronounce_number, pronounce_fraction, pronounce_ordinal, extract_number, is_fractional, is_ordinal), following the same delegation pattern used for MWL and PT variants.

Also applies to: 124-125

ovos_number_parser/numbers_pt.py (5)

19-47: Well-implemented gender transformation for Portuguese.

The function handles the common irregular case (dois/duas) explicitly and applies general suffix rules for other words. The pattern is consistent with the Mirandese implementation.


58-266: Comprehensive European Portuguese vocabulary configuration.

The _PT_PT vocabulary is well-structured with appropriate defaults (DEFAULT_SCALE=Scale.LONG for European convention). The Wikipedia reference in the comment provides good documentation provenance.


268-358: Clean Brazilian Portuguese variant with appropriate overrides.

The _PT_BR vocabulary correctly inherits from _PT_PT and overrides only the differences: spelling variations (dezesseis vs dezasseis), scale word spellings (-lhão vs -lião), and DEFAULT_SCALE=Scale.SHORT (reflecting Brazilian usage). The DIGIT_SPELLINGS entry for "meia" (colloquial for 6) is a nice regional touch.


363-507: Proper deprecation wrappers maintain backward compatibility.

The deprecated functions correctly emit DeprecationWarning with stacklevel=2 and delegate to the appropriate PT_PT or PT_BR extractor based on the variant parameter. This allows existing code to continue working while guiding users toward the new API.


575-584: Good addition of edge case tests.

The test block now includes edge cases like 1/0 and 0/0 for fraction pronunciation, which helps verify the DIVIDED_BY_ZERO handling works correctly.

Comment on lines +461 to +471
for scale_val, w in scales_map.items():
if token == w or token == self.vocab.pluralize(w):
current_number = current_number or 0
result = result or 0
if current_number is None:
current_number = 1
result += current_number * scale_val
current_number = None
found_scale = True
number_consumed = True
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code: if current_number is None is unreachable.

On line 463, current_number = current_number or 0 ensures current_number is never None at this point. Therefore, the condition if current_number is None on line 465 can never be True, making current_number = 1 unreachable.

If the intent is to default to 1 when no preceding number is specified (e.g., "million" → 1,000,000), the logic should be:

             for scale_val, w  in scales_map.items():
                 if token == w or token == self.vocab.pluralize(w):
-                    current_number = current_number or 0
-                    result = result or 0
                     if current_number is None:
                         current_number = 1
+                    result = result or 0
                     result += current_number * scale_val
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for scale_val, w in scales_map.items():
if token == w or token == self.vocab.pluralize(w):
current_number = current_number or 0
result = result or 0
if current_number is None:
current_number = 1
result += current_number * scale_val
current_number = None
found_scale = True
number_consumed = True
break
for scale_val, w in scales_map.items():
if token == w or token == self.vocab.pluralize(w):
if current_number is None:
current_number = 1
result = result or 0
result += current_number * scale_val
current_number = None
found_scale = True
number_consumed = True
break
🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 461-471, the code sets current_number
= current_number or 0 which makes the subsequent check if current_number is None
unreachable; to fix, stop forcing current_number to 0 before the scale check
(remove or move the "current_number = current_number or 0" assignment) and
instead only ensure result is initialized (result = result or 0), then keep the
if current_number is None branch to set current_number = 1 when a scale word
appears without a preceding number; ensure number_consumed/found_scale handling
remains the same.

@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/test_number_parser_mwl.py (2)

28-33: Consider removing or enabling the commented-out composite number tests.

The commented-out assertions for composite numbers (21, 35, 99) suggest these were originally intended but not completed. Since composite numbers are formed dynamically by RomanceNumberExtractor rather than stored in TENS, these tests may not be applicable to the vocabulary dictionary. Consider removing the dead code to avoid confusion.


127-135: Clean up commented-out ordinal assertions.

The commented-out tests for décimo (10th ordinal) with the @unittest.skip decorator syntax in comments are confusing. Either enable them with proper @unittest.skip decorators or remove them. The current mixed approach (decorator syntax inside comments) doesn't serve any purpose.

     def test_basic_ordinals_masculine(self):
         self.assertEqual(MWL.pronounce_ordinal(1), "purmerio")
         self.assertEqual(MWL.pronounce_ordinal(2), "segundo")
         self.assertEqual(MWL.pronounce_ordinal(3), "terceiro")
-        # @unittest.skip("TODO - fix me")
-        #self.assertEqual(MWL.pronounce_ordinal(10), "décimo")

     def test_basic_ordinals_feminine(self):
         self.assertEqual(MWL.pronounce_ordinal(1, GrammaticalGender.FEMININE), "purmeria")
         self.assertEqual(MWL.pronounce_ordinal(2, GrammaticalGender.FEMININE), "segunda")
         self.assertEqual(MWL.pronounce_ordinal(3, GrammaticalGender.FEMININE), "terceira")
-        #@unittest.skip("TODO - fix me")
-        #self.assertEqual(MWL.pronounce_ordinal(10, GrammaticalGender.FEMININE), "décima")
ovos_number_parser/numbers_mwl.py (1)

392-395: Remove extraneous f-string prefixes.

These print statements use f-string prefixes but the values are passed via comma rather than interpolation, making the f prefix unnecessary and potentially confusing.

-    print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
-    print(f"'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
-    print(f"'bint'i dous' ->", MWL.extract_number("bint'i dous"))
-    print(f"'bint'i dues' ->", MWL.extract_number("bint'i dues"))
+    print("'bint'i un' ->", MWL.extract_number("bint'i un"))
+    print("'bint'i ũa' ->", MWL.extract_number("bint'i ũa"))
+    print("'bint'i dous' ->", MWL.extract_number("bint'i dous"))
+    print("'bint'i dues' ->", MWL.extract_number("bint'i dues"))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c27d20 and abb45b3.

📒 Files selected for processing (2)
  • ovos_number_parser/numbers_mwl.py (1 hunks)
  • tests/test_number_parser_mwl.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_number_parser_mwl.py (1)
ovos_number_parser/util.py (2)
  • GrammaticalGender (48-54)
  • text (87-91)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (13)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • DigitPronunciation (57-59)
  • NumberVocabulary (264-362)
  • RomanceNumberExtractor (365-877)
  • pronounce_ordinal (716-775)
  • is_fractional (382-397)
  • is_ordinal (371-380)
  • text (87-91)
  • extract_number (399-494)
  • pronounce_number (552-677)
  • numbers_to_digits (496-550)
  • pronounce_fraction (679-714)
ovos_number_parser/__init__.py (7)
  • pronounce_ordinal (198-245)
  • is_fractional (318-379)
  • is_ordinal (382-406)
  • extract_number (248-315)
  • pronounce_number (84-170)
  • numbers_to_digits (36-81)
  • pronounce_fraction (173-195)
🪛 Ruff (0.14.7)
ovos_number_parser/numbers_mwl.py

392-392: f-string without any placeholders

Remove extraneous f prefix

(F541)


393-393: f-string without any placeholders

Remove extraneous f prefix

(F541)


394-394: f-string without any placeholders

Remove extraneous f prefix

(F541)


395-395: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (10)
tests/test_number_parser_mwl.py (5)

1-5: LGTM! Clean imports and appropriate test dependencies.

The imports are minimal and correctly target the MWL extractor and GrammaticalGender enum needed for the tests.


53-114: Good test coverage for cardinal and fraction pronunciation.

The tests comprehensively cover basic cardinals, hundreds, thousands, large numbers, decimals, negatives, and fractions. The skipped gendered cardinals test with @unittest.skip("TODO - fix me") appropriately marks known limitations for future work.


154-186: LGTM! Comprehensive extraction tests with good edge case coverage.

The extraction tests appropriately cover cardinals (including gendered variants like ũa/dues), decimals (including negatives), ordinals (both genders), phrase extraction, and failure cases. The assertFalse usage for unrecognized input is correct.


192-209: LGTM! Basic digits conversion tests with appropriate skip markers.

The basic conversion tests cover essential cases. The skipped decimal and negative conversion tests are appropriately marked for future implementation.


226-240: Verify the expected pronunciation for the large number test.

The expected text at line 231 for 1_234_567_899 may have an inconsistency. The pronunciation "mil duzientos i trinta i quatro milhones quinhentos i sessenta i siete mil uitocientos i nobenta i nuobe" appears to describe a number structure that may not match exactly. In long scale, 1,234,567,899 would be interpreted differently than in short scale. Ensure the expected output aligns with the Scale.LONG default of MWL.

ovos_number_parser/numbers_mwl.py (5)

1-4: LGTM! Imports are clean and consolidated.

The duplicate import issue from the previous review has been addressed.


33-39: LGTM! Basic pluralization logic with acknowledged TODO.

The pluralization rules handle -on-ones and default -s suffixing. The TODO comment appropriately flags uncertainty for future refinement.


42-225: Well-structured vocabulary configuration for Mirandese.

The _MWL vocabulary is comprehensive, covering units, tens, hundreds, fractions, ordinals, and scale-specific number words. The configuration correctly sets DEFAULT_SCALE=Scale.LONG for Mirandese and includes alternative spellings for extraction flexibility.


227-227: LGTM! Clean extractor instantiation.

The MWL object is correctly instantiated as a RomanceNumberExtractor with the _MWL vocabulary configuration.


229-340: LGTM! Well-implemented deprecation wrappers.

The deprecated functions correctly:

  1. Emit DeprecationWarning with stacklevel=2 to point to the caller
  2. Provide clear migration guidance
  3. Fall back to MWL.vocab.DEFAULT_SCALE when scale is None
  4. Delegate to the new MWL.* methods

Comment on lines +20 to +21
elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
return word[:-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential bug in masculine gender swap for words ending in -ma.

The logic for converting feminine -ma endings to masculine removes only the final character (word[:-1]), resulting in a word ending in m rather than the expected -o ending. For example, "décima" would become "décim" instead of "décimo".

Apply this diff to fix the gender swap:

     elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
-        return word[:-1]
+        return word[:-2] + 'o'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
return word[:-1]
elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
return word[:-2] + 'o'
🤖 Prompt for AI Agents
In ovos_number_parser/numbers_mwl.py around lines 20-21, the masculine gender
branch for words ending in 'ma' currently strips only the final character
(word[:-1]) which produces a trailing 'm' (e.g., "décima" -> "décim"); change
the transformation to replace the final 'a' with 'o' (for example return
word[:-1] + 'o') so "décima" becomes "décimo".

@github-actions github-actions bot added feature and removed feature labels Dec 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
ovos_number_parser/util.py (1)

433-498: Scale words mis-parsed; exact-scale phrases and large numbers are wrong.

There are several coupled issues in extract_number:

  1. Scale words are handled as plain cardinals before the scale logic.
    Because numbers_map includes entries for scale words (e.g., "millón", "mil"), tokens like "millón" or "mil" hit the val is not None branch and are added directly to current_number, so "un millón" becomes 1 000 001 instead of 1 000 000, and "dous millóns cincocentos" doesn’t aggregate as expected.

  2. The current_number is None branch is unreachable.
    Inside the scale loop, you do:

    current_number = current_number or 0
    ...
    if current_number is None:
        current_number = 1

    After current_number = current_number or 0, current_number is never None, so that if never fires. Bare scale words like "millón" can’t default to 1.

  3. Trailing cardinals after a scale are dropped once number_consumed is set.
    After hitting a scale, number_consumed is set to True, and the final fold:

    if not number_consumed and current_number:
        result = result or 0
        result += current_number

    will skip any remaining current_number. This is the classic “scaled part + leftover hundreds” problem.

Given how central this is to GL and AST (and other Romance languages), this is serious for correctness.

A more robust structure would be:

  • Detect scale words before val is not None handling.
  • Allow current_number to be None to mean “implicit 1”.
  • Always fold any leftover current_number into result at the end if any numeric content was seen.

Sketch (condensed for clarity):

-        val = numbers_map.get(token)
-        if val is not None:
+        val = numbers_map.get(token)
+
+        # 1) Handle scale words first
+        scale_val = None
+        for s_val, w in scales_map.items():
+            if token == w or token == self.vocab.pluralize(w):
+                scale_val = s_val
+                break
+
+        if scale_val is not None:
+            # Implicit "one million" when no prefix
+            if current_number is None:
+                current_number = 1
+            result = (result or 0) + current_number * scale_val
+            current_number = None
+            number_consumed = True
+
+        # 2) Plain cardinal tokens
+        elif val is not None:
             current_number = current_number or 0
             result = result or 0
             if prev_token in self.vocab.NEGATIVE_SIGN:
                 is_negative = True
@@
-            else:
-                # Handle large scales like milhão, bilhão
-                found_scale = False
-                for scale_val, w  in scales_map.items():
-                    if token == w or token == self.vocab.pluralize(w):
-                        current_number = current_number or 0
-                        result = result or 0
-                        if current_number is None:
-                            current_number = 1
-                        result += current_number * scale_val
-                        current_number = None
-                        found_scale = True
-                        number_consumed = True
-                        break
-
-                # Handle decimal numbers
-                if not found_scale:
+            else:
+                # Handle decimal numbers

And at the end:

-        if not number_consumed and current_number:
-            result = result or 0
-            result += current_number
+        if current_number is not None:
+            # Fold any trailing cardinal/ordinal component
+            result = (result or 0) + current_number
+            number_consumed = True

Combined with the fraction fix, this should give sensible results for phrases like "un millón", "dous millóns cincocentos", and "tres cuartos".

🧹 Nitpick comments (5)
ovos_number_parser/util.py (3)

231-261: Avoid callable default range(...) in function signature.

convert_to_mixed_fraction uses denominators=range(1, 21) as a default. Linters flag this (B008) because defaults are evaluated at import time; while range is immutable and safe here, it’s still better to follow the pattern.

Consider:

-def convert_to_mixed_fraction(number: Union[int, float], denominators=range(1, 21)) -> Optional[Tuple[int, int, int]]:
+def convert_to_mixed_fraction(number: Union[int, float], denominators=None) -> Optional[Tuple[int, int, int]]:
@@
-    if not denominators:
-        denominators = range(1, 21)
+    if denominators is None:
+        denominators = range(1, 21)

304-306: Unused gender argument in default swap_gender lambda.

The default swap_gender ignores its gender parameter, which triggers ARG005 and can confuse readers.

If you want a truly no-op default but still keep the signature, you can either:

-    swap_gender: Callable[[str, GrammaticalGender], str] = lambda word, gender: word
+    swap_gender: Callable[[str, GrammaticalGender], str] = staticmethod(lambda word, _gender: word)

or define a small named function above and use _gender in its parameter list.


610-611: Rename unused loop variable to _ for clarity.

In the decimal pronunciation loop:

for i in range(n_zeros):
    decimal_pronunciation_parts.append(self.vocab.UNITS[0])

i is unused and triggers B007. Renaming it to _ better conveys intent and quiets linters:

-                    for i in range(n_zeros):
+                    for _ in range(n_zeros):
                         decimal_pronunciation_parts.append(self.vocab.UNITS[0])
ovos_number_parser/numbers_mwl.py (1)

401-404: __main__ demo: f-strings without placeholders.

Lines like:

print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))

don’t interpolate anything; linters (F541) suggest dropping the f prefix:

-print(f"'bint'i un' ->", MWL.extract_number("bint'i un"))
+print("'bint'i un' ->", MWL.extract_number("bint'i un"))

Same for the following similar prints.

ovos_number_parser/numbers_gl.py (1)

104-114: Hundreds vocabulary needs feminine forms once _pronounce_up_to_999 uses swap_gender.

With the proposed change in _pronounce_up_to_999 (wrapping exact hundreds with swap_gender), HUNDREDS will be interpreted as the masculine base. For GL:

HUNDREDS={
    ...
}

and GENDERED_SPELLINGS only overrides units 1 and 2.

After _pronounce_up_to_999 starts calling swap_gender for exact hundreds, you should ensure swap_gender_gl (for single words) correctly turns "douscentos" into "douscentas", etc. If that’s hard to express generically, you can also add explicit entries to GENDERED_SPELLINGS[GrammaticalGender.FEMININE] for 200, 300, … 900 so they bypass the generic path entirely.

As is, tests such as:

GL.pronounce_number(200, gender=GrammaticalGender.FEMININE)

will continue to return "douscentos" until either swap_gender_gl or GENDERED_SPELLINGS encode the correct feminine hundreds.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abb45b3 and c82b4ec.

📒 Files selected for processing (7)
  • ovos_number_parser/numbers_ast.py (1 hunks)
  • ovos_number_parser/numbers_gl.py (1 hunks)
  • ovos_number_parser/numbers_mwl.py (1 hunks)
  • ovos_number_parser/util.py (10 hunks)
  • tests/test_number_parser_ast.py (1 hunks)
  • tests/test_number_parser_gl.py (1 hunks)
  • tests/test_number_parser_mwl.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_number_parser_mwl.py
🧰 Additional context used
🧬 Code graph analysis (3)
ovos_number_parser/util.py (1)
ovos_number_parser/__init__.py (6)
  • is_ordinal (382-406)
  • is_fractional (318-379)
  • extract_number (248-315)
  • numbers_to_digits (36-81)
  • pronounce_number (84-170)
  • pronounce_ordinal (198-245)
ovos_number_parser/numbers_mwl.py (2)
ovos_number_parser/util.py (13)
  • Scale (38-45)
  • GrammaticalGender (48-54)
  • DigitPronunciation (57-59)
  • NumberVocabulary (264-366)
  • RomanceNumberExtractor (369-879)
  • pronounce_ordinal (716-775)
  • is_fractional (386-401)
  • is_ordinal (375-384)
  • text (87-91)
  • extract_number (403-498)
  • pronounce_number (556-677)
  • numbers_to_digits (500-554)
  • pronounce_fraction (679-714)
ovos_number_parser/__init__.py (7)
  • pronounce_ordinal (198-245)
  • is_fractional (318-379)
  • is_ordinal (382-406)
  • extract_number (248-315)
  • pronounce_number (84-170)
  • numbers_to_digits (36-81)
  • pronounce_fraction (173-195)
tests/test_number_parser_ast.py (2)
ovos_number_parser/util.py (2)
  • GrammaticalGender (48-54)
  • pronounce_number (556-677)
ovos_number_parser/__init__.py (1)
  • pronounce_number (84-170)
🪛 GitHub Actions: Run Unit Tests
tests/test_number_parser_gl.py

[error] 23-23: GL composite numbers gender feminine: expected 'vinte e dúas' but got 'vinte e dous'.


[error] 34-34: GL hundreds: feminine form 'douscentas' expected, but got 'douscentos'.


[error] 55-55: GL long scale: expected 'mil millóns' but got 'mil mil mil'.


[error] 47-47: GL millions: expected 'un millón' but got 'mil mil'.


[error] 80-80: GL ordinals: expected 'millonésimo' for 1,000,000 but got 'mil milésimo'.

tests/test_number_parser_ast.py

[error] 117-117: AST fraction extraction failed: expected 0.75 but got 3.25.


[error] 189-189: Integration: large numbers extraction mismatch: 1234567 vs 1004797.

🪛 Ruff (0.14.7)
ovos_number_parser/util.py

231-231: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


304-304: Unused lambda argument: gender

(ARG005)


579-579: Avoid specifying long messages outside the exception class

(TRY003)


610-610: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


737-737: Avoid specifying long messages outside the exception class

(TRY003)


794-794: Avoid specifying long messages outside the exception class

(TRY003)


847-847: Avoid specifying long messages outside the exception class

(TRY003)

ovos_number_parser/numbers_mwl.py

401-401: f-string without any placeholders

Remove extraneous f prefix

(F541)


402-402: f-string without any placeholders

Remove extraneous f prefix

(F541)


403-403: f-string without any placeholders

Remove extraneous f prefix

(F541)


404-404: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/test_number_parser_ast.py

21-21: Comment contains ambiguous (EN DASH). Did you mean - (HYPHEN-MINUS)?

(RUF003)

🔇 Additional comments (7)
tests/test_number_parser_gl.py (2)

19-82: GL tests clearly expose current backend bugs; keep them as-is.

The failing expectations here:

  • "vinte e dúas" vs "vinte e dous" (Line 23),
  • "douscentas" vs "douscentos" (Line 34),
  • "un millón" / "mil millóns" / "millonésimo" vs various "mil …" outputs,

are consistent with the issues in RomanceNumberExtractor (scale selection and lack of gender handling for exact hundreds) and swap_gender_gl. The test data looks correct and should be treated as the source of truth; the fixes belong in ovos_number_parser/util.py and ovos_number_parser/numbers_gl.py, not in these tests.


116-147: Extraction and numbers-to-digits tests for GL align with intended semantics.

The extraction expectations for:

  • "dous millóns cincocentos"2_000_500,
  • "mil vinte e tres"1023,
  • "trinta e cinco coma catro"35.4,

and numbers_to_digits phrases (e.g., 'atopamos dous millóns cincocentos insectos' -> 'atopamos 2000500 insectos') are exactly what the generic extractor is supposed to support. Current failures in these areas are due to extractor logic, not the tests.

No changes requested here; once the extract_number scale/fraction fixes are applied, these tests should start passing.

ovos_number_parser/numbers_ast.py (1)

4-221: AST vocabulary and helpers look consistent; main issues are in shared extractor.

The Asturian-specific pieces here—swap_gender_ast, pluralize_ast, _AST’s UNITS/TENS/HUNDREDS/FRACTION/ORDINAL maps, and the AST = RomanceNumberExtractor(_AST) wiring—are coherent and match what the AST tests exercise.

The pipeline failures (fraction extraction of "tres cuartos" and large-number integration) stem from the shared RomanceNumberExtractor implementation (scale and fraction handling), which is already flagged in util.py. No changes needed in this file once the shared logic is fixed.

tests/test_number_parser_ast.py (2)

73-90: Fraction extraction test correctly exposes “integer + fraction” vs “numerator/denominator” bug.

TestFractionsAst.test_fraction_pronounce and TestExtractAST.test_fraction_phrase expect:

  • AST.is_fractional("mediu")0.5,
  • AST.extract_number("tres cuartos")0.75.

Current implementation returns 3.25 for the latter because it always does current_number + fraction. This test is doing the right thing by enforcing 3/4 semantics for “tres cuartos”; the fix belongs in RomanceNumberExtractor.extract_number (see earlier comment), not in the test.


179-189: Large-number round-trip test is valid and highlights scale-selection bug.

TestIntegrationAST.test_large_numbers expects AST.extract_number(AST.pronounce_number(1_234_567)) == 1_234_567. The current mismatch (e.g., 1_004_797) is consistent with the shared pronounce_number choosing the smallest scale instead of the largest plus extract_number’s imperfect scale handling.

The test is appropriate; fixes should be applied in RomanceNumberExtractor.pronounce_number and extract_number as already outlined.

ovos_number_parser/numbers_gl.py (2)

322-429: Deprecated GL wrappers correctly delegate but share the same scale/gender issues.

The deprecated functions (pronounce_ordinal_gl, is_fractional_gl, is_ordinal_gl, extract_number_gl, pronounce_number_gl, numbers_to_digits_gl, pronounce_fraction_gl) are thin adapters over GL with a deprecation warning, which is good.

Just be aware:

  • They inherit all behavior from RomanceNumberExtractor and _GL, including the scale and gender bugs already discussed.
  • Once you fix the shared logic in util.py and the GL-specific swap_gender_gl / vocab, these wrappers should start passing old tests too.

No structural changes recommended here beyond those shared fixes.


432-542: __main__ GL demo matches intended semantics but will currently show wrong outputs until core fixes land.

The examples printed here (e.g.,

  • 1_234_567"un millón douscentos trinta e catro mil cincocentos sesenta e sete",
  • 1_000_000_000"mil millóns",
  • ordinals like 1_000_000"millonésimo",
  • "dous millóns cincocentos" extraction → 2_000_500,

) align with the expectations encoded in tests/test_number_parser_gl.py.

Right now, users running this script will see the same mispronunciations/extraction errors flagged in the tests (due to the shared scale/gender issues). Once RomanceNumberExtractor and swap_gender_gl are corrected, this demo script should output the intended strings; there’s nothing to change here structurally.

Comment on lines +11 to +39
def swap_gender_gl(word: str, gender: GrammaticalGender) -> str:
"""
This function takes the given text and checks if it is a fraction.
Convert a Portuguese word between masculine and feminine grammatical gender by adjusting its ending.
Args:
text (str): the string to check if fractional
Parameters:
word (str): The word to convert.
gender (GrammaticalGender): The target grammatical gender.
short_scale (bool): use short scale if True, long scale if False
Returns:
(bool) or (float): False if not a fraction, otherwise the fraction
str: The word with its ending swapped to match the specified gender, if applicable; otherwise, the original word.
"""
if input_str.endswith('s', -1):
input_str = input_str[:len(input_str) - 1] # e.g. "fifths"

aFrac = {"medio": 2, "media": 2, "terzo": 3, "cuarto": 4,
"cuarta": 4, "quinto": 5, "quinta": 5, "sexto": 6, "sexta": 6,
"séptimo": 7, "séptima": 7, "oitavo": 8, "oitava": 8,
"noveno": 9, "novena": 9, "décimo": 10, "décima": 10,
"onceavo": 11, "onceava": 11, "doceavo": 12, "doceava": 12}

if input_str.lower() in aFrac:
return 1.0 / aFrac[input_str]
if (input_str == "vixésimo" or input_str == "vixésima"):
return 1.0 / 20
if (input_str == "trixésimo" or input_str == "trixésima"):
return 1.0 / 30
if (input_str == "centésimo" or input_str == "centésima"):
return 1.0 / 100
if (input_str == "milésimo" or input_str == "milésima"):
return 1.0 / 1000
return False


def extract_number_gl(text, short_scale=True, ordinals=False):
if word == "dois" and gender == GrammaticalGender.FEMININE:
return "duas"
elif word == "duas" and gender == GrammaticalGender.MASCULINE:
return "dois"

elif gender == GrammaticalGender.FEMININE and word.endswith('o'):
return word[:-1] + 'a'
elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
return word[:-1]
elif gender == GrammaticalGender.MASCULINE and word.endswith('a'):
return word[:-1] + 'o'
elif gender == GrammaticalGender.FEMININE and word.endswith('os'):
return word[:-2] + 'as'
elif gender == GrammaticalGender.MASCULINE and word.endswith('as'):
return word[:-2] + 'os'
elif gender == GrammaticalGender.FEMININE and word.endswith('m'):
return word + 'a'
return word
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

swap_gender_gl is still Portuguese-flavored and doesn’t handle Galician composites.

A few points here:

  • The docstring still says “Convert a Portuguese word…”, but this module is Galician-specific.
  • Special-casing "dois"/"duas" is Portuguese; Galician UNITS use "dous"/"dúas", so that block never triggers.
  • swap_gender_gl operates on the entire phrase ("vinte e dous"), but only checks suffixes like 'o', 'os', 'a', 'as', 'm'. For "vinte e dous" (ending in 'us'), none of the conditions fire, so feminine GL.pronounce_number(22, gender=FEMININE) stays "vinte e dous" instead of "vinte e dúas" as the tests expect.

Given the shared _pronounce_up_to_999 already builds tens+unit phrases and then calls swap_gender, a pragmatic fix here is:

  • Make swap_gender_gl recognise "dous"/"dúas".
  • When given a multi-word phrase, apply gender swap only to the last token, not the whole string.

For example:

def swap_gender_gl(word: str, gender: GrammaticalGender) -> str:
-    if word == "dois" and gender == GrammaticalGender.FEMININE:
-        return "duas"
-    elif word == "duas" and gender == GrammaticalGender.MASCULINE:
-        return "dois"
+    # Handle multi-word phrases by swapping only the last token
+    if " " in word:
+        parts = word.split()
+        parts[-1] = swap_gender_gl(parts[-1], gender)
+        return " ".join(parts)
+
+    if word == "dous" and gender == GrammaticalGender.FEMININE:
+        return "dúas"
+    elif word == "dúas" and gender == GrammaticalGender.MASCULINE:
+        return "dous"
@@
-    elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
-        return word[:-1]
+    # …keep or adjust the generic suffix rules as appropriate for GL…

and update the docstring to say “Galician” instead of “Portuguese”.

Combined with the _pronounce_up_to_999 hundreds fix in util.py, this should give you the expected "vinte e dúas" and other gendered forms.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In ovos_number_parser/numbers_gl.py around lines 11 to 39, the function
swap_gender_gl still refers to Portuguese in the docstring and special-cases
"dois"/"duas", and it tries to swap gender on the whole phrase instead of only
the final token; this misses Galician forms like "dous"/"dúas" and multi-word
phrases like "vinte e dous". Update the docstring to say "Galician", add
recognition for Galician unit pair "dous" -> "dúas" (and the reverse), and
change the logic so when the input contains multiple words you split by
whitespace, apply the gender-swapping rules only to the last token (handling
endings 'o','a','os','as','m' and the new "dous"/"dúas" case), then rejoin and
return the full phrase.

Comment on lines +7 to +21
def swap_gender_mwl(word: str, gender: GrammaticalGender) -> str:
if word.endswith("un") and gender == GrammaticalGender.FEMININE:
return word[:-2] + "ũa"
elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
return word[:-2] + "ũa"
elif word == "dous" and gender == GrammaticalGender.FEMININE:
return "dues"
elif word == "dues" and gender == GrammaticalGender.MASCULINE:
return "dous"

# TODO - is this correct?
elif gender == GrammaticalGender.FEMININE and word.endswith('o'):
return word[:-1] + 'a'
elif gender == GrammaticalGender.MASCULINE and word.endswith('ma'):
return word[:-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Masculine conversion for words ending in ũa looks wrong.

Current logic:

if word.endswith("un") and gender == GrammaticalGender.FEMININE:
    return word[:-2] + "ũa"
elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
    return word[:-2] + "ũa"

The second branch effectively returns the same "…ũa" ending for masculine by chopping two chars and re‑adding "ũa". For example, "dũa""dũa". That doesn’t actually switch gender and is likely a copy‑paste mistake.

You probably meant to restore the masculine "un" form:

-    elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
-        return word[:-2] + "ũa"
+    elif word.endswith("ũa") and gender == GrammaticalGender.MASCULINE:
+        return word[:-2] + "un"

Please double-check with MWL examples, but as written this branch can’t change the word’s gender.

Comment on lines +375 to +401
def is_ordinal(self, input_str: str, scale: Optional[Scale] = None) -> bool:
"""
Determine if a string is a Portuguese ordinal word.
Returns:
bool: True if the input string is recognized as a Portuguese ordinal, otherwise False.
"""
scale = scale or self.vocab.DEFAULT_SCALE
ordinals_map = self.vocab.get_ordinal_strings(scale)
return input_str in ordinals_map

def is_fractional(self, input_str: str) -> Union[float, bool]:
"""
Checks if the input string corresponds to a recognized Portuguese fractional word.
Returns:
The fractional value as a float if recognized; otherwise, False.
"""
fractions_map = self.vocab.get_fraction_strings()
input_str = input_str.lower().strip()

# handle fraction denominator strings
for word, den in fractions_map.items():
if input_str == word:
return 1.0 / den

return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fraction aggregation treats “three quarters” as 3.25 instead of 0.75.

In extract_number, the fractional branch always does:

current_number = current_number or 0
result = result or 0
fraction = self.is_fractional(token)
result += current_number + fraction

For AST, "tres cuartos" becomes 3 + 1/4 = 3.25 instead of 3/4 = 0.75, matching the failing test in tests/test_number_parser_ast.py (Line 117). Similar patterns will show up for “tres terzos”, etc.

You likely want to treat a bare “ ” phrase as N * (1/den) when there is no prior integer part, while still supporting “integer + fraction” for mixed forms.

One possible approach:

-            elif self.is_fractional(token):
-                current_number = current_number or 0
-                result = result or 0
-                fraction = self.is_fractional(token)
-                result += current_number + fraction
-                current_number = None
-                number_consumed = True
+            elif self.is_fractional(token):
+                fraction = self.is_fractional(token)
+                # Initialize result if needed
+                result = result or 0
+
+                # If we've already consumed a scale/decimal/fraction before,
+                # treat current_number as integer part.
+                if number_consumed or not current_number:
+                    result += (current_number or 0) + fraction
+                else:
+                    # Bare "tres cuartos" style: interpret as numerator/denominator
+                    denom = round(1 / fraction)
+                    result += current_number / denom
+
+                current_number = None
+                number_consumed = True

You might want to refine the conditions (e.g., based on token count or language specifics), but the current behavior is clearly wrong for the AST test case.

Comment on lines +588 to +637
if "." in str(number):
integer_part = int(number)
decimal_part_str = str(number).split('.')[1].rstrip("0")

# Handle cases where the decimal part rounds to zero
if decimal_part_str and int(decimal_part_str) == 0:
return self.pronounce_number(integer_part, places,
scale=scale,
digits=digits, gender=gender)

int_pronunciation = self.pronounce_number(integer_part, places,
scale=scale,
digits=digits, gender=gender)

decimal_pronunciation_parts = []
# pronounce decimals either as a whole number or digit by digit
if decimal_part_str:
if digits == DigitPronunciation.FULL_NUMBER:

# handle leading zeros
no_z = decimal_part_str.lstrip("0") # without zeros
n_zeros = len(decimal_part_str) - len(no_z)
for i in range(n_zeros):
decimal_pronunciation_parts.append(self.vocab.UNITS[0])

if n_zeros >= places:
# read all zeros
last_digit = int(decimal_part_str[n_zeros])
after_last_digit = int(decimal_part_str[n_zeros + 1]) if len(decimal_part_str) > n_zeros + 1 else 0
# round up last digit if needed
if after_last_digit >= 5:
last_digit += 1
decimal_pronunciation_parts.append(self._pronounce_up_to_999(last_digit, gender=gender))
else:
if len(decimal_part_str) > places:
last_digit = int(decimal_part_str[places - 1])
after_last_digit = int(decimal_part_str[places])
# round up last digit if needed
if after_last_digit >= 5:
last_digit += 1
decimal_part_str = decimal_part_str[:places - 1] + str(last_digit)
decimal_pronunciation_parts.append(self.pronounce_number(int(decimal_part_str), gender=gender))
else:
for digit in decimal_part_str:
decimal_pronunciation_parts.append(self.pronounce_number(int(digit), gender=gender))

decimal_pronunciation = " ".join(decimal_pronunciation_parts) or self.vocab.UNITS[0]
decimal_word = self.vocab.DECIMAL_MARKER[0]
return f"{int_pronunciation} {decimal_word} {decimal_pronunciation}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Decimal pronunciation can crash on .0 values and has minor rounding edge cases.

In the decimal branch of pronounce_number:

  • For numbers like 1.0 or 2.00, decimal_part_str becomes "" after .rstrip("0"). The subsequent logic doesn’t guard against an empty string and will eventually call int(decimal_part_str), which raises ValueError.
  • The if n_zeros >= places: block indexes decimal_part_str[n_zeros] without ensuring n_zeros < len(decimal_part_str).

A safer pattern:

        if "." in str(number):
            integer_part = int(number)
            decimal_part_str = str(number).split('.')[1].rstrip("0")

-            # Handle cases where the decimal part rounds to zero
-            if decimal_part_str and int(decimal_part_str) == 0:
-                return self.pronounce_number(integer_part, places,
-                                           scale=scale,
-                                           digits=digits, gender=gender)
+            # If nothing remains after stripping zeros, it's effectively an integer
+            if not decimal_part_str:
+                return self.pronounce_number(
+                    integer_part, places, scale=scale, digits=digits, gender=gender
+                )

and ensure any indexing into decimal_part_str[...] is only done when len(decimal_part_str) > index.

🧰 Tools
🪛 Ruff (0.14.7)

610-610: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

Comment on lines +647 to +677
# Find the smallest scale that fits the number
scale_candidates = [(scale_val, scale_str)
for scale_val, scale_str in scales_map.items()
if n >= scale_val]
scale_val: Tuple[int, str] = scale_candidates[0]

count = n // scale_val[0]
remainder = n % scale_val[0]

# Pronounce the 'count' part of the number
is_singular = count == 1 or scale_val[0] in self.vocab.NO_PLURAL
scale_word = scale_val[1] if is_singular else self.vocab.pluralize(scale_val[1])
if count == 1 and scale_val[0] in self.vocab.NO_PREV_UNIT:
count_str = scale_word
else:
count_pronunciation = self.pronounce_number(count, places, scale)
count_str = f"{count_pronunciation} {scale_word}"

# If there's no remainder, we're done
if remainder == 0:
return count_str

# Pronounce the remainder and join with the correct conjunction
remainder_str = self.pronounce_number(remainder, places, scale)
# Conjunction logic: add JOIN_WORD if the remainder is the last group and is
# less than 100 or a multiple of 100.
join_word = self.vocab.JOIN_WORD[0] if len(self.vocab.JOIN_WORD) else ""
if remainder < 100 or (remainder < 1000 and remainder % 100 == 0):
return f"{count_str} {join_word} {remainder_str}"
else:
return f"{count_str} {remainder_str}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Pronunciation of large numbers picks the smallest scale instead of the largest.

In pronounce_number, you compute:

scales_map = ...
scale_candidates = [(scale_val, scale_str)
                    for scale_val, scale_str in scales_map.items()
                    if n >= scale_val]
scale_val: Tuple[int, str] = scale_candidates[0]

Because dictionaries are insertion-ordered, this picks the first matching scale, i.e., the smallest value ≥ 1 000. For GL, where LONG_SCALE starts with 1000: "mil" then 10**6: "millón", 1_000_000 is pronounced as "mil mil"; 1_000_000_000 becomes "mil mil mil". This exactly matches the GL test failures.

You want the largest scale ≤ n. Suggestion:

-        scale_candidates = [(scale_val, scale_str)
-                            for scale_val, scale_str in scales_map.items()
-                            if n >= scale_val]
-        scale_val: Tuple[int, str] = scale_candidates[0]
+        scale_candidates = [
+            (scale_val, scale_str)
+            for scale_val, scale_str in scales_map.items()
+            if n >= scale_val
+        ]
+        if not scale_candidates:
+            return self._pronounce_up_to_999(n, gender)
+        # choose the largest applicable scale
+        scale_val, scale_str = max(scale_candidates, key=lambda item: item[0])

and then use scale_str instead of re-indexing scale_val[1] below.

The same bug exists in pronounce_ordinal (see comment there); both need to be fixed to get correct “un millón”, “mil millóns”, and “millonésimo”.

🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 647 to 677, the code picks the first
matching scale (smallest) from scale_candidates causing large numbers to use
repeated smaller scales; change the logic to select the largest scale <= n by
using max(scale_candidates, key=lambda t: t[0]) (or sort and take last) and
unpack it as (scale_val, scale_str) so subsequent code uses scale_str rather
than re-indexing scale_val[1]; apply the exact same fix to the corresponding
block in pronounce_ordinal so both functions choose the largest applicable scale
and use the proper scale_str variable.

Comment on lines +657 to +676
is_singular = count == 1 or scale_val[0] in self.vocab.NO_PLURAL
scale_word = scale_val[1] if is_singular else self.vocab.pluralize(scale_val[1])
if count == 1 and scale_val[0] in self.vocab.NO_PREV_UNIT:
count_str = scale_word
else:
count_pronunciation = self.pronounce_number(count, places, scale)
count_str = f"{count_pronunciation} {scale_word}"

# If there's no remainder, we're done
if remainder == 0:
return count_str

# Pronounce the remainder and join with the correct conjunction
remainder_str = self.pronounce_number(remainder, places, scale)
# Conjunction logic: add JOIN_WORD if the remainder is the last group and is
# less than 100 or a multiple of 100.
join_word = self.vocab.JOIN_WORD[0] if len(self.vocab.JOIN_WORD) else ""
if remainder < 100 or (remainder < 1000 and remainder % 100 == 0):
return f"{count_str} {join_word} {remainder_str}"
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hundreds ignore grammatical gender for exact multiples (e.g., 200 feminine).

_pronounce_up_to_999 applies swap_gender only in the composite case and returns self.vocab.HUNDREDS[n] directly for exact multiples of 100:

if n in self.vocab.HUNDREDS:
    return self.vocab.HUNDREDS[n]
...
return self.vocab.swap_gender(" ".join(parts), gender)

This means GL.pronounce_number(200, gender=FEMININE) always yields "douscentos", never "douscentas", which is exactly what the GL tests report.

A generic fix that works for all Romance languages:

-        if n in self.vocab.HUNDREDS:
-            return self.vocab.HUNDREDS[n]
+        if n in self.vocab.HUNDREDS:
+            base = self.vocab.HUNDREDS[n]
+            return self.vocab.swap_gender(base, gender)

Given other languages either have no gender variation for hundreds (in which case swap_gender will be a no-op) or can encode their rules in swap_gender, this change centralizes the behavior correctly.

🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 657 to 676, the function handling
hundreds returns self.vocab.HUNDREDS[n] directly for exact multiples of 100,
causing grammatical gender to be ignored (e.g., 200 always yields masculine
form); change the logic so the returned hundred word is passed through
self.vocab.swap_gender(..., gender) (i.e., apply swap_gender to the hundred-word
in the exact-multiple branch), and ensure the same swap_gender usage remains for
the composite case so all hundreds respect the requested gender across
languages.

Comment on lines +748 to +775
scales_map = self.vocab.ORDINAL_SHORT_SCALE if scale == Scale.SHORT else self.vocab.ORDINAL_LONG_SCALE

# Find the smallest scale that fits the number
scale_candidates = [(scale_val, scale_str)
for scale_val, scale_str in scales_map.items()
if n >= scale_val]
scale_val: Tuple[int, str] = scale_candidates[0]

count = n // scale_val[0]
remainder = n % scale_val[0]

# Special case for "milésimo" and other large scales where 'um' is not needed
if count == 1:
count_str = self.vocab.swap_gender(scale_val[1], gender)
else:
# Pronounce the 'count' part of the number and the scale word
count_pronunciation = self.pronounce_number(count, scale=scale)
scale_word = self.vocab.swap_gender(scale_val[1], gender)
count_str = f"{count_pronunciation} {scale_word}"

# If there's no remainder, we're done
if remainder == 0:
return count_str

# Pronounce the remainder and join
remainder_str = self.pronounce_ordinal(remainder, gender, scale)

return f"{count_str} {remainder_str}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Ordinal pronunciation has the same “smallest scale” bug as cardinal pronunciation.

pronounce_ordinal mirrors pronounce_number in its scale handling:

scale_candidates = [(scale_val, scale_str)
                    for scale_val, scale_str in scales_map.items()
                    if n >= scale_val]
scale_val: Tuple[int, str] = scale_candidates[0]

For GL, 1_000_000 ordinal becomes "mil milésimo" instead of "millonésimo", exactly as seen in the GL tests.

Apply the same fix as for pronounce_number:

-        scale_candidates = [(scale_val, scale_str)
-                            for scale_val, scale_str in scales_map.items()
-                            if n >= scale_val]
-        scale_val: Tuple[int, str] = scale_candidates[0]
+        scale_candidates = [
+            (scale_val, scale_str)
+            for scale_val, scale_str in scales_map.items()
+            if n >= scale_val
+        ]
+        if not scale_candidates:
+            # 0–999 should already have been handled
+            raise ValueError("No ordinal scale found for number")
+        scale_val, scale_str = max(scale_candidates, key=lambda item: item[0])
@@
-        if count == 1:
-            count_str = self.vocab.swap_gender(scale_val[1], gender)
+        if count == 1:
+            count_str = self.vocab.swap_gender(scale_str, gender)
@@
-            scale_word = self.vocab.swap_gender(scale_val[1], gender)
+            scale_word = self.vocab.swap_gender(scale_str, gender)
🤖 Prompt for AI Agents
In ovos_number_parser/util.py around lines 748 to 775, the ordinal pronunciation
picks the first matching scale candidate resulting in wrong outputs (e.g., "mil
milésimo" for 1_000_000); change the selection logic to choose the largest
scale_val <= n (same fix as pronounce_number): build scale_candidates as you
already do, then set scale_val = max(scale_candidates, key=lambda t: t[0]) (or
sort and take the last) so the biggest fitting scale is used; keep the rest of
the logic (count, remainder, gender swap, recursion) unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants