Skip to content

find_uniqueids_in_text fails to extract IMDb ID from URLs with language prefix (e.g. /de/) #257

Description

@chfury

The regex in scraper_datahelper.py used to extract IMDb IDs from NFO files fails when the URL contains a language prefix like /de/.

Affected regex:

res = re.search(r'imdb....?/title/tt([0-9]+)', input_text)

Example URL that fails:

https://www.imdb.com/de/title/tt14280366/

The regex expects imdb.com/title/... but the URL has imdb.com/de/title/..., so no ID is extracted. Kodi then falls back to a title search, which fails for movies whose filename contains transliterated umlauts (e.g. Kuechenbrigade instead of Küchenbrigade).

Proposed fix:

res = re.search(r'imdb....?/(?:[a-z]+/)?title/tt([0-9]+)', input_text)

The (?:[a-z]+/)? makes the optional language prefix match correctly.

Steps to reproduce:

  1. Have an NFO file containing an IMDb URL with a language prefix, e.g. https://www.imdb.com/de/title/tt14280366/
  2. Scan the file into Kodi library using the TMDB Python scraper
  3. Kodi logs: Find movie with title '...' from year '...' — meaning the ID was not found and it fell back to title search

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions