Skip to content

check to raise error when non-standard utf-8 #873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 31, 2025

Conversation

alhelguera
Copy link
Collaborator

catching the UTF-8 decoding error to give a better error message

@alhelguera alhelguera linked an issue Feb 17, 2025 that may be closed by this pull request
@@ -323,6 +324,19 @@ def publish_from_csv(path: Path, new_topic: str = None) -> None:
LOGGER.debug(f'Publishing station list from {path}')
station_list = []
with path.open() as fh:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with path.open() as fh: 
    try:
        _ = fh.readlines()
    except UnicodeDecodeError as err:
        msg = f'Invalid utf-8 in station metadata file: {err}'
        LOGGER.error(msg)
        raise RuntimeError(msg)
   
    fh.seek(0)

    reader = csv.DictReader(fh)

@tomkralidis tomkralidis added this to the sprint-017 milestone Feb 18, 2025
@tomkralidis tomkralidis added bug Something isn't working station metadata Station metadata labels Feb 18, 2025
@tomkralidis
Copy link
Collaborator

@alhelguera to add test data/CI integration.

@alhelguera alhelguera requested a review from tomkralidis March 27, 2025 14:57
- name: add Brazil wrongly encoded station data and check for error message
env:
STATION_METADATA: /data/wis2box/metadata/station/brazil.csv
run: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be encoding test logic in CI. This test should be in put forth in tests/integration/test_workflow.py instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomkralidis, I disagree with your comment you asked Alberto to write the test in CI in your previous request and this test is based on command-line logic to demonstrate that the user receives a clear error when ingesting stations from the command-line, we should not put this in tests/integration/test_workflow.py

@alhelguera alhelguera requested a review from tomkralidis March 31, 2025 18:34
@tomkralidis tomkralidis merged commit 40b9e2b into main Mar 31, 2025
6 checks passed
@tomkralidis tomkralidis deleted the non-standard-utf-8-encoding-in-station branch March 31, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working station metadata Station metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

detecting non-standard utf-8 encoding in station csv files
3 participants