Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different providers/adapters output times differently #25

Open
caparker opened this issue Nov 19, 2022 · 2 comments
Open

Different providers/adapters output times differently #25

caparker opened this issue Nov 19, 2022 · 2 comments

Comments

@caparker
Copy link
Collaborator

Purple air provides then in unix time while habitat map is similar but includes milliseconds and clarity provides iso format. And then in the ingestor we are using a try/catch block to determine the format that a row is in, which is making ingest times very long for files that are, ironically, already in the right format.

I propose that we force all fetchers to return data in the same format (iso)
I also propose that we rewrite the ingestor method to assume iso but when it encounters something that is not iso it figures out what it is and then updates its method. This assumes 1) most if not all files will be iso (the should be), 2) if something is not iso then the whole file will be the same and/or 3) if we have a bunch of files stacked, then the updated method will be needed for multiple lines at least and therefor worth keeping.

e.g.

# in the ingestor
datetime_formatter = iso_formatter
for row in csv.reader(content.split('\n')):
    try:
       row[2] = datetime_formatter(row[2])
    except Exception:
       # figure out what format and update the formatter
       datetime_formatter = choosen_formatter
       row[2] = datetime_formatter(row[2])
@russbiggs
Copy link
Member

This makes sense to make the datetimes standard across adapters. Just for clarity why prefer ISO to milliseconds? readability? is it the common input format from the provider?

@caparker
Copy link
Collaborator Author

Right now iso is what the ingestor is converting it to for ingestion. Plus its more readable.

Milliseconds seems odd to me given that, currently, all the values are 000 and therefor not really used and needs to be parsed differently in python. Also, the instruments themselves are actually averaging over a few minutes (or an hour) and therefor the time is just the end time, so even if the milliseconds were being populated it seems incorrectly precise.

Unix time could be an option but, I think, its slower to covert from unix time to timestamptz than it is to go from an iso string to timestamptz, but I could be wrong there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants