Different providers/adapters output times differently #25

caparker · 2022-11-19T15:17:30Z

Purple air provides then in unix time while habitat map is similar but includes milliseconds and clarity provides iso format. And then in the ingestor we are using a try/catch block to determine the format that a row is in, which is making ingest times very long for files that are, ironically, already in the right format.

I propose that we force all fetchers to return data in the same format (iso)
I also propose that we rewrite the ingestor method to assume iso but when it encounters something that is not iso it figures out what it is and then updates its method. This assumes 1) most if not all files will be iso (the should be), 2) if something is not iso then the whole file will be the same and/or 3) if we have a bunch of files stacked, then the updated method will be needed for multiple lines at least and therefor worth keeping.

e.g.

# in the ingestor
datetime_formatter = iso_formatter
for row in csv.reader(content.split('\n')):
    try:
       row[2] = datetime_formatter(row[2])
    except Exception:
       # figure out what format and update the formatter
       datetime_formatter = choosen_formatter
       row[2] = datetime_formatter(row[2])

russbiggs · 2022-11-19T16:28:33Z

This makes sense to make the datetimes standard across adapters. Just for clarity why prefer ISO to milliseconds? readability? is it the common input format from the provider?

caparker · 2022-11-19T17:21:22Z

Right now iso is what the ingestor is converting it to for ingestion. Plus its more readable.

Milliseconds seems odd to me given that, currently, all the values are 000 and therefor not really used and needs to be parsed differently in python. Also, the instruments themselves are actually averaging over a few minutes (or an hour) and therefor the time is just the end time, so even if the milliseconds were being populated it seems incorrectly precise.

Unix time could be an option but, I think, its slower to covert from unix time to timestamptz than it is to go from an iso string to timestamptz, but I could be wrong there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different providers/adapters output times differently #25

Different providers/adapters output times differently #25

caparker commented Nov 19, 2022

russbiggs commented Nov 19, 2022

caparker commented Nov 19, 2022

Different providers/adapters output times differently #25

Different providers/adapters output times differently #25

Comments

caparker commented Nov 19, 2022

russbiggs commented Nov 19, 2022

caparker commented Nov 19, 2022