You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Purple air provides then in unix time while habitat map is similar but includes milliseconds and clarity provides iso format. And then in the ingestor we are using a try/catch block to determine the format that a row is in, which is making ingest times very long for files that are, ironically, already in the right format.
I propose that we force all fetchers to return data in the same format (iso)
I also propose that we rewrite the ingestor method to assume iso but when it encounters something that is not iso it figures out what it is and then updates its method. This assumes 1) most if not all files will be iso (the should be), 2) if something is not iso then the whole file will be the same and/or 3) if we have a bunch of files stacked, then the updated method will be needed for multiple lines at least and therefor worth keeping.
e.g.
# in the ingestordatetime_formatter=iso_formatterforrowincsv.reader(content.split('\n')):
try:
row[2] =datetime_formatter(row[2])
exceptException:
# figure out what format and update the formatterdatetime_formatter=choosen_formatterrow[2] =datetime_formatter(row[2])
The text was updated successfully, but these errors were encountered:
This makes sense to make the datetimes standard across adapters. Just for clarity why prefer ISO to milliseconds? readability? is it the common input format from the provider?
Right now iso is what the ingestor is converting it to for ingestion. Plus its more readable.
Milliseconds seems odd to me given that, currently, all the values are 000 and therefor not really used and needs to be parsed differently in python. Also, the instruments themselves are actually averaging over a few minutes (or an hour) and therefor the time is just the end time, so even if the milliseconds were being populated it seems incorrectly precise.
Unix time could be an option but, I think, its slower to covert from unix time to timestamptz than it is to go from an iso string to timestamptz, but I could be wrong there.
Purple air provides then in unix time while habitat map is similar but includes milliseconds and clarity provides iso format. And then in the ingestor we are using a try/catch block to determine the format that a row is in, which is making ingest times very long for files that are, ironically, already in the right format.
I propose that we force all fetchers to return data in the same format (iso)
I also propose that we rewrite the ingestor method to assume iso but when it encounters something that is not iso it figures out what it is and then updates its method. This assumes 1) most if not all files will be iso (the should be), 2) if something is not iso then the whole file will be the same and/or 3) if we have a bunch of files stacked, then the updated method will be needed for multiple lines at least and therefor worth keeping.
e.g.
The text was updated successfully, but these errors were encountered: