-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate the covid-19-uk-data repo #68
Comments
Many thanks to you and everyone else who contributed to this repo. |
Many thanks for all the help! Much appreciated. |
Hi Tom. It makes sense, although sad to see it stop as it's been an island of sanity in the lunacy of our 4 nations approaches to reporting data streams. One thing this repo offers (which the "official" sources don't) is the commit history of the time series. This will be useful in investigating issues in delays in reporting and recreating the data set as it was at particular points in time. For example, I think that delays reporting cases in the early days of the outbreak may have significantly affected the interpretation of the situation, and hence decisions around timing of the lockdown. Obviously the main use case for the evolution of the historical time series is the early stage, which wouldn't be impacted by winding this up now, but my point is that the official sources do not provide the commit history in the same way and this makes your repository unique in the UK. We may find that the historical data around local outbreaks are similarly interesting in the future. It's your call, and it will continue to be a useful resource either way. Cheers, |
Hi Rob, Thanks for your comments. I agree that having a history of changes so people can look back and see how things were reported at the time is valuable. As you said it's especially interesting at the beginning of the pandemic. I thought about this as a reason for continuing, but the change history is now being published for England, and for Scotland (on GitHub!) at least. Wales publishes a new spreadsheet every day, which may have revised historical figures in it (so doesn't retain the change history), and NI doesn't publish its data in machine readable form. I think it would be fairly easy for someone to write a GH action (or similar) that downloads and archives the Wales data every day. It could also translate it into a set of CSVs to make it easier to consume. Cheers, |
Thanks so much for all your help and assistance, the ever changing goal posts in the ways in which the different countries chose to deal with their data, make it available, change it every five mins, was a nightmare and your repository has been a god send! |
Hello Tom, Guido |
Thanks Guido. BTW you can get data for the other nations (except NI) at the links listed here: https://github.com/tomwhite/covid-19-uk-data#data-sources |
Thanks for all your work Tom. Your data enabled us to build our application https://covidlive.co.uk. We'll be maintaining a limited fork of this repo at https://github.com/geeogi/covid-19-uk-data while we migrate to a new service. |
Hello Tom, I think it is still possible to keep this project code relevant by slightly changing its purpose. Instead of just dealing with UK, if this project deals with global statistics, this project may still be useful. There are many countries that still do not provide easy machine readable data yet. |
I would like to deprecate this repo and encourage consumers to move to official upstream data sources. I'd like to stop updates in a month's time (1 August 2020).
When I started curating UK COVID-19 data in early March, numbers for people tested, confirmed cases, and deaths were only available on web pages, and did not provide a historical timeseries. That has now changed, with all the UK health agencies (except Northern Ireland, see below) providing machine-readable historical datasets. In fact, most of the datasets are now much richer than the data provided in this repository, including data such as number of hospitalizations and calls to helplines. For that reason, people who are working with COVID-19 data will typically be using the upstream sources anyway, to access this richer data.
As a case in point, the debate over Pillar 2 data has meant that the confirmed case numbers of England have become potentially misleading, so I have stopped providing them from this repository (#67). The data is still available from https://coronavirus.data.gov.uk/downloads/csv/coronavirus-cases_latest.csv, and in the last few days PHE have published week-level case numbers for England that contain Pillar 2 data (see the spreadsheet on this page: https://www.gov.uk/government/publications/national-covid-19-surveillance-reports). The hope is that they will publish this information at daily granularity, but until they do this illustrates the fact that working with COVID data is messy and necessarily involves working with multiple sources of data, even with efforts like this one.
The lack of machine-readable data for Northern Ireland is another unfortunate reality, and while I have been able to work around this problem in the past by using an undocumented backend API to get the case numbers for LGDs, this stopped working recently in such a way that it started reporting incorrect data. I feel it is wrong to rely on this undocumented API, given how it can silently break, and that people who want machine-readable data should make the case to the NI Department for Health (I was not successful in my request to them, see #63).
The data sources that this repo relies on are documented here: https://github.com/tomwhite/covid-19-uk-data#data-sources. Most consumers of the data should be able to move to these sources fairly easily. Most of them are in CSV or JSON format, at known locations, and with stable formats. There may be some challenges though - URLs that change every day, or parsing XLSX (for Wales) on some platforms - spring to mind, but these are the kind of things that I hope can be fixed by the community or the official providers.
The text was updated successfully, but these errors were encountered: