Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving to quantiles or other statistics #8

Open
giacomoorsi opened this issue Oct 16, 2023 · 0 comments
Open

Moving to quantiles or other statistics #8

giacomoorsi opened this issue Oct 16, 2023 · 0 comments
Labels
data analysis enhancement Improvement of data analysis or data story wontfix This will not be worked on

Comments

@giacomoorsi
Copy link
Owner

giacomoorsi commented Oct 16, 2023

It is reasonable to consider showing median delays instead of average delays.

  • Median delays provide the simple intuition: half of the time the train arrives with a delay lower than the value shown
  • Median is probably a concept not understood by the general population
  • The average delay is very sensitive to trains delayed in exceptional cases (e.g. infrastructure failure). A delay of 500 minutes which happens once, moves the average delay a lot, but it doesn't have any impact on the median. E.g. over one month, if a train is delayed for 0 minutes every day, and one day it has a delay of 120 minutes, the average delay is 4 minutes, which would be displayed orange/red on the website

However, showing median delays reveals a good performance all over Italy, for any kind of trains, including InterCity trains, which are the most problematic (see picture below), therefore it doesn't lead to any discussion

Screenshot 2023-10-16 at 12 36 39

That's why, perhaps, we should consider moving to other quantiles. For instance the quantile 0.9 would tell 90% of the trains arrive with a delay lower than this value. However, this might be hard to be explained to the general public.

Perhaps, we could also explore moving to the "percentage" of train delayed. We set a threshold of delay, for instance 3 or 5 minutes and we show the percentage.

@giacomoorsi giacomoorsi added the website enhancement Improvement of website label Oct 16, 2023
@giacomoorsi giacomoorsi added wontfix This will not be worked on data analysis enhancement Improvement of data analysis or data story and removed website enhancement Improvement of website labels Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data analysis enhancement Improvement of data analysis or data story wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant