Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Stop Arrival Times #13

Open
1 task
radumas opened this issue Mar 14, 2017 · 7 comments
Open
1 task

Generate Stop Arrival Times #13

radumas opened this issue Mar 14, 2017 · 7 comments
Assignees

Comments

@radumas
Copy link
Collaborator

radumas commented Mar 14, 2017

PostgreSQL function that gets called every scraper run.
Following the gtfs spec

 {trip_id, arrival_time, departure_time, stop_id, stop_sequence}

Note: The times would actually be timestamps.

Which brings up:

  • check the current TTC gtfs schema for trip_ids, stop_ids
@radumas radumas added this to the Generate Station Arrival Times and Performance Metrics milestone Mar 14, 2017
@radumas radumas self-assigned this Mar 14, 2017
@radumas
Copy link
Collaborator Author

radumas commented Mar 15, 2017

Or we could use the form developed by my friend of use date + interval for the timestamps.

@radumas
Copy link
Collaborator Author

radumas commented Mar 15, 2017

First draft

SELECT DISTINCT ON (pollid, lineid, trainid, traindirection, stationid) lineid, create_date, traindirection, trainid, stationid, timint, train_message
FROM ntas_data 
INNER JOIN requests USING (requestid)
INNER JOIN polls USING (pollid)
WHERE train_message = 'AtStation' OR timint < 1
ORDER BY pollid, lineid, trainid, traindirection, stationid, create_date

Which leads to some problems with delayed trains

'lineid' 'create_date' 'traindirection' 'trainid' 'stationid' 'timint' 'train_message'
1 '2017-03-07 20:27:13' 'North' 102 4 0.0 'AtStation'
1 '2017-03-07 20:27:13' 'North' 102 13 0.0 'Delayed'
1 '2017-03-07 20:27:13' 'South' 102 11 0.0 'AtStation'
1 '2017-03-07 20:27:13' 'South' 102 18 0.21618222222222222 'Arriving'

@radumas
Copy link
Collaborator Author

radumas commented Oct 9, 2019

The arrival time inference algorithm I built in SQL has some issues. The below graph shows the difference in the number of trips having different lengths (number of stops) for inferred vs scheduled (gtfs).

image

Digging into where these problems might Interesting finding of where my SQL query creates a lot of short (2,3 station) trips

image

@radumas
Copy link
Collaborator Author

radumas commented Oct 23, 2019

@radumas
Copy link
Collaborator Author

radumas commented Oct 30, 2019

Turns out those station_char for the "3 stop trips" are actually for Line 2, not Line 1... so there may be an issue of cross-pollination in the API...

@radumas
Copy link
Collaborator Author

radumas commented Dec 14, 2019

I managed to take a solid chunk out of the really short trips, but have introduced.... trips that are somehow longer than scheduled.

image

@radumas
Copy link
Collaborator Author

radumas commented Dec 14, 2019

If grouping by station (getting rid of the platform number at the end), that helps.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant