Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Timing data contains laps with incorrect duplicate lap times #404

Open
theOehrly opened this issue Jun 20, 2023 · 5 comments
Open
Labels
accuracy Related to accuracy of timing or telemetry data bug Something isn't working
Milestone

Comments

@theOehrly
Copy link
Owner

theOehrly commented Jun 20, 2023

Describe the issue:

For the Qualifying of the 2023 Canadian GP, the timing data for some drivers contains laps that have the exact same lap time as a previous lap.

For example: Perez' first two laps, Verstappen's last two laps

Reference: https://www.fia.com/sites/default/files/2023_09_can_f1_q0_timing_qualifyingsessionlaptimes_v01.pdf

Edit after first investigation:

The laps that have incorrect lap times (and sector 3 times) are laps during which the session was red-flagged. The lap time and sector 3 time of the previous lap is then received again from the API. I.e. the incorrectly duplicated data already exists in the source data.

Expected Behaviour

FastF1 should detect that these values are incorrect and ignore them.

Reproduce the code example:

import fastf1

session = fastf1.get_session(2023, 'Canada', 'Q')
session.load(telemetry=False)

ver = session.laps.pick_driver('VER')

print(ver.loc[:, ('LapNumber', 'Time', 'LapTime')])

Error message:

core           INFO 	Loading data for Canadian Grand Prix - Qualifying [v3.0.4]
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '27', '14', '44', '63', '31', '4', '55', '81', '23', '16', '11', '18', '20', '77', '22', '10', '21', '2', '24']

    LapNumber                   Time                LapTime
0         1.0 0 days 00:25:02.786000                    NaT
1         2.0 0 days 00:26:36.836000                    NaT
2         3.0 0 days 00:28:00.942000 0 days 00:01:24.106000
3         4.0 0 days 00:29:23.785000 0 days 00:01:22.843000
4         5.0 0 days 00:30:54.954000 0 days 00:01:31.169000
5         6.0 0 days 00:32:16.942000 0 days 00:01:21.988000
6         7.0 0 days 00:33:56.601000 0 days 00:01:39.659000
7         8.0 0 days 00:35:22.770000 0 days 00:01:26.169000
8         9.0 0 days 00:36:44.509000 0 days 00:01:21.739000
9        10.0 0 days 00:38:41.690000 0 days 00:01:57.181000
10       11.0 0 days 00:40:02.541000 0 days 00:01:20.851000
11       12.0 0 days 00:47:02.809000                    NaT
12       13.0 0 days 00:48:34.656000                    NaT
13       14.0 0 days 00:49:54.791000 0 days 00:01:20.135000
14       15.0 0 days 00:51:39.433000 0 days 00:01:44.642000
15       16.0 0 days 00:53:10.331000 0 days 00:01:30.898000
16       17.0 0 days 00:54:30.708000 0 days 00:01:20.377000
17       18.0 0 days 00:55:49.800000 0 days 00:01:19.092000
18       19.0 0 days 00:57:16.184000 0 days 00:01:26.384000
19       20.0 0 days 00:58:42.033000 0 days 00:01:25.849000
20       21.0 0 days 01:10:02.660000                    NaT
21       22.0 0 days 01:11:38.752000                    NaT
22       23.0 0 days 01:13:05.811000 0 days 00:01:27.059000
23       24.0 0 days 01:14:31.669000 0 days 00:01:25.858000
24       25.0 0 days 01:21:59.694000 0 days 00:01:25.858000
25       26.0 0 days 01:23:44.223000                    NaT
@theOehrly theOehrly added this to the v3.2.0 milestone Jun 21, 2023
@theOehrly theOehrly added bug Something isn't working accuracy Related to accuracy of timing or telemetry data labels Jun 21, 2023
@AND2797
Copy link
Contributor

AND2797 commented Sep 15, 2023

I was thinking if we can check the race_control_messages for RED FLAG, and try to figure out whether there are any laps that have red_flag_time < Time < immediate_green_flag. This will require some processing and helper methods to do further analysis on the laps and race_control_messages.

I also noticed, that the lap "Time" (start time) i think is in GMT and race control messages are maybe in race local time. I think race control messages time should be converted to GMT ?

@d-tomasino
Copy link
Contributor

d-tomasino commented Sep 18, 2023

import fastf1

session = fastf1.get_session(2023, 'Canada', 'Q')
session.load(telemetry=False)

ver = session.laps.pick_driver('VER')

ver_df = ver.loc[:, ('LapNumber', 'Time', 'LapTime')]

#create a column that calculates the difference with previous finish lap time
ver_df['sub_time'] = ver_df['Time'].diff()

#create a boolean column to check if 'sub_time' equals 'LapTime'
ver_df['bool_check'] = ver_df['sub_time'] == ver_df['LapTime']
#create a boolean column to check if 'LapNumber' equals 'LapNumber' of previous row
ver_df['bool_previous_lap'] = ver_df['LapTime'] == ver_df['LapTime'].shift(1)


#if "bool_check" False and "bool_previous_lap" True, then set "LapTime" to None
ver_df['LapTime'] = ver_df['LapTime'].mask((ver_df['bool_check'] == False) & (ver_df['bool_previous_lap'] == True), None)

#remove the columns that were used to remove the duplicates
ver_df.drop(['sub_time', 'bool_check', 'bool_previous_lap'], axis=1, inplace=True)

I tried to do something like this, but obviously you can correct me if this could lead to ignore "useful" laps. This piece of code only adds some kind of temporary column to check two conditions:

  1. The first condition checks if the difference with the previous lap "Time" equals with the current "LapTime". For a normal lap, it should always return True;
  2. The second condition checks if the current "LapTime" equals the previous "LapTime". This should give more strenght to the first condition, checking if it's also a possible duplicate.

So, if the first condition if False and the second condition is True, we can set None to that lap time. Finally, temp columns are removed from the dataframe.

@theOehrly
Copy link
Owner Author

@d-tomasino this seems to work, although I'm not entirely happy with a solution like this because it just assumes that any lap time that matches these criteria is incorrect. F1 drivers surprisingly often set two successive laps with exactly the same time (can happen multiple times per race actually).

So this would need some more extensive testing on multiple session where it is manually verified whether the removed laps were correctly detected.

Additionally, your first check is in theory already implemented in the API parser. It should warn the user about "timing integrity errors", but apparently it is not triggered here. Before fixing this we should figure out why this warning is not shown because there has to be something else that's going on.

@d-tomasino
Copy link
Contributor

@theOehrly thanks for the reply! You're right, it's understandable that could happen not so rarely to have two straight laps with same exact time. However, in that case (as far as I understood) the difference in "Time" between the two adjacent rows should match the "LapTime" value, which is why, in the case of two consecutive real laps, the two conditions should report True and True instead of False and True as in this case (the red flag issue), but obviously I could be wrong, so please correct me if I said some inaccuracy.

In any case, as soon as I can, I could try to take a look first at the "timing integrity errors" warning that is not shown, so that we can try to solve everything step by step

@theOehrly
Copy link
Owner Author

theOehrly commented Jul 9, 2024

Noting that this may not just be limited to laps near red flags, see #612. Also remember to investigate potential relation with #473

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy Related to accuracy of timing or telemetry data bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants