date_time_between does not generate same values on multiple runs with time passed between runs #2149

Fashimpaur · 2025-01-13T19:11:15Z

Faker version: 33.3.8
OS: Sonoma 14.6.1

faker date_time_between does not reproduce the same date and time for a given seed.

Steps to reproduce

Run the code:

fake = Faker()
    seed_start = int(2595)
    Faker.seed(seed_start)
    for i in range(10):
        print(fake.date_time_between('-3y'))
        if i == 4:  # reset the seed  to clear previous values for the second 5 passes
            Faker.seed(seed_start)

Output:

2024-02-08 17:08:30.053243
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237
2024-02-08 17:08:30.053243    # the same values are repeated for the second 5 times
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237

Run the code again... after the delay to post the first results:

2024-02-08 17:12:24.053243
2024-01-04 09:44:47.843103
2024-12-22 22:15:56.216101
2025-01-08 12:57:58.497594
2025-01-05 16:31:36.947237
2024-02-08 17:12:24.053243
2024-01-04 09:44:47.843103
2024-12-22 22:15:56.216101
2025-01-08 12:57:58.497594
2025-01-05 16:31:36.947237

Expected behavior

Expected that the values should be the same as the first time:

2024-02-08 17:08:30.053243
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237
2024-02-08 17:08:30.053243
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237

The text was updated successfully, but these errors were encountered:

stefan6419846 · 2025-01-13T19:18:23Z

Without constant time input values, this is not really surprising. Your time input value is non-constant as relative to the current time.

Fashimpaur · 2025-01-13T20:11:58Z

@stefan6419846 But I do not enter a time. Am I supposed to in order to accomplish what I am trying to do?

stefan6419846 · 2025-01-14T06:29:22Z

If you have a look at the method signature

faker/faker/providers/date_time/__init__.py

Lines 2033 to 2038 in bbcab85

    
           def date_time_between( 
        
               self, 
        
               start_date: DateParseType = "-30y", 
        
               end_date: DateParseType = "now", 
        
               tzinfo: Optional[TzInfo] = None, 
        
           ) -> datetime:

then you will see that without specifying a fixed start and end date, this will always depend on the current time. To get reproducible result, you will have to pin both of them besides setting an explicit seed.

Fashimpaur · 2025-01-14T08:08:23Z

That is absurd. Clearly my examples show the dates are reproducible for the same seed and only the time drifts from one invocation to the next. If both a start and end datetime were required as you infer, then even the dates should drift. See how the date parts on lines 1&6, 2&7, 3&8, etc in the actual output match but the time vales drift

…

On Tue, Jan 14, 2025, 12:29 AM Stefan ***@***.***> wrote: If you have a look at the method signature https://github.com/joke2k/faker/blob/bbcab85add3f6bf52ae1e1862f5350622e425c51/faker/providers/date_time/__init__.py#L2033-L2038 then you will see that without specifying a fixed start and end date, this will always depend on the current time. To get reproducible result, you will have to pin both of them besides setting an explicit seed. — Reply to this email directly, view it on GitHub <#2149 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARGRG67YVUIUENHQOSPFFT2KSVFRAVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBZGEZTIMBQGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

fcurella · 2025-01-14T15:04:51Z

Invoking date_time_between with a relative start_date will assume that end_date is the time of invocation, leading to different results because the parameters are different. Using a seed will not freeze time.

As suggested by @stefan6419846 , you need to pin the end_date:

from datetime import datetime

from faker import Faker

fake = Faker()
seed_start = int(2595)
Faker.seed(seed_start)
end_date = datetime.now()  # Freeze the upper bound
for i in range(10):
    print(fake.date_time_between('-3y'), end_date)
    if i == 4:  # reset the seed  to clear previous values for the second 5 passes
        Faker.seed(seed_start)

Fashimpaur · 2025-01-14T18:22:00Z

Ok. I think you also missed my issue. Pinning the end_date falsely gives you the impression that the date is locked in and reproducible. Let me show you why it is not: I used your code exactly as it is above and ran it. I got the following result: ![image](https://github.com/user-attachments/assets/dc40904a-a746-4d65-af53-71364c92f038) In the time it took me to write this so far and copy, highlight, and paste the data this is the result I got when I ran it again: ![image](https://github.com/user-attachments/assets/27bb6a90-46d0-4189-a54b-35b8d0c29d68) You can see that the dates in all groupings of 5 in both the first run and the second, are the same. However, you must also see that the times (faked, not datetime.now() values) are different between the first run ant the second run. Some the hour has changed, some the minutes, and yet seconds for the rest. This IS the problem. The microseconds reproduce the same fake value between the corresponding output row in all tests but the hours, minutes and seconds do not stay the same between runs. Even with an end_date pinning the value for time (which by the way is also changing as time does not stop) so it is really not pinned. And if you run the code, with a fixed end_date, it will still not keep the hours, minutes and seconds locked for each run of Faker.date_time_between.. To get around this Faker shortcoming, I have created the following two functions in a utilities.py script as a fix. I pass my Faker instance to it so that they have the same instance as the calling script: def generate_date_time_between(fake: Faker, offset: str = '-3y'): dt = fake.date_between(offset) tm = generate_fake_time(fake) return datetime.combine(dt, tm) def generate_fake_time(fake: Faker): hour = fake.random_int(min=0, max=23) min = fake.random_int(min=0, max=59) sec = fake.random_int(min=0, max=59) microsec = fake.random_int(min=0, max=999999) return time(hour, min, sec, microsec) When run using the following code: fake = Faker() seed_start = int(2595) Faker.seed(seed_start) for i in range(10): print(generate_date_time_between(fake, '-3y')) if i == 4: # reset the seed to clear previous values for the second 5 passes Faker.seed(seed_start) I get the first run: 2024-02-09 21:49:46.357320 2025-01-06 19:06:12.875047 2022-05-12 11:19:59.946491 2022-03-23 15:02:16.964874 2024-10-12 19:04:18.445665 2024-02-09 21:49:46.357320 2025-01-06 19:06:12.875047 2022-05-12 11:19:59.946491 2022-03-23 15:02:16.964874 2024-10-12 19:04:18.445665 and every run thereafter, regardless of how long I take between runs, I get 2024-02-09 21:49:46.357320 2025-01-06 19:06:12.875047 2022-05-12 11:19:59.946491 2022-03-23 15:02:16.964874 2024-10-12 19:04:18.445665 2024-02-09 21:49:46.357320 2025-01-06 19:06:12.875047 2022-05-12 11:19:59.946491 2022-03-23 15:02:16.964874 2024-10-12 19:04:18.445665 Again note that the first and second groups of 5 timestamps have the same values in each corresponding row and that all ten between the first and second runs have the same values. No end_date required.

…

On Tue, Jan 14, 2025 at 9:05 AM Flavio Curella ***@***.***> wrote: Invoking date_time_between with a relative start_date will assume that end_date is the time of invocation, leading to different results because the parameters are different. Using a seed will not freeze time. As suggested by @stefan6419846 <https://github.com/stefan6419846> , you need to pin the end_date: from datetime import datetime from faker import Faker fake = Faker()seed_start = int(2595)Faker.seed(seed_start)end_date = datetime.now() # Freeze the upper boundfor i in range(10): print(fake.date_time_between('-3y'), end_date) if i == 4: # reset the seed to clear previous values for the second 5 passes Faker.seed(seed_start) — Reply to this email directly, view it on GitHub <#2149 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARGRG6TGIMAEHPMHQQN5Y32KURSVAVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJQGE3DSNBVGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Fashimpaur · 2025-01-15T17:25:08Z

posting just to keep issue alive

fcurella · 2025-01-15T17:32:15Z

I understand it might be inconvenient, but I believe Faker is behaving as intended.

For your use case, I can see a few different options:

set end_date to a specific, hardcoded datetime (eg end_date=datetime(2025, 1, 15, 23, 59, 59))
use something like freezegun
generate time parts with random_int like in your example

Fashimpaur · 2025-01-16T00:04:57Z

So, I have a start_date '-3y' and an end_date of datetime(2025, 1, 14, 23, 59, 59):

    fake = Faker()
    seed_start = int(2595)
    Faker.seed(seed_start)
    for i in range(10):
        print(fake.date_time_between(
            start_date='-3y',
            end_date=datetime(2025, 1, 14, 23, 59, 59))
        )
        if i == 4:  # reset the seed  to clear previous values for the second 5 passes
            Faker.seed(seed_start)

and, when I ran it the first time, I got:

2024-02-10 09:38:53.907332
2024-01-06 02:45:48.356157
2024-12-24 09:31:25.905156
2025-01-09 23:57:14.013255
2025-01-07 03:33:39.685516
2024-02-10 09:38:53.907332
2024-01-06 02:45:48.356157
2024-12-24 09:31:25.905156
2025-01-09 23:57:14.013255
2025-01-07 03:33:39.685516

Then, after some time, I ran it again, and I got:

2024-02-10 09:41:16.572580
2024-01-06 02:48:25.845434
2024-12-24 09:31:34.980653
2025-01-09 23:57:16.114559
2025-01-07 03:33:42.983982
2024-02-10 09:41:16.572580
2024-01-06 02:48:25.845434
2024-12-24 09:31:34.980653
2025-01-09 23:57:16.114559
2025-01-07 03:33:42.983982

The times do not stay the same between runs. Functioning as expected I guess. Should I have locked down the start_date? Maybe it is because I do not include the tzinfo?

I should not expect that someone actually will look at the library and fix the issue. Why bother having an Issues tab on your GitHub repo? You only need to put that everything is functioning as expected and then you don't have to bother with comments like mine.

Yes. You are correct. I will use the random_int time generation in my example. I do not understand what is so incomprehensible. The date portion is repeatable and reproducible for a given seed. The time portion is adrift on a flimsy raft in rough water. I thought that the whole idea of locking something in with a seed was to make sure you could reproduce the data time and again. Apparently, I am mistaken. I will stick to using only the parts of Faker that work as expected and leave the unreliable parts for others.

stefan6419846 · 2025-01-16T13:36:28Z

Changing tzinfo should not have any effect as long as the machine running the code is not changed. You will have to lock the start date for reproducible builds, yes - as already mentioned, you are still using a value relative to the UNIX timestamp you are currently running on. This is just how it works as the seed does not pin the time - this is why you should use a fixed start date or some time mocking mechanism as proposed multiple times before.

I should not expect that someone actually will look at the library and fix the issue. Why bother having an Issues tab on your GitHub repo? You only need to put that everything is functioning as expected and then you don't have to bother with comments like mine.

You have received responses explaining why the current behavior makes sense from both the maintainer and someone who happens to stroll around here from time to time (and has contributed some smaller changes). If this is not what you want, you can always write your own generator code instead - the faker library is a FOSS project which offers you a wide range of different data generator methods, but nobody forces you to actually make use of it (or all of them). Just looking at the other issues and PRs (as well as on the releases), you will quickly see that indeed regular bugfixes and enhancements are being made.

fcurella · 2025-01-16T14:11:07Z

If the end_date date and the start_date are the same, then the range is consistent, and using the same seed should indeed give you the same result. I'll take a look at this when I'm back at my desk in a couple of hours.

@Fashimpaur I didn't mean to sound dismissive of your issue.

I understand you're frustrated, but please let's all make an effort to stay courteous. And let's remember we are all volunteers here, donating the little spare time we have to the project 🙂

fcurella · 2025-01-16T15:05:08Z

I can confirm this is indeed a bug, and I can reproduce.

The issue is that the string parser converts -3y as "3 years from now" and it doesn't account for end_date.

I'm working on a fix.

Fashimpaur · 2025-01-16T15:58:21Z

Awesome, thanks!

…

On Thu, Jan 16, 2025, 9:05 AM Flavio Curella ***@***.***> wrote: I can confirm this is indeed a bug, and I can reproduce. The issue is that the string parser converts -3y as "3 years from now" <https://github.com/joke2k/faker/blob/master/faker/providers/date_time/__init__.py#L2003> and it doesn't account for end_date. I'm working on a fix. — Reply to this email directly, view it on GitHub <#2149 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARGRG6YWQ6BOJ2LYQRIVF32K7DD3AVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJVHE3TANJUGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

…e_between`

fcurella · 2025-01-16T17:10:23Z

@Fashimpaur can you check out the patch/2149 branch and see confirm if it fixes your issue?

Fashimpaur · 2025-01-16T21:29:51Z

@fcurella , I ran it twice with some delay between runs. I get the following results: 2024-02-10 04:06:46.053243 2024-02-10 04:06:46.053243 2024-01-05 20:39:09.843103 2024-01-05 20:39:09.843103 2024-12-24 09:10:18.216101 2024-12-24 09:10:18.216101 2025-01-09 23:52:20.497594 2025-01-09 23:52:20.497594 2025-01-07 03:25:58.947237 2025-01-07 03:25:58.947237 2024-02-10 04:06:46.053243 2024-02-10 04:06:46.053243 2024-01-05 20:39:09.843103 2024-01-05 20:39:09.843103 2024-12-24 09:10:18.216101 2024-12-24 09:10:18.216101 2025-01-09 23:52:20.497594 2025-01-09 23:52:20.497594 2025-01-07 03:25:58.947237 2025-01-07 03:25:58.947237 Based on the test, this is fixed and exactly what I expected from Faker! Awesome job and many thanks to you and the Faker team. Dennis

…

On Thu, Jan 16, 2025 at 11:14 AM Flavio Curella ***@***.***> wrote: @Fashimpaur <https://github.com/Fashimpaur> can you check out the patch/2149 branch and see confirm if it fixes your issue? — Reply to this email directly, view it on GitHub <#2149 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARGRGYJ6XLQIDCBMBAOQWL2K7SHHAVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWGI3TCNZYGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Fashimpaur · 2025-01-16T21:40:33Z

@fcurella,

I apologize that I did not pay attention to the fact that it was you doing the change. I edited my comment. Can you ping here when the version bump is complete that includes this change? Thanks again!

fcurella · 2025-01-22T17:00:41Z

I've just released the fix in v34.0.0

fcurella · 2025-01-23T15:20:07Z

Hi @Fashimpaur ,

I'll have to revert the fix for issue, it's causing too many backward compatibility issue.

For your use, would it be acceptable to use absolute datetime objects rather than relative? Something like:

    fake = Faker()
    seed_start = int(2595)
    Faker.seed(seed_start)
    for i in range(10):
        print(fake.date_time_between(
            start_date=datetime(2022, 1, 14, 23, 59, 59),
            end_date=datetime(2025, 1, 14, 23, 59, 59),
        ))
        if i == 4:  # reset the seed  to clear previous values for the second 5 passes
            Faker.seed(seed_start)

Fashimpaur · 2025-01-23T19:04:31Z

I can do that with minimal effort to refactor. I ran 3 tests:

2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736
2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604
2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892
2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263
2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628
2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736
2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604
2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892
2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263
2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628

There were varying delays between capturing the dates and they reliably returned the same values.

Problem resolved. Your change did fix it for relative dates and I would have liked to see that too but I am guessing it is causing too many issues with approval to merge the PR.

Thanks @fcurella. I appreciate your time investment in this.

fcurella added a commit that referenced this issue Jan 16, 2025

Fix #2149. Account for end_date when calculating relative `date_tim…

f222129

…e_between`

fcurella added a commit that referenced this issue Jan 16, 2025

Fix #2149. Account for end_date when calculating relative `date_tim…

e61d064

…e_between`

fcurella closed this as completed in a0e656c Jan 17, 2025

fcurella reopened this Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

date_time_between does not generate same values on multiple runs with time passed between runs #2149

date_time_between does not generate same values on multiple runs with time passed between runs #2149

Fashimpaur commented Jan 13, 2025

stefan6419846 commented Jan 13, 2025

Fashimpaur commented Jan 13, 2025

stefan6419846 commented Jan 14, 2025

Fashimpaur commented Jan 14, 2025 via email •

edited

Loading

fcurella commented Jan 14, 2025

Fashimpaur commented Jan 14, 2025 via email •

edited

Loading

Fashimpaur commented Jan 15, 2025

fcurella commented Jan 15, 2025

Fashimpaur commented Jan 16, 2025 •

edited by fcurella

Loading

stefan6419846 commented Jan 16, 2025

fcurella commented Jan 16, 2025

fcurella commented Jan 16, 2025

Fashimpaur commented Jan 16, 2025 via email

fcurella commented Jan 16, 2025

Fashimpaur commented Jan 16, 2025 via email •

edited

Loading

Fashimpaur commented Jan 16, 2025

fcurella commented Jan 22, 2025

fcurella commented Jan 23, 2025 •

edited

Loading

Fashimpaur commented Jan 23, 2025 •

edited

Loading

date_time_between does not generate same values on multiple runs with time passed between runs #2149

date_time_between does not generate same values on multiple runs with time passed between runs #2149

Comments

Fashimpaur commented Jan 13, 2025

Steps to reproduce

Expected behavior

stefan6419846 commented Jan 13, 2025

Fashimpaur commented Jan 13, 2025

stefan6419846 commented Jan 14, 2025

Fashimpaur commented Jan 14, 2025 via email • edited Loading

fcurella commented Jan 14, 2025

Fashimpaur commented Jan 14, 2025 via email • edited Loading

Fashimpaur commented Jan 15, 2025

fcurella commented Jan 15, 2025

Fashimpaur commented Jan 16, 2025 • edited by fcurella Loading

stefan6419846 commented Jan 16, 2025

fcurella commented Jan 16, 2025

fcurella commented Jan 16, 2025

Fashimpaur commented Jan 16, 2025 via email

fcurella commented Jan 16, 2025

Fashimpaur commented Jan 16, 2025 via email • edited Loading

Fashimpaur commented Jan 16, 2025

fcurella commented Jan 22, 2025

fcurella commented Jan 23, 2025 • edited Loading

Fashimpaur commented Jan 23, 2025 • edited Loading

Fashimpaur commented Jan 14, 2025 via email •

edited

Loading

Fashimpaur commented Jan 14, 2025 via email •

edited

Loading

Fashimpaur commented Jan 16, 2025 •

edited by fcurella

Loading

Fashimpaur commented Jan 16, 2025 via email •

edited

Loading

fcurella commented Jan 23, 2025 •

edited

Loading

Fashimpaur commented Jan 23, 2025 •

edited

Loading