Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_time_between does not generate same values on multiple runs with time passed between runs #2149

Open
Fashimpaur opened this issue Jan 13, 2025 · 19 comments

Comments

@Fashimpaur
Copy link

  • Faker version: 33.3.8
  • OS: Sonoma 14.6.1

faker date_time_between does not reproduce the same date and time for a given seed.

Steps to reproduce

Run the code:

fake = Faker()
    seed_start = int(2595)
    Faker.seed(seed_start)
    for i in range(10):
        print(fake.date_time_between('-3y'))
        if i == 4:  # reset the seed  to clear previous values for the second 5 passes
            Faker.seed(seed_start)

Output:

2024-02-08 17:08:30.053243
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237
2024-02-08 17:08:30.053243    # the same values are repeated for the second 5 times
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237

Run the code again... after the delay to post the first results:

2024-02-08 17:12:24.053243
2024-01-04 09:44:47.843103
2024-12-22 22:15:56.216101
2025-01-08 12:57:58.497594
2025-01-05 16:31:36.947237
2024-02-08 17:12:24.053243
2024-01-04 09:44:47.843103
2024-12-22 22:15:56.216101
2025-01-08 12:57:58.497594
2025-01-05 16:31:36.947237

Expected behavior

Expected that the values should be the same as the first time:

2024-02-08 17:08:30.053243
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237
2024-02-08 17:08:30.053243
2024-01-04 09:40:53.843103
2024-12-22 22:12:02.216101
2025-01-08 12:54:04.497594
2025-01-05 16:27:42.947237
@stefan6419846
Copy link
Contributor

Without constant time input values, this is not really surprising. Your time input value is non-constant as relative to the current time.

@Fashimpaur
Copy link
Author

@stefan6419846 But I do not enter a time. Am I supposed to in order to accomplish what I am trying to do?

@stefan6419846
Copy link
Contributor

If you have a look at the method signature

def date_time_between(
self,
start_date: DateParseType = "-30y",
end_date: DateParseType = "now",
tzinfo: Optional[TzInfo] = None,
) -> datetime:
then you will see that without specifying a fixed start and end date, this will always depend on the current time. To get reproducible result, you will have to pin both of them besides setting an explicit seed.

@Fashimpaur
Copy link
Author

Fashimpaur commented Jan 14, 2025 via email

@fcurella
Copy link
Collaborator

Invoking date_time_between with a relative start_date will assume that end_date is the time of invocation, leading to different results because the parameters are different. Using a seed will not freeze time.

As suggested by @stefan6419846 , you need to pin the end_date:

from datetime import datetime

from faker import Faker

fake = Faker()
seed_start = int(2595)
Faker.seed(seed_start)
end_date = datetime.now()  # Freeze the upper bound
for i in range(10):
    print(fake.date_time_between('-3y'), end_date)
    if i == 4:  # reset the seed  to clear previous values for the second 5 passes
        Faker.seed(seed_start)

@Fashimpaur
Copy link
Author

Fashimpaur commented Jan 14, 2025 via email

@Fashimpaur
Copy link
Author

posting just to keep issue alive

@fcurella
Copy link
Collaborator

I understand it might be inconvenient, but I believe Faker is behaving as intended.

For your use case, I can see a few different options:

  • set end_date to a specific, hardcoded datetime (eg end_date=datetime(2025, 1, 15, 23, 59, 59))
  • use something like freezegun
  • generate time parts with random_int like in your example

@Fashimpaur
Copy link
Author

Fashimpaur commented Jan 16, 2025

So, I have a start_date '-3y' and an end_date of datetime(2025, 1, 14, 23, 59, 59):

    fake = Faker()
    seed_start = int(2595)
    Faker.seed(seed_start)
    for i in range(10):
        print(fake.date_time_between(
            start_date='-3y',
            end_date=datetime(2025, 1, 14, 23, 59, 59))
        )
        if i == 4:  # reset the seed  to clear previous values for the second 5 passes
            Faker.seed(seed_start)

and, when I ran it the first time, I got:

2024-02-10 09:38:53.907332
2024-01-06 02:45:48.356157
2024-12-24 09:31:25.905156
2025-01-09 23:57:14.013255
2025-01-07 03:33:39.685516
2024-02-10 09:38:53.907332
2024-01-06 02:45:48.356157
2024-12-24 09:31:25.905156
2025-01-09 23:57:14.013255
2025-01-07 03:33:39.685516

Then, after some time, I ran it again, and I got:

2024-02-10 09:41:16.572580
2024-01-06 02:48:25.845434
2024-12-24 09:31:34.980653
2025-01-09 23:57:16.114559
2025-01-07 03:33:42.983982
2024-02-10 09:41:16.572580
2024-01-06 02:48:25.845434
2024-12-24 09:31:34.980653
2025-01-09 23:57:16.114559
2025-01-07 03:33:42.983982

The times do not stay the same between runs. Functioning as expected I guess. Should I have locked down the start_date? Maybe it is because I do not include the tzinfo?

I should not expect that someone actually will look at the library and fix the issue. Why bother having an Issues tab on your GitHub repo? You only need to put that everything is functioning as expected and then you don't have to bother with comments like mine.

Yes. You are correct. I will use the random_int time generation in my example. I do not understand what is so incomprehensible. The date portion is repeatable and reproducible for a given seed. The time portion is adrift on a flimsy raft in rough water. I thought that the whole idea of locking something in with a seed was to make sure you could reproduce the data time and again. Apparently, I am mistaken. I will stick to using only the parts of Faker that work as expected and leave the unreliable parts for others.

@stefan6419846
Copy link
Contributor

Changing tzinfo should not have any effect as long as the machine running the code is not changed. You will have to lock the start date for reproducible builds, yes - as already mentioned, you are still using a value relative to the UNIX timestamp you are currently running on. This is just how it works as the seed does not pin the time - this is why you should use a fixed start date or some time mocking mechanism as proposed multiple times before.

I should not expect that someone actually will look at the library and fix the issue. Why bother having an Issues tab on your GitHub repo? You only need to put that everything is functioning as expected and then you don't have to bother with comments like mine.

You have received responses explaining why the current behavior makes sense from both the maintainer and someone who happens to stroll around here from time to time (and has contributed some smaller changes). If this is not what you want, you can always write your own generator code instead - the faker library is a FOSS project which offers you a wide range of different data generator methods, but nobody forces you to actually make use of it (or all of them). Just looking at the other issues and PRs (as well as on the releases), you will quickly see that indeed regular bugfixes and enhancements are being made.

@fcurella
Copy link
Collaborator

If the end_date date and the start_date are the same, then the range is consistent, and using the same seed should indeed give you the same result. I'll take a look at this when I'm back at my desk in a couple of hours.

@Fashimpaur I didn't mean to sound dismissive of your issue.

I understand you're frustrated, but please let's all make an effort to stay courteous. And let's remember we are all volunteers here, donating the little spare time we have to the project 🙂

@fcurella
Copy link
Collaborator

I can confirm this is indeed a bug, and I can reproduce.

The issue is that the string parser converts -3y as "3 years from now" and it doesn't account for end_date.

I'm working on a fix.

@Fashimpaur
Copy link
Author

Fashimpaur commented Jan 16, 2025 via email

@fcurella
Copy link
Collaborator

@Fashimpaur can you check out the patch/2149 branch and see confirm if it fixes your issue?

@Fashimpaur
Copy link
Author

Fashimpaur commented Jan 16, 2025 via email

@Fashimpaur
Copy link
Author

@fcurella,

I apologize that I did not pay attention to the fact that it was you doing the change. I edited my comment. Can you ping here when the version bump is complete that includes this change? Thanks again!

@fcurella
Copy link
Collaborator

I've just released the fix in v34.0.0

@fcurella
Copy link
Collaborator

fcurella commented Jan 23, 2025

Hi @Fashimpaur ,

I'll have to revert the fix for issue, it's causing too many backward compatibility issue.

For your use, would it be acceptable to use absolute datetime objects rather than relative? Something like:

    fake = Faker()
    seed_start = int(2595)
    Faker.seed(seed_start)
    for i in range(10):
        print(fake.date_time_between(
            start_date=datetime(2022, 1, 14, 23, 59, 59),
            end_date=datetime(2025, 1, 14, 23, 59, 59),
        ))
        if i == 4:  # reset the seed  to clear previous values for the second 5 passes
            Faker.seed(seed_start)

@fcurella fcurella reopened this Jan 23, 2025
@Fashimpaur
Copy link
Author

Fashimpaur commented Jan 23, 2025

I can do that with minimal effort to refactor. I ran 3 tests:

2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736
2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604
2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892
2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263
2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628
2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736        2024-02-10 02:01:43.101736
2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604        2024-01-05 18:21:07.276604
2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892        2024-12-24 09:02:20.923892
2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263        2025-01-09 23:50:29.987263
2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628        2025-01-07 03:23:05.476628

There were varying delays between capturing the dates and they reliably returned the same values.

Problem resolved. Your change did fix it for relative dates and I would have liked to see that too but I am guessing it is causing too many issues with approval to merge the PR.

Thanks @fcurella. I appreciate your time investment in this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants