-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
date_time_between does not generate same values on multiple runs with time passed between runs #2149
Comments
Without constant time input values, this is not really surprising. Your time input value is non-constant as relative to the current time. |
@stefan6419846 But I do not enter a time. Am I supposed to in order to accomplish what I am trying to do? |
If you have a look at the method signature faker/faker/providers/date_time/__init__.py Lines 2033 to 2038 in bbcab85
|
That is absurd. Clearly my examples show the dates are reproducible for the
same seed and only the time drifts from one invocation to the next. If both
a start and end datetime were required as you infer, then even the dates
should drift. See how the date parts on lines 1&6, 2&7, 3&8, etc in the
actual output match but the time vales drift
…On Tue, Jan 14, 2025, 12:29 AM Stefan ***@***.***> wrote:
If you have a look at the method signature
https://github.com/joke2k/faker/blob/bbcab85add3f6bf52ae1e1862f5350622e425c51/faker/providers/date_time/__init__.py#L2033-L2038
then you will see that without specifying a fixed start and end date, this
will always depend on the current time. To get reproducible result, you
will have to pin both of them besides setting an explicit seed.
—
Reply to this email directly, view it on GitHub
<#2149 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARGRG67YVUIUENHQOSPFFT2KSVFRAVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBZGEZTIMBQGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Invoking As suggested by @stefan6419846 , you need to pin the from datetime import datetime
from faker import Faker
fake = Faker()
seed_start = int(2595)
Faker.seed(seed_start)
end_date = datetime.now() # Freeze the upper bound
for i in range(10):
print(fake.date_time_between('-3y'), end_date)
if i == 4: # reset the seed to clear previous values for the second 5 passes
Faker.seed(seed_start) |
Ok. I think you also missed my issue. Pinning the end_date falsely gives
you the impression that the date is locked in and reproducible. Let me show
you why it is not:
I used your code exactly as it is above and ran it. I got the following
result:
![image](https://github.com/user-attachments/assets/dc40904a-a746-4d65-af53-71364c92f038)
In the time it took me to write this so far and copy, highlight, and paste
the data this is the result I got when I ran it again:
![image](https://github.com/user-attachments/assets/27bb6a90-46d0-4189-a54b-35b8d0c29d68)
You can see that the dates in all groupings of 5 in both the first run and
the second, are the same. However, you must also see that the times
(faked, not datetime.now() values) are different between the first run ant
the second run. Some the hour has changed, some the minutes, and yet
seconds for the rest. This IS the problem. The microseconds reproduce the
same fake value between the corresponding output row in all tests but the
hours, minutes and seconds do not stay the same between runs. Even with an
end_date pinning the value for time (which by the way is also changing as
time does not stop) so it is really not pinned. And if you run the code,
with a fixed end_date, it will still not keep the hours, minutes and
seconds locked for each run of Faker.date_time_between..
To get around this Faker shortcoming, I have created the following two
functions in a utilities.py script as a fix. I pass my Faker instance to
it so that they have the same instance as the calling script:
def generate_date_time_between(fake: Faker, offset: str = '-3y'):
dt = fake.date_between(offset)
tm = generate_fake_time(fake)
return datetime.combine(dt, tm)
def generate_fake_time(fake: Faker):
hour = fake.random_int(min=0, max=23)
min = fake.random_int(min=0, max=59)
sec = fake.random_int(min=0, max=59)
microsec = fake.random_int(min=0, max=999999)
return time(hour, min, sec, microsec)
When run using the following code:
fake = Faker()
seed_start = int(2595)
Faker.seed(seed_start)
for i in range(10):
print(generate_date_time_between(fake, '-3y'))
if i == 4: # reset the seed to clear previous values for the
second 5 passes
Faker.seed(seed_start)
I get the first run:
2024-02-09 21:49:46.357320
2025-01-06 19:06:12.875047
2022-05-12 11:19:59.946491
2022-03-23 15:02:16.964874
2024-10-12 19:04:18.445665
2024-02-09 21:49:46.357320
2025-01-06 19:06:12.875047
2022-05-12 11:19:59.946491
2022-03-23 15:02:16.964874
2024-10-12 19:04:18.445665
and every run thereafter, regardless of how long I take between runs, I get
2024-02-09 21:49:46.357320
2025-01-06 19:06:12.875047
2022-05-12 11:19:59.946491
2022-03-23 15:02:16.964874
2024-10-12 19:04:18.445665
2024-02-09 21:49:46.357320
2025-01-06 19:06:12.875047
2022-05-12 11:19:59.946491
2022-03-23 15:02:16.964874
2024-10-12 19:04:18.445665
Again note that the first and second groups of 5 timestamps have the same
values in each corresponding row and that all ten between the first and
second runs have the same values.
No end_date required.
…On Tue, Jan 14, 2025 at 9:05 AM Flavio Curella ***@***.***> wrote:
Invoking date_time_between with a relative start_date will assume that
end_date is the time of invocation, leading to different results because
the parameters are different. Using a seed will not freeze time.
As suggested by @stefan6419846 <https://github.com/stefan6419846> , you
need to pin the end_date:
from datetime import datetime
from faker import Faker
fake = Faker()seed_start = int(2595)Faker.seed(seed_start)end_date = datetime.now() # Freeze the upper boundfor i in range(10):
print(fake.date_time_between('-3y'), end_date)
if i == 4: # reset the seed to clear previous values for the second 5 passes
Faker.seed(seed_start)
—
Reply to this email directly, view it on GitHub
<#2149 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARGRG6TGIMAEHPMHQQN5Y32KURSVAVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJQGE3DSNBVGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
posting just to keep issue alive |
I understand it might be inconvenient, but I believe Faker is behaving as intended. For your use case, I can see a few different options:
|
So, I have a start_date '-3y' and an end_date of datetime(2025, 1, 14, 23, 59, 59):
and, when I ran it the first time, I got:
Then, after some time, I ran it again, and I got:
The times do not stay the same between runs. Functioning as expected I guess. Should I have locked down the start_date? Maybe it is because I do not include the tzinfo? I should not expect that someone actually will look at the library and fix the issue. Why bother having an Issues tab on your GitHub repo? You only need to put that everything is functioning as expected and then you don't have to bother with comments like mine. Yes. You are correct. I will use the random_int time generation in my example. I do not understand what is so incomprehensible. The date portion is repeatable and reproducible for a given seed. The time portion is adrift on a flimsy raft in rough water. I thought that the whole idea of locking something in with a seed was to make sure you could reproduce the data time and again. Apparently, I am mistaken. I will stick to using only the parts of Faker that work as expected and leave the unreliable parts for others. |
Changing
You have received responses explaining why the current behavior makes sense from both the maintainer and someone who happens to stroll around here from time to time (and has contributed some smaller changes). If this is not what you want, you can always write your own generator code instead - the faker library is a FOSS project which offers you a wide range of different data generator methods, but nobody forces you to actually make use of it (or all of them). Just looking at the other issues and PRs (as well as on the releases), you will quickly see that indeed regular bugfixes and enhancements are being made. |
If the @Fashimpaur I didn't mean to sound dismissive of your issue. I understand you're frustrated, but please let's all make an effort to stay courteous. And let's remember we are all volunteers here, donating the little spare time we have to the project 🙂 |
I can confirm this is indeed a bug, and I can reproduce. The issue is that the string parser converts I'm working on a fix. |
Awesome, thanks!
…On Thu, Jan 16, 2025, 9:05 AM Flavio Curella ***@***.***> wrote:
I can confirm this is indeed a bug, and I can reproduce.
The issue is that the string parser converts -3y as "3 years from now"
<https://github.com/joke2k/faker/blob/master/faker/providers/date_time/__init__.py#L2003>
and it doesn't account for end_date.
I'm working on a fix.
—
Reply to this email directly, view it on GitHub
<#2149 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARGRG6YWQ6BOJ2LYQRIVF32K7DD3AVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJVHE3TANJUGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Fashimpaur can you check out the |
@fcurella ,
I ran it twice with some delay between runs. I get the following results:
2024-02-10 04:06:46.053243 2024-02-10 04:06:46.053243
2024-01-05 20:39:09.843103 2024-01-05 20:39:09.843103
2024-12-24 09:10:18.216101 2024-12-24 09:10:18.216101
2025-01-09 23:52:20.497594 2025-01-09 23:52:20.497594
2025-01-07 03:25:58.947237 2025-01-07 03:25:58.947237
2024-02-10 04:06:46.053243 2024-02-10 04:06:46.053243
2024-01-05 20:39:09.843103 2024-01-05 20:39:09.843103
2024-12-24 09:10:18.216101 2024-12-24 09:10:18.216101
2025-01-09 23:52:20.497594 2025-01-09 23:52:20.497594
2025-01-07 03:25:58.947237 2025-01-07 03:25:58.947237
Based on the test, this is fixed and exactly what I expected from Faker!
Awesome job and many thanks to you and the Faker team.
Dennis
…On Thu, Jan 16, 2025 at 11:14 AM Flavio Curella ***@***.***> wrote:
@Fashimpaur <https://github.com/Fashimpaur> can you check out the
patch/2149 branch and see confirm if it fixes your issue?
—
Reply to this email directly, view it on GitHub
<#2149 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARGRGYJ6XLQIDCBMBAOQWL2K7SHHAVCNFSM6AAAAABVDJQF36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWGI3TCNZYGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I apologize that I did not pay attention to the fact that it was you doing the change. I edited my comment. Can you ping here when the version bump is complete that includes this change? Thanks again! |
I've just released the fix in v34.0.0 |
Hi @Fashimpaur , I'll have to revert the fix for issue, it's causing too many backward compatibility issue. For your use, would it be acceptable to use absolute datetime objects rather than relative? Something like: fake = Faker()
seed_start = int(2595)
Faker.seed(seed_start)
for i in range(10):
print(fake.date_time_between(
start_date=datetime(2022, 1, 14, 23, 59, 59),
end_date=datetime(2025, 1, 14, 23, 59, 59),
))
if i == 4: # reset the seed to clear previous values for the second 5 passes
Faker.seed(seed_start) |
I can do that with minimal effort to refactor. I ran 3 tests:
There were varying delays between capturing the dates and they reliably returned the same values. Problem resolved. Your change did fix it for relative dates and I would have liked to see that too but I am guessing it is causing too many issues with approval to merge the PR. Thanks @fcurella. I appreciate your time investment in this. |
faker date_time_between does not reproduce the same date and time for a given seed.
Steps to reproduce
Run the code:
Output:
Run the code again... after the delay to post the first results:
Expected behavior
Expected that the values should be the same as the first time:
The text was updated successfully, but these errors were encountered: