Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance augmentation objects with references to a random state. #1874

Closed
Erotemic opened this issue Aug 12, 2024 · 2 comments · Fixed by #2031
Closed

Enhance augmentation objects with references to a random state. #1874

Erotemic opened this issue Aug 12, 2024 · 2 comments · Fixed by #2031
Labels
enhancement New feature or request

Comments

@Erotemic
Copy link
Contributor

Erotemic commented Aug 12, 2024

Suggested Improvement

Looking at the current code, to draw random samples augmentation objects are using the global random state in the random module. This is ideal for maximally random pipelines that are impacted by any other use of the global random state outside of albumentations itself.

This is not ideal for cases where the subcomponents of a system want their random generators to be seeded and not impacted by other components. For instance, right now there is no way for me to define a seeded augmentation pipeline that does not interfere with any other usage of the global random state.

I suggest adding a parameter to each augmentation class called: seed, random_state, or rng that defaults to None. When it is None, the it gets resolved to the global random state, which keeps the current behavior.

If it is an integer, then it would create a new random.Random object, and if rng is already a random.Random object, then it keeps it as-is, which allows augmentation pipelines to be independent of the global random state, but use an internally consistent random state.

Potential Benefits

  • Default behavior is unchanged
  • Makes it easy to test augmentation pipelines without modifying the global state
  • Makes it possible to set up a highly random, but consistent augmentation pipeline independent of any global random usage.

Additional Information

This is how the (now defuct) imgaug library handled randomness, where random states are explicitly passed and maintained.

I see there is a random_utils module which somewhat handles this, but only for numpy random states, but as documented in CONTRIBUTING, it is only to ensure that any numpy.random usage is impacting the global Python random state.

I've written a function that I widely use called ensure_rng that handles the resolution of an argument to a valid random state object. In fact, it can also convert between the stdlib random.Random and np.random.RandomState objects. This might be useful here, although it doesn't exactly handle what is done in random_utils.get_random_state, but it is compatible with it.

I also see that ReplayCompose is a good solution to the problem of creating reproducible pipelines, but I believe maintaining a random state in each augmentation instance is complementary, especially in the realm of testing.

@ternaus
Copy link
Collaborator

ternaus commented Aug 12, 2024

Thanks. Makes sense. Let me think about it.

@ternaus ternaus added the enhancement New feature or request label Aug 12, 2024
@ternaus
Copy link
Collaborator

ternaus commented Oct 26, 2024

@Erotemic

You may define random state for numpy random per transform and in Compose as:

aug.set_random_state(0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants