Enhance augmentation objects with references to a random state. #1874

Erotemic · 2024-08-12T22:00:47Z

Suggested Improvement

Looking at the current code, to draw random samples augmentation objects are using the global random state in the random module. This is ideal for maximally random pipelines that are impacted by any other use of the global random state outside of albumentations itself.

This is not ideal for cases where the subcomponents of a system want their random generators to be seeded and not impacted by other components. For instance, right now there is no way for me to define a seeded augmentation pipeline that does not interfere with any other usage of the global random state.

I suggest adding a parameter to each augmentation class called: seed, random_state, or rng that defaults to None. When it is None, the it gets resolved to the global random state, which keeps the current behavior.

If it is an integer, then it would create a new random.Random object, and if rng is already a random.Random object, then it keeps it as-is, which allows augmentation pipelines to be independent of the global random state, but use an internally consistent random state.

Potential Benefits

Default behavior is unchanged
Makes it easy to test augmentation pipelines without modifying the global state
Makes it possible to set up a highly random, but consistent augmentation pipeline independent of any global random usage.

Additional Information

This is how the (now defuct) imgaug library handled randomness, where random states are explicitly passed and maintained.

I see there is a random_utils module which somewhat handles this, but only for numpy random states, but as documented in CONTRIBUTING, it is only to ensure that any numpy.random usage is impacting the global Python random state.

I've written a function that I widely use called ensure_rng that handles the resolution of an argument to a valid random state object. In fact, it can also convert between the stdlib random.Random and np.random.RandomState objects. This might be useful here, although it doesn't exactly handle what is done in random_utils.get_random_state, but it is compatible with it.

I also see that ReplayCompose is a good solution to the problem of creating reproducible pipelines, but I believe maintaining a random state in each augmentation instance is complementary, especially in the realm of testing.

The text was updated successfully, but these errors were encountered:

ternaus · 2024-08-12T23:00:13Z

Thanks. Makes sense. Let me think about it.

ternaus · 2024-10-26T00:29:43Z

@Erotemic

You may define random state for numpy random per transform and in Compose as:

aug.set_random_state(0)

ternaus added the enhancement New feature or request label Aug 12, 2024

ternaus mentioned this issue Oct 25, 2024

Clean random generator #2031

Merged

ternaus closed this as completed in #2031 Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance augmentation objects with references to a random state. #1874

Enhance augmentation objects with references to a random state. #1874

Erotemic commented Aug 12, 2024 •

edited

Loading

ternaus commented Aug 12, 2024

ternaus commented Oct 26, 2024

Enhance augmentation objects with references to a random state. #1874

Enhance augmentation objects with references to a random state. #1874

Comments

Erotemic commented Aug 12, 2024 • edited Loading

Suggested Improvement

Potential Benefits

Additional Information

ternaus commented Aug 12, 2024

ternaus commented Oct 26, 2024

Erotemic commented Aug 12, 2024 •

edited

Loading