-
Notifications
You must be signed in to change notification settings - Fork 1.9k
🧩 Add pluggable SIMD-accelerated random generator (Mersenne Twister) and RNG abstraction for deterministic ML pipelines #7525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
asp2286
wants to merge
9
commits into
dotnet:main
Choose a base branch
from
asp2286:MersenneTwister
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…st/MLContext - Introduce internal IRandomSource and IRandomBulkSource - Add adapters/shims: RandomSourceAdapter, RandomFromRandomSource, RandomShim - Implement SIMD-backed MersenneTwisterRandomSource (MT19937) and core MersenneTwister - Wire IRandomSource through HostEnvironmentBase, ConsoleEnvironment, LocalEnvironment, MLContext - Add tests for determinism and mixed-call consumption
…0.0 pack vs 4.0.0 baseline
/azp list |
Commenter does not have sufficient privileges for PR 7525 in repo dotnet/machinelearning |
…FromRandomSource, and ResourceManagerUtils logic; fix Windows native cmake script; add APICompat suppression
/azp run |
Commenter does not have sufficient privileges for PR 7525 in repo dotnet/machinelearning |
/azp run MachineLearning-CI |
Commenter does not have sufficient privileges for PR 7525 in repo dotnet/machinelearning |
…te cross-test DLL/PDB contention under coverlet
…h, handle abandoned, correct ownership release) to deflake concurrent model downloads in CI
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📖 Overview
This PR introduces a new pluggable random number generation (RNG) infrastructure into ML.NET, replacing the previous
System.Random
dependency with a deterministic, high-performance SIMD-accelerated Mersenne Twister (MT19937) implementation.The goal is to provide a faster, deterministic, and cross-platform reproducible RNG foundation for all stochastic algorithms (e.g., Random Forest, KMeans++, Isolation Forest, etc.) while maintaining full backward compatibility.
🚀 Key Changes
New interfaces:
IRandomSource
— unified, injectable RNG abstractionIRandomBulkSource
— efficient vectorized bulk fill APINew RNG backend:
MersenneTwister
— pure C# MT19937 implementationMersenneTwisterRandomSource
— SIMD-optimized version usingSystem.Runtime.Intrinsics.X86.Avx2
System.Runtime.Intrinsics.Arm.AdvSimd
with automatic scalar fallback
Integration:
HostEnvironmentBase
,ConsoleEnvironment
,LocalEnvironment
, andMLContext
RandomSource
property available on allIHost
andMLContext
instancesRand
property retained and wired through adaptersAdapters for compatibility:
RandomSourceAdapter
RandomFromRandomSource
RandomShim
Testing and validation:
Rand
+RandomSource
)⚡ Performance Impact
The new RNG is up to 5× faster in real workloads.
It eliminates per-call allocations and leverages vectorized bit-generation via SIMD instructions.
📊 Benchmark results (Isolation Forest prototype)
All benchmarks use identical seeds and datasets.
Deterministic equivalence confirmed across runs and architectures.
🔬 Determinism & Reproducibility
IHost
andMLContext
obtains an independent deterministic streamIHost.Rand
remains functional and maps to new RNG internally🧠 Motivation
This refactor lays the foundation for future high-performance stochastic algorithms in ML.NET.
Reliable, cross-platform determinism and reproducible random streams are critical for modern ML workloads, testing, and research reproducibility.
It also unlocks future optimizations for:
🔜 Next Steps
In the next PR, I will introduce a native Isolation Forest implementation built entirely in C# using this RNG backend.
Preliminary testing shows the Isolation Forest algorithm using
MersenneTwisterRandomSource
performs ~5× faster than scikit-learn’s Python version while producing numerically consistent anomaly scores.This follow-up contribution will:
IsolationForestTrainer
to ML.NET✅ Checklist
IRandomSource
,IRandomBulkSource
)MLContext
Rand
,RandomShim
, etc.)🧾 References
🧩 Example usage