SciRS2 adopts a modular structure, providing independent crates for each functional area of SciPy. This design offers several advantages:
- Flexible Dependency Management: Users can select only the features they need
- Parallel Development: Each module can be developed, tested, and released independently
- Clear Responsibility: Each module focuses on a specific functional area
- Code Organization: Logical separation of related code improves maintainability
- Prevention of Circular Dependencies: Clear dependency relationships between modules
SciRS2 adopts the following crate structure:
/
├── Cargo.toml (Workspace configuration)
├── scirs2-core/ (Core utilities and common functionality)
│ ├── Cargo.toml
│ └── src/
│ ├── constants.rs
│ ├── error.rs
│ ├── lib.rs
│ └── utils.rs
├── scirs2-linalg/ (Linear algebra module)
│ ├── Cargo.toml
│ └── src/
│ ├── basic.rs
│ ├── blas.rs
│ ├── decomposition.rs
│ ├── eigen.rs
│ ├── error.rs
│ ├── lapack.rs
│ ├── lib.rs
│ ├── norm.rs
│ ├── solve.rs
│ └── special.rs
├── scirs2-integrate/
├── scirs2-interpolate/
├── scirs2-optimize/
├── scirs2-fft/
├── scirs2-stats/
├── scirs2-special/
├── scirs2-signal/
├── scirs2-sparse/
├── scirs2-spatial/
└── scirs2/ (Main integration crate)
├── Cargo.toml
└── src/
└── lib.rs (Re-exports from all other crates)
This modular structure is designed as follows:
- scirs2-core: Core utilities and common functionality used by other modules
- scirs2-linalg: Linear algebra functionality (BLAS/LAPACK wrappers, matrix operations, decompositions)
- scirs2-integrate: Numerical integration algorithms
- scirs2-interpolate: Interpolation functionality
- scirs2-optimize: Optimization algorithms
- scirs2-fft: Fast Fourier Transform
- scirs2-stats: Statistical functions
- scirs2-special: Special functions
- scirs2-signal: Signal processing
- scirs2-sparse: Sparse matrix operations
- scirs2-spatial: Spatial algorithms
- scirs2: Main integration crate that re-exports functionality from all other crates
The modular design helps avoid circular dependencies:
- scirs2-core: Contains shared utilities and has no dependencies on other project crates
- scirs2-{module}: Each module depends only on scirs2-core, not on other modules
- scirs2: The main crate that depends on and re-exports all other modules
To avoid code duplication and improve consistency across modules:
- Use scirs2-core validation functions: Always use validation utilities from
scirs2-core::validationfor parameter checking, shape validation, numerical bounds, etc. - Use scirs2-core error handling: Leverage the error system from
scirs2-core::errorand extend it with module-specific error types when necessary - Use scirs2-core numeric traits: Apply numeric traits from
scirs2-core::numericfor generic numerical operations - Use scirs2-core caching mechanisms: Employ
scirs2-core::cachefor optimizing performance-critical operations - Use scirs2-core configuration system: Utilize
scirs2-core::configfor module configuration - Use scirs2-core constants: Reference mathematical and physical constants from
scirs2-core::constants - Use scirs2-core parallel processing: Leverage
scirs2-core::parallelfor multi-threaded operations (when feature-enabled) - Use scirs2-core SIMD operations: Apply
scirs2-core::simdfor vectorized computations (when feature-enabled) - Use scirs2-core utilities: Employ common utility functions from
scirs2-core::utilsfor general operations
For optimal performance and code consistency:
-
SIMD Acceleration: When implementing numerical algorithms that process arrays or vectors:
- Always use
scirs2-core::simdoperations rather than implementing custom SIMD code - Enable the
simdfeature flag in module Cargo.toml files to access this functionality - Consider providing both scalar and SIMD implementations with feature flags
- Always use
-
Parallel Processing: When processing large datasets or performing computation-heavy tasks:
- Use
scirs2-core::parallelinstead of direct Rayon usage - Enable the
parallelfeature flag in module Cargo.toml files - Use the
par_prefixed functions for parallel equivalents of common operations
- Use
-
Caching: For computations with repeated inputs or expensive calculations:
- Use
scirs2-core::cache::TTLSizedCachefor data that should expire - Use
scirs2-core::cache::CacheBuilderto configure custom caches - Use the
#[cached]macro from scirs2-core for function-level memoization
- Use
-
Memory Efficiency: For operations on large datasets:
- Use
scirs2-core::parallel::chunk_wise_opfor processing data in manageable chunks - Use
scirs2-core::parallel::memory_efficient_cumsumand similar operations
- Use
This approach ensures all modules benefit from the same optimizations and performance characteristics.
The main scirs2 crate provides feature flags to selectively enable modules:
# Using only linear algebra and statistics
scirs2 = { version = "0.1.0", features = ["linalg", "stats"] }
# Using all features
scirs2 = { version = "0.1.0", features = ["all"] }And each module can depend on scirs2-core:
[dependencies]
scirs2-core = { version = "0.1.0" }Each sub-crate defines its own error types to avoid circular dependencies:
- scirs2-core: Defines base error traits and common error handling utilities
- scirs2-{module}: Each module defines its own error types specific to its domain
- Error Conversion: Each module provides conversions to/from core error types
- Result Types: All functions use the
Resulttype with appropriate error types - Detailed Messages: Error variants include detailed information for debugging
- API Compatibility: Maintain API similarity with SciPy where appropriate for Rust
- Rust Idioms: Leverage Rust's type system, ownership model, and performance features
- Type Safety: Use Rust's type system to prevent common numerical errors
- Performance: Prioritize computational efficiency without sacrificing safety
- Zero-cost Abstractions: Ensure high-level interfaces don't compromise performance
SciRS2 primarily uses ndarray for multi-dimensional array operations:
- N-dimensional arrays with static or dynamic dimensionality
- Broadcasting capabilities similar to NumPy
- Efficient indexing and slicing
- Integration with BLAS/LAPACK via optional features
Additionally, it uses:
nalgebrafor specialized linear algebra operationsnum-complexfor complex number support- Custom data structures for specific algorithms
-
Generic Programming:
- Use traits to define algorithm requirements
- Support multiple numeric types (f32, f64, complex)
- Use trait bounds to enforce constraints
-
Function Design:
- Match SciPy's function signatures where appropriate
- Use builder patterns for complex configurations
- Provide sensible defaults that match SciPy behavior
-
Performance Optimizations:
- SIMD instructions where applicable
- Parallelization via Rayon
- Efficient memory usage patterns
- FFI bindings to established C/Fortran libraries for critical paths
- Basic matrix operations: determinants, inverses, etc.
- Matrix decompositions: LU, QR, SVD, Cholesky
- Eigenvalue/eigenvector computations
- Matrix norms and condition numbers
- Linear equation solvers
- Special matrix functions
- BLAS and LAPACK wrappers
- Numerical integration algorithms
- ODE solvers
- Quadrature methods
- 1D interpolation methods (linear, nearest, cubic)
- Spline interpolation (cubic splines, Akima splines)
- Multi-dimensional interpolation (regular grid and scattered data)
- Advanced interpolation methods:
- Radial Basis Function (RBF) interpolation
- Kriging interpolation with uncertainty quantification
- Barycentric interpolation
- Grid transformation and resampling utilities
- Tensor product interpolation for high-dimensional data
- Utility functions for error estimation, differentiation, and integration
- Local and global optimization algorithms
- Constrained and unconstrained optimization
- Linear and nonlinear programming
- Fast Fourier Transform algorithms
- Real and complex FFT
- Multi-dimensional FFT
- Descriptive statistics
- Probability distributions
- Statistical tests
- Random number generation
- Special mathematical functions
- Bessel functions
- Gamma and beta functions
- Orthogonal polynomials
- Filter design and application
- Signal analysis
- Window functions
- Various sparse matrix formats
- Sparse matrix operations
- Sparse linear solvers
- Distance computations
- Spatial transformations
- Spatial data structures
- Reverse-mode automatic differentiation
- Tensor-based computation with graph tracking
- Gradient computation and propagation
- Neural network operations:
- Activation functions
- Cross-entropy loss functions
- Convolution operations
- Pooling operations
- Optimizers for machine learning (SGD, Adam, Momentum SGD, AdaGrad)
- Higher-order derivatives
- BLAS acceleration for linear algebra operations
- Neural network building blocks (layers, activations, loss functions)
- Backpropagation infrastructure
- Model architecture implementations
- Stochastic gradient descent and variants
- Learning rate scheduling
- Regularization techniques
- Graph operations and algorithms
- Support for graph neural networks
- Centrality measures and community detection
- Data normalization and standardization
- Feature engineering utilities
- Dimensionality reduction
- Classification metrics (accuracy, F1, ROC)
- Regression metrics (MSE, MAE)
- Model evaluation utilities
- Tokenization utilities
- Embedding operations
- Text preprocessing
- Standard dataset loaders and interfaces
- Data splitting and validation utilities
- Batch processing
- Image processing operations
- Feature extraction
- Image transforms and augmentation
- Time series decomposition
- Forecasting algorithms
- Temporal feature extraction
- Vector quantization algorithms
- Hierarchical clustering
- Density-based clustering
- Filtering operations
- Morphological operations
- Image measurements and analysis
- Unit Tests: Test individual functions and algorithms
- Property Tests: Verify mathematical properties and invariants
- Numerical Tests: Compare against reference implementations
- Benchmark Tests: Monitor performance characteristics
- Integration Tests: Test cross-module functionality
- API Documentation: Comprehensive docs for all public APIs
- Tutorials: Step-by-step guides for common tasks
- Theory: Mathematical background for implemented algorithms
- Examples: Practical usage examples with code
- Performance Notes: Guidance on algorithm selection
- Phase 1: Implement core infrastructure and linear algebra (scirs2-linalg)
- Phase 2: Develop basic statistical and optimization functionality
- Phase 3: Add integration, interpolation, and FFT modules
- Phase 4: Implement remaining modules based on priority
- Ongoing: Continuous integration, testing, and performance optimization