Grok Mini V3 is an enhanced runtime built on the Predictive Autograd Engine (PAE). It provides a lightweight, NumPy-only implementation of neural network training and inference with safety-first autonomous capabilities.
- predictive_autograd_engine.py: Production-grade NumPy autograd with broadcasting-safe gradients, stable numerics, and checkpoint management
- grok_mini_v3.py: MLP implementation using PAE Tensors with flexible architecture
- runtime_v3.py: Training runtime with SGD and Adam optimizers
- self_agents.py: Agent introspection and audit scaffolding for autonomous behaviors
- Action approval gates
- Full audit trails
- Environment-based security controls
Optimized inference pipeline with:
- Sharded checkpoint loading
- Layer swapping for large models
- Optional quantization (8-bit/4-bit via bitsandbytes)
- Streaming generation
Automated training pipeline with:
- Text dataset handling
- Mixed precision training
- Checkpointing and resumption
- Validation and metrics
from grok_mini_v3 import GrokMiniV3
from runtime_v3 import demo_train
import numpy as np
# Create model
model = GrokMiniV3(input_dim=8, hidden_dims=[16, 8], output_dim=1, seed=42)
# Train on synthetic data
model = demo_train(model, epochs=20, batch_size=32, lr=0.01)
# Make predictions
X_test = np.random.randn(5, 8)
predictions = model.predict_numpy(X_test)
print(predictions)from predictive_autograd_engine import save_state_dict, load_state_dict
# Save model
state = model.state_dict()
save_state_dict(state, "model_checkpoint.npz", metadata={"epoch": 20})
# Load model
loaded_state = load_state_dict("model_checkpoint.npz")
model.load_state_dict(loaded_state)from supergrok.inference import load_model, predict
import numpy as np
# Load model from checkpoint
model = load_model("model_checkpoint.npz", input_dim=8, hidden_dims=[16, 8])
# Predict
X = np.random.randn(10, 8)
predictions = predict(model, X)Input (input_dim)
↓
Linear + ReLU (hidden_dims[0])
↓
Linear + ReLU (hidden_dims[1])
↓
...
↓
Linear (output_dim)
↓
Output
Features:
- Xavier weight initialization
- Configurable hidden layer dimensions
- ReLU activations (customizable)
- Efficient NumPy operations
Optimizers:
- SGD: Stochastic Gradient Descent with momentum
- Adam: Adaptive moment estimation with weight decay
Features:
- Mini-batch training
- Automatic checkpoint saving
- Loss tracking
- Synthetic dataset generation
All autonomous behaviors are gated by environment flags:
# Enable autonomous actions
export ALLOW_SELF_ACTIONS=true
# Enable network access
export ALLOW_NETWORK=true
# DANGEROUS: Auto-approve all actions
export AUTO_APPROVE=true # Use with extreme caution!Default behavior: All actions require manual operator approval.
from self_agents import get_auditor
auditor = get_auditor()
auditor.print_audit_summary()
# Get full audit log
actions = auditor.get_audit_log()
for action in actions:
print(f"{action.timestamp}: {action.action_type} - {action.description}")For large models that don't fit in memory:
from supergrok.loader import create_shards_from_checkpoint, ShardedModelLoader
# Create shards
create_shards_from_checkpoint("large_model.pt", "shards/", num_shards=8)
# Load with lazy loading
loader = ShardedModelLoader("shards/", max_loaded=2)
param = loader.get_param("layer0.W")Keep only a subset of layers on GPU:
from supergrok.loader import LayerSwapManager
# Assuming you have a PyTorch model
manager = LayerSwapManager(model, device="cuda", keep_on_gpu=4)
manager.register_hooks()
# Model will automatically swap layers during forward passReduce model size with 8-bit quantization:
from supergrok.quantize import quantize_checkpoint_simple
# Quantize checkpoint for storage
quantize_checkpoint_simple(
"model.pt",
"model_quantized.pt",
bits=8
)Train GrokMiniV2 models (from main repo) with automated pipeline:
python -m autotrainer.runtime \
--train-file data/train.txt \
--valid-file data/valid.txt \
--batch-size 8 \
--max-epochs 10 \
--lr 1e-4 \
--mixed-precision- Use appropriate batch sizes: Larger batches = faster training, more memory
- Adjust learning rate: Start with 0.01 for SGD, 0.001 for Adam
- Enable momentum: Helps escape local minima
- Save checkpoints regularly: Every few epochs
- Monitor loss curves: Should decrease steadily
Run self-tests:
# PAE tests
python predictive_autograd_engine.py
# Agent system demo
python self_agents.pyCore (required):
- NumPy >= 1.21
Optional:
- PyTorch (for SuperGrok, AutoTrainer)
- Transformers (for tokenizers)
- bitsandbytes (for quantization)
- tqdm (for progress bars)
- CPU-only for PAE (NumPy backend)
- Small to medium models (< 1B parameters)
- No distributed training
- Limited to supervised learning
For production workloads with large models, consider:
- PyTorch with CUDA
- DeepSpeed or FSDP
- Dedicated serving infrastructure
See repository license.
Contributions welcome! Areas for improvement:
- Additional activation functions
- Convolutional layers
- Recurrent architectures
- Advanced optimizers (AdamW, LAMB)
- Distributed training support
Built with ❤️ by Massive Magnetics