Skip to content

Optimization of Hyperparameters and Evaluation Metrics in config.h #79

@Eamon2009

Description

@Eamon2009

The current hyperparameter configuration in config.h exhibits sub-optimal training throughput and statistical variance during the evaluation phase. Specifically, the evaluation iteration count, evaluation interval frequency, and dropout regularisation parameters present opportunities for tuning to improve convergence stability and reduce computational overhead in the native C++ training loop.

Over-Regularisation (DROPOUT = 0.2f)For a compact, character-level architecture ($N_{\text{embd}} = 128$, $N_{\text{layer}} = 4$, $N_{\text{head}} = 4$) containing fewer than 1 million parameters, a 20% dropout rate is excessively aggressive. This high constraint risks underfitting the underlying structural pattern of the training corpus, delaying optimal cross-entropy minimization.

static const int BATCH_SIZE = 16;       // Increased from 4 to stabilize gradients and utilize vectorization
static const int BLOCK_SIZE = 64;       // Context length
static const int MAX_ITERS = 5000;      // Reduced from 10000 due to larger batch size tokens-per-iteration
static const int EVAL_INTERVAL = 250;   // Increased from 20 to decrease context-switching overhead

// Learning Rate Schedule
static const float LEARNING_RATE = 5e-4f; // Adjusted marginally upward to scale with higher batch size

// Statistical Stability
static const int EVAL_ITERS = 100;      // Increased from 1 to yield an accurate, low-variance mean loss

// Architectural Regularisation
static const int N_EMBD = 128;
static const int N_HEAD = 4;
static const int N_LAYER = 4;
static const float DROPOUT = 0.05f;     // Reduced from 0.2f to accelerate early-stage convergence

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions