Optimization of Hyperparameters and Evaluation Metrics in config.h

The current hyperparameter configuration in config.h exhibits sub-optimal training throughput and statistical variance during the evaluation phase. Specifically, the evaluation iteration count, evaluation interval frequency, and dropout regularisation parameters present opportunities for tuning to improve convergence stability and reduce computational overhead in the native C++ training loop.

Over-Regularisation (DROPOUT = 0.2f)For a compact, character-level architecture ($N_{\text{embd}} = 128$, $N_{\text{layer}} = 4$, $N_{\text{head}} = 4$) containing fewer than 1 million parameters, a 20% dropout rate is excessively aggressive. This high constraint risks underfitting the underlying structural pattern of the training corpus, delaying optimal cross-entropy minimization.

```cpp
static const int BATCH_SIZE = 16;       // Increased from 4 to stabilize gradients and utilize vectorization
static const int BLOCK_SIZE = 64;       // Context length
static const int MAX_ITERS = 5000;      // Reduced from 10000 due to larger batch size tokens-per-iteration
static const int EVAL_INTERVAL = 250;   // Increased from 20 to decrease context-switching overhead

// Learning Rate Schedule
static const float LEARNING_RATE = 5e-4f; // Adjusted marginally upward to scale with higher batch size

// Statistical Stability
static const int EVAL_ITERS = 100;      // Increased from 1 to yield an accurate, low-variance mean loss

// Architectural Regularisation
static const int N_EMBD = 128;
static const int N_HEAD = 4;
static const int N_LAYER = 4;
static const float DROPOUT = 0.05f;     // Reduced from 0.2f to accelerate early-stage convergence

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of Hyperparameters and Evaluation Metrics in config.h #79

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimization of Hyperparameters and Evaluation Metrics in config.h #79

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions