diff --git a/improvement_plan.md b/improvement_plan.md
deleted file mode 100644
index 0912c37..0000000
--- a/improvement_plan.md
+++ /dev/null
@@ -1,131 +0,0 @@
-# Improvement Plan
-
-## 1. Parameter Optimization
-
-### Problem
-
-The scorer uses fixed global constants (decay exponents, stability coefficient, spacing
-weight, etc.). Finding optimal values requires running the benchmark and minimizing
-days-to-mastery across student profiles. Each evaluation is expensive (a full simulation),
-the function is not differentiable, and there are 5-10 parameters.
-
-### Algorithms
-
-#### Nelder-Mead
-
-A derivative-free optimization algorithm that maintains a simplex (N+1 vertices in
-N-dimensional space). At each step it reflects the worst vertex through the centroid of
-the remaining vertices, then expands, contracts, or shrinks based on the result.
-
-- Evaluations needed: 50-200 for 5 parameters, 150-500 for 10.
-- Strengths: simple to implement (~50 lines of logic), deterministic "next point" logic,
-  no gradients needed.
-- Weaknesses: can converge to local minima, degrades above ~15 parameters, no native
-  bound handling (use parameter transforms instead).
-
-#### CMA-ES (Covariance Matrix Adaptation Evolution Strategy)
-
-Maintains a multivariate normal distribution over parameter space. Each iteration samples
-a population, evaluates them, and updates the distribution's mean, covariance matrix, and
-step size based on the best candidates. The covariance matrix learns correlations between
-parameters.
-
-- Evaluations needed: 200-500 for 5 parameters, 500-2000 for 10.
-- Strengths: handles non-convex and multimodal landscapes, learns parameter correlations,
-  very robust.
-- Weaknesses: higher evaluation cost than Nelder-Mead for simple landscapes, moderately
-  complex to implement (covariance update, step-size control).
-
-#### Bayesian Optimization
-
-Maintains a Gaussian Process surrogate model fitted to all evaluations so far. An
-acquisition function (e.g. Expected Improvement) balances exploration and exploitation
-to pick the next point. Specifically designed for expensive evaluations.
-
-- Evaluations needed: 30-100 for 5 parameters, 80-250 for 10.
-- Strengths: most sample-efficient method, models uncertainty explicitly, ideal for
-  expensive objective functions.
-- Weaknesses: complex to implement (GP fitting, kernel hyperparameters, acquisition
-  function optimization), practically requires a library.
-
-#### Powell's Method
-
-Performs sequential 1D line searches along a set of directions, updating the direction
-set after each cycle to incorporate curvature information.
-
-- Evaluations needed: similar to Nelder-Mead.
-- Strengths: often faster convergence than Nelder-Mead for smooth functions.
-- Weaknesses: requires implementing a line search subroutine, slightly more complex.
-
-#### Differential Evolution
-
-Population-based: creates new candidates by combining difference vectors from random
-population members.
-
-- Evaluations needed: 500-2000 for 5 parameters, 2000-10000 for 10.
-- Strengths: simple to implement, robust for multimodal problems.
-- Weaknesses: too many evaluations for tight budgets.
-
-#### Random / Latin Hypercube Search
-
-Random sampling, optionally with stratified coverage (Latin Hypercube). No learning
-between evaluations.
-
-- Useful as an initialization phase for directed methods, not as a standalone approach
-  for 5+ parameters.
-
-### Agent-Driven Optimization
-
-These algorithms are well-suited for an AI agent to run autonomously because the agent
-can implement the logic, run the benchmark, observe results, and decide next steps without
-human intervention.
-
-#### Recommended approach: phased strategy
-
-**Phase 1 — Exploration (20-50 evaluations):** Latin Hypercube Sampling across parameter
-bounds to get broad coverage. Identifies promising regions and which parameters matter most.
-
-**Phase 2 — Directed optimization (remaining budget):** Nelder-Mead starting from the best
-point found in Phase 1. If budget permits (>300 total), run 2-3 Nelder-Mead instances from
-different starting points to mitigate local minima.
-
-**Phase 3 — Local refinement (last 10-20% of budget):** Small perturbation study around the
-best point to confirm it is a genuine minimum and assess parameter sensitivity.
-
-#### Bound handling
-
-Transform bounded parameters to unbounded space before optimizing:
-
-- Parameters in (0, inf): log transform.
-- Parameters in (0, 1): logit transform.
-- Parameters in (a, b): logit of (x - a) / (b - a).
-
-The agent optimizes in transformed space and maps back for evaluation. This is cleaner
-than clamping or penalty functions.
-
-### Evaluation Budget Summary
-
-| Approach              | Budget (5 params) | Budget (10 params) | Implementability |
-|-----------------------|--------------------|--------------------|------------------|
-| Nelder-Mead           | 50-200             | 150-500            | Trivial          |
-| Multi-start N-M       | 150-400            | 300-500+           | Trivial          |
-| Powell's method       | 50-200             | 150-500            | Moderate         |
-| CMA-ES                | 200-500            | 500-2000           | Moderate-Hard    |
-| Bayesian Optimization | 30-100             | 80-250             | Hard (library)   |
-| Differential Evolution| 500-2000           | 2000-10000         | Easy but costly  |
-| Random / LHS          | 500+ (poor)        | 1000+ (poor)       | Trivial          |
-
-### Parameters to Optimize
-
-Candidates from `exercise_scorer.rs`:
-
-- `DECLARATIVE_CURVE_DECAY` (-0.5)
-- `PROCEDURAL_CURVE_DECAY` (-0.3)
-- `STABILITY_COEFFICIENT` (2.1)
-- `DIFFICULTY_GRADE_ADJUSTMENT_SCALE` (0.6)
-- `DIFFICULTY_REVERSION_WEIGHT` (0.1)
-- `PERFORMANCE_WEIGHT_DECAY` (0.98)
-- `SPACING_EFFECT_WEIGHT` (0.7)
-
-The benchmark's `days_to_mastery` aggregated across all student profiles is the objective
-to minimize.
diff --git a/src/benchmark.rs b/src/benchmark.rs
index 74126c9..fe53579 100644
--- a/src/benchmark.rs
+++ b/src/benchmark.rs
@@ -136,7 +136,7 @@ impl Default for Benchmark {
                 exercises_per_session: 25,
                 initial_performance: [0.3, 0.2, 0.25, 0.15, 0.1],
                 trials_before_stable: 5,
-                stable_performance: [0.02, 0.08, 0.1, 0.3, 0.5],
+                stable_performance: [0.02, 0.05, 0.1, 0.33, 0.5],
                 lapse_rate: 0.07,
             },
             below_median_profile: StudentProfile {
diff --git a/src/exercise_scorer.rs b/src/exercise_scorer.rs
index 3b99642..b6b18de 100644
--- a/src/exercise_scorer.rs
+++ b/src/exercise_scorer.rs
@@ -32,34 +32,34 @@ pub trait ExerciseScorer {
 
 // Adjustable constants: these can be tuned to calibrate the scorer.
 
-/// The decay exponent used in the power-law forgetting curve for declarative exercises (e.g. memory
-/// recall). The value is taken from the FSRS-4.5 implementation.
-const DECLARATIVE_CURVE_DECAY: f32 = -0.5;
-
 /// The decay exponent used in the power-law forgetting curve for procedural exercises (e.g. playing
 /// a piece of music). The value is higher than for declarative exercises, reflecting the slower
 /// decay of procedural memory.
-const PROCEDURAL_CURVE_DECAY: f32 = -0.3;
+const PROCEDURAL_CURVE_DECAY: f32 = -0.2;
+
+/// The decay exponent used in the power-law forgetting curve for declarative exercises (e.g. memory
+/// recall).
+const DECLARATIVE_CURVE_DECAY: f32 = -0.4;
 
 /// A scaling coefficient applied to the stability update term for each review. The per-review
 /// multiplicative change is `1 + STABILITY_COEFFICIENT * P * E * spacing_gain`. The resulting
 /// stability is clamped to `MIN_STABILITY..MAX_STABILITY`.
-const STABILITY_COEFFICIENT: f32 = 2.1;
+const STABILITY_COEFFICIENT: f32 = 2.5;
 
 /// The per-trial difficulty adjustment scale. Good grades reduce difficulty, poor grades increase
 /// it.
-const DIFFICULTY_GRADE_ADJUSTMENT_SCALE: f32 = 0.6;
+const DIFFICULTY_GRADE_ADJUSTMENT_SCALE: f32 = 1.05;
 
 /// How much the dynamic difficulty is pulled back toward the base estimate after each review.
-const DIFFICULTY_REVERSION_WEIGHT: f32 = 0.1;
+const DIFFICULTY_REVERSION_WEIGHT: f32 = 0.16;
 
 /// The per-day decay factor for exponential weighting of performance. Latest score weight 1.0,
 /// scores one day old are multiplied by it, two days old by its square and so on.
-const PERFORMANCE_WEIGHT_DECAY: f32 = 0.98;
+const PERFORMANCE_WEIGHT_DECAY: f32 = 0.95;
 
 /// The weight of the interval-aware spacing effect during successful reviews. Larger values
 /// increase stability growth when pre-review retrievability is low.
-const SPACING_EFFECT_WEIGHT: f32 = 0.7;
+const SPACING_EFFECT_WEIGHT: f32 = 0.65;
 
 /// The minimum weighted score required to apply the old-good retrievability floor. This floor is
 /// applied to exercises with strong historical performance to prevent them from dropping too low
@@ -216,32 +216,46 @@ impl PowerLawScorer {
         difficulty.clamp(MIN_DIFFICULTY, MAX_DIFFICULTY)
     }
 
-    /// Computes the time-decayed weighted average performance from all entries.
+    /// Computes a blended weighted average performance from all entries.
     ///
-    /// Weights decay by elapsed days from the most recent entry so irregular practice cadence is
-    /// modeled more accurately.
+    /// Two averages are combined: a time-based average where weights decay by elapsed weeks, and a
+    /// position-based average where weights decay by ordinal position (most recent = 1, next =
+    /// decay, then decay squared, etc.). The two are blended 60/40 time/position.
     fn compute_weighted_avg<T: TimestampedValue>(entries: &[T]) -> f32 {
         if entries.is_empty() {
             return 0.0;
         }
 
-        // Start from the latest timestamp and compute the weights based on the number of days
-        // from it.
+        // Time-based average: weights decay by elapsed weeks from the most recent entry.
         let newest_timestamp = entries[0].timestamp();
-        let mut sum_weighted = 0.0;
-        let mut sum_weights = 0.0;
+        let mut time_sum_weighted = 0.0;
+        let mut time_sum_weights = 0.0;
         for entry in entries {
-            let elapsed_days = ((newest_timestamp.saturating_sub(entry.timestamp())) as f32
-                / SECONDS_PER_DAY)
+            let elapsed_weeks = ((newest_timestamp.saturating_sub(entry.timestamp())) as f32
+                / SECONDS_PER_DAY
+                / 7.0)
                 .max(0.0);
             let weight = PERFORMANCE_WEIGHT_DECAY
-                .powf(elapsed_days)
+                .powf(elapsed_weeks)
+                .max(PERFORMANCE_WEIGHT_MIN);
+            time_sum_weighted += weight * entry.value();
+            time_sum_weights += weight;
+        }
+        let time_avg = time_sum_weighted / time_sum_weights;
+
+        // Position-based average: weights decay by ordinal position regardless of timestamps.
+        let mut pos_sum_weighted = 0.0;
+        let mut pos_sum_weights = 0.0;
+        for (i, entry) in entries.iter().enumerate() {
+            let weight = PERFORMANCE_WEIGHT_DECAY
+                .powf(i as f32)
                 .max(PERFORMANCE_WEIGHT_MIN);
-            sum_weighted += weight * entry.value();
-            sum_weights += weight;
+            pos_sum_weighted += weight * entry.value();
+            pos_sum_weights += weight;
         }
+        let pos_avg = pos_sum_weighted / pos_sum_weights;
 
-        sum_weighted / sum_weights
+        0.8 * time_avg + 0.2 * pos_avg
     }
 
     /// Returns the forgetting-curve decay exponent for the given exercise type.
@@ -856,7 +870,7 @@ mod test {
             PowerLawScorer::compute_retrievability(&ExerciseType::Declarative, 100.0, stability);
         let very_old_procedural =
             PowerLawScorer::compute_retrievability(&ExerciseType::Procedural, 100.0, stability);
-        assert!(very_old_declarative < 0.25);
+        assert!(very_old_declarative < 0.26);
         assert!(very_old_declarative < very_old_procedural);
     }
 
@@ -950,7 +964,7 @@ mod test {
         let mean = PowerLawScorer::compute_weighted_avg(&single_trial);
         assert!((mean - 5.0).abs() < 1e-6);
 
-        // Multiple trials: [5.0, 4.0, 3.0] should be approx 4.03 at this decay rate.
+        // Multiple trials: [5.0, 4.0, 3.0] should be approx 4.017 at this decay rate.
         let multi_trials = vec![
             ExerciseTrial {
                 score: 5.0,
@@ -966,7 +980,7 @@ mod test {
             },
         ];
         let weighted = PowerLawScorer::compute_weighted_avg(&multi_trials);
-        assert!((weighted - 4.013).abs() < 0.001);
+        assert!((weighted - 4.017).abs() < 0.01);
 
         // Irregular spacing should down-weight distant failures more than dense spacing.
         let dense_low_tail = vec![