This folder hosts the evaluation harness used to produce the BiasBios, CounterFact, and PronChange results reported in Spectral Attention Steering for Prompt Highlighting (ICLR 2026). Follow the steps below to recreate the paper numbers end-to-end.
Download the preprocessed benchmark bundle from our Hugging Face release (https://huggingface.co/datasets/waylonli/SEKA-datasets). The archive contains the ready-to-use files for BiasBios, CounterFact, PronChange, Lost-in-the-middle, and other datasets for expert projection generation. Simply unpack it into the repo’s data/ directory so the paths resolve as written.
You can either regenerate the projection banks or download the pre-built archives released with the paper:
- Generate locally using the builders in
src/custom_builders/. For example:python src/custom_builders/synthetic_qa_builder.py \ --model pretrained/Qwen3-4B-Base \ --data data/synthetic/pair_qa_new.jsonl \ --output_dir projections/biasbios/Qwen3-4B-Base \ --max_samples 200 \ --min_diff 0.20 \ --top_pct 0.90
- Download the projection packs shipped with the camera-ready (links forthcoming) and unpack them under
projections/<task>/<model>/.
Each benchmark expects *_pos_proj.pt (and optionally *_neg_proj.pt) to be available at the paths you pass via --pos / --neg.
Use the Python scripts directly to reproduce the main results. Swap in the model/projection pair of interest; the commands below match the settings used in the paper.
python benchmarks/eval_bias_gen.py \
--model pretrained/Qwen3-4B-Base \
--data_path data/biasbios/biasbios.json \
--output_dir benchmarks/biasbios/results/seka-qwen3-4b \
--overwrite_output_dir \
--batch_size 32 \
--max_new_tokens 64 \
--seka \
--pos projections/biasbios/Qwen3-4B-Base_pos_proj.pt \
--neg projections/biasbios/Qwen3-4B-Base_neg_proj.pt \
--amplify_pos 1.0 \
--amplify_neg 0.8 \
--layers last10python benchmarks/eval_fact_gen.py \
--model pretrained/Qwen3-4B-Base \
--data_path data/counterfact \
--output_dir benchmarks/counterfact/results/seka-qwen3-4b \
--overwrite_output_dir \
--benchmarks efficacy paraphrase \
--add_unmediated_fact True \
--batch_size 32 \
--max_new_tokens 64 \
--seka \
--pos projections/counterfact/Qwen3-4B-Base_pos_proj.pt \
--neg projections/counterfact/Qwen3-4B-Base_neg_proj.pt \
--amplify_pos 1.56 \
--amplify_neg 0.0 \
--layers last10python benchmarks/eval_biasbios_instruction.py \
--model pretrained/gemma-3-4b-pt \
--data_path data/biasbios/biasbios.json \
--task pronchange \
--output_dir benchmarks/pronchange/results/seka-gemma-3-4b \
--overwrite_output_dir \
--batch_size 32 \
--max_new_tokens 256 \
--seka \
--pos projections/pronchange/gemma-3-4b-pt_pos_proj.pt \
--neg projections/pronchange/gemma-3-4b-pt_neg_proj.pt \
--amplify_pos 0.40 \
--amplify_neg 0.00 \
--layers last10 \
--example_subset 28297:29297 # optional slice used in the paperEach run emits metric_result.json (headline metrics) and task-specific logs inside the specified --output_dir.
| Argument | Description |
|---|---|
--model |
HF identifier or local path under pretrained/ |
--data_path |
Path to the processed dataset JSON / folder |
--pos, --neg |
Paths to SEKA projection tensors (*_pos_proj.pt, *_neg_proj.pt) |
--amplify_pos, --amplify_neg |
Steering coefficients (values from Tables 2–4 of the paper) |
--layers |
Layer subset (last10, all, 0,1,2, etc.) parsed by _parse_layers |
--example_subset |
Optional start:end slice for quick sanity checks |
- Projection mismatch – Ensure you load projections trained for the same model family and task you are evaluating.
- Negative amplitude behaviour – If you do not wish to use a negative projection, omit the
--negflag entirely; passing the flag with--amplify_neg 0.0will attenuate the positive term. - Cluster modules – The example scripts assume CUDA modules named
cuda/12.x; adapt to your HPC environment if necessary.
That’s it! The commands above recreate the CounterFact, BiasBios, and PronChange numbers reported in the paper.