Skip to content

IBM/AISteer360

AISteer360 AISteer360

Docs uv pre-commit Python 3.11+ GitHub License


The AI Steerability 360 toolkit is an extensible library for general purpose steering of LLMs. The toolkit allows for the implementation of steering methods across a range of model control surfaces (input, structural, state, and output), functionality to compose steering methods (into a SteeringPipeline), and the ability to compare steering methods (and pipelines) on custom tasks/metrics.

To get started, please see the documentation.

Installation

The toolkit uses uv as the package manager (Python 3.11+). After installing uv, install the toolkit by running:

uv venv --python 3.11 && uv pip install .

Activate by running source .venv/bin/activate. Note that on Windows, you may need to split the above script into two separate commands (instead of chained via &&).

Inference is facilitated by Hugging Face. Before steering, create a .env file in the root directory for your Hugging Face API key in the following format:

HUGGINGFACE_TOKEN=hf_***

Some Hugging Face models (e.g. meta-llama/Meta-Llama-3.1-8B-Instruct) are behind an access gate. Check that you have access via the model's Hub page with the same account whose token you pass to the toolkit.

Note

AISteer360 runs the model inside your process. For efficient inference, please run the toolkit from a machine that has enough GPU memory for both the base checkpoint and the extra overhead your steering method/pipeline adds.

Featured applications

The ability to benchmark and compare steering methods on realistic use cases is one of the main features of the toolkit. The featured examples below illustrate this functionality.

Steering for instruction following
A model's instruction following ability is an important measure of its general usability. This notebook studies the effect of post-hoc attention steering (PASTA) on a model's ability to follow instructions. We sweep over the steering strength and investigate the trade-off between a model's instruction following ability and general response quality.

Open In Colab
Steering for commonsense reasoning
Multiple choice question answering is a common format for evaluating a model's reasoning ability. This notebook benchmarks steering methods on the CommonsenseQA dataset, comparing few-shot prompting against a LoRA adapter trained with DPO. We sweep over the number of few-shot examples and study how accuracy scales relative to the fine-tuned baseline across two models.

Open In Colab
Composite steering for truthfulness
One of the primary features of the toolkit is the ability to compose multiple steering methods into one model operation. This notebook composes a state control (PASTA) with an output control (DeAL) with the goal of improving the model's truthfulness (as measured on TruthfulQA) without significantly degrading informativeness. We sweep over the joint parameter space of the controls and study each control's performance (via the tradeoff between truthfulness and informativeness) to that of the composition.

Open In Colab

Control library

Demonstrations of each of the implemented steering methods in the toolkit are provided in the examples/notebooks/control_* folders; links to Colab notebooks are provided below.

Method Authors Notebook
Activation Addition (ActAdd) Turner et al., 2023 Open In Colab
Contrastive Activation Addition (CAA) Panickssery et al., 2023 Open In Colab
Conditional Activation Steering (CAST) Lee et al., 2024 Open In Colab
Decoding-time Alignment (DeAL) Huang et al., 2024 Open In Colab
Few-shot Learning Brown et al., 2020 Open In Colab
Inference-Time Intervention (ITI) Li et al., 2023 Open In Colab
Post-hoc Attention Steering (PASTA) Zhang et al., 2023 Open In Colab
Reward-Augmented Decoding (RAD) Deng & Raffel, 2023 Open In Colab
Self-Disciplined Autoregressive Sampling (SASA) Ko et al., 2025 Open In Colab
Thinking Intervention Wu et al., 2025 Open In Colab

The toolkit also provides wrappers for the following external libraries.

Wrapper Authors Notebook
MergeKit Goddard et al., 2024 Open In Colab
TRL von Werra et al., 2020 Open In Colab

Contributing

We invite community contributions primarily on broadening the set of steering methods (via new controls) and evaluations (via use cases and metrics). We additionally welcome reporting of any bugs/issues, improvements to the documentation, and new features). Specifics on how to contribute can be found in our contribution guidelines. To make contributing easier, we have prepared the following tutorials.

Adding a new steering method

If there is an existing steering method that is not yet in the toolkit but you feel should be, or you have developed a new steering method of your own, the toolkit has been designed to enable relatively easy contribution of new steering methods. Please see the tutorial on adding your own steering method for a detailed guide

Adding a new use case / benchmark

Use cases enable comparison of different steering methods on a common task. The base UseCase (aisteer360/evaluation/use_cases/) and the Benchmark class (aisteer360/evaluation/benchmark.py) enable this comparison. If you'd like to compare various steering methods/pipelines on a novel use case, please see the tutorial on adding your own use case.

Adding a new metric

Metrics are used by a given benchmark to quantify model performance across steering pipelines in a comparable way. We've included a selection of generic metrics (see aisteer360/evaluation/metrics/). If you'd like to add new generic metrics or custom metrics (for a new use case), please see the tutorial on adding your own metric.

IBM ❤️ Open Source AI

The AI Steerability 360 toolkit has been brought to you by IBM.

About

The AI Steerability 360 toolkit is an extensible library for general purpose steering of LLMs.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages