Releases · decoderesearch/circuit-tracer

18 Apr 09:51

hannamw

v0.5.0

4bb8c0e

v0.5.0 Latest

Latest

This release brings two new features:

Top-K Transcoders

Top-K transcoders are now supported - thanks to @zsquaredz for helping with this! This means that these Llama-3 8B Instruct transcoders are now usable.
In the config.yaml, specify activation: topk to mark the transcoders as top-k transcoders, and use k: 128 to indicate the value of k (e.g., 128). The weight files for these transcoders should be the same as e.g. relu transcoders.
load_relu_transcoder is now load_transcoder, and serves as a general function for loading per-layer transcoders. It acts the same as the old load_relu_transcoder, except that you can now pass in the activation_fn that you want used.

Local Features

Thanks to @s-ewbank, there is now a features_dir argument for serve and circuit-tracer start-server that allows you to specify a local directory where locally-computed features live! This is helpful if you've trained your own transcoders / computed your own features, and don't wish to upload them to Huggingface.

Contributors

zsquaredz and s-ewbank

Assets 2

28 Feb 16:19

hannamw

v0.4.1

a2e9eb9

v0.4.1

This release bumps the version of nnsight to v0.6.1! This should improve the performance of ReplacementModels using the nnsight backend. Read more about this nnsight release here.

Assets 2

23 Feb 20:55

hannamw

v0.4.0

fad653f

v0.4.0

New Feature: Attribution Targets

Previously, circuit-tracer only allowed attribution back from either the top-n tokens or those tokens representing top-p of probability mass. Now, you can attribute back from a wider set of quantities! These include:

Arbitrary tokens (as specified by either a list of token strings or tensor of token ids)
Arbitrary d-model-size vectors (such as the difference between two logits' unembedding vectors)

Want to know more? Check out demos/attribution_targets_demo.ipynb

Thank a bunch to @speediedan for contributing this awesome feature!

Minor changes

For GemmaScope-2 Transcoders and Gemma-3-IT models, prompts now must start with <bos><start_of_turn>user\n; otherwise, an error will be thrown.
Circuit-tracer is now officially part of decode research! (thanks @hijohnnylin)
Circuit-tracer's version in the pyproject.toml has been updated to match the tag (thanks @hijohnnylin!)

Contributors

hijohnnylin and speediedan

Assets 2

14 Jan 08:19

hannamw

v0.3.1

e09b5f3

v0.3.1

This release fixes two bugs:

Error nodes for skip transcoders were computed without accounting for the skip connection, resulting in inflated error nodes and outward edges. This would have affected Llama 3.2 1B graphs made with skip transcoders, as well as Gemma 3 graphs made with skip transcoders.
Error nodes for gemma-3 instruct models were only being zeroed out at position 0, rather than at the first 4 positions (corresponding to the 4 static BOS-adjacent tokens that their transcoders were not trained on)

Assets 2

08 Jan 16:33

hannamw

v0.3.0

9317b2a

v0.3.0

New Features

NNsight Backend

This release introduces the NNsight backend! Now, you can create a ReplacementModel that is a subclass of NNsight's LanguageModel class, instead of being a HookedTransformer, like so:

from circuit_tracer import ReplacementModel
model = ReplacementModel.from_pretrained("google/gemma-2-2b", "gemma", backend='nnsight')

The nnsight backend behaves identically to the original (transformerlens) backend, including all of the same features. This means that you can use circuit-tracer with any model, including those not yet ported to TransformerLens. However, note that you still need to have transcoders for your model in order to use circuit-tracer.

Using the nnsight backend with a totally new model does entail some extra work, to specify where relevant parts of the model are. In particular, you need to fill out a TransformerLens_NNSight_Mapping in utils/tl_nnsight_mapping.py, if one does not yet exist; see utils/MAPPING_INFO.md for more information on what this is, and how to fill it out.

By default, circuit-tracer installs both backends. But, if you want to use only one, you can also keep only TransformerLens or only NNsight installed; as long as you set the backend argument appropriately, there should be no dependence on the other backend.

Right now, the NNsight backend is still somewhat slower than the TransformerLens backend when it comes to performing interventions; however, performance improvements to both NNsight and circuit-tracer are coming soon to speed things up!

GemmaScope 2 Transcoders

Google Deepmind has released new transcoders as part GemmaScope 2, and these are compatible with circuit-tracer! We provide HuggingFace repos containing configuration files that allow these to be used with circuit-tracer. These correspond to models in the transcoder_all and clt subfolders of the HuggingFace model repos; take the whole path to the desired transcoder and replace google/ with mwhanna/ to get the circuit-tracer-compatible repo.

You can load these models into an NNsight-backend ReplacementModel as follows:

import torch
from circuit_tracer import ReplacementModel

model = ReplacementModel.from_pretrained(
    "google/gemma-3-1b-pt", 
    "mwhanna/gemma-scope-2-1b-pt/transcoder_all/width_262k_l0_small_affine", 
    dtype=torch.bfloat16, 
    backend='nnsight',
)

Currently, only some of these models support full circuit-tracer functionality: only 270m models and some 1b models allow for the visualization of graphs. This is due to a lack of feature files containing activation information that would allow the others to be visualized; such feature files will be added in the coming days.

Caching

In order to use the lazy_decoder and lazy_encoder options on transcoders, they must be stored in circuit-tracer-compatible format. So far, we've been (re-)uploading transcoders (including all of GemmaScope-2) in that format to HuggingFace; however, this is time- and space-inefficient. circuit-tracer now supports instead creating a local cache of models, by calling e.g.

from circuit_tracer.utils.caching import save_transcoders_to_cache

hf_ref = "mwhanna/gemma-scope-2-1b-pt/transcoder_all/width_262k_l0_small_affine"
cache_dir = '~/.cache/'
save_transcoders_to_cache(hf_ref, cache_dir=cache_dir)

You can also empty the cache using empty_cache. Since all current transcoders on mntss/ and mwhanna/ are in the correct format, this isn't yet necessary, but it may become necessary in the future to save HuggingFace repository space.

Breaking Changes

Removed zero_bos parameter: The zero_bos argument has been removed from setup_attribution, get_transcoder_activations, and related methods.
Demo utilities moved: demos/utils.py has been moved to circuit_tracer/utils/demo_utils.py. Update your imports accordingly.

Other

Improved testing

Test coverage has been improved over past releases. We now test interventions more thoroughly, ensuring the correct functioning of ReplacementModel.feature_intervention and its various keyword arguments; we also test ReplacementModel.feature_intervention_generate more thoroughly.

New tests have also been added to check that the TransformerLens and NNsight backends produce identical results.

Assets 2

05 Aug 17:35

mntss

v0.2.0

23a2c10

v0.2.0

New Features

Cross-Layer Transcoders (CLT)

Introducing support for cross-layer transcoders, where features read from one layer and write to all subsequent layers. This enables shorter attribution paths by representing cross-layer dependencies as single features.

from circuit_tracer.transcoder.cross_layer_transcoder import load_clt
clt = load_clt("/path/to/clt", lazy_decoder=True)
model = ReplacementModel.from_config(cfg, clt)

Consolidated Transcoder Repository System

Transcoders and their associated feature files are now consolidated in single repositories, eliminating configuration file complexity. Feature examples are loaded directly from HuggingFace repositories in the frontend:

# Simply point to transcoder repository
model = ReplacementModel.from_pretrained(
    "google/gemma-2-2b", 
    "mntss/gemma-scope-transcoders"
)

Transcoder Lazy Loading

Memory-efficient lazy loading ensures only actively used weights are kept in memory:

Lazy decoder: Loads only the decoder rows that are actually used (recommended for CLTs)
Lazy encoder: Loads encoder weights only when accessed (use when memory constrained)

from circuit_tracer.utils.hf_utils import load_transcoder_from_hub
transcoder, config = load_transcoder_from_hub("mntss/gemma-scope-transcoders")

Text Generation with Feature Interventions

Generate text while steering model behavior through feature interventions. This is now much easier to set up and use:

generation, logits, activations = model.feature_intervention_generate(
    prompt="The capital of France is",
    interventions=[(layer, slice(1, None), feature_idx, value)],
    max_new_tokens=50
)

Additional Changes

Updated Tokenization

Attribution now automatically enforces special token prepending. The prepended token is ignored in attribution to avoid position-0 artifacts.

Breaking Changes

Removed YAML-based configuration system
ReplacementModel now accepts either TranscoderSet or CrossLayerTranscoder instances
Some internal import paths have changed due to module reorganization

New Transcoder Releases

We're excited to also release new transcoders:

Cross-layer transcoders for Gemma 2 2B and Llama 3.2 1B, plus new transcoder support for the Qwen3 model suite
Collections: Cross-Layer Transcoders (CLT) and Per-Layer Transcoders (PLT)

Assets 2

Releases: decoderesearch/circuit-tracer

v0.5.0

Top-K Transcoders

Local Features

Contributors

Uh oh!

v0.4.1

Uh oh!

v0.4.0

New Feature: Attribution Targets

Minor changes

Contributors

Uh oh!

v0.3.1

Uh oh!

v0.3.0

New Features

NNsight Backend

GemmaScope 2 Transcoders

Caching

Breaking Changes

Other

Improved testing

Uh oh!

v0.2.0

New Features

Cross-Layer Transcoders (CLT)

Consolidated Transcoder Repository System

Transcoder Lazy Loading

Text Generation with Feature Interventions

Additional Changes

Updated Tokenization

Breaking Changes

New Transcoder Releases

Uh oh!