Skip to content

Releases: decoderesearch/circuit-tracer

v0.5.0

18 Apr 09:51
4bb8c0e

Choose a tag to compare

This release brings two new features:

Top-K Transcoders

  • Top-K transcoders are now supported - thanks to @zsquaredz for helping with this! This means that these Llama-3 8B Instruct transcoders are now usable.
  • In the config.yaml, specify activation: topk to mark the transcoders as top-k transcoders, and use k: 128 to indicate the value of k (e.g., 128). The weight files for these transcoders should be the same as e.g. relu transcoders.
  • load_relu_transcoder is now load_transcoder, and serves as a general function for loading per-layer transcoders. It acts the same as the old load_relu_transcoder, except that you can now pass in the activation_fn that you want used.

Local Features

  • Thanks to @s-ewbank, there is now a features_dir argument for serve and circuit-tracer start-server that allows you to specify a local directory where locally-computed features live! This is helpful if you've trained your own transcoders / computed your own features, and don't wish to upload them to Huggingface.

v0.4.1

28 Feb 16:19
a2e9eb9

Choose a tag to compare

This release bumps the version of nnsight to v0.6.1! This should improve the performance of ReplacementModels using the nnsight backend. Read more about this nnsight release here.

v0.4.0

23 Feb 20:55
fad653f

Choose a tag to compare

New Feature: Attribution Targets

Previously, circuit-tracer only allowed attribution back from either the top-n tokens or those tokens representing top-p of probability mass. Now, you can attribute back from a wider set of quantities! These include:

  • Arbitrary tokens (as specified by either a list of token strings or tensor of token ids)
  • Arbitrary d-model-size vectors (such as the difference between two logits' unembedding vectors)

Want to know more? Check out demos/attribution_targets_demo.ipynb

Thank a bunch to @speediedan for contributing this awesome feature!

Minor changes

  • For GemmaScope-2 Transcoders and Gemma-3-IT models, prompts now must start with <bos><start_of_turn>user\n; otherwise, an error will be thrown.
  • Circuit-tracer is now officially part of decode research! (thanks @hijohnnylin)
  • Circuit-tracer's version in the pyproject.toml has been updated to match the tag (thanks @hijohnnylin!)

v0.3.1

14 Jan 08:19
e09b5f3

Choose a tag to compare

This release fixes two bugs:

  • Error nodes for skip transcoders were computed without accounting for the skip connection, resulting in inflated error nodes and outward edges. This would have affected Llama 3.2 1B graphs made with skip transcoders, as well as Gemma 3 graphs made with skip transcoders.
  • Error nodes for gemma-3 instruct models were only being zeroed out at position 0, rather than at the first 4 positions (corresponding to the 4 static BOS-adjacent tokens that their transcoders were not trained on)

v0.3.0

08 Jan 16:33
9317b2a

Choose a tag to compare

New Features

NNsight Backend

This release introduces the NNsight backend! Now, you can create a ReplacementModel that is a subclass of NNsight's LanguageModel class, instead of being a HookedTransformer, like so:

from circuit_tracer import ReplacementModel
model = ReplacementModel.from_pretrained("google/gemma-2-2b", "gemma", backend='nnsight')

The nnsight backend behaves identically to the original (transformerlens) backend, including all of the same features. This means that you can use circuit-tracer with any model, including those not yet ported to TransformerLens. However, note that you still need to have transcoders for your model in order to use circuit-tracer.

Using the nnsight backend with a totally new model does entail some extra work, to specify where relevant parts of the model are. In particular, you need to fill out a TransformerLens_NNSight_Mapping in utils/tl_nnsight_mapping.py, if one does not yet exist; see utils/MAPPING_INFO.md for more information on what this is, and how to fill it out.

By default, circuit-tracer installs both backends. But, if you want to use only one, you can also keep only TransformerLens or only NNsight installed; as long as you set the backend argument appropriately, there should be no dependence on the other backend.

Right now, the NNsight backend is still somewhat slower than the TransformerLens backend when it comes to performing interventions; however, performance improvements to both NNsight and circuit-tracer are coming soon to speed things up!

GemmaScope 2 Transcoders

Google Deepmind has released new transcoders as part GemmaScope 2, and these are compatible with circuit-tracer! We provide HuggingFace repos containing configuration files that allow these to be used with circuit-tracer. These correspond to models in the transcoder_all and clt subfolders of the HuggingFace model repos; take the whole path to the desired transcoder and replace google/ with mwhanna/ to get the circuit-tracer-compatible repo.

You can load these models into an NNsight-backend ReplacementModel as follows:

import torch
from circuit_tracer import ReplacementModel

model = ReplacementModel.from_pretrained(
    "google/gemma-3-1b-pt", 
    "mwhanna/gemma-scope-2-1b-pt/transcoder_all/width_262k_l0_small_affine", 
    dtype=torch.bfloat16, 
    backend='nnsight',
)

Currently, only some of these models support full circuit-tracer functionality: only 270m models and some 1b models allow for the visualization of graphs. This is due to a lack of feature files containing activation information that would allow the others to be visualized; such feature files will be added in the coming days.

Caching

In order to use the lazy_decoder and lazy_encoder options on transcoders, they must be stored in circuit-tracer-compatible format. So far, we've been (re-)uploading transcoders (including all of GemmaScope-2) in that format to HuggingFace; however, this is time- and space-inefficient. circuit-tracer now supports instead creating a local cache of models, by calling e.g.

from circuit_tracer.utils.caching import save_transcoders_to_cache

hf_ref = "mwhanna/gemma-scope-2-1b-pt/transcoder_all/width_262k_l0_small_affine"
cache_dir = '~/.cache/'
save_transcoders_to_cache(hf_ref, cache_dir=cache_dir)

You can also empty the cache using empty_cache. Since all current transcoders on mntss/ and mwhanna/ are in the correct format, this isn't yet necessary, but it may become necessary in the future to save HuggingFace repository space.

Breaking Changes

  • Removed zero_bos parameter: The zero_bos argument has been removed from setup_attribution, get_transcoder_activations, and related methods.
  • Demo utilities moved: demos/utils.py has been moved to circuit_tracer/utils/demo_utils.py. Update your imports accordingly.

Other

Improved testing

Test coverage has been improved over past releases. We now test interventions more thoroughly, ensuring the correct functioning of ReplacementModel.feature_intervention and its various keyword arguments; we also test ReplacementModel.feature_intervention_generate more thoroughly.

New tests have also been added to check that the TransformerLens and NNsight backends produce identical results.

v0.2.0

05 Aug 17:35
23a2c10

Choose a tag to compare

New Features

Cross-Layer Transcoders (CLT)

Introducing support for cross-layer transcoders, where features read from one layer and write to all subsequent layers. This enables shorter attribution paths by representing cross-layer dependencies as single features.

from circuit_tracer.transcoder.cross_layer_transcoder import load_clt
clt = load_clt("/path/to/clt", lazy_decoder=True)
model = ReplacementModel.from_config(cfg, clt)

Consolidated Transcoder Repository System

Transcoders and their associated feature files are now consolidated in single repositories, eliminating configuration file complexity. Feature examples are loaded directly from HuggingFace repositories in the frontend:

# Simply point to transcoder repository
model = ReplacementModel.from_pretrained(
    "google/gemma-2-2b", 
    "mntss/gemma-scope-transcoders"
)

Transcoder Lazy Loading

Memory-efficient lazy loading ensures only actively used weights are kept in memory:

  • Lazy decoder: Loads only the decoder rows that are actually used (recommended for CLTs)
  • Lazy encoder: Loads encoder weights only when accessed (use when memory constrained)
from circuit_tracer.utils.hf_utils import load_transcoder_from_hub
transcoder, config = load_transcoder_from_hub("mntss/gemma-scope-transcoders")

Text Generation with Feature Interventions

Generate text while steering model behavior through feature interventions. This is now much easier to set up and use:

generation, logits, activations = model.feature_intervention_generate(
    prompt="The capital of France is",
    interventions=[(layer, slice(1, None), feature_idx, value)],
    max_new_tokens=50
)

Additional Changes

Updated Tokenization

Attribution now automatically enforces special token prepending. The prepended token is ignored in attribution to avoid position-0 artifacts.

Breaking Changes

  • Removed YAML-based configuration system
  • ReplacementModel now accepts either TranscoderSet or CrossLayerTranscoder instances
  • Some internal import paths have changed due to module reorganization

New Transcoder Releases

We're excited to also release new transcoders: