Skip to content

noanabeshima/tokre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokre

Tokre is a token-regex library created for poking around SAE latents. It's used in the backend for https://sparselatents.com/search.

Tokre can be used for:

  • Quickly creating 'synthetic features' more easily than you can with python. These are useful for searching SAE latents and quickly validating/disproving a hypothesis you have about a particular SAE latent.
  • Optionally tuning a synthetic feature to try to predict a given SAE latent.

Induction ([A][B].*[A]) is literally written in tokre as

[a=.].*[a]

Taking the cosine sim of this synthetic feature with the latent activation in a gemma-2-2b SAE identifies this latent image

Words that start with 'M' would look like this:

punctuation_token = [re `[.*(:|;|,|"|\?|\.|!).*]`]
starts_with_m = [re `( m| M|M).*`]
nospace_token = [re `[^\s].*`]
m_word_token = [starts_with_m][nospace_token](?<![punctuation_token])

[m_word_token]

Using this script we find this latent image

Major Cities script

( Paris | New York | London | Tokyo | Shanghai | Dubai | Singapore | Sydney | Mumbai | Istanbul | São Paulo | Moscow | Berlin | Toronto | Seoul )

finds image

Tokre can be used to debug hypotheses and more easily find interesting exceptions: image

Tokre can easily be used with every tiktoken and huggingface tokenizer.

Usage

pip install tokre
import tokre
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('google/gemma-2-2b')

tokre.setup(tokenizer=tokenizer)

script = r'''
punctuation_token = [re `[.*(:|;|,|"|\?|\.|!).*]`]
starts_with_m = [re `( m| M|M).*`]
nospace_token = [re `[^\s].*`]
m_word_token = [starts_with_m]([nospace_token](?<![punctuation_token]))*
'''

# a 2d numpy array of token strings
# You also could have used a token ids tensor.
tok_strs = np.array([
   [' What', "'", 's', ' the', ' nicest', ' part', ' of', ' Massachusetts', '?'],
   ['Hello', ' World', '!', ' Here', "'", 's', ' an', ' example', '.']
])

synth = tokre.SynthFeat(script)
synth_acts = synth.get_mask(tok_strs)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages