Hierarchical Transformers for Context-Free Grammars

This repository contains reproducible experiments comparing Regular Transformers (RT) and Hierarchical Transformers (HT) on Context-Free Grammar (CFG) tasks and protein structure prediction.

Quick Start

Option 1: Run All Experiments

# Clone and setup
git clone https://github.com/aboitreaud/hierarchical-transformers-cfg.git

# Or clone via SSH
# git clone [email protected]:aboitreaud/hierarchical-transformers-cfg.git
cd hierarchical-transformers-cfg

# Install dependencies
pip install -r requirements.txt

# Run all experiments (this will take several hours)
./scripts/reproduce_all.sh

Option 2: Run Individual Experiments

# HT-2L vs RT-4L comparison
python experiments/scripts/run_ht_2l_vs_rt_4l.py

# HT-4L vs RT-8L parameter-matched comparison  
python experiments/scripts/run_ht_4l_vs_rt_8l.py

# Depth sweep analysis (CFG-3L to CFG-11L)
python experiments/scripts/run_depth_sweep.py

# Binary HT vs RT on deep CFG-9L
python experiments/scripts/run_ht_binary_vs_rt_cfg9l.py

# Protein structure prediction (requires dataset)
python experiments/scripts/run_protein_ht_vs_rt.py

Experiments Overview

The scripts/reproduce_all.sh script runs the following experiments:

1. RT and HT on CFG-7L

This experiment compares RT and HT performance on 7-level CFG. We use cosine learning rate schedule from 6e-4 to 6e-5, training for 100 epochs with 5k sentences per epoch. The models are 6-layer RT vs 4-transformer HT.

Script: run_ht_2l_vs_rt_4l.py

2. HT-4L vs RT-8L

This is parameter-matched comparison of hierarchical vs depth. We compare 4-layer HT vs 8-layer RT with similar parameter counts to see which architecture is more efficient.

Script: run_ht_4l_vs_rt_8l.py

3. Depth Sweep Analysis

This experiment studies transformer accuracy vs CFG depth. We test CFG depths from 3L through 11L and analyze both transformer performance and spectral clustering heuristics for reproduction.

Script: run_depth_sweep.py

4. HT-Binary vs RT on CFG-9L

This compares binary hierarchy vs flat transformer on deep grammar. We use 2-level HT vs standard RT on 9-level CFG to test performance on very deep hierarchical structures.

Script: run_ht_binary_vs_rt_cfg9l.py

5. Protein HT vs RT with Kabsch-MSE

This experiment compares protein structure prediction using hierarchical (atom+residue) vs regular transformer. We use Kabsch-aligned MSE loss for 3D coordinate prediction.

Script: run_protein_ht_vs_rt.py

Project Structure

hierarchical-transformers-cfg/
├── src/                          # Source code
│   ├── models/                   # Model implementations
│   │   ├── cfg.py               # Context-Free Grammar implementation
│   │   ├── transformers.py      # RT and HT models
│   │   └── protein_models.py    # Protein-specific models
│   └── utils/                   # Training utilities
│       └── training.py          # Training loops and helpers
├── experiments/                 # Experiment configurations and scripts
│   ├── configs/                # YAML configuration files
│   ├── scripts/                # Individual experiment scripts
│   └── results/                # Experiment outputs (created during runs)
├── scripts/                    # Main execution scripts
│   ├── reproduce_all.sh        # Master script to run all experiments
│   └── setup_environment.sh    # Environment setup script
├── requirements.txt            # Python dependencies
└── README.md

Installation and Setup

Requirements

Python 3.8 or later
CUDA-capable GPU (recommended for faster training) with torch 2.0.0 or later
At least 16GB RAM for larger experiments

Installation Steps

# Create virtual environment (recommended)
python -m venv cfg-env
source cfg-env/bin/activate  # On Windows: cfg-env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Or use the setup script (recommended)
./scripts/setup_environment.sh

Weights and Biases Setup (Optional)

If you want to track experiments with Weights and Biases:

wandb login

Model Architectures

Regular Transformer (RT)

This is standard GPT-style transformer with token embeddings and positional embeddings. It uses multi-head self-attention with causal masking and feed-forward networks with GELU activation.

Hierarchical Transformer (HT)

This model has multiple transformer levels that correspond to CFG hierarchy. It uses parent embedders for level transitions and dimension reducers for multi-level fusion. The architecture is specialized for hierarchical sequence modeling.

Protein Models

Protein HT: Uses separate transformers for atoms and residues
Protein RT: Uses single transformer on atom sequences only
Kabsch-MSE Loss: Rotation and translation invariant loss function

Configuration

Each experiment uses YAML configuration files in experiments/configs/:

# Example: ht_2l_vs_rt_4l.yaml
experiment_name: "ht_2l_vs_rt_4l"
cfg:
  L: 7  # CFG depth
  ns: [1, 3, 3, 3, 5, 5, 9, 10]  # Symbols per level
  nr: [2, 2, 2, 2, 2, 2, 2]      # Rules per symbol
  T: [2, 2, 2, 2, 2, 4, 4]       # Rule lengths

models:
  rt:
    type: "GPT"
    n_layer: 6
    n_head: 6
    head_size: 64
  
training:
  num_epochs: 100
  max_lr: 6e-4
  min_lr: 6e-5

Expected Results

CFG Tasks

Accuracy: Percentage of grammatically correct generated sentences
Per-level Accuracy: Correctness at each CFG level
Parameter Efficiency: Accuracy per million parameters

Protein Tasks

Kabsch-MSE: Rotation-invariant coordinate prediction error
Training/Validation Loss: Learning curves over epochs

Troubleshooting

Common Issues

CUDA Out of Memory

# Reduce batch size in config files
batch_size: 50  # Instead of 100

Missing Protein Dataset

# The protein experiment requires protein_dataset.pkl
# Create it using the data preparation script:
python scripts/prepare_protein_dataset.py
# Skip with: python run_protein_ht_vs_rt.py --skip-if-missing

Slow Training

# Reduce epochs for testing
num_epochs: 10  # Instead of 100

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hierarchical Transformers for Context-Free Grammars

Quick Start

Option 1: Run All Experiments

Option 2: Run Individual Experiments

Experiments Overview

1. RT and HT on CFG-7L

2. HT-4L vs RT-8L

3. Depth Sweep Analysis

4. HT-Binary vs RT on CFG-9L

5. Protein HT vs RT with Kabsch-MSE

Project Structure

Installation and Setup

Requirements

Installation Steps

Weights and Biases Setup (Optional)

Model Architectures

Regular Transformer (RT)

Hierarchical Transformer (HT)

Protein Models

Configuration

Expected Results

CFG Tasks

Protein Tasks

Troubleshooting

Common Issues

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
experiments		experiments
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

aboitreaud/hierarchical-transformers-cfg

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Transformers for Context-Free Grammars

Quick Start

Option 1: Run All Experiments

Option 2: Run Individual Experiments

Experiments Overview

1. RT and HT on CFG-7L

2. HT-4L vs RT-8L

3. Depth Sweep Analysis

4. HT-Binary vs RT on CFG-9L

5. Protein HT vs RT with Kabsch-MSE

Project Structure

Installation and Setup

Requirements

Installation Steps

Weights and Biases Setup (Optional)

Model Architectures

Regular Transformer (RT)

Hierarchical Transformer (HT)

Protein Models

Configuration

Expected Results

CFG Tasks

Protein Tasks

Troubleshooting

Common Issues

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages