Skip to content

Conversation

Copy link

Copilot AI commented Nov 18, 2025

Analyzed 57-file legacy codebase, identified core β-VAE + Random Forest algorithm, and reimplemented as clean seti_ml package focused on Phase 1 synthetic signal generation with setigen. Phase 2 (real SRT plates) structure ready.

Implementation

New seti_ml package (2,900 LOC)

  • data/signal_generation.py - Setigen-based ETI signal injection with ABACAD cadence
  • data/preprocessing.py - Log normalization, 4096→512 bin downsampling pipeline
  • models/vae.py - β-VAE (6D latent) with TensorFlow 2.x
  • models/classifier.py - Random Forest on flattened 36D features
  • training/ - VAE and classifier training scripts with config support
  • inference/detector.py - End-to-end detection pipeline
  • tests/test_integration.py - Full pipeline validation (passing)

Algorithm: Detect signals in ABACAD pattern (ON-OFF-ON-OFF-ON-OFF cadence)

  • Extract 6D latent features per observation via VAE encoder
  • Classify 36D concatenated features (6 obs × 6D) with RF
  • Threshold probability for detection

Bug Fixes

  • Drift rate bias: Changed random()uniform() in signal generation (eliminated 2× negative slope bias per 2025 bug report)
  • API compatibility: Added ascending=False to Frame.from_data() for setigen 2.x
  • VAE decoder: Dynamic reshape calculation based on output shape instead of hardcoded (1, 16, 128)

Phase Structure

Phase 1 (Complete): Synthetic chi-squared noise + setigen signal injection
Phase 2 (Ready): preprocessing.py:create_background_plates(use_synthetic=False) placeholder for real SRT data loading

Usage

from seti_ml.data.preprocessing import create_background_plates
from seti_ml.data.signal_generation import generate_dataset
from seti_ml.models.vae import build_vae
from seti_ml.inference.detector import SETIDetector

# Generate synthetic data
plates = create_background_plates(n_plates=1000)
signals = generate_dataset(plates, 1000, 'true_fast', snr_base=20.0)

# Train and detect
vae = build_vae(input_shape=(16, 512, 1), latent_dim=6)
detector = SETIDetector(vae_model=vae, classifier=classifier)
detections, probs, metrics = detector.detect(test_data)

Documentation: 6 guides (1,329 lines) including Italian summary, technical analysis, and API docs

Cleanup: Removed ~40 duplicate/unused legacy files; preserved original code in existing directories for reference

Original prompt

Analizza per intero questo repository, scopri qual'è l'entry point, quali sono i file e le funzioni che vengono realmente utilizzate e quali invece no, e aiutami a ricostruire da zero l'intero algoritmo. Devo reimplementare il tutto mantenendo la stessa logica e base ma eventualmente aggiungendo delle migliorie, ottimizzando e strutturando il codice in modo migliore, e sfruttare al massimo funzioni di libreria già esistenti e allo stato dell'arte

Nella prima fase utilizzeremo solo dati simulati con setigen. Successivamente, dopo aver constato che il modello funziona correttamente su dati simulati, utilizzeremo, come in questo repository, dei plate ossia degli "sfondi" presi direttamente dalle osservazioni del Sardinian Radio Telescopoe, composti da rumore o RFI su cui faremo un injection di segnali ETI tramite setigen


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Analyze repository and reimplement the algorithm with optimizations Restructure ML_GBT_SETI with clean implementation for Phase 1 synthetic data Nov 18, 2025
Copilot AI requested a review from filippozuddas November 18, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants