Skip to content

FreedomIntelligence/EchoX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

License Python Model Size

📄 Paper | 📦 Model | 🚀 HF Space | 🌐 Web Demo | 📊 EchoX-Dialougues | 📊 EchoX-Dialogues-Plus

Contents

Key Features

  • Mitigates Acoustic-Semantic Gap in Speech-to-Speech LLMs
  • Introduces Echo Training with a Novel Three-Stage Pipeline (S2T, T2C, Echo)
  • Trained on Only 6k Hours of Curated Data, Ensuring Efficiency
  • Achieves State-of-the-Art Performance in Knowledge-Based QA Benchmarks
  • Preserves Reasoning and Knowledge Abilities for Interactive Speech Tasks

Performance

Performance

EchoX demonstrates exceptional performance on knowledge-based question-answering tasks. The model achieves superior results with minimal training data, establishing a new benchmark for efficiency in speech-to-speech language models.

Datasets and Models

Dataset

EchoX is trained on carefully curated datasets for each stage of the pipeline, ensuring optimal performance across ASR, TTS, and SQA tasks. The datasets used are as follows:

Task Data Size Duration(H) Stage Download
ASR LibriSpeech 281,241 960 I -
ASR MLS 723,636 3,000 I -
TTS AudioQA-1M 178,576 989 II -
TTS SpeechInstruct 31,563 84 II -
TTS HH-RLHF-Speech 124,945 656 II -
SQA sharechatx 43,223 178 I, III Link
SQA Magpie-Pro-Speech+ 117,000 327 I, III Link
Total 1,500,184 6,194

Model

The following pre-trained models are available for download:

Model Parameters Training Data Download Link
EchoX-3B 3 billion 6k hours EchoX-3B Model
EchoX-8B 8 billion 6k hours EchoX-8B Model

Quickstart

Environment Setup

To set up your environment, follow these steps:

git clone https://github.com/FreedomIntelligence/EchoX.git
cd EchoX
conda create -n echox python=3.10 pip=24.0
conda activate echox
pip install -r requirements.txt

Model Download

Download the models to this repository directory using the following commands:

pip install -U huggingface_hub
huggingface-cli download --resume-download FreedomIntelligence/EchoX-8B --local-dir EchoX-8B
huggingface-cli download --resume-download openai/whisper-large-v3 --local-dir whisper-large-v3

Note: If the models are downloaded to a different location or 3B version is used, please update the model directory paths in inference/echox_stream.py and {your_EchoX_weight_directory}/config.json accordingly.

Inference

Run inference on a test case:

python demo.py

Alternatively, start the Gradio web interface:

python app.py

To use a specific GPU:

CUDA_VISIBLE_DEVICES=1 python app.py

Citation

If you use EchoX in your research or projects, please cite our paper:

@misc{zhang2025echoxmitigatingacousticsemanticgap,
      title={EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs}, 
      author={Yuhao Zhang and Yuhao Du and Zhanchen Dai and Xiangnan Ma and Kaiqi Kou and Benyou Wang and Haizhou Li},
      year={2025},
      eprint={2509.09174},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.09174}, 
}

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

About

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •