Skip to content

Conversation

@breakstring
Copy link

  1. Add Spark-TTS Web API with FastAPI implementation
    2.Add Docker support for Spark-TTS deployment

- Implement comprehensive FastAPI-based TTS API service
- Add API endpoints for text-to-speech with voice cloning and creation
- Create example client script for API interaction
- Include environment configuration and startup script
- Add README with detailed API usage and configuration instructions
- Configure .env.example for flexible service setup
- Implement file cleanup and output management
- Support multiple audio input and output methods
- Create Dockerfile for building Spark-TTS images with flexible model inclusion
- Add docker_builder.sh script for easy image building
- Implement docker-compose.yml with multiple service configurations
- Add .dockerignore to optimize Docker build context
- Update README and run_api.sh to support Docker deployment
- Configure environment variables and service types for containerized deployment
@breakstring breakstring mentioned this pull request Mar 8, 2025
@D34DC3N73R
Copy link

Tested this out but I get the following error in startup logs:

ERROR:api.main:Model initialization failed: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding protobuf==4.21.12 to requirements.txt and building again solves the issue.

@breakstring
Copy link
Author

Tested this out but I get the following error in startup logs:

ERROR:api.main:Model initialization failed: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding protobuf==4.21.12 to requirements.txt and building again solves the issue.

It's very strange, I checked in my own environment and there is no such protobuf package, and there is no such error at runtime(both in docker logs and local running logs).

(sparktts) azureuser@t4-westus2:~/Spark-TTS$ pip list
Package                  Version
------------------------ ------------
accelerate               0.26.0
aiofiles                 23.2.1
annotated-types          0.7.0
antlr4-python3-runtime   4.9.3
anyio                    4.8.0
audioread                3.0.1
certifi                  2025.1.31
cffi                     1.17.1
charset-normalizer       3.4.1
click                    8.1.8
decorator                5.2.1
einops                   0.8.1
einx                     0.3.0
fastapi                  0.115.11
ffmpy                    0.5.0
filelock                 3.17.0
frozendict               2.4.6
fsspec                   2025.2.0
gradio                   5.18.0
gradio_client            1.7.2
h11                      0.14.0
httpcore                 1.0.7
httpx                    0.28.1
huggingface-hub          0.29.2
idna                     3.10
Jinja2                   3.1.6
joblib                   1.4.2
lazy_loader              0.4
librosa                  0.10.2.post1
llvmlite                 0.44.0
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
mpmath                   1.3.0
msgpack                  1.1.0
networkx                 3.4.2
numba                    0.61.0
numpy                    2.1.3
nvidia-cublas-cu12       12.4.5.8
nvidia-cuda-cupti-cu12   12.4.127
nvidia-cuda-nvrtc-cu12   12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.1.3
nvidia-curand-cu12       10.3.5.147
nvidia-cusolver-cu12     11.6.1.9
nvidia-cusparse-cu12     12.3.1.170
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.4.127
omegaconf                2.3.0
orjson                   3.10.15
packaging                24.2
pandas                   2.2.3
pillow                   11.1.0
pip                      25.0
platformdirs             4.3.6
pooch                    1.8.2
psutil                   7.0.0
pycparser                2.22
pydantic                 2.10.6
pydantic_core            2.27.2
pydub                    0.25.1
Pygments                 2.19.1
python-dateutil          2.9.0.post0
python-dotenv            1.0.1
python-multipart         0.0.20
pytz                     2025.1
PyYAML                   6.0.2
regex                    2024.11.6
requests                 2.32.3
rich                     13.9.4
ruff                     0.9.9
safehttpx                0.1.6
safetensors              0.5.2
scikit-learn             1.6.1
scipy                    1.15.2
semantic-version         2.10.0
setuptools               75.8.0
shellingham              1.5.4
six                      1.17.0
sniffio                  1.3.1
soundfile                0.12.1
soxr                     0.5.0.post1
starlette                0.46.0
sympy                    1.13.1
threadpoolctl            3.5.0
tokenizers               0.20.3
tomlkit                  0.13.2
torch                    2.5.1
torchaudio               2.5.1
tqdm                     4.66.5
transformers             4.46.2
triton                   3.1.0
typer                    0.15.2
typing_extensions        4.12.2
tzdata                   2025.1
urllib3                  2.3.0
uvicorn                  0.34.0
websockets               15.0.1
wheel                    0.45.1

At the same time, I also use some other methods to check the protobuf package, which does not exist either.
image

@D34DC3N73R
Copy link

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: expected value at line 1 column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/webui.py", line 260, in <module>
    demo = build_ui(
           ^^^^^^^^^
  File "/app/webui.py", line 97, in build_ui
    model = initialize_model(model_dir, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/webui.py", line 47, in initialize_model
    model = SparkTTS(model_dir, device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cli/SparkTTS.py", line 44, in __init__
    self._initialize_inference()
  File "/app/cli/SparkTTS.py", line 48, in _initialize_inference
    self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    except import_protobuf_decode_error():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message))
ImportError: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding this allows me to run the container after a rebuild

$ cat requirements.txt
einops==0.8.1
einx==0.3.0
numpy==2.2.3
omegaconf==2.3.0
packaging==24.2
safetensors==0.5.2
soundfile==0.12.1
soxr==0.5.0.post1
torch==2.5.1
torchaudio==2.5.1
tqdm==4.66.5
transformers==4.46.2
gradio==5.18.0
fastapi==0.115.11
uvicorn==0.34.0
python-dotenv==1.0.1
protobuf==4.21.12

From within the container

root@d0dad5f76940:/app# pip show protobuf
Name: protobuf
Version: 4.21.12
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.12/site-packages
Requires: 
Required-by: 

@breakstring
Copy link
Author

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: expected value at line 1 column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/webui.py", line 260, in <module>
    demo = build_ui(
           ^^^^^^^^^
  File "/app/webui.py", line 97, in build_ui
    model = initialize_model(model_dir, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/webui.py", line 47, in initialize_model
    model = SparkTTS(model_dir, device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cli/SparkTTS.py", line 44, in __init__
    self._initialize_inference()
  File "/app/cli/SparkTTS.py", line 48, in _initialize_inference
    self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    except import_protobuf_decode_error():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message))
ImportError: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding this allows me to run the container after a rebuild

$ cat requirements.txt
einops==0.8.1
einx==0.3.0
numpy==2.2.3
omegaconf==2.3.0
packaging==24.2
safetensors==0.5.2
soundfile==0.12.1
soxr==0.5.0.post1
torch==2.5.1
torchaudio==2.5.1
tqdm==4.66.5
transformers==4.46.2
gradio==5.18.0
fastapi==0.115.11
uvicorn==0.34.0
python-dotenv==1.0.1
protobuf==4.21.12

From within the container

root@d0dad5f76940:/app# pip show protobuf
Name: protobuf
Version: 4.21.12
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.12/site-packages
Requires: 
Required-by: 

Oops, it's webui part.

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: expected value at line 1 column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/webui.py", line 260, in <module>
    demo = build_ui(
           ^^^^^^^^^
  File "/app/webui.py", line 97, in build_ui
    model = initialize_model(model_dir, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/webui.py", line 47, in initialize_model
    model = SparkTTS(model_dir, device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cli/SparkTTS.py", line 44, in __init__
    self._initialize_inference()
  File "/app/cli/SparkTTS.py", line 48, in _initialize_inference
    self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    except import_protobuf_decode_error():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message))
ImportError: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding this allows me to run the container after a rebuild

$ cat requirements.txt
einops==0.8.1
einx==0.3.0
numpy==2.2.3
omegaconf==2.3.0
packaging==24.2
safetensors==0.5.2
soundfile==0.12.1
soxr==0.5.0.post1
torch==2.5.1
torchaudio==2.5.1
tqdm==4.66.5
transformers==4.46.2
gradio==5.18.0
fastapi==0.115.11
uvicorn==0.34.0
python-dotenv==1.0.1
protobuf==4.21.12

From within the container

root@d0dad5f76940:/app# pip show protobuf
Name: protobuf
Version: 4.21.12
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.12/site-packages
Requires: 
Required-by: 

I'm very sorry, I just packaged the webui part into Docker but didn't test this part of the code, because the webui is the existing code, and I thought it should work fine. I will take some time today to verify it.
Thank you very much for your clarification.

@breakstring
Copy link
Author

image
I just found a clean VM to set up the environment, then completely rebuilt the image and executed your command without encountering the protobuf error you mentioned. The warning in the first line is something I had seen before.

After starting, the corresponding WebUI can also be opened. Of course, there are also some strange issues on the WebUI that cause me to sometimes be able to generate audio and most times not, which is also the reason I repackaged this FastAPI-based WebAPI interface. Gradio is too difficult to use....

@D34DC3N73R
Copy link

You are correct on that. I completely wiped my build cache and downloaded the model fresh from HF and did not receive the error on startup. Sorry for the false report!

@phong-phuong
Copy link

While your intent was to have separate images, one that includes pretrained and a lite one that doens't, the commands here are copying and deleting files in separate layers, which will only add to the filesize.

As a result, the lite image actually contains the pretrained models in the image in earlier layers, twice, one in the /tmp folder, and a second in the final destination.

For reference the pretrained images are around 3.67GB.
Personally, I would completely avoid including the models in the image and let the use mount them to avoid this complexity, and to avoid redundant models in both docker container library and on disk.

Lite image is 17 GB
image

Lite image should be 10 GB:

image

# Copy context
COPY . /tmp/context/  # 1st copy (+3.67GB)

# Check if model directory exists
RUN if [ -d "/tmp/context/pretrained_models" ]; then \
    echo "Found pretrained_models directory"; \
else \
    echo "pretrained_models directory not found"; \
fi

# Decide whether to copy model files based on INCLUDE_MODELS parameter
RUN if [ "${INCLUDE_MODELS}" = "true" ]; then \
    echo "Including models in the image"; \
    if [ -d "/tmp/context/pretrained_models" ]; then \
        cp -r /tmp/context/pretrained_models/* /app/pretrained_models/ || echo "No model files to copy"; \ # 2nd copy (+367GB)
    else \
        echo "Warning: pretrained_models directory not found in build context"; \
    fi; \
else \
    echo "Models will need to be mounted at runtime"; \
fi

# Clean up temporary directory - Comment: # This is run in a separate layer, so it doesn't reduce the image size
RUN rm -rf /tmp/context

@breakstring
Copy link
Author

While your intent was to have separate images, one that includes pretrained and a lite one that doens't, the commands here are copying and deleting files in separate layers, which will only add to the filesize.

As a result, the lite image actually contains the pretrained models in the image in earlier layers, twice, one in the /tmp folder, and a second in the final destination.

For reference the pretrained images are around 3.67GB. Personally, I would completely avoid including the models in the image and let the use mount them to avoid this complexity, and to avoid redundant models in both docker container library and on disk.

Lite image is 17 GB image

Lite image should be 10 GB:

image

# Copy context
COPY . /tmp/context/  # 1st copy (+3.67GB)

# Check if model directory exists
RUN if [ -d "/tmp/context/pretrained_models" ]; then \
    echo "Found pretrained_models directory"; \
else \
    echo "pretrained_models directory not found"; \
fi

# Decide whether to copy model files based on INCLUDE_MODELS parameter
RUN if [ "${INCLUDE_MODELS}" = "true" ]; then \
    echo "Including models in the image"; \
    if [ -d "/tmp/context/pretrained_models" ]; then \
        cp -r /tmp/context/pretrained_models/* /app/pretrained_models/ || echo "No model files to copy"; \ # 2nd copy (+367GB)
    else \
        echo "Warning: pretrained_models directory not found in build context"; \
    fi; \
else \
    echo "Models will need to be mounted at runtime"; \
fi

# Clean up temporary directory - Comment: # This is run in a separate layer, so it doesn't reduce the image size
RUN rm -rf /tmp/context

Thanks for your feedback. I'm in a travel these days, and will check it next week once I have time. @phong-phuong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants