✨feat: WebAPI & Docker #40

breakstring · 2025-03-08T12:22:17Z

Add Spark-TTS Web API with FastAPI implementation
2.Add Docker support for Spark-TTS deployment

- Implement comprehensive FastAPI-based TTS API service - Add API endpoints for text-to-speech with voice cloning and creation - Create example client script for API interaction - Include environment configuration and startup script - Add README with detailed API usage and configuration instructions - Configure .env.example for flexible service setup - Implement file cleanup and output management - Support multiple audio input and output methods

- Create Dockerfile for building Spark-TTS images with flexible model inclusion - Add docker_builder.sh script for easy image building - Implement docker-compose.yml with multiple service configurations - Add .dockerignore to optimize Docker build context - Update README and run_api.sh to support Docker deployment - Configure environment variables and service types for containerized deployment

D34DC3N73R · 2025-03-13T07:36:59Z

Tested this out but I get the following error in startup logs:

ERROR:api.main:Model initialization failed: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding protobuf==4.21.12 to requirements.txt and building again solves the issue.

breakstring · 2025-03-13T09:44:26Z

Tested this out but I get the following error in startup logs:

ERROR:api.main:Model initialization failed: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding protobuf==4.21.12 to requirements.txt and building again solves the issue.

It's very strange, I checked in my own environment and there is no such protobuf package, and there is no such error at runtime(both in docker logs and local running logs).

(sparktts) azureuser@t4-westus2:~/Spark-TTS$ pip list
Package                  Version
------------------------ ------------
accelerate               0.26.0
aiofiles                 23.2.1
annotated-types          0.7.0
antlr4-python3-runtime   4.9.3
anyio                    4.8.0
audioread                3.0.1
certifi                  2025.1.31
cffi                     1.17.1
charset-normalizer       3.4.1
click                    8.1.8
decorator                5.2.1
einops                   0.8.1
einx                     0.3.0
fastapi                  0.115.11
ffmpy                    0.5.0
filelock                 3.17.0
frozendict               2.4.6
fsspec                   2025.2.0
gradio                   5.18.0
gradio_client            1.7.2
h11                      0.14.0
httpcore                 1.0.7
httpx                    0.28.1
huggingface-hub          0.29.2
idna                     3.10
Jinja2                   3.1.6
joblib                   1.4.2
lazy_loader              0.4
librosa                  0.10.2.post1
llvmlite                 0.44.0
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
mpmath                   1.3.0
msgpack                  1.1.0
networkx                 3.4.2
numba                    0.61.0
numpy                    2.1.3
nvidia-cublas-cu12       12.4.5.8
nvidia-cuda-cupti-cu12   12.4.127
nvidia-cuda-nvrtc-cu12   12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.1.3
nvidia-curand-cu12       10.3.5.147
nvidia-cusolver-cu12     11.6.1.9
nvidia-cusparse-cu12     12.3.1.170
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.4.127
omegaconf                2.3.0
orjson                   3.10.15
packaging                24.2
pandas                   2.2.3
pillow                   11.1.0
pip                      25.0
platformdirs             4.3.6
pooch                    1.8.2
psutil                   7.0.0
pycparser                2.22
pydantic                 2.10.6
pydantic_core            2.27.2
pydub                    0.25.1
Pygments                 2.19.1
python-dateutil          2.9.0.post0
python-dotenv            1.0.1
python-multipart         0.0.20
pytz                     2025.1
PyYAML                   6.0.2
regex                    2024.11.6
requests                 2.32.3
rich                     13.9.4
ruff                     0.9.9
safehttpx                0.1.6
safetensors              0.5.2
scikit-learn             1.6.1
scipy                    1.15.2
semantic-version         2.10.0
setuptools               75.8.0
shellingham              1.5.4
six                      1.17.0
sniffio                  1.3.1
soundfile                0.12.1
soxr                     0.5.0.post1
starlette                0.46.0
sympy                    1.13.1
threadpoolctl            3.5.0
tokenizers               0.20.3
tomlkit                  0.13.2
torch                    2.5.1
torchaudio               2.5.1
tqdm                     4.66.5
transformers             4.46.2
triton                   3.1.0
typer                    0.15.2
typing_extensions        4.12.2
tzdata                   2025.1
urllib3                  2.3.0
uvicorn                  0.34.0
websockets               15.0.1
wheel                    0.45.1

At the same time, I also use some other methods to check the protobuf package, which does not exist either.

D34DC3N73R · 2025-03-13T23:48:27Z

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: expected value at line 1 column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/webui.py", line 260, in <module>
    demo = build_ui(
           ^^^^^^^^^
  File "/app/webui.py", line 97, in build_ui
    model = initialize_model(model_dir, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/webui.py", line 47, in initialize_model
    model = SparkTTS(model_dir, device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cli/SparkTTS.py", line 44, in __init__
    self._initialize_inference()
  File "/app/cli/SparkTTS.py", line 48, in _initialize_inference
    self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    except import_protobuf_decode_error():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message))
ImportError: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding this allows me to run the container after a rebuild

$ cat requirements.txt
einops==0.8.1
einx==0.3.0
numpy==2.2.3
omegaconf==2.3.0
packaging==24.2
safetensors==0.5.2
soundfile==0.12.1
soxr==0.5.0.post1
torch==2.5.1
torchaudio==2.5.1
tqdm==4.66.5
transformers==4.46.2
gradio==5.18.0
fastapi==0.115.11
uvicorn==0.34.0
python-dotenv==1.0.1
protobuf==4.21.12

From within the container

root@d0dad5f76940:/app# pip show protobuf
Name: protobuf
Version: 4.21.12
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.12/site-packages
Requires: 
Required-by:

breakstring · 2025-03-14T02:09:05Z

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: expected value at line 1 column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/webui.py", line 260, in <module>
    demo = build_ui(
           ^^^^^^^^^
  File "/app/webui.py", line 97, in build_ui
    model = initialize_model(model_dir, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/webui.py", line 47, in initialize_model
    model = SparkTTS(model_dir, device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cli/SparkTTS.py", line 44, in __init__
    self._initialize_inference()
  File "/app/cli/SparkTTS.py", line 48, in _initialize_inference
    self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    except import_protobuf_decode_error():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message))
ImportError: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding this allows me to run the container after a rebuild

$ cat requirements.txt
einops==0.8.1
einx==0.3.0
numpy==2.2.3
omegaconf==2.3.0
packaging==24.2
safetensors==0.5.2
soundfile==0.12.1
soxr==0.5.0.post1
torch==2.5.1
torchaudio==2.5.1
tqdm==4.66.5
transformers==4.46.2
gradio==5.18.0
fastapi==0.115.11
uvicorn==0.34.0
python-dotenv==1.0.1
protobuf==4.21.12

From within the container

root@d0dad5f76940:/app# pip show protobuf
Name: protobuf
Version: 4.21.12
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.12/site-packages
Requires: 
Required-by:

Oops, it's webui part.

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: expected value at line 1 column 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/webui.py", line 260, in <module>
    demo = build_ui(
           ^^^^^^^^^
  File "/app/webui.py", line 97, in build_ui
    model = initialize_model(model_dir, device=device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/webui.py", line 47, in initialize_model
    model = SparkTTS(model_dir, device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cli/SparkTTS.py", line 44, in __init__
    self._initialize_inference()
  File "/app/cli/SparkTTS.py", line 48, in _initialize_inference
    self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM")
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained
    except import_protobuf_decode_error():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message))
ImportError: 
 requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

Adding this allows me to run the container after a rebuild

$ cat requirements.txt
einops==0.8.1
einx==0.3.0
numpy==2.2.3
omegaconf==2.3.0
packaging==24.2
safetensors==0.5.2
soundfile==0.12.1
soxr==0.5.0.post1
torch==2.5.1
torchaudio==2.5.1
tqdm==4.66.5
transformers==4.46.2
gradio==5.18.0
fastapi==0.115.11
uvicorn==0.34.0
python-dotenv==1.0.1
protobuf==4.21.12

From within the container

root@d0dad5f76940:/app# pip show protobuf
Name: protobuf
Version: 4.21.12
Summary: 
Home-page: https://developers.google.com/protocol-buffers/
Author: [email protected]
Author-email: [email protected]
License: 3-Clause BSD License
Location: /usr/local/lib/python3.12/site-packages
Requires: 
Required-by:

I'm very sorry, I just packaged the webui part into Docker but didn't test this part of the code, because the webui is the existing code, and I thought it should work fine. I will take some time today to verify it.
Thank you very much for your clarification.

breakstring · 2025-03-14T03:58:22Z

I just found a clean VM to set up the environment, then completely rebuilt the image and executed your command without encountering the protobuf error you mentioned. The warning in the first line is something I had seen before.

After starting, the corresponding WebUI can also be opened. Of course, there are also some strange issues on the WebUI that cause me to sometimes be able to generate audio and most times not, which is also the reason I repackaged this FastAPI-based WebAPI interface. Gradio is too difficult to use....

D34DC3N73R · 2025-03-14T06:53:06Z

You are correct on that. I completely wiped my build cache and downloaded the model fresh from HF and did not receive the error on startup. Sorry for the false report!

phong-phuong · 2025-03-21T22:51:04Z

While your intent was to have separate images, one that includes pretrained and a lite one that doens't, the commands here are copying and deleting files in separate layers, which will only add to the filesize.

As a result, the lite image actually contains the pretrained models in the image in earlier layers, twice, one in the /tmp folder, and a second in the final destination.

For reference the pretrained images are around 3.67GB.
Personally, I would completely avoid including the models in the image and let the use mount them to avoid this complexity, and to avoid redundant models in both docker container library and on disk.

Lite image is 17 GB

Lite image should be 10 GB:

# Copy context
COPY . /tmp/context/  # 1st copy (+3.67GB)

# Check if model directory exists
RUN if [ -d "/tmp/context/pretrained_models" ]; then \
    echo "Found pretrained_models directory"; \
else \
    echo "pretrained_models directory not found"; \
fi

# Decide whether to copy model files based on INCLUDE_MODELS parameter
RUN if [ "${INCLUDE_MODELS}" = "true" ]; then \
    echo "Including models in the image"; \
    if [ -d "/tmp/context/pretrained_models" ]; then \
        cp -r /tmp/context/pretrained_models/* /app/pretrained_models/ || echo "No model files to copy"; \ # 2nd copy (+367GB)
    else \
        echo "Warning: pretrained_models directory not found in build context"; \
    fi; \
else \
    echo "Models will need to be mounted at runtime"; \
fi

# Clean up temporary directory - Comment: # This is run in a separate layer, so it doesn't reduce the image size
RUN rm -rf /tmp/context

breakstring · 2025-03-22T00:34:43Z

While your intent was to have separate images, one that includes pretrained and a lite one that doens't, the commands here are copying and deleting files in separate layers, which will only add to the filesize.

As a result, the lite image actually contains the pretrained models in the image in earlier layers, twice, one in the /tmp folder, and a second in the final destination.

For reference the pretrained images are around 3.67GB. Personally, I would completely avoid including the models in the image and let the use mount them to avoid this complexity, and to avoid redundant models in both docker container library and on disk.

Lite image is 17 GB

Lite image should be 10 GB:
# Copy context
COPY . /tmp/context/  # 1st copy (+3.67GB)

# Check if model directory exists
RUN if [ -d "/tmp/context/pretrained_models" ]; then \
    echo "Found pretrained_models directory"; \
else \
    echo "pretrained_models directory not found"; \
fi

# Decide whether to copy model files based on INCLUDE_MODELS parameter
RUN if [ "${INCLUDE_MODELS}" = "true" ]; then \
    echo "Including models in the image"; \
    if [ -d "/tmp/context/pretrained_models" ]; then \
        cp -r /tmp/context/pretrained_models/* /app/pretrained_models/ || echo "No model files to copy"; \ # 2nd copy (+367GB)
    else \
        echo "Warning: pretrained_models directory not found in build context"; \
    fi; \
else \
    echo "Models will need to be mounted at runtime"; \
fi

# Clean up temporary directory - Comment: # This is run in a separate layer, so it doesn't reduce the image size
RUN rm -rf /tmp/context

Thanks for your feedback. I'm in a travel these days, and will check it next week once I have time. @phong-phuong

breakstring added 3 commits March 7, 2025 15:37

Add CLI module for Spark-TTS

29b2727

breakstring mentioned this pull request Mar 8, 2025

有docker吗 #37

Open

Merge branch 'SparkAudio:main' into main

cfc3b80

breakstring mentioned this pull request Mar 18, 2025

可以api调用吗？ #112

Open

Merge branch 'SparkAudio:main' into main

2ffdf43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨feat: WebAPI & Docker #40

✨feat: WebAPI & Docker #40

Uh oh!

breakstring commented Mar 8, 2025

Uh oh!

D34DC3N73R commented Mar 13, 2025

Uh oh!

breakstring commented Mar 13, 2025

Uh oh!

D34DC3N73R commented Mar 13, 2025

Uh oh!

breakstring commented Mar 14, 2025

Uh oh!

breakstring commented Mar 14, 2025

Uh oh!

D34DC3N73R commented Mar 14, 2025

Uh oh!

phong-phuong commented Mar 21, 2025

Uh oh!

breakstring commented Mar 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

✨feat: WebAPI & Docker #40

Are you sure you want to change the base?

✨feat: WebAPI & Docker #40

Uh oh!

Conversation

breakstring commented Mar 8, 2025

Uh oh!

D34DC3N73R commented Mar 13, 2025

Uh oh!

breakstring commented Mar 13, 2025

Uh oh!

D34DC3N73R commented Mar 13, 2025

Uh oh!

breakstring commented Mar 14, 2025

Uh oh!

breakstring commented Mar 14, 2025

Uh oh!

D34DC3N73R commented Mar 14, 2025

Uh oh!

phong-phuong commented Mar 21, 2025

Uh oh!

breakstring commented Mar 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants