Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ chatterbox
| `generate(model, text, voice)` | Generate speech |
| `create_voice_embedding(model, audio)` | Create speaker embedding |
| `tts_chunked(model, text, voice)` | Long texts, sentence-chunked, gc per chunk |
| `chatterbox_defaults()` | Per-card setup: GC options + backend + chunking thresholds |
| `chatterbox_gc_options()` | Print torch GC settings for this GPU (set before torch loads) |
| `quick_tts(text, ref_audio, output)` | One-liner convenience (loads whole model per call) |

Expand Down Expand Up @@ -694,8 +695,12 @@ See `vignettes/performance.md` for the full story. Two facts dominate:
| lean eager R (ATen builtins, no nn_module) | 71 | proves the per-op R call is the cost, not wrapper style |

End-to-end long-form (~20s audio): jit ~6s wall vs container ~6s -
container parity. On 6GB hardware (GTX 1660 Ti, rate 0.75): traced
88-94, pure R 300-360; jit not yet validated there.
container parity. On 6GB hardware (GTX 1660 Ti, rate 0.75, June 2026):
jit 35-38 ms/token (4.7GB peak) vs container 30 - the fastest native
path there too;
traced 88-94 but its 350-position cache truncates long-form at ~120
tokens; pure R 254-287. `chatterbox_defaults()` returns the per-card
setup (GC tier + backend + chunking).

### Architecture note: pure R package since June 2026

Expand All @@ -722,10 +727,13 @@ There is no `useDynLib` and no compiled code.

### When to Use What

- jit + tuned GC: default on any GPU.
- jit + tuned GC: default on any GPU (fastest native path on both
measured cards).
- Container: production deployments via tts.api/gpu.ctl.
- Traced: long-running sessions, short utterances.
- Traced: niche - short utterances only (350-position cache cap).
- Pure R: debugging, CPU-only.
- `chatterbox_defaults()`: detects the card, returns GC options +
backend + chunking thresholds as one pasteable snippet.
## Related

- Alternative to tts.api container backend for local TTS (no Docker required)
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: chatterbox
Title: Text-to-Speech Using Chatterbox TTS Engine
Version: 0.1.0.8
Version: 0.1.0.9
Authors@R:
c(person("Troy", "Hernandez", role = c("aut", "cre"),
email = "troy@cornball.ai",
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# tinyrox says don't edit this manually, but it can't stop you!

export(chatterbox)
export(chatterbox_defaults)
export(chatterbox_gc_options)
export(compute_mel_spectrogram)
export(compute_mel_spectrogram_ve)
Expand Down Expand Up @@ -46,5 +47,6 @@ export(voice_convert)
export(write_audio)

S3method(print,chatterbox)
S3method(print,chatterbox_defaults)
S3method(print,chatterbox_gc_options)
S3method(print,voice_embedding)
9 changes: 9 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
# chatterbox 0.1.0.9 (development)

- New `chatterbox_defaults()`: detects the GPU and returns the full
recommended setup (GC options, backend, token budget, chunking
threshold) as a pasteable snippet.
- 6GB hardware validation: jit measures 35-38 ms/token vs container 30;
per-card guidance updated (jit is the fastest native backend on every
measured card).

# chatterbox 0.1.0.8 (development)

- New `generate_batch()`: several texts, one batched S3Gen synthesis
Expand Down
149 changes: 149 additions & 0 deletions R/defaults.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Hardware-adaptive defaults: GC settings, backend, and chunking
# thresholds per detected GPU/CPU. Measured tiers: 16 GB (RTX 5060 Ti)
# and 6 GB (GTX 1660 Ti); 8/12 GB projected from the tier rule.

#' Recommended chatterbox settings for this machine
#'
#' Detects the GPU (or its absence) and returns everything worth setting
#' for it: the torch GC options (which must be set BEFORE torch loads -
#' see \code{\link{chatterbox_gc_options}} for why), the fastest
#' validated backend, the per-call token budget, and when to switch to
#' \code{\link{tts_chunked}}. Printing the result shows a ready-to-paste
#' setup snippet.
#'
#' Measured tiers (long-form, tuned GC): 16 GB RTX 5060 Ti - jit
#' 11 ms/token, container parity; 6 GB GTX 1660 Ti - jit 35-38 ms/token
#' vs container 30, in 4.7 GB VRAM. The 8 and 12 GB tiers are projected
#' from the rule (the GC trigger line must clear the ~4.6 GB loaded
#' model) and marked as such when printed.
#'
#' @param vram_gb Total GPU memory in GB. Default: detected via
#' nvidia-smi; 0 (or detection failure) means CPU-only. Cards under
#' 5 GB are treated as CPU: the loaded model alone needs ~4.6 GB.
#' @return An object of class \code{"chatterbox_defaults"}: a list with
#' \code{device}, \code{vram_gb}, \code{options} (for
#' \code{do.call(options, ...)} before torch loads), \code{backend},
#' \code{max_new_tokens}, \code{chunk_chars}, and \code{measured}.
#' @examples
#' chatterbox_defaults(vram_gb = 6)
#' chatterbox_defaults(vram_gb = 0) # CPU
#' @export
chatterbox_defaults <- function(vram_gb = NULL) {
if (is.null(vram_gb)) {
smi <- suppressWarnings(tryCatch(
system2("nvidia-smi",
c("--query-gpu=memory.total",
"--format=csv,noheader,nounits"),
stdout = TRUE, stderr = FALSE),
error = function(e) character(0)
))
vram_gb <- if (length(smi) >= 1 && nzchar(smi[1]) &&
!is.na(suppressWarnings(as.numeric(smi[1])))) {
round(as.numeric(smi[1]) / 1024, 1)
} else {
0
}
}

if (vram_gb < 5) {
# CPU, or a card too small to be supported: the loaded model
# floor is ~4.6 GB and the measured 6 GB peak was 4.7 GB, so
# anything under 5 GB cannot run the CUDA path. The CUDA
# allocator knobs are irrelevant; only the CPU allocation
# odometer exists, and it measured as minor.
out <- list(
device = "cpu",
vram_gb = vram_gb,
options = list(),
backend = "r",
max_new_tokens = 1000L,
chunk_chars = 200L,
measured = FALSE
)
} else {
rate <- if (vram_gb <= 6.5) 0.75 else if (vram_gb <= 10) 0.6 else 0.5
out <- list(
device = "cuda",
vram_gb = vram_gb,
options = list(torch.cuda_allocator_reserved_rate = rate),
backend = "jit",
max_new_tokens = 1000L,
chunk_chars = 200L,
# Measured tiers: a 6 GB card (GTX 1660 Ti) and a
# 16 GB card (RTX 5060 Ti); near-miss sizes (5-5.5,
# 7-13) are projections of those measurements
measured = (vram_gb > 5.5 && vram_gb <= 6.5) ||
vram_gb >= 14
)
}

if (isNamespaceLoaded("torch") && length(out$options) > 0) {
warning("torch is already initialized in this session; the GC ",
"options take effect only in a fresh R session that sets ",
"them before torch loads.", call. = FALSE)
}

structure(out, class = "chatterbox_defaults")
}

#' Print method for chatterbox_defaults
#'
#' @param x Object from \code{\link{chatterbox_defaults}}
#' @param ... Ignored
#' @return \code{x}, invisibly
#' @export
print.chatterbox_defaults <- function(x, ...) {
if (x$device == "cpu") {
cat("CPU-only setup (no usable GPU detected).\n\n",
" library(chatterbox)\n",
" model <- load_chatterbox(chatterbox(\"cpu\"))\n\n",
"Use backend = \"r\". Expect minutes per utterance; for\n",
"anything longer than a sentence or two, use tts_chunked()\n",
"so audio arrives incrementally.\n", sep = "")
return(invisible(x))
}

if (isTRUE(x$measured)) {
tier <- "measured"
} else {
tier <- "projected"
}
rate <- x$options$torch.cuda_allocator_reserved_rate
cat(sprintf("Recommended for a %s GB GPU (%s tier) - put the\n",
format(x$vram_gb), tier))
cat("options() line in .Rprofile or at the top of your script,\n")
cat("BEFORE torch loads:\n\n")
cat(sprintf(" options(torch.cuda_allocator_reserved_rate = %.2f)\n",
rate))
cat(" library(chatterbox)\n")
cat(" model <- load_chatterbox(chatterbox(\"cuda\"))\n")
cat(sprintf(
" result <- generate(model, text, voice, backend = \"%s\")\n\n",
x$backend))
cat(sprintf(
"Per call, up to max_new_tokens = %d (~40 s of audio). For\n",
x$max_new_tokens))
cat(sprintf(
"longer texts use tts_chunked() (sentence chunks, ~%d chars,\n",
x$chunk_chars))
cat("one gc() per chunk). In your own batch loops, call gc() after\n")
cat("each generate().\n")

if (x$vram_gb <= 6.5) {
cat("\nNote: on a ", format(x$vram_gb), " GB card the model floor",
" leaves little headroom,\nso the 0.8 backstop still fires",
" some collections. Measured on a\nGTX 1660 Ti: jit",
" 35-38 ms/token (~4.7 GB peak) vs container 30;\npure R",
" ~10x slower. Do NOT lower",
" torch.cuda_allocator_allocated_rate\nhere - 60% of a small",
" card sits below the model floor and recreates\nthe",
" constant-collection regime.\n", sep = "")
} else if (x$vram_gb >= 8) {
cat("\nOptional, to hold the VRAM plateau lower (e.g. shared",
" GPUs), at\nno speed cost:\n\n",
" options(torch.cuda_allocator_allocated_rate = 0.6)\n",
sep = "")
}

invisible(x)
}
11 changes: 5 additions & 6 deletions R/gc_options.R
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,11 @@ print.chatterbox_gc_options <- function(x, ...) {
if (vram_gb <= 6.5) {
cat("\nNote: on a ", vram_gb, " GB card the model floor leaves",
" little headroom, so the\n0.8 backstop still fires some",
" collections: expect ~3-5x from tuning for\npure R, not",
" the ~10x larger cards see. traced = TRUE measured",
" fastest\non 6 GB hardware (88-94 ms/token, ~5 GB peak -",
" tight but it fits).\nDo NOT lower allocated_rate here -",
" 60% of a small card sits below\nthe model floor and",
" recreates the constant-collection regime.\n", sep = "")
" collections. backend = \"jit\" measured\nfastest on 6 GB",
" hardware (35-38 ms/token, ~4.7 GB peak, vs the\n",
"container's 30). Do NOT lower allocated_rate here - 60%",
" of a small\ncard sits below the model floor and recreates",
" the\nconstant-collection regime.\n", sep = "")
}

invisible(x)
Expand Down
2 changes: 1 addition & 1 deletion R/s3gen.R
Original file line number Diff line number Diff line change
Expand Up @@ -1122,7 +1122,7 @@ s3gen <- torch::nn_module(
if (!is.null(speech_token_lens)) {
gen_mel_lens <- (speech_token_len * 2L)$to(dtype = torch::torch_long())
gen_mask <- (!make_pad_mask(gen_mel_lens,
max_len = output_mels$size(3)))$unsqueeze(2)$to(
max_len = output_mels$size(3)))$unsqueeze(2)$to(
dtype = output_mels$dtype, device = output_mels$device)
output_mels <- output_mels * gen_mask
}
Expand Down
47 changes: 47 additions & 0 deletions inst/tinytest/test_defaults.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# chatterbox_defaults tier logic (no GPU or weights needed)

d6 <- chatterbox::chatterbox_defaults(vram_gb = 6)
expect_inherits(d6, "chatterbox_defaults")
expect_identical(d6$device, "cuda")
expect_equal(d6$options$torch.cuda_allocator_reserved_rate, 0.75)
expect_identical(d6$backend, "jit")
expect_true(d6$measured)

d8 <- chatterbox::chatterbox_defaults(vram_gb = 8)
expect_equal(d8$options$torch.cuda_allocator_reserved_rate, 0.6)
expect_false(d8$measured)

d12 <- chatterbox::chatterbox_defaults(vram_gb = 12)
expect_equal(d12$options$torch.cuda_allocator_reserved_rate, 0.5)
expect_false(d12$measured)

d16 <- chatterbox::chatterbox_defaults(vram_gb = 16)
expect_equal(d16$options$torch.cuda_allocator_reserved_rate, 0.5)
expect_true(d16$measured)

dcpu <- chatterbox::chatterbox_defaults(vram_gb = 0)
expect_identical(dcpu$device, "cpu")
expect_identical(dcpu$backend, "r")
expect_identical(dcpu$options, list())

# cards under 5 GB cannot hold the ~4.6 GB model: treated as CPU
expect_identical(chatterbox::chatterbox_defaults(vram_gb = 2)$device, "cpu")
expect_identical(chatterbox::chatterbox_defaults(vram_gb = 4)$device, "cpu")
expect_identical(chatterbox::chatterbox_defaults(vram_gb = 4.9)$device, "cpu")

# 5-5.5 GB runs CUDA but is a projection, not the measured 6 GB tier
d5 <- chatterbox::chatterbox_defaults(vram_gb = 5)
expect_identical(d5$device, "cuda")
expect_false(d5$measured)

# 13 GB sits between measured tiers: projected
expect_false(chatterbox::chatterbox_defaults(vram_gb = 13)$measured)

# print methods run and return invisibly
expect_stdout(print(d6), "jit")
expect_stdout(print(d6), "0.75")
expect_stdout(print(d8), "projected")
expect_stdout(print(dcpu), "CPU-only")

# the GC option is applicable directly
expect_silent(do.call(options, d16$options))
38 changes: 38 additions & 0 deletions man/chatterbox_defaults.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
% tinyrox says don't edit this manually, but it can't stop you!
\name{chatterbox_defaults}
\alias{chatterbox_defaults}
\title{Recommended chatterbox settings for this machine}
\usage{
chatterbox_defaults(vram_gb = NULL)
}
\arguments{
\item{vram_gb}{Total GPU memory in GB. Default: detected via
nvidia-smi; 0 (or detection failure) means CPU-only. Cards under
5 GB are treated as CPU: the loaded model alone needs ~4.6 GB.}
}
\value{
An object of class \code{"chatterbox_defaults"}: a list with
\code{device}, \code{vram_gb}, \code{options} (for
\code{do.call(options, ...)} before torch loads), \code{backend},
\code{max_new_tokens}, \code{chunk_chars}, and \code{measured}.
}
\description{
Detects the GPU (or its absence) and returns everything worth setting
for it: the torch GC options (which must be set BEFORE torch loads -
see \code{\link{chatterbox_gc_options}} for why), the fastest
validated backend, the per-call token budget, and when to switch to
\code{\link{tts_chunked}}. Printing the result shows a ready-to-paste
setup snippet.
}
\details{
Measured tiers (long-form, tuned GC): 16 GB RTX 5060 Ti - jit
11 ms/token, container parity; 6 GB GTX 1660 Ti - jit 35-38 ms/token
vs container 30, in 4.7 GB VRAM. The 8 and 12 GB tiers are projected
from the rule (the GC trigger line must clear the ~4.6 GB loaded
model) and marked as such when printed.

}
\examples{
chatterbox_defaults(vram_gb = 6)
chatterbox_defaults(vram_gb = 0) # CPU
}
18 changes: 18 additions & 0 deletions man/print.chatterbox_defaults.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
% tinyrox says don't edit this manually, but it can't stop you!
\name{print.chatterbox_defaults}
\alias{print.chatterbox_defaults}
\title{Print method for chatterbox_defaults}
\usage{
\method{print}{chatterbox_defaults}(x, ...)
}
\arguments{
\item{x}{Object from \code{\link{chatterbox_defaults}}}

\item{...}{Ignored}
}
\value{
\code{x}, invisibly
}
\description{
Print method for chatterbox_defaults
}
Loading