Skip to content

Injecting or disrupting cadence in tts output #5

@chris-english

Description

@chris-english

Single spaces in text to be transformed to speech can have two impacts that may result from this. Under voice clone, pitch might increase due to the 'time of a space' aspect when translating to speech.
(This can be adjusted with ffmpeg if built with rubberband, though this query hopes to avoid this additional post-processing step.)
The uniformity of single spacing between words also seems unnatural to the ear. Several experiments suggest that one can only increase the number of spaces between words from 1 -> 2 on the text to be processed side. At 3 and above, the output becomes chaotic (single letters pronounced interspersed between whole word produced). The following work on the text to be sent through tts process, limited to changing from 1->2 spaces in some instances:

cadence_for2 <- (sent) { # sentence(s), something happens on 'some' of the spaces in, a paragraph
space_where <- unlist(gregexpr(' ', sent) )
sent_split <- unlist(strsplit(sent, split = ''))
space_mod3 <- space_where[which(space_where %% 3 == 0)]
for (k in 1:length(space_mod3) ) {
sent_split[space_mod3[k]] <- sub(' ', ' ', sent_split[space_mod3[k]]) # swap to 2 spaces
}
sent_for2 <- paste0(sent_split, collapse = '')
return(sent_for2)
}
then use returned text object in tts_chunked

cadence2 <- (sent) { # sentence(s), a replacement is possible upon every space encountered
spaces <- vector(mode = 'character', length = 2)
spaces <- c(' ',' ') # 1 or 2
space_where <- unlist(gregexpr(' ', sent) )
num_space <- length(unlist(gregexpr(' ', sent) ) )
set.seed(42)
inter <- suppressWarnings(rbind(unlist(strsplit(sent, split = ' ') ), sample(spaces, num_space, replace = TRUE) ) )
attributes(inter) <- NULL # interleave, pads ending, useful tts_chunked for between sentence space
sent_cad2 <- paste0(inter, collapse = '')
return(sent_cad2)
}
then use returned text object in tts_chunked

Are there model side variants of the special character [SPACE] that could be invoked to introduce more variability in interword spacing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions