Single spaces in text to be transformed to speech can have two impacts that may result from this. Under voice clone, pitch might increase due to the 'time of a space' aspect when translating to speech.
(This can be adjusted with ffmpeg if built with rubberband, though this query hopes to avoid this additional post-processing step.)
The uniformity of single spacing between words also seems unnatural to the ear. Several experiments suggest that one can only increase the number of spaces between words from 1 -> 2 on the text to be processed side. At 3 and above, the output becomes chaotic (single letters pronounced interspersed between whole word produced). The following work on the text to be sent through tts process, limited to changing from 1->2 spaces in some instances:
cadence_for2 <- (sent) { # sentence(s), something happens on 'some' of the spaces in, a paragraph
space_where <- unlist(gregexpr(' ', sent) )
sent_split <- unlist(strsplit(sent, split = ''))
space_mod3 <- space_where[which(space_where %% 3 == 0)]
for (k in 1:length(space_mod3) ) {
sent_split[space_mod3[k]] <- sub(' ', ' ', sent_split[space_mod3[k]]) # swap to 2 spaces
}
sent_for2 <- paste0(sent_split, collapse = '')
return(sent_for2)
}
then use returned text object in tts_chunked
cadence2 <- (sent) { # sentence(s), a replacement is possible upon every space encountered
spaces <- vector(mode = 'character', length = 2)
spaces <- c(' ',' ') # 1 or 2
space_where <- unlist(gregexpr(' ', sent) )
num_space <- length(unlist(gregexpr(' ', sent) ) )
set.seed(42)
inter <- suppressWarnings(rbind(unlist(strsplit(sent, split = ' ') ), sample(spaces, num_space, replace = TRUE) ) )
attributes(inter) <- NULL # interleave, pads ending, useful tts_chunked for between sentence space
sent_cad2 <- paste0(inter, collapse = '')
return(sent_cad2)
}
then use returned text object in tts_chunked
Are there model side variants of the special character [SPACE] that could be invoked to introduce more variability in interword spacing?
Single spaces in text to be transformed to speech can have two impacts that may result from this. Under voice clone, pitch might increase due to the 'time of a space' aspect when translating to speech.
(This can be adjusted with
ffmpegif built withrubberband, though this query hopes to avoid this additional post-processing step.)The uniformity of single spacing between words also seems unnatural to the ear. Several experiments suggest that one can only increase the number of spaces between words from 1 -> 2 on the text to be processed side. At 3 and above, the output becomes chaotic (single letters pronounced interspersed between whole word produced). The following work on the text to be sent through tts process, limited to changing from 1->2 spaces in some instances:
cadence_for2 <- (sent) { # sentence(s), something happens on 'some' of the spaces in, a paragraph
space_where <- unlist(gregexpr(' ', sent) )
sent_split <- unlist(strsplit(sent, split = ''))
space_mod3 <- space_where[which(space_where %% 3 == 0)]
for (k in 1:length(space_mod3) ) {
sent_split[space_mod3[k]] <- sub(' ', ' ', sent_split[space_mod3[k]]) # swap to 2 spaces
}
sent_for2 <- paste0(sent_split, collapse = '')
return(sent_for2)
}
then use returned text object in
tts_chunkedcadence2 <- (sent) { # sentence(s), a replacement is possible upon every space encountered
spaces <- vector(mode = 'character', length = 2)
spaces <- c(' ',' ') # 1 or 2
space_where <- unlist(gregexpr(' ', sent) )
num_space <- length(unlist(gregexpr(' ', sent) ) )
set.seed(42)
inter <- suppressWarnings(rbind(unlist(strsplit(sent, split = ' ') ), sample(spaces, num_space, replace = TRUE) ) )
attributes(inter) <- NULL # interleave, pads ending, useful tts_chunked for between sentence space
sent_cad2 <- paste0(inter, collapse = '')
return(sent_cad2)
}
then use returned text object in
tts_chunkedAre there model side variants of the special character [SPACE] that could be invoked to introduce more variability in interword spacing?