You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While exploring NVDA source and the TTS-engine side of the SAPI 5.4 API, I realized that screen readers send much more than basic speech strings to be spoken by the TTS. In the case of SAPI, NVDA sends SSML, which ISpTTSEngine::Speak receives in SVPSTATE.
Metadata includes:
LangID - the language associated with the whole or section of an announcement, which the TTS can use to adjust vocalization. We could use this to test that NVDA correctly processes a multi-language web page
EmphAdj - Not sure this is used, but presumably could ensure that <em> semantics are picked up and conveyed by the screen reader
There's also SPVACTIONS, which include SPVA_Pronounce and SPVA_SpellOut. I think NVDA provides it's own spelling functionality, but does appear to use <pron>
Should "observe spoken text" should include this level of detail?
Technical speech observation solution scopes to a particular TTS API
I realized looking through NVDA's source that it has many synthDrivers, currently for SAPI 4, SAPI 5, OneCore, and eSpeak. Our SAPI 5 driver only tests NVDA's code path for SAPI 5.
Is it worth documenting this... tradeoff?
Pragmatically, I think the chance of finding a bug in a specific TTS driver is low, and finding a comprehensive solution probably isn't worth the effort. That said, the drivers do have some complexity. synthDrivers/oneCore.py maintains its own queue. All 3 have different SSML algorithms (looking at commit history, espeak seems to allow malformed SSML while OneCore rejects it).
The text was updated successfully, but these errors were encountered:
@WestonThayer This is great information; thanks for carrying out the research and writing it up.
Keep in mind that a virtual system-level (i.e. SAPI5 on Windows) engine is only one of the paths that will be investigated going forward. It is likely that screen-reader-specific code will also be needed to implement parts of the automation driver protocol, and such in-process facilities may also involve capturing the speech before it even leaves the screen reader's boundaries, e.g. with a "tee"-like synth driver to allow speech to be captured while also speaking it out loud for developers and/or testers. That would make use of similar things to what you've outlined here, albeit SR-specific internal ones, e.g. NVDA's formatting/command fields.
https://github.com/bocoup/aria-at-automation#observe-spoken-text
Scoping speech metadata sent to the TTS
While exploring NVDA source and the TTS-engine side of the SAPI 5.4 API, I realized that screen readers send much more than basic speech strings to be spoken by the TTS. In the case of SAPI, NVDA sends SSML, which
ISpTTSEngine::Speak
receives in SVPSTATE.Metadata includes:
LangID
- the language associated with the whole or section of an announcement, which the TTS can use to adjust vocalization. We could use this to test that NVDA correctly processes a multi-language web pageEmphAdj
- Not sure this is used, but presumably could ensure that<em>
semantics are picked up and conveyed by the screen readerPitchAdj
- Could test that NVDA is correctly increasing pitch for capital lettersSilenceMSecs
- Via the SSML<silence>
tag, NVDA inserts this forBreakCommand
s. Could be used to test appropriate cadanceSPVA_Pronounce
andSPVA_SpellOut
. I think NVDA provides it's own spelling functionality, but does appear to use<pron>
Should "observe spoken text" should include this level of detail?
Technical speech observation solution scopes to a particular TTS API
I realized looking through NVDA's source that it has many synthDrivers, currently for SAPI 4, SAPI 5, OneCore, and eSpeak. Our SAPI 5 driver only tests NVDA's code path for SAPI 5.
Is it worth documenting this... tradeoff?
Pragmatically, I think the chance of finding a bug in a specific TTS driver is low, and finding a comprehensive solution probably isn't worth the effort. That said, the drivers do have some complexity. synthDrivers/oneCore.py maintains its own queue. All 3 have different SSML algorithms (looking at commit history, espeak seems to allow malformed SSML while OneCore rejects it).
The text was updated successfully, but these errors were encountered: