-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contribution Request: Offline speech recognition #11
Comments
Willing to do what i can to help! :) |
Excellent thanks for the offer, what skills do you posses? |
PHP, HTML, JavaScript, Java, C#.... I graduated at a Secondary Technical College in Engeneering in Austria. Working as Sysadmin. Advanced Linux skills. Never worked with Node.js before but basic knowlege. |
Okay that's perfect, have you ever worked with PocketSphinx or CMUSphinx before? |
@patrickjquinn Which module is currently providing the speech recognition? |
@patrickjquinn No but looks interesting! And im willing to study this shit xD @Marak |
Ah the person behind say! I'm using your fantastic module for the RasPi client! At the moment, it's done using online APIs ( Google cloud speech and Wit.ai) and node-record-lpc16 to handle speech recognition on the clients. I've experimented with PocketSphinx but found it to be...too unreliable. Hence my desire to build something more fit-for task that can be trained dynamically and manually by the community. Open source SST would be a massive coo for the open source community working on projects such as this. Think you might be able to help out? |
@developingUnicorn excellent :) well I'd suggest you try and get https://github.com/cmusphinx/node-pocketsphinx or https://syl22-00.github.io/pocketsphinx.js/ (both JavaScript bindings for PocketSphinx) recognising speech locally that should be all the research you'll need :) you can contact me via the projects Gitter https://gitter.im/P-Brain/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge |
In order to kick this off, i've started a new project; Luther https://github.com/patrickjquinn/Luther (i.e Martin Luther King, i.e Free speech), initially just containing a giant text file of 450+k english words. I'll expand this to a giant list of english sentences, popular musicians and slang words sourced from various different databases of such information. |
Okay guys so tomorrow I'm going to populate Luther with a set up guide for PocketSphinx and a precompiled English dictionary for it. I'll also create a basic node module for recording raw input and isolating the frequencies for human speech which should allow for easier extraction. Also when the time comes that we have a solid platform, I'll host it on a beefy box "in the cloud" with 20 (I have some spare Azure hosting credits) or so cores so everyone can access it, via a simple API for their projects! Can anyone who wants to help let me know so I can add them as admins to the Luther project? |
I'll help! Started to playing around with pocketphinx and i'm totally i to this! Looking forward to be a part of this project! |
Excellent, i'll add you as an admin! Did you make any progress getting it recognising speech? |
Not by now, here in Austria we have to do military service by the age of 18 and i have 5 Months left so my time is limited. But i hope i'll geht it working by friday! |
No rush! It's for your own benefit not mine :) I'll have everything mentioned above commited by tonight |
does XNOR.ai does image/video recognition too? |
Yes indeed they do, but I dont believe they have released anything yet. While I also believe it should be a priority, its a gigantic task and way beyond the capabilities of any one or two people (Especially if one of those people is me). Basically its not something i can handle alone. Hence this contribution request. |
looking around i found this https://github.com/zzmp/juliusjs which is a 'fork' from this https://github.com/julius-speech/julius speech recognition for ubuntu. |
Any updates on this? Getting local speech recognition to work right can be hard. Will we have default support for MacOS? Looking forward to project updates. This is awesome work being done here! |
I think we'll almost certainly be using pocketsphinx for speech recognition unless we can find something better. I attempted to get pocketsphinx and nodejs talking to each other last week but nobody maintains the nodejs bindings anymore. To answer your question though, it's almost certain it will be cross-platform compatible as long as all the dependencies support it too. |
Maybe there's some hope from Mozilla's project "Deep Speech" engine? they're claiming 6.5% error rate at this point. https://github.com/mozilla/DeepSpeech |
Iiiinnnteresting....anyone want to attempt to write a nose wrapper for this? If we can make this work then it makes the project (and some of the modular forks I’ve been working on behind closed doors) more viable vs other open source VAs and we can start more actively maintaining it. Long terms I’d love to see this or a variant of this as a proper open source Alexa competitor with an open skills ecosystem and companion apps. |
It sounds like a great idea to me. I think DeepSpeech only processes chunks of audio though? We'd also need to extract those chunks from a stream which is quite a big chunk of work to do right |
From my research there is a branch for doing real time analysis called “streaming-interface” that should be able to accept a raw stream from the mic. It just requires a rebuild using the build instructions.
The other option is to capture the mic stream after a snowboy keyword is detected and save that stream to a file. Then run DeepSpeech over it and extract the text. Less elegant but should work.
…On 31 May 2018, 09:31 +0100, Tim Stableford ***@***.***>, wrote:
It sounds like a great idea to me. I think DeepSpeech only processes chunks of audio though? We'd also need to extract those chunks from a stream which is quite a big chunk of work to do right
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I'm mainly thinking about detecting when a command ends. With the first one, does it work like Google's where you tell it to start and then it automatically ends on silence? With the second I think I read that the NodeJS bindings for deep speech and accept an audio buffer so we could at least cut out the filesystem |
We would have to build a timeout wrapper ourselves (similar to how the RasPi client does it) that automatically closes the mic stream a few seconds after it hears the keyword.
The alternative is to start a timer once the decibel level on the mic drops below a certain threshold (i.e silence).
Which do you think? Any other options that might work here?
…On 31 May 2018, 12:03 +0100, Tim Stableford ***@***.***>, wrote:
I'm mainly thinking about detecting when a command ends. With the first one, does it work like Google's where you tell it to start and then it automatically ends on silence? With the second I think I read that the NodeJS bindings for deep speech and accept an audio buffer so we could at least cut out the filesystem
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Of those two options I prefer the second, otherwise there'd be a large delay after small commands. I'd really like to do it like in this StackOverflow answer. The problem is that's a lot of work and it may need some input normalisation or the silence threshold to be dynamically set off of maybe an average calculation? |
I like it. I'll dig around to see what pre-packaged options are available
for Node to do this, possibly Ffmpeg.
Otherwise, If we can get a transcription stream piped out of DeepSpeech we
can basically start a short timer immediately which is reset when new text
is transcribed.
So 'hotword'-> 'start timer -> 'user speech'-> 'reset timer' -> 'no user
speech' -> 'timer stops mic and re-inits hotword detection'
Thats how I've done it on iOS and it works really well.
…On Thu, May 31, 2018 at 1:20 PM Tim Stableford ***@***.***> wrote:
Of those two options I prefer the second, otherwise there'd be a large
delay after small commands. I'd really like to do it like in this StackOverflow
answer <https://dsp.stackexchange.com/a/17629>. The problem is that's a
lot of work and it may need some input normalisation or the silence
threshold to be dynamically set off of maybe an average calculation?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvC6pHksaFYF2yUCW9Zrb76csuEya21ks5t39_1gaJpZM4Lwjf0>
.
|
Best approach would probably be having a timer after detecting relative silence, but also including a key shortcut or a tap detection or any such mechanism to have the user manually declare they're done, could speed up things a bit. |
Hi guys, I want to include offline speech recognition that's open source to this project, initially just for English. It might be worth investigating the work of XNOR.ai and failing that, building a full, optimised model for PocketSphinx.
Longer term id like this model to to be trained by interactions with the platform and then have some sort of central repository for the model so it can be synced across all instances of the platform.
Any one willing to help? Or have any ideas?
The text was updated successfully, but these errors were encountered: