Contribution Request: Offline speech recognition #11

patrickjquinn · 2017-01-28T17:03:10Z

Hi guys, I want to include offline speech recognition that's open source to this project, initially just for English. It might be worth investigating the work of XNOR.ai and failing that, building a full, optimised model for PocketSphinx.

Longer term id like this model to to be trained by interactions with the platform and then have some sort of central repository for the model so it can be synced across all instances of the platform.

Any one willing to help? Or have any ideas?

h4ckd0tm3 · 2017-01-28T17:48:22Z

Willing to do what i can to help! :)

patrickjquinn · 2017-01-28T18:10:08Z

Excellent thanks for the offer, what skills do you posses?

h4ckd0tm3 · 2017-01-28T18:22:09Z

PHP, HTML, JavaScript, Java, C#....

I graduated at a Secondary Technical College in Engeneering in Austria. Working as Sysadmin. Advanced Linux skills.

Never worked with Node.js before but basic knowlege.

patrickjquinn · 2017-01-28T18:55:15Z

Okay that's perfect, have you ever worked with PocketSphinx or CMUSphinx before?

Marak · 2017-01-28T18:59:20Z

@patrickjquinn Which module is currently providing the speech recognition?

h4ckd0tm3 · 2017-01-28T19:14:43Z

@patrickjquinn No but looks interesting! And im willing to study this shit xD

@Marak
As far as i know annyang

patrickjquinn · 2017-01-28T19:17:20Z

Ah the person behind say! I'm using your fantastic module for the RasPi client!

At the moment, it's done using online APIs ( Google cloud speech and Wit.ai) and node-record-lpc16 to handle speech recognition on the clients.

I've experimented with PocketSphinx but found it to be...too unreliable.

Hence my desire to build something more fit-for task that can be trained dynamically and manually by the community. Open source SST would be a massive coo for the open source community working on projects such as this.

Think you might be able to help out?

patrickjquinn · 2017-01-28T21:19:16Z

@developingUnicorn excellent :) well I'd suggest you try and get https://github.com/cmusphinx/node-pocketsphinx or https://syl22-00.github.io/pocketsphinx.js/ (both JavaScript bindings for PocketSphinx) recognising speech locally that should be all the research you'll need :) you can contact me via the projects Gitter https://gitter.im/P-Brain/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

patrickjquinn · 2017-01-29T18:43:10Z

In order to kick this off, i've started a new project; Luther https://github.com/patrickjquinn/Luther (i.e Martin Luther King, i.e Free speech), initially just containing a giant text file of 450+k english words. I'll expand this to a giant list of english sentences, popular musicians and slang words sourced from various different databases of such information.

patrickjquinn · 2017-01-31T22:12:26Z

Okay guys so tomorrow I'm going to populate Luther with a set up guide for PocketSphinx and a precompiled English dictionary for it. I'll also create a basic node module for recording raw input and isolating the frequencies for human speech which should allow for easier extraction.

Also when the time comes that we have a solid platform, I'll host it on a beefy box "in the cloud" with 20 (I have some spare Azure hosting credits) or so cores so everyone can access it, via a simple API for their projects!

Can anyone who wants to help let me know so I can add them as admins to the Luther project?

h4ckd0tm3 · 2017-02-01T08:39:06Z

I'll help! Started to playing around with pocketphinx and i'm totally i to this! Looking forward to be a part of this project!

patrickjquinn · 2017-02-01T08:42:13Z

Excellent, i'll add you as an admin! Did you make any progress getting it recognising speech?

h4ckd0tm3 · 2017-02-01T14:56:35Z

Not by now, here in Austria we have to do military service by the age of 18 and i have 5 Months left so my time is limited. But i hope i'll geht it working by friday!

patrickjquinn · 2017-02-01T16:03:32Z

No rush! It's for your own benefit not mine :) I'll have everything mentioned above commited by tonight

staberas · 2017-02-12T18:26:25Z

does XNOR.ai does image/video recognition too?
i also think offline should be the priority of this project.

patrickjquinn · 2017-02-12T19:02:13Z

Yes indeed they do, but I dont believe they have released anything yet.

While I also believe it should be a priority, its a gigantic task and way beyond the capabilities of any one or two people (Especially if one of those people is me). Basically its not something i can handle alone. Hence this contribution request.

staberas · 2017-02-16T11:58:13Z

looking around i found this https://github.com/zzmp/juliusjs which is a 'fork' from this https://github.com/julius-speech/julius speech recognition for ubuntu.

Marak · 2017-03-02T00:37:20Z

Any updates on this? Getting local speech recognition to work right can be hard.

Will we have default support for MacOS?

Looking forward to project updates. This is awesome work being done here!

timstableford · 2017-03-02T08:30:51Z

I think we'll almost certainly be using pocketsphinx for speech recognition unless we can find something better. I attempted to get pocketsphinx and nodejs talking to each other last week but nobody maintains the nodejs bindings anymore. To answer your question though, it's almost certain it will be cross-platform compatible as long as all the dependencies support it too.

i-am-malaquias · 2018-05-28T21:28:45Z

Maybe there's some hope from Mozilla's project "Deep Speech" engine? they're claiming 6.5% error rate at this point. https://github.com/mozilla/DeepSpeech

patrickjquinn · 2018-05-28T21:37:48Z

Iiiinnnteresting....anyone want to attempt to write a nose wrapper for this?

If we can make this work then it makes the project (and some of the modular forks I’ve been working on behind closed doors) more viable vs other open source VAs and we can start more actively maintaining it.

Long terms I’d love to see this or a variant of this as a proper open source Alexa competitor with an open skills ecosystem and companion apps.

timstableford · 2018-05-31T08:31:36Z

It sounds like a great idea to me. I think DeepSpeech only processes chunks of audio though? We'd also need to extract those chunks from a stream which is quite a big chunk of work to do right

patrickjquinn · 2018-05-31T08:35:58Z

From my research there is a branch for doing real time analysis called “streaming-interface” that should be able to accept a raw stream from the mic. It just requires a rebuild using the build instructions. The other option is to capture the mic stream after a snowboy keyword is detected and save that stream to a file. Then run DeepSpeech over it and extract the text. Less elegant but should work.

…

On 31 May 2018, 09:31 +0100, Tim Stableford ***@***.***>, wrote: It sounds like a great idea to me. I think DeepSpeech only processes chunks of audio though? We'd also need to extract those chunks from a stream which is quite a big chunk of work to do right — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

timstableford · 2018-05-31T11:03:14Z

I'm mainly thinking about detecting when a command ends. With the first one, does it work like Google's where you tell it to start and then it automatically ends on silence? With the second I think I read that the NodeJS bindings for deep speech and accept an audio buffer so we could at least cut out the filesystem

patrickjquinn · 2018-05-31T11:08:05Z

We would have to build a timeout wrapper ourselves (similar to how the RasPi client does it) that automatically closes the mic stream a few seconds after it hears the keyword. The alternative is to start a timer once the decibel level on the mic drops below a certain threshold (i.e silence). Which do you think? Any other options that might work here?

…

On 31 May 2018, 12:03 +0100, Tim Stableford ***@***.***>, wrote: I'm mainly thinking about detecting when a command ends. With the first one, does it work like Google's where you tell it to start and then it automatically ends on silence? With the second I think I read that the NodeJS bindings for deep speech and accept an audio buffer so we could at least cut out the filesystem — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

timstableford · 2018-05-31T12:20:05Z

Of those two options I prefer the second, otherwise there'd be a large delay after small commands. I'd really like to do it like in this StackOverflow answer. The problem is that's a lot of work and it may need some input normalisation or the silence threshold to be dynamically set off of maybe an average calculation?

patrickjquinn · 2018-05-31T14:17:25Z

I like it. I'll dig around to see what pre-packaged options are available for Node to do this, possibly Ffmpeg. Otherwise, If we can get a transcription stream piped out of DeepSpeech we can basically start a short timer immediately which is reset when new text is transcribed. So 'hotword'-> 'start timer -> 'user speech'-> 'reset timer' -> 'no user speech' -> 'timer stops mic and re-inits hotword detection' Thats how I've done it on iOS and it works really well.

…

On Thu, May 31, 2018 at 1:20 PM Tim Stableford ***@***.***> wrote: Of those two options I prefer the second, otherwise there'd be a large delay after small commands. I'd really like to do it like in this StackOverflow answer <https://dsp.stackexchange.com/a/17629>. The problem is that's a lot of work and it may need some input normalisation or the silence threshold to be dynamically set off of maybe an average calculation? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABvC6pHksaFYF2yUCW9Zrb76csuEya21ks5t39_1gaJpZM4Lwjf0> .

i-am-malaquias · 2018-05-31T17:52:35Z

Best approach would probably be having a timer after detecting relative silence, but also including a key shortcut or a tap detection or any such mechanism to have the user manually declare they're done, could speed up things a bit.

patrickjquinn closed this as completed Feb 7, 2017

patrickjquinn reopened this Feb 7, 2017

patrickjquinn added the help wanted label Feb 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contribution Request: Offline speech recognition #11

Contribution Request: Offline speech recognition #11

patrickjquinn commented Jan 28, 2017

h4ckd0tm3 commented Jan 28, 2017

patrickjquinn commented Jan 28, 2017

h4ckd0tm3 commented Jan 28, 2017

patrickjquinn commented Jan 28, 2017

Marak commented Jan 28, 2017

h4ckd0tm3 commented Jan 28, 2017 •

edited

Loading

patrickjquinn commented Jan 28, 2017

patrickjquinn commented Jan 28, 2017

patrickjquinn commented Jan 29, 2017

patrickjquinn commented Jan 31, 2017

h4ckd0tm3 commented Feb 1, 2017

patrickjquinn commented Feb 1, 2017

h4ckd0tm3 commented Feb 1, 2017 •

edited

Loading

patrickjquinn commented Feb 1, 2017

staberas commented Feb 12, 2017 •

edited

Loading

patrickjquinn commented Feb 12, 2017

staberas commented Feb 16, 2017 •

edited

Loading

Marak commented Mar 2, 2017

timstableford commented Mar 2, 2017

i-am-malaquias commented May 28, 2018

patrickjquinn commented May 28, 2018

timstableford commented May 31, 2018

patrickjquinn commented May 31, 2018 via email

timstableford commented May 31, 2018

patrickjquinn commented May 31, 2018 via email

timstableford commented May 31, 2018

patrickjquinn commented May 31, 2018 via email

i-am-malaquias commented May 31, 2018

Contribution Request: Offline speech recognition #11

Contribution Request: Offline speech recognition #11

Comments

patrickjquinn commented Jan 28, 2017

h4ckd0tm3 commented Jan 28, 2017

patrickjquinn commented Jan 28, 2017

h4ckd0tm3 commented Jan 28, 2017

patrickjquinn commented Jan 28, 2017

Marak commented Jan 28, 2017

h4ckd0tm3 commented Jan 28, 2017 • edited Loading

patrickjquinn commented Jan 28, 2017

patrickjquinn commented Jan 28, 2017

patrickjquinn commented Jan 29, 2017

patrickjquinn commented Jan 31, 2017

h4ckd0tm3 commented Feb 1, 2017

patrickjquinn commented Feb 1, 2017

h4ckd0tm3 commented Feb 1, 2017 • edited Loading

patrickjquinn commented Feb 1, 2017

staberas commented Feb 12, 2017 • edited Loading

patrickjquinn commented Feb 12, 2017

staberas commented Feb 16, 2017 • edited Loading

Marak commented Mar 2, 2017

timstableford commented Mar 2, 2017

i-am-malaquias commented May 28, 2018

patrickjquinn commented May 28, 2018

timstableford commented May 31, 2018

patrickjquinn commented May 31, 2018 via email

timstableford commented May 31, 2018

patrickjquinn commented May 31, 2018 via email

timstableford commented May 31, 2018

patrickjquinn commented May 31, 2018 via email

i-am-malaquias commented May 31, 2018

h4ckd0tm3 commented Jan 28, 2017 •

edited

Loading

h4ckd0tm3 commented Feb 1, 2017 •

edited

Loading

staberas commented Feb 12, 2017 •

edited

Loading

staberas commented Feb 16, 2017 •

edited

Loading