I've started developing an application that allows the user to compose and send an email completely hands-free... by voice command only.
However I'm having trouble finding a decent, open source (free) speech recognition (speech-to-text) engine / API to use.
Does anyone know of one? I tried PocketSphinx but had trouble compiling it in Windows using VS2008.
I'm wondering what API the Windows Live Search app uses? Its speech recognition capabilities are already decent, and if it's included with Windows Mobile or .NET Compact Framework 3.5 or Windows Mobile 6.1 itself, then I would prefer to use that. But I'm having trouble determining if this speech recognition is available to 3rd-party developers and, if so, how to interface with it.
Any help would be greatly appreciated!
OMG I hate timeouts lol
So I had this nice long post about how I thought it might be one of three things and I whipped out my omnia and disconnected the network and blah blah.
When I hit post, I got a not logged in timeout.
So here's the short of it:
It uses a server, that's probably related to UC aka Office Communications Server aka Speech Server 2007... you can get to it (and all the Microsoft Speech technologies, including Voice Command) here:
http://www.microsoft.com/speech/speech2007/default.mspx
A little more searching lead me to read the MSDN Channel 9 blog on said subject:
http://blogs.msdn.com/speech/archiv...h-for-mobile-now-with-speech-recognition.aspx
which states:
"The speech recognition functionality for the application doesn't actually sit on the Windows Mobile phone. Instead, the phone takes your speech input, sends it to a server, the server does it's recognition magic, and sends the results back to the phone. "
Speech Server 2007
Thanks for the reply MerlinJim... sucks about the timeout! That's why on a long post I always copy the text to the clipboard... that way if it times out I can just paste it in! (It's happened to me too many times for me to not do that now!)
Yeah I've looked at Speech Server 2007 as well... and I was thinking that maybe Live Search offloaded the speech recognition to a server. There's a little lag between what you say and when it guesses what you said.
I guess something like that would work. If you're writing an email then you need an Internet connection, and so sending the voice data to a speech server would be plausible. The only downside would be if it used up a lot of data transfer/bandwidth, and the user was on metered bandwidth.
The lag would be a bit of a drawback, because if the Speech Server guessed incorrectly what you said, but you kept talking (due to the processing lag), then you would have to go back and correct what you had said.
And also sometimes the Live Maps speech recognition is WAY off. Like I'll say "1 Jefferson Parkway" and it will come back with something like "Did you say 'Parkstone Apartments?'"
It's also speaker-independent, so you don't do any training. I would rather train an app to recognize my voice specifically, because I would be the only user of it.
But it may be my only solution for right now. Thanks for the info! I was beginning to think that no one knew the answer.
acrosser said:
Thanks for the reply MerlinJim... sucks about the timeout! That's why on a long post I always copy the text to the clipboard... that way if it times out I can just paste it in! (It's happened to me too many times for me to not do that now!)
Yeah I've looked at Speech Server 2007 as well... and I was thinking that maybe Live Search offloaded the speech recognition to a server. There's a little lag between what you say and when it guesses what you said.
I guess something like that would work. If you're writing an email then you need an Internet connection, and so sending the voice data to a speech server would be plausible. The only downside would be if it used up a lot of data transfer/bandwidth, and the user was on metered bandwidth.
The lag would be a bit of a drawback, because if the Speech Server guessed incorrectly what you said, but you kept talking (due to the processing lag), then you would have to go back and correct what you had said.
And also sometimes the Live Maps speech recognition is WAY off. Like I'll say "1 Jefferson Parkway" and it will come back with something like "Did you say 'Parkstone Apartments?'"
It's also speaker-independent, so you don't do any training. I would rather train an app to recognize my voice specifically, because I would be the only user of it.
But it may be my only solution for right now. Thanks for the info! I was beginning to think that no one knew the answer.
Click to expand...
Click to collapse
perhaps, but there IS a speech application loaded ON a Windows Mobile 6.1 which has text-to-speech capabilities and speech recognition
(my Blackjack II loaded with Wm6.1 has this capability)
can't find any API to use it though... only way to activate this TTS capability is to
1) sms announcing
2) appointment announcing
3) call announcing
no actual program to do TTS...
Any progress on this or any other speech-to-text program? I'm really interested in finding one.
Wouldn't mind being a beta tester, either.
*Double Post*
DELETE
Let's say I have two or more voices installed for the same language - country combination, like "en-US". One could be female, the other male voice for example. How could I switch these programmatically? There seems to be no way...
I can easily switch any time between "en-UK" and "en-US" voices by using something like:
myTTS.setLanguage(locale);
but for switching voices within the same locale, nothing. Looks to me like a flaw in Android TextToSpeech class design... Or is there some way of which I'm not aware? FYI, the high quality IVONA voices (still in free beta in Google Play Store) have "Select preferred voice" setting for each language locale. The other TTS engines I have installed (Pico, Google and SVOX Classic) do not offer this option yet, so maybe the only way to do this would be to access the private settings of IVONA engine?
Greg
Does anyone know if google has opened source code for offline voice recognition ? And if yes, where can I find them ?
As I understood srec from nuance is not that - it is old online recognition system, right ?
Hi, I am a Dyslexic android user, and I work in disability services for a small university. I am wondering if there is a practical TTS solution which would be simpler than the common "copy/paste" apps.
iOS has VoiceOver, which (for Dyslexics) puts Android TTS support to shame.
My dream is to be able to point to a chunk of text and have it read aloud without switching apps. Is there any workaround or plugin I could use to make this possible?
I have a Verizon HTC Droid DNA. I would be willing to root this if there is a possibility of this working.
I have tasker run functions, and it sometimes talks to me, now coming from a 2.3 device to a 4.3 I see the voice is different, and it's not terribly nice, are there any alternative voices to try? I know in the past other makers (Samsung) have had different voices)
I'll answer my own question in case anyone else finds this useful.
If you're unhappy with text to speech on your android device, there's several options to try both paid & unpaid.
For me (after extensive research & testing) I found this solution works best.
Svox Classic Text To Speech Engine
https://play.google.com/store/apps/details?id=com.svox.classic
and the voice I used for maximum clarity & easy on the ears was this:
SVOX US English Grace Voice
https://play.google.com/store/apps/details?id=com.svox.classic.langpack.eng_usa_fem
Naturally your mileage will vary, but this one worked for me. It doesn't seem to work in google now speech, but in maps (navigation), tasker & FBReadet TTs it does.