For the last couple of years, I've been waffling on about how embedded vision and embedded speech capabilities are racing towards us.
One example I often use is that of an electric toaster. Imagine purchasing a new toaster, taking it out of its box, and powering it up. Now imagine it asking for your name. Later, when you drop your bread into the toaster, it could ask how you like your toast. In my case, I might respond "I prefer it tad on the dark side, if you don’t mind." When the toast pops up, the toaster might inquire how well it did, and I might say "Not bad, but perhaps a shade darker next time."
Using embedded vision, the toaster will recognize me the next time I use it. Furthermore, it will recognize if I'm using the same type of bread as last time, in which case it will simply get on with the job, and not engage me in idle chit-chat.
Over time, the toaster will learn my toasting preferences for white bread, wheat bread, bagels, muffins, croissants, and so forth. Similarly, it will learn the individual preferences of other family members and guests. It's not hard to envisage how this type of capability could make all of our lives a lot easier.
We are still in the early days with regard to the widespread use of this technology, but it's coming -- oh yes, it's coming.
In the case of speech recognition, one option is to perform all of the processing wirelessly, online, in the cloud. This is the way the Amazon Echo works, for example (see also Have you heard of the Amazon Echo?). Once the cloud servers have processed what you are saying, the Echo typically responds by telling you the answer to the question you asked or performing some task like adding an item to your shopping list or playing some music.
An alternative cloud-based scheme is to have your speech uploaded and evaluated in the cloud, which then returns an ASCII-based text equivalent of what you said. It's then your job -- or the job of your software application -- to parse this ASCII text and decide what to do with it.
Another approach is to perform the speech recognition locally, offline. Now, as you may recall, I have a lot of Arduino-based hobby projects on the go. These include my BADASS Display, my Vetinari Clock, my Inamorata Prognostication Engine, and my Caveman Diorama. I would love to be able to control all of these little scamps using voice commands.
There are a number of speech-recognition shields available for the Arduino. A friend of mine recently purchased one for around $50, but he said it was pretty unintuitive and not-so-easy to use. Also he had to train it to his voice -- he later decided to splash out an additional $200 to get the package that made it speaker-independent.
All of which leads us to this Kickstarter project that just launched a few days ago. This is for a standalone (offline) Arduino speech recognizer/synthesizer shield called MOVI, which is the brainchild of Bertrand Irissou and Gerald Friedland. They currently have a fully working multi-board prototype, and they have launched the Kickstarter project in order to transmogrify this prototype into a one-board shield that they will make available to the Maker community. In fact, they recently won the Maker Faire Editor's Blue Ribbon Award at the Bay Area Maker Faire in May 2015.
This little beauty will be able to recognize hundreds of sentences of the user's choice, where these can range from simple and generic ("Light on") to complex and specific ("Turn on the ceiling light in the bedroom"). Furthermore, MOVI is speaker independent -- it will respond in the same way to the same sentences, irrespective of who is saying them.
The usage model for MOVI looks to be very simple and intuitive. You start off by defining the sentences you expect to see in your setup() function; for example:
recognizer.addSentence (1, "Turn table light on");
recognizer.addSentence (2, "Turn table light off");
recognizer.addSentence (3, "Turn ceiling light on");
recognizer.addSentence (4, "Turn ceiling light off");
:
Later, in the main body of your code, you might use something like:
speachResult = recognizer.poll();
if (speachResult == 1) {
// Your commands go here
}
From what I've seen on the Kickstarter videos, MOVI really does look rather interesting. Of course, I have all sorts of questions, so I'm hoping to set up a telephone call with MOVI's creators so I can learn a lot more, in which case you may well see a follow-up column on this little beauty. Meanwhile, what do you think about speech recognition in general and MOVI in particular?
文章评论(0条评论)
登录后参与讨论