Buster is my work-in-progress, interactive voice controlled robot arm. Buster accepts basic commands in spoken English. Buster will also answer basic questions about his status.
After several months of tinkering, I was pleased to recently hit the major milestone of having the foundational elements for speech recognition, speech synthesis, and arm control integrated and playing nicely together. I now have a solid platform for exploring some of my natural language processing ideas.
Even at this nascent stage, Buster can be fairly clever. He recognizes that “raise the arm,” “lift arm,” and “move the arm up,” all are roughly equivalent. Similarly, he recognizes that “what is the height of the arm” and “how high is the arm” are asking for the same information. Buster thinks in millimeters, but if you give a command in centimeters he will convert the value. If you give a command that would move beyond the arm's range, Buster will remind you what the limit is, and won't execute the command.
I have a lot of ambition for Buster. In the near term I'll be focused on making him smarter, not just in terms of more commands and system queries, but also adding general conversational and question answering functionality. I'd also like to integrate some basic vision functions and perhaps other sensors and even other appendages. To that end, modularity has been a primary focus in Buster's design. I'm going on the assumption that over time virtually every element of hardware and software will be upgraded, replaced, enhanced or added to.
For now, I wanted to share my progress here and show this short video of Buster 0.1 in action:
Some facts/details about the current Buster build:
Buster's brain is a Raspberry Pi 2 running the standard Raspian operating system.
I'm programming in C++ using the GNU GCC compiler.
Speech recognition is handled using the open source PocketSphinx library. (I have a walk through on installing PocketSphinx here: http://www.robotrebels.org/index.php?topic=220.0
, and a boilerplate code example here: http://www.robotrebels.org/index.php?topic=239.0
For speech synthesis I'm using the open source Flite library.
PocketSphinx and Flite are Carnegie Mellon University Language Technologies Institute projects. They are both offered as lightweight implementations of more comprehensive tools (Sphinx and Festival respectively) which made them appropriate choices for Buster. Running realtime speech recognition and speech synthesis is pushing the Pi somewhat to its limits, so lightweight is the order of the day.
Buster's secret sauce is the command and query parser, a set of C++ routines that rely heavily on regular expression pattern matching. PocketSphinx will output a string of words that is oblivious to any meaning. The parser examines this output looking at both keywords and word order, looking for known structures. The parser then decomposes the string of words into a structured command or query. Also, PocketSphinx returns spoken numbers as text (i.e. “TEN”), which the parser will convert to a numeric value.
The parser accommodates a lot of variation in terms of sentence structure and synonyms. At the same time it is fairly restrictive, in that it doesn't make too many educated guesses. If too much information is missing or not understood, Buster will simply say “I did not understand what you said” and take no action.
Buster's arm is a MeArm that I assembled from a kit. At around US$50, I think the MeArm is a pretty good value. Since it uses standard 9-gram hobby servos, it was familiar and easy to interface with the electronics. MeArm has some obvious limitations and a well-known issue with strain on the servo that rotates the base. But I've successfully picked-up small objects and moved them around, so for the moment I'm happy enough.
To drive the servos I'm using a bare ATMEGA328P IC programmed as an Arduino. The Arduino and the Pi are coupled using SPI.
The microphone is the mic from a Logitech USB webcam. I've been using the webcam in other projects, and since Buster will hopefully have vision soon it made sense to use it now.
The speaker is a cheap amplified unit I picked up from DX for a couple of bucks. Overall right now speech quality is fairly poor, but this can't all be blamed on the speaker. The actual synthesis out of Flite and the Pi's PCM audio output are both culpable as well.