collapse

Welcome!



Welcome to Robot Rebels, an online robot maker community.

Register today to post your projects, ask questions, share knowledge and meet like-minded people from around the world.


The RobotRebel.org Community

Author Topic: Installing PocketSphinx on a Raspberry Pi  (Read 9343 times)

Ralph

  • Member
  • *
  • Posts: 36
Installing PocketSphinx on a Raspberry Pi
« on: October 19, 2015, 03:25:21 PM »
I've started to experiment with PocketSphinx for my Raspberry Pi projects. PocketSphinx is a speech recognition library written in C that is part of the open source CMU Sphinx project.   PocketSphinx has my interest as a speech recognition solution for a number of reasons.  Not the least of these is the promise of being lightweight and efficient enough to do something useful on the Pi.  Also, it appears to be reasonably straightforward to integrate with C/C++ projects and can run standalone without an active Internet connection.

As always, Step 0 is to get the tools installed and running.  As such tasks go, the PocketSphinx install went fairly smoothly.  I'm posting my notes and some commentary in hopes it will be useful to others.  I claim no great expertise here.  Comments, corrections, questions, and elaborations are all most sincerely welcome.

Obviously I didn't pull this stuff out of thin air.  The following resources were essential to getting going and troubleshooting:

http://cmusphinx.sourceforge.net/wiki/tutorialPocketSphinx
https://wolfpaulus.com/journal/embedded/raspberrypi2-sr/
https://wiki.archlinux.org/index.php/Advanced_Linux_Sound_Architecture

The posting from Wolf Paulus' Journal at wolfpaulus.com was particularly thorough and extremely helpful. 

My install was on a Raspberry Pi 2 running Raspian (Debian 7.8 ).  For audio input I'm using the microphone of a Logitech C270 Webcam.  I run my Pi headless and access it from a terminal on my desktop PC, so most everything here was the command line.  I used nano to edit some of the config files mentioned.  I also used FileZilla for moving files around between machines.

I installed PocketSphinx version 5prealpha as downloaded from http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/ . Installing from a file named pre-alpha is kind of scary.  But it was the most recent version in the repository by a few years, and the official CMU Sphinx wiki recommends using 5prealpha at the time of this writing. Future generations should check at the main CMU Sphinx site and Sourceforge for the most recent version.

The bulk of the process I followed was generic to Linux, so much of the information here should be applicable to other embedded Linux systems. 

***********************************************
Configure Audio Input and Output in ALSA
***********************************************

ALSA is the Advanced Linux Sound Architecture.  It provides sound card drivers and an API to interact with audio devices. 

Some ALSA tools referenced here include:

amixer - a shell command to change audio settings
alsamixer - a more elaborate audio device configuration tool with a text-based UI
arecord - command-line sound recorder
aplay -  command-line sound player

PocketSphinx will rely on ALSA to access the microphone.  ALSA needs to be configured.

The Pi doesn't have an analog input so you will need a USB microphone, or a USB soundcard.  A USB microphone will appear as a soundcard which is convenient but could be confusing at first.

To get a list of available sound cards and verify your USB microphone is present, enter at the command line:

Code: [Select]
cat /proc/asound/cards
My result:
Code: [Select]
0 [ALSA           ]: bcm2835 - bcm2835 ALSA
                      bcm2835 ALSA
1 [U0x46d0x825    ]: USB-Audio - USB Device 0x46d:0x825
                      USB Device 0x46d:0x825 at usb-3f980000.usb-1.4, high speed

Good news - the webcam shows up as a microphone with no effort at all.  Although the mic was visible, it wasn't recognized by ASLA yet.  There is a configuration file alsa-base.conf that needs to be edited.

You can use nano to edit the file:

Code: [Select]
sudo nano /etc/modprobe.d/alsa-base.conf
There is a line in alsa-base.conf that keeps the usb mic from loading:

Code: [Select]
options snd-usb-audio index=-2
I changed the option to 0:

Code: [Select]
options snd-usb-audio index=0
Now the mic shows up first:
Code: [Select]
0 [U0x46d0x825    ]: USB-Audio - USB Device 0x46d:0x825
                      USB Device 0x46d:0x825 at usb-3f980000.usb-1.4, high speed
1 [ALSA           ]: bcm2835 - bcm2835 ALSA
                      bcm2835 ALSA

The ALSA tool amixer will let you see properties for the cards and individual controls that are active.  Entering amixer with the following parameters will show the first microphone on the first card: 

Code: [Select]
amixer -c 0 sget 'Mic',0
My Result:

Code: [Select]
Simple mixer control 'Mic',0
  Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
  Capture channels: Mono
  Limits: Capture 0 - 16
  Mono: Capture 0 [0%] [6.00dB] [on]

The mic was there but the capture was set to 0%.  I used AlsaMixer (enter alsamixer -c 0 at command line to launch) and increased the capture to 25%.

AlsaMixer.png
*AlsaMixer.png (33.87 kB . 839x462 - viewed 5379 times)

Now amixer showed the following:
Code: [Select]
Simple mixer control 'Mic',0
  Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
  Capture channels: Mono
  Limits: Capture 0 - 16
  Mono: Capture 4 [25%] [12.00dB] [on]

I tested the mic using arecord at the command line to create a 10 second (-d 10) file named test.wav:

Code: [Select]
arecord -D plughw:0,0 -d 10 -f cd ./test.wav
The file was created and I was able to copy it to my main PC and play it.  However when I tried to play it on the Pi using aplay, I got a nasty little error:

Code: [Select]
ALSA lib pcm_dmix.c:1018:(snd_pcm_dmix_open) unable to open slave
aplay: main:682: audio open error: No such file or directory

(Technically PocketSphinx doesn't need playback, but now is the time to get this fixed.)

The problem here appears to be using a USB mic for recording and the audio/video jack on the Pi for playback.   To handle this ALSA needs specific instructions in a configuration file named asound.conf in the /etc folder of the pi.  If no software on your Pi has ever required an ALSA config file, then one may not exist.  I used sudo nano to create one with the following settings.  Note that the entry for card in the first block (pcm.usb) is particular to my USB webcam microphone.  Everything else is boilerplate to use a USB mic and the audio output jack:

Code: [Select]
pcm.usb
{
    type hw
    card 0x46d:0x825
}

pcm.internal
{
    type hw
    card ALSA
}

pcm.!default
{
    type asym
    playback.pcm
    {
        type plug
        slave.pcm "internal"
    }
    capture.pcm
    {
        type plug
        slave.pcm "usb"
    }
}

ctl.!default
{
    type asym
    playback.pcm
    {
        type plug
        slave.pcm "internal"
    }
    capture.pcm
    {
        type plug
        slave.pcm "usb"
    }
}

Now I was able to both record from my USB mic and play back to analog headphones.

So far so good so I went on to...

****************************************
Installing PocketSphinx
*****************************************

To use PocketSphinx, you need to install both PocketSphinx and the support library Sphinxbase.

Before the actual install, you'll want/need to make sure your environment is up-to-date and then install the required tools and dependencies.  According to the CMUSphinx wiki you need: gcc, automake, autoconf, libtool, bison, swig at least version 2.0, python development package, and pulseaudio development package.  I was pretty sure I was good to go with gcc etc., and was pretty sure wasn't going to want pulseaudio, so I entered the following at the command prompt line by line directly from the info at the Wolf Paulus site:

Code: [Select]
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install bison
sudo apt-get install libasound2-dev
sudo apt-get install swig
sudo apt-get install python-dev
sudo apt-get install mplayer

What follows are the steps to be entered at the command line, one line at a time, for building Sphinxbase and PocketSphinx.  The steps download the files to Pi, uncompress them, compile and install.  A few notes:

  • The "5prealpha" portion of the file names for both Sphinxbase and PocketSphinx is version specific.
  • Both Sphinxbase and PocketSphinx should be the same version.
  • The "./configure --enable-fixed" option on the Sphinxbase install is for fixed point arithmetic - this speeds operation on smaller devices like the Pi

Sphinxbase

Code: [Select]
cd ~/
wget http://sourceforge.net/projects/cmusphinx/files/sphinxbase/5prealpha/sphinxbase-5prealpha.tar.gz
tar -zxvf ./sphinxbase-5prealpha.tar.gz
cd ./sphinxbase-5prealpha
./configure --enable-fixed
make clean all
sudo make install



PocketSphinx

Code: [Select]
cd ~/
wget http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/5prealpha/pocketsphinx-5prealpha.tar.gz
tar -zxvf pocketsphinx-5prealpha.tar.gz
cd ./pocketsphinx-5prealpha
./configure
make clean all
sudo make install

Finally it is necessary to execute the following steps to place the new libraries correctly in the path.  If you skip this step you might later get an error like: "error while loading shared libraries: libpocketsphinx.so.3":

Code: [Select]
cd ~/
export LD_LIBRARY_PATH=/usr/local/lib
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

Creating a Language Model

On the install, the acoustic model files were conveniently placed in the /usr/local/share/pocketsphinx/model/en-us/en-us directory.  There is an additional not-so-optional step required to run PocketSphinx effectively on the Pi: the development of a custom language model containing the words and phrases you wish to detect in your application.  (Technically PocketSphinx will run on a Pi without the custom language model but far too slowly for continuous detection from a microphone.)  The custom language model requires two additional files (1) a *.dic pronunciation dictionary and (2) a *.lm trigram language model.  Carnegie Melon has an online tool where you can upload a text file of words and sentences and generate the additional files: http://www.speech.cs.cmu.edu/tools/lmtool-new.html

For initial testing I created a text file with the following phrases, saved it and uploaded it to the online tool:
Code: [Select]
Left arm up
Left arm down
Right arm up
Right arm down
Rotate left
Rotate right
Walk forward
Walk backward

The tool produced a handful of custom files, including 2193.lm and 2193.dic which I downloaded and copied to the Pi.  (2193 is just a unique number assigned by the tool to my batch of files.)  I didn't place the files in the local share directory with the other model, but just went ahead and placed them in a directory I'm using for my various speech related work.

**************************************
A Basic Test
**************************************

The PocketSphinx install includes a sample application named pocketsphinx_continuous, a command line utility that captures sound from a mic or file and converts it to text.  Here it is useful for validating the PocketSphinx install.

Some needed parameters
-hmm: path to directory containing acoustic model files
-lm: trigram language model input file
-dict: pronunciation dictionary (lexicon) input file
-adcdev: platform-specific name for the audio input (the microphone) - for the Pi we set it to sysdefault
-inmic: to use the mic input set to yes

Based on those parameters I built the following to enter at the command prompt (for now you have to be in the directory where the custom *.lm and *.dic files are stored):

Code: [Select]
pocketsphinx_continuous -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm 2193.lm -dict 2193.dic -adcdev sysdefault -inmic yes
By default the pocketsphinx_continuous utility spits out a wealth of information as it runs. As I read aloud from the list of custom phrases listed above I can see that it is picking them out - and picking them out quite well!

Below is an excerpt from my testing showing the output when I said "Rotate Left":
Code: [Select]
READY....
Input overrun, read calls are too rare (non-fatal)
Listening...
Input overrun, read calls are too rare (non-fatal)
INFO: cmn_prior.c(131): cmn_prior_update: from < 26.97 10.93  2.76  1.63 -1.68  4.28  4.29 -2.27 -0.11  3.25  3.70 -1.72  4.75 >
INFO: cmn_prior.c(149): cmn_prior_update: to   < 29.20 12.47  1.66  2.47 -0.41  5.29  3.09 -4.41 -1.92  1.77  4.31 -2.93  3.89 >
INFO: ngram_search_fwdtree.c(1553):      424 words recognized (3/fr)
INFO: ngram_search_fwdtree.c(1555):    12365 senones evaluated (87/fr)
INFO: ngram_search_fwdtree.c(1559):     5991 channels searched (42/fr), 987 1st, 3988 last
INFO: ngram_search_fwdtree.c(1562):      536 words for which last channels evaluated (3/fr)
INFO: ngram_search_fwdtree.c(1564):      148 candidate words for entering last phone (1/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.96 CPU 0.676 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 3.37 wall 2.374 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 7 words
INFO: ngram_search_fwdflat.c(948):      348 words recognized (2/fr)
INFO: ngram_search_fwdflat.c(950):     9293 senones evaluated (65/fr)
INFO: ngram_search_fwdflat.c(952):     5151 channels searched (36/fr)
INFO: ngram_search_fwdflat.c(954):      738 words searched (5/fr)
INFO: ngram_search_fwdflat.c(957):      277 word transitions (1/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.44 CPU 0.310 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.43 wall 0.305 xRT
INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.101
INFO: ngram_search.c(1279): Eliminated 1 nodes before end node
INFO: ngram_search.c(1384): Lattice has 100 nodes, 28 links
INFO: ps_lattice.c(1380): Bestpath score: -1759
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(</s>:101:140) = -121468
INFO: ps_lattice.c(1441): Joint P(O,S) = -133720 P(S|O) = -12252
INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(878): bestpath 0.00 wall 0.001 xRT
ROTATE LEFT

The only disappointing aspect of the output is the "Input overrun, read calls are too rare (non-fatal)" error.  I dug into this issue as deeply as I could and it appears that audio is being captured through the microphone just slightly faster than the Pi is processing it through the recognition routines. I don't get the errors if I use a *.wav file as input. The impact of the overrun appears minor, as pocketsphinx_continuous is running well enough.  It does occasionally miss a word which is likely attributable to the error.  Regardless, pocketsphinx_continuous is just the sample application.  PocketSphinx itself is endlessly tunable and optimizing speed versus accuracy for a particular use is a major part of the undertaking.

**************************************
Moving On
**************************************

PocketSphinx is now installed and available to be integrated with projects.  My own plan is to develop using C++ as I've been doing with my other Raspberry Pi projects.  For C/C++ development, it seems the general recommendation is to study the source code for pocketsphinx_continuous which you can find in the source directory at: ~/pocketsphinx-5prealpha/src/programs/continuous.c. There is also a more compact "hello world" example at the CMU Sphinx wiki.


cevinius

  • Member
  • *
  • Posts: 127
    • www.cevinius.com
Re: Installing PocketSphinx on a Raspberry Pi
« Reply #1 on: October 19, 2015, 07:21:31 PM »
Thanks very much for posting this!! Very helpful. I've got to try this out on a Pi!! :D

ZeroMax

  • Badass when not suffering from IBS
  • Member
  • *
  • Posts: 105
  • Do you want ants? Because that's how you get ants.
    • DON'T LOOK AT MY ART!
Re: Installing PocketSphinx on a Raspberry Pi
« Reply #2 on: October 19, 2015, 08:12:03 PM »
I've tried several variations on that including Jasper and always run into some wall or another.  I can't wait to give this a try.
DROP * FROM* WHERE 1=1
SUDO RM /DEV/

 

* Search


* Recent Topics

3Dimsen by Bajdi
[Today at 05:42:36 AM]


Antique (fun) stuff by MEgg
[September 23, 2017, 09:26:34 AM]


win a smoothie board by jinx
[September 21, 2017, 02:21:45 AM]


Parallax CR Servo with Built-In Encoder by erco
[September 19, 2017, 10:50:01 PM]


Re-writing robotics by MEgg
[September 17, 2017, 06:01:03 AM]


Saturn pics anyone by ossipee
[September 16, 2017, 05:46:24 PM]


Kitronik :MOVE mini buggy kit by craighissett
[September 14, 2017, 08:12:20 PM]


Third 3D printer recommentation by Deity
[September 14, 2017, 06:43:00 AM]


Android Adware by Impala
[September 13, 2017, 03:24:01 PM]


Plantoid Robot's on Kickstarter. by ossipee
[September 09, 2017, 04:29:54 PM]


rewound PLA filament by jinx
[September 03, 2017, 10:30:51 AM]


corexy test by jinx
[September 02, 2017, 05:30:46 AM]


RR terms by MEgg
[August 27, 2017, 08:29:04 AM]


Eclipse 2017 by Impala
[August 24, 2017, 08:14:44 PM]


PCB mill by jinx
[August 21, 2017, 04:01:57 PM]

* Recent Posts

Re: 3Dimsen by Bajdi
[Today at 05:42:36 AM]


Re: 3Dimsen by Deity
[September 25, 2017, 07:49:30 AM]


Antique (fun) stuff by MEgg
[September 23, 2017, 09:26:34 AM]


3Dimsen by mogul
[September 23, 2017, 08:59:05 AM]


win a smoothie board by jinx
[September 21, 2017, 02:21:45 AM]


Parallax CR Servo with Built-In Encoder by erco
[September 19, 2017, 10:50:01 PM]


Re: Re-writing robotics by MEgg
[September 17, 2017, 06:01:03 AM]


Saturn pics anyone by ossipee
[September 16, 2017, 05:46:24 PM]


Re: Re-writing robotics by mogul
[September 16, 2017, 04:32:44 PM]


Re: Re-writing robotics by GayatoYana
[September 16, 2017, 02:26:52 AM]


Kitronik :MOVE mini buggy kit by craighissett
[September 14, 2017, 08:12:20 PM]


Re: Third 3D printer recommentation by Deity
[September 14, 2017, 06:43:00 AM]


Re: Third 3D printer recommentation by tinhead
[September 14, 2017, 01:46:57 AM]


Android Adware by Impala
[September 13, 2017, 03:24:01 PM]


Re: Third 3D printer recommentation by Deity
[September 13, 2017, 11:20:20 AM]