collapse

Welcome!



Welcome to Robot Rebels, an online robot maker community.

Register today to post your projects, ask questions, share knowledge and meet like-minded people from around the world.


The RobotRebel.org Community

Author Topic: Boilerplate PocketSphinx C++ code to capture microphone input on Raspberry Pi  (Read 8358 times)

Ralph

  • Member
  • *
  • Posts: 36
Below I've adapted and simplified the demonstration code that comes with PocketSphinx (continuous.c) to hopefully make it easier for people just getting started to follow the logic and write their own C/C++ programs.  Mostly what I've done is rearrange things a bit, strip out some extraneous code, and add a bunch of comments.  I also removed the routines related to reading from files and focused on microphone input.  I've preserved the variable names etc. from continuous.c whenever possible to facilitate comparison with the original.

This boilerplate requires a custom dictionary and language model (custom.dic and custom.lm) to be present in the local directory.

Reference http://www.robotrebels.org/index.php?topic=220.0 for details on installing PocketSphinx on a Raspberry Pi and creating the dictionary and language files.

To compile, save the source code as ps_boilerplate.cpp and enter the following string at the command line (of course you can name the source and target files anything you want):
Code: [Select]
g++ -O3 -o ps_boilerplate ps_boilerplate.cpp  `pkg-config --cflags --libs pocketsphinx sphinxbase`

To run the program:
Code: [Select]
./ps_boilerplate


Concept of Operation

Among its capabilities, the PocketSphinx API provides an interface to functionality to:
  • manage audio input devices
  • monitor incoming audio for the presence of speech
  • decode spoken audio into text

This code provides a boilerplate framework to implement those capabilities in an application that performs near continuous speech recognition from a microphone.

There are two functions:

main -
Performs some basic housekeeping to configure and initialize a decoder instance and then starts a loop that calls the core recognition routine and sends the results to the screen, one decoded statement at a time.  The text is placed in a variable named decoded_speech.  This boilerplate could be expanded by adding functionality to "do something" with the contents of the decoded_speech variable.

recognize_from_microphone -
When called, returns a string containing the decoded text of the next spoken statement.

The basic flow of recognize_from_microphone is:
  • start recording audio
  • start reading from the audio buffer and feeding the audio to the PocketSphinx decoder
  • test for the presence of speech and once detected set a flag that speech has started
  • begin testing for the absence of speech, once detected clear the flag
  • stop recording
  • get the hypothesis (best guess) of recorded speech-to-text and return

Code: [Select]
#include <iostream>
#include <string>
#include <pocketsphinx.h>
#include <sphinxbase/ad.h>
#include <sphinxbase/err.h>

using namespace std;

string recognize_from_microphone();

ps_decoder_t *ps;                  // create pocketsphinx decoder structure
cmd_ln_t *config;                  // create configuration structure
ad_rec_t *ad;                      // create audio recording structure - for use with ALSA functions

int16 adbuf[4096];                 // buffer array to hold audio data
uint8 utt_started, in_speech;      // flags for tracking active speech - has speech started? - is speech currently happening?
int32 k;                           // holds the number of frames in the audio buffer
char const *hyp;                   // pointer to "hypothesis" (best guess at the decoded result)


int main(int argc, char *argv[]) {

  config = cmd_ln_init(NULL, ps_args(), TRUE,                   // Load the configuration structure - ps_args() passes the default values
    "-hmm", "/usr/local/share/pocketsphinx/model/en-us/en-us",  // path to the standard english language model
    "-lm", "custom.lm",                                         // custom language model (file must be present)
    "-dict", "custom.dic",                                      // custom dictionary (file must be present)
    "-logfn", "/dev/null",                                      // suppress log info from being sent to screen
     NULL);

  ps = ps_init(config);                                                        // initialize the pocketsphinx decoder
  ad = ad_open_dev("sysdefault", (int) cmd_ln_float32_r(config, "-samprate")); // open default microphone at default samplerate

  while(1){                                                                   
    string decoded_speech = recognize_from_microphone();          // call the function to capture and decode speech           
    cout << "Decoded Speech: "<< decoded_speech << "\n" <<endl;   // send decoded speech to screen
   }

 ad_close(ad);                                                    // close the microphone
}
 
string recognize_from_microphone(){

    ad_start_rec(ad);                                // start recording
    ps_start_utt(ps);                                // mark the start of the utterance
    utt_started = FALSE;                             // clear the utt_started flag

    while(1) {                                       
        k = ad_read(ad, adbuf, 4096);                // capture the number of frames in the audio buffer
        ps_process_raw(ps, adbuf, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder

        in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected

        if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false                           
            utt_started = TRUE;                      // then set the flag
        }
 
        if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
            ps_end_utt(ps);                          // then mark the end of the utterance
            ad_stop_rec(ad);                         // stop recording
            hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
            return hyp;                              // the function returns the hypothesis
            break;                                   // exit the while loop and return to main
        }
    }

}

vladtech

  • Member
  • ****
  • v
  • Posts: 8
Hello, I was wondering if you had a boilerplate for PocketSphinx for C and not C++, I know C but not C++. Thank you.

Ralph

  • Member
  • *
  • Posts: 36
As noted above, this "boilerplate" is adapted and simplified from the demonstration code that comes with PocketSphinx.  That file is named "continuous.c" - it is straight C so it should meet your needs.  It should already be present on your computer if PocketSphinx is installed.  (Unfortunately the original continuous.c code is considerably more complex than the boilerplate - but that's why I posted the boilerplate.)  Off the top of my head I can't remember exactly what I did when I used C++ syntax in my simplified version, but I suspect the changes you would need to make are primarily to use only standard C headers, remove the "using namespace" directive, and use "printf" in place of "cout" for sending text to the console.

For getting started using C with PocketSphinx, it would probably help to go straight to the source at the PocketSphinx tutorial - http://cmusphinx.sourceforge.net/wiki/tutorialPocketSphinx

vladtech

  • Member
  • ****
  • v
  • Posts: 8
Thanks to Ralph I was about to make some changes to make it a pure C boilerplate, here is the code:

Code: [Select]
#include <stdio.h>
#include <string.h>
#include <pocketsphinx.h>
#include <sphinxbase/ad.h>
#include <sphinxbase/err.h>

const char * recognize_from_microphone();

ps_decoder_t *ps;                  // create pocketsphinx decoder structure
cmd_ln_t *config;                  // create configuration structure
ad_rec_t *ad;                      // create audio recording structure - for use with ALSA functions

int16 adbuf[4096];                 // buffer array to hold audio data
uint8 utt_started, in_speech;      // flags for tracking active speech - has speech started? - is speech currently happening?
int32 k;                           // holds the number of frames in the audio buffer
char const *hyp;                   // pointer to "hypothesis" (best guess at the decoded result)
char const *decoded_speech;


int main(int argc, char *argv[]) {

  config = cmd_ln_init(NULL, ps_args(), TRUE,                   // Load the configuration structure - ps_args() passes the default values
    "-hmm", "/usr/local/share/pocketsphinx/model/en-us/en-us",  // path to the standard english language model
    "-lm", "custom.lm",                                         // custom language model (file must be present)
    "-dict", "custom.dic",                                      // custom dictionary (file must be present)
    "-logfn", "/dev/null",                                      // suppress log info from being sent to screen
     NULL);

  ps = ps_init(config);                                                        // initialize the pocketsphinx decoder
  ad = ad_open_dev("sysdefault", (int) cmd_ln_float32_r(config, "-samprate")); // open default microphone at default samplerate

  while(1){                                                                   
    decoded_speech = recognize_from_microphone();            // call the function to capture and decode speech           
    printf("You Said: %s\n", decoded_speech); // send decoded speech to screen

   }

 ad_close(ad);                                                    // close the microphone
}
 
const char * recognize_from_microphone(){

    ad_start_rec(ad);                                // start recording
    ps_start_utt(ps);                                // mark the start of the utterance
    utt_started = FALSE;                             // clear the utt_started flag

    while(1) {                                       
        k = ad_read(ad, adbuf, 4096);                // capture the number of frames in the audio buffer
        ps_process_raw(ps, adbuf, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder

        in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected

        if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false                           
            utt_started = TRUE;                      // then set the flag
        }
 
        if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
            ps_end_utt(ps);                          // then mark the end of the utterance
            ad_stop_rec(ad);                         // stop recording
            hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
            return hyp;                              // the function returns the hypothesis
            break;                                   // exit the while loop and return to main
        }
    }

}



Ralph

  • Member
  • *
  • Posts: 36
Excellent work - and a sincere "thank you" for coming back to the site and posting the standard C version for others who may need it.  Best of luck moving forward with your PocketSphinx project.

 

* Search


* Recent Topics

Time of Flight >>> Rounded Corners !!! by Gareth
[Today at 04:07:31 AM]


Double_ing up the "Time of Flight" sensors by Gareth
[Today at 04:05:02 AM]


Point cloud anyone !!! maybe 3D scanner by maelh
[November 11, 2018, 05:38:33 AM]


removable magnetic print surface by Gareth
[November 09, 2018, 10:53:45 AM]


Robot Cop by Valeriex
[November 07, 2018, 05:48:20 AM]


it's lukeyes by Valeriex
[November 07, 2018, 05:47:16 AM]


TOF gets a local DEBUG screen by Gareth
[November 07, 2018, 03:47:29 AM]


BlueTooth elves choose "BlueTeeth" by Gareth
[November 07, 2018, 03:40:19 AM]


Yup, it is me... ...CtC by ZeroMax
[November 05, 2018, 07:14:09 PM]


Four Motor Leg Module by DWRobotics
[November 04, 2018, 05:15:49 PM]


MKS Gen L 1.0 by terragady
[November 02, 2018, 12:15:09 PM]


Do you think he pushed the train in the last lap, G? by Gareth
[November 01, 2018, 05:57:58 PM]


IBM buys Red Hat by jinx
[October 30, 2018, 04:09:07 AM]


QTC - quantum tunneling composite. Can we make our own sensors? by OddBot
[October 27, 2018, 02:14:52 PM]


[ Firmware ] Klipper - the cool firmware for 3d Printers by tinhead
[October 03, 2018, 12:29:14 PM]

* Recent Posts

Re: Time of Flight >>> Rounded Corners !!! by Gareth
[Today at 04:07:31 AM]


Re: Double_ing up the "Time of Flight" sensors by Gareth
[Today at 04:05:02 AM]


Re: Time of Flight >>> Rounded Corners !!! by Gareth
[Today at 03:53:46 AM]


Re: Time of Flight >>> Rounded Corners !!! by Gareth
[Today at 03:45:41 AM]


Re: Time of Flight >>> Rounded Corners !!! by MEgg
[November 12, 2018, 01:01:49 PM]


Re: Point cloud anyone !!! maybe 3D scanner by maelh
[November 11, 2018, 05:38:33 AM]


Re: Time of Flight >>> Rounded Corners !!! by maelh
[November 11, 2018, 05:25:15 AM]


Re: Double_ing up the "Time of Flight" sensors by maelh
[November 11, 2018, 05:22:23 AM]


Re: removable magnetic print surface by Gareth
[November 09, 2018, 10:53:45 AM]


Re: Robot Cop by Valeriex
[November 07, 2018, 05:48:20 AM]


Re: it's lukeyes by Valeriex
[November 07, 2018, 05:47:16 AM]


TOF gets a local DEBUG screen by Gareth
[November 07, 2018, 03:47:29 AM]


Time of Flight >>> Rounded Corners !!! by Gareth
[November 07, 2018, 03:46:01 AM]


Point cloud anyone !!! maybe 3D scanner by Gareth
[November 07, 2018, 03:44:49 AM]


Double_ing up the "Time of Flight" sensors by Gareth
[November 07, 2018, 03:42:30 AM]