collapse

Welcome!



Welcome to Robot Rebels, an online robot maker community.

Register today to post your projects, ask questions, share knowledge and meet like-minded people from around the world.


The RobotRebel.org Community

Author Topic: Boilerplate PocketSphinx C++ code to capture microphone input on Raspberry Pi  (Read 4990 times)

Ralph

  • Member
  • *
  • Posts: 36
Below I've adapted and simplified the demonstration code that comes with PocketSphinx (continuous.c) to hopefully make it easier for people just getting started to follow the logic and write their own C/C++ programs.  Mostly what I've done is rearrange things a bit, strip out some extraneous code, and add a bunch of comments.  I also removed the routines related to reading from files and focused on microphone input.  I've preserved the variable names etc. from continuous.c whenever possible to facilitate comparison with the original.

This boilerplate requires a custom dictionary and language model (custom.dic and custom.lm) to be present in the local directory.

Reference http://www.robotrebels.org/index.php?topic=220.0 for details on installing PocketSphinx on a Raspberry Pi and creating the dictionary and language files.

To compile, save the source code as ps_boilerplate.cpp and enter the following string at the command line (of course you can name the source and target files anything you want):
Code: [Select]
g++ -O3 -o ps_boilerplate ps_boilerplate.cpp  `pkg-config --cflags --libs pocketsphinx sphinxbase`

To run the program:
Code: [Select]
./ps_boilerplate


Concept of Operation

Among its capabilities, the PocketSphinx API provides an interface to functionality to:
  • manage audio input devices
  • monitor incoming audio for the presence of speech
  • decode spoken audio into text

This code provides a boilerplate framework to implement those capabilities in an application that performs near continuous speech recognition from a microphone.

There are two functions:

main -
Performs some basic housekeeping to configure and initialize a decoder instance and then starts a loop that calls the core recognition routine and sends the results to the screen, one decoded statement at a time.  The text is placed in a variable named decoded_speech.  This boilerplate could be expanded by adding functionality to "do something" with the contents of the decoded_speech variable.

recognize_from_microphone -
When called, returns a string containing the decoded text of the next spoken statement.

The basic flow of recognize_from_microphone is:
  • start recording audio
  • start reading from the audio buffer and feeding the audio to the PocketSphinx decoder
  • test for the presence of speech and once detected set a flag that speech has started
  • begin testing for the absence of speech, once detected clear the flag
  • stop recording
  • get the hypothesis (best guess) of recorded speech-to-text and return

Code: [Select]
#include <iostream>
#include <string>
#include <pocketsphinx.h>
#include <sphinxbase/ad.h>
#include <sphinxbase/err.h>

using namespace std;

string recognize_from_microphone();

ps_decoder_t *ps;                  // create pocketsphinx decoder structure
cmd_ln_t *config;                  // create configuration structure
ad_rec_t *ad;                      // create audio recording structure - for use with ALSA functions

int16 adbuf[4096];                 // buffer array to hold audio data
uint8 utt_started, in_speech;      // flags for tracking active speech - has speech started? - is speech currently happening?
int32 k;                           // holds the number of frames in the audio buffer
char const *hyp;                   // pointer to "hypothesis" (best guess at the decoded result)


int main(int argc, char *argv[]) {

  config = cmd_ln_init(NULL, ps_args(), TRUE,                   // Load the configuration structure - ps_args() passes the default values
    "-hmm", "/usr/local/share/pocketsphinx/model/en-us/en-us",  // path to the standard english language model
    "-lm", "custom.lm",                                         // custom language model (file must be present)
    "-dict", "custom.dic",                                      // custom dictionary (file must be present)
    "-logfn", "/dev/null",                                      // suppress log info from being sent to screen
     NULL);

  ps = ps_init(config);                                                        // initialize the pocketsphinx decoder
  ad = ad_open_dev("sysdefault", (int) cmd_ln_float32_r(config, "-samprate")); // open default microphone at default samplerate

  while(1){                                                                   
    string decoded_speech = recognize_from_microphone();          // call the function to capture and decode speech           
    cout << "Decoded Speech: "<< decoded_speech << "\n" <<endl;   // send decoded speech to screen
   }

 ad_close(ad);                                                    // close the microphone
}
 
string recognize_from_microphone(){

    ad_start_rec(ad);                                // start recording
    ps_start_utt(ps);                                // mark the start of the utterance
    utt_started = FALSE;                             // clear the utt_started flag

    while(1) {                                       
        k = ad_read(ad, adbuf, 4096);                // capture the number of frames in the audio buffer
        ps_process_raw(ps, adbuf, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder

        in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected

        if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false                           
            utt_started = TRUE;                      // then set the flag
        }
 
        if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
            ps_end_utt(ps);                          // then mark the end of the utterance
            ad_stop_rec(ad);                         // stop recording
            hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
            return hyp;                              // the function returns the hypothesis
            break;                                   // exit the while loop and return to main
        }
    }

}

vladtech

  • Member
  • ****
  • v
  • Posts: 8
Hello, I was wondering if you had a boilerplate for PocketSphinx for C and not C++, I know C but not C++. Thank you.

Ralph

  • Member
  • *
  • Posts: 36
As noted above, this "boilerplate" is adapted and simplified from the demonstration code that comes with PocketSphinx.  That file is named "continuous.c" - it is straight C so it should meet your needs.  It should already be present on your computer if PocketSphinx is installed.  (Unfortunately the original continuous.c code is considerably more complex than the boilerplate - but that's why I posted the boilerplate.)  Off the top of my head I can't remember exactly what I did when I used C++ syntax in my simplified version, but I suspect the changes you would need to make are primarily to use only standard C headers, remove the "using namespace" directive, and use "printf" in place of "cout" for sending text to the console.

For getting started using C with PocketSphinx, it would probably help to go straight to the source at the PocketSphinx tutorial - http://cmusphinx.sourceforge.net/wiki/tutorialPocketSphinx

vladtech

  • Member
  • ****
  • v
  • Posts: 8
Thanks to Ralph I was about to make some changes to make it a pure C boilerplate, here is the code:

Code: [Select]
#include <stdio.h>
#include <string.h>
#include <pocketsphinx.h>
#include <sphinxbase/ad.h>
#include <sphinxbase/err.h>

const char * recognize_from_microphone();

ps_decoder_t *ps;                  // create pocketsphinx decoder structure
cmd_ln_t *config;                  // create configuration structure
ad_rec_t *ad;                      // create audio recording structure - for use with ALSA functions

int16 adbuf[4096];                 // buffer array to hold audio data
uint8 utt_started, in_speech;      // flags for tracking active speech - has speech started? - is speech currently happening?
int32 k;                           // holds the number of frames in the audio buffer
char const *hyp;                   // pointer to "hypothesis" (best guess at the decoded result)
char const *decoded_speech;


int main(int argc, char *argv[]) {

  config = cmd_ln_init(NULL, ps_args(), TRUE,                   // Load the configuration structure - ps_args() passes the default values
    "-hmm", "/usr/local/share/pocketsphinx/model/en-us/en-us",  // path to the standard english language model
    "-lm", "custom.lm",                                         // custom language model (file must be present)
    "-dict", "custom.dic",                                      // custom dictionary (file must be present)
    "-logfn", "/dev/null",                                      // suppress log info from being sent to screen
     NULL);

  ps = ps_init(config);                                                        // initialize the pocketsphinx decoder
  ad = ad_open_dev("sysdefault", (int) cmd_ln_float32_r(config, "-samprate")); // open default microphone at default samplerate

  while(1){                                                                   
    decoded_speech = recognize_from_microphone();            // call the function to capture and decode speech           
    printf("You Said: %s\n", decoded_speech); // send decoded speech to screen

   }

 ad_close(ad);                                                    // close the microphone
}
 
const char * recognize_from_microphone(){

    ad_start_rec(ad);                                // start recording
    ps_start_utt(ps);                                // mark the start of the utterance
    utt_started = FALSE;                             // clear the utt_started flag

    while(1) {                                       
        k = ad_read(ad, adbuf, 4096);                // capture the number of frames in the audio buffer
        ps_process_raw(ps, adbuf, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder

        in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected

        if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false                           
            utt_started = TRUE;                      // then set the flag
        }
 
        if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
            ps_end_utt(ps);                          // then mark the end of the utterance
            ad_stop_rec(ad);                         // stop recording
            hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
            return hyp;                              // the function returns the hypothesis
            break;                                   // exit the while loop and return to main
        }
    }

}



Ralph

  • Member
  • *
  • Posts: 36
Excellent work - and a sincere "thank you" for coming back to the site and posting the standard C version for others who may need it.  Best of luck moving forward with your PocketSphinx project.

 

* Search


* Recent Topics

Just scary to me... by jinx
[November 16, 2017, 03:05:30 AM]


"Artie" version 3.0 (RTV3) by jinx
[November 14, 2017, 03:11:04 AM]


Anycubic Kossel (pulley) by Bajdi
[November 13, 2017, 04:48:04 PM]


grbl-LPC by jinx
[November 12, 2017, 02:48:46 PM]


mks ftf2.4 simple bezel by jinx
[November 11, 2017, 04:18:13 PM]


Controllers by jinx
[November 11, 2017, 06:48:21 AM]


Magnetometer vs. Electric Motor by ZeroMax
[November 08, 2017, 02:22:47 PM]


Raspberry Pi / Python eBook - free for the next 11 hours! by MEgg
[November 02, 2017, 06:10:53 PM]


who's using atom by jinx
[November 02, 2017, 03:49:38 AM]


3d review of a delta by jinx
[October 27, 2017, 11:30:38 PM]


Coding a Nano i2c Slave by BaldwinK
[October 25, 2017, 05:40:27 AM]


Drawdio by mogul
[October 24, 2017, 02:35:00 PM]


My scope focuser upgrade by jscottb
[October 22, 2017, 03:13:23 PM]


thermoplastic bed surface by Bajdi
[October 22, 2017, 03:12:37 PM]


Kossel Build by Deity
[October 20, 2017, 11:23:27 AM]

* Recent Posts

Re: Just scary to me... by jinx
[November 16, 2017, 03:05:30 AM]


Just scary to me... by Protowrxs
[November 15, 2017, 05:31:10 PM]


Re: "Artie" version 3.0 (RTV3) by jinx
[November 14, 2017, 03:11:04 AM]


Re: Anycubic Kossel (pulley) by Bajdi
[November 13, 2017, 04:48:04 PM]


Re: Anycubic Kossel (pulley) by Deity
[November 13, 2017, 04:08:05 PM]


Re: "Artie" version 3.0 (RTV3) by lukeyes2
[November 12, 2017, 06:29:44 PM]


grbl-LPC by jinx
[November 12, 2017, 02:48:46 PM]


mks ftf2.4 simple bezel by jinx
[November 11, 2017, 04:18:13 PM]


Controllers by jinx
[November 11, 2017, 06:48:21 AM]


Magnetometer vs. Electric Motor by ZeroMax
[November 08, 2017, 02:22:47 PM]


Re: Raspberry Pi / Python eBook - free for the next 11 hours! by MEgg
[November 02, 2017, 06:10:53 PM]


Re: "Artie" version 3.0 (RTV3) by lukeyes2
[November 02, 2017, 10:47:29 AM]


Re: who's using atom by jinx
[November 02, 2017, 03:49:38 AM]


Re: "Artie" version 3.0 (RTV3) by Bajdi
[November 01, 2017, 04:11:20 PM]


Re: who's using atom by Bajdi
[November 01, 2017, 04:08:47 PM]