collapse

Welcome!



Welcome to Robot Rebels, an online robot maker community.

Register today to post your projects, ask questions, share knowledge and meet like-minded people from around the world.


The RobotRebel.org Community

Author Topic: Boilerplate PocketSphinx C++ code to capture microphone input on Raspberry Pi  (Read 5507 times)

Ralph

  • Member
  • *
  • Posts: 36
Below I've adapted and simplified the demonstration code that comes with PocketSphinx (continuous.c) to hopefully make it easier for people just getting started to follow the logic and write their own C/C++ programs.  Mostly what I've done is rearrange things a bit, strip out some extraneous code, and add a bunch of comments.  I also removed the routines related to reading from files and focused on microphone input.  I've preserved the variable names etc. from continuous.c whenever possible to facilitate comparison with the original.

This boilerplate requires a custom dictionary and language model (custom.dic and custom.lm) to be present in the local directory.

Reference http://www.robotrebels.org/index.php?topic=220.0 for details on installing PocketSphinx on a Raspberry Pi and creating the dictionary and language files.

To compile, save the source code as ps_boilerplate.cpp and enter the following string at the command line (of course you can name the source and target files anything you want):
Code: [Select]
g++ -O3 -o ps_boilerplate ps_boilerplate.cpp  `pkg-config --cflags --libs pocketsphinx sphinxbase`

To run the program:
Code: [Select]
./ps_boilerplate


Concept of Operation

Among its capabilities, the PocketSphinx API provides an interface to functionality to:
  • manage audio input devices
  • monitor incoming audio for the presence of speech
  • decode spoken audio into text

This code provides a boilerplate framework to implement those capabilities in an application that performs near continuous speech recognition from a microphone.

There are two functions:

main -
Performs some basic housekeeping to configure and initialize a decoder instance and then starts a loop that calls the core recognition routine and sends the results to the screen, one decoded statement at a time.  The text is placed in a variable named decoded_speech.  This boilerplate could be expanded by adding functionality to "do something" with the contents of the decoded_speech variable.

recognize_from_microphone -
When called, returns a string containing the decoded text of the next spoken statement.

The basic flow of recognize_from_microphone is:
  • start recording audio
  • start reading from the audio buffer and feeding the audio to the PocketSphinx decoder
  • test for the presence of speech and once detected set a flag that speech has started
  • begin testing for the absence of speech, once detected clear the flag
  • stop recording
  • get the hypothesis (best guess) of recorded speech-to-text and return

Code: [Select]
#include <iostream>
#include <string>
#include <pocketsphinx.h>
#include <sphinxbase/ad.h>
#include <sphinxbase/err.h>

using namespace std;

string recognize_from_microphone();

ps_decoder_t *ps;                  // create pocketsphinx decoder structure
cmd_ln_t *config;                  // create configuration structure
ad_rec_t *ad;                      // create audio recording structure - for use with ALSA functions

int16 adbuf[4096];                 // buffer array to hold audio data
uint8 utt_started, in_speech;      // flags for tracking active speech - has speech started? - is speech currently happening?
int32 k;                           // holds the number of frames in the audio buffer
char const *hyp;                   // pointer to "hypothesis" (best guess at the decoded result)


int main(int argc, char *argv[]) {

  config = cmd_ln_init(NULL, ps_args(), TRUE,                   // Load the configuration structure - ps_args() passes the default values
    "-hmm", "/usr/local/share/pocketsphinx/model/en-us/en-us",  // path to the standard english language model
    "-lm", "custom.lm",                                         // custom language model (file must be present)
    "-dict", "custom.dic",                                      // custom dictionary (file must be present)
    "-logfn", "/dev/null",                                      // suppress log info from being sent to screen
     NULL);

  ps = ps_init(config);                                                        // initialize the pocketsphinx decoder
  ad = ad_open_dev("sysdefault", (int) cmd_ln_float32_r(config, "-samprate")); // open default microphone at default samplerate

  while(1){                                                                   
    string decoded_speech = recognize_from_microphone();          // call the function to capture and decode speech           
    cout << "Decoded Speech: "<< decoded_speech << "\n" <<endl;   // send decoded speech to screen
   }

 ad_close(ad);                                                    // close the microphone
}
 
string recognize_from_microphone(){

    ad_start_rec(ad);                                // start recording
    ps_start_utt(ps);                                // mark the start of the utterance
    utt_started = FALSE;                             // clear the utt_started flag

    while(1) {                                       
        k = ad_read(ad, adbuf, 4096);                // capture the number of frames in the audio buffer
        ps_process_raw(ps, adbuf, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder

        in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected

        if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false                           
            utt_started = TRUE;                      // then set the flag
        }
 
        if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
            ps_end_utt(ps);                          // then mark the end of the utterance
            ad_stop_rec(ad);                         // stop recording
            hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
            return hyp;                              // the function returns the hypothesis
            break;                                   // exit the while loop and return to main
        }
    }

}

vladtech

  • Member
  • ****
  • v
  • Posts: 8
Hello, I was wondering if you had a boilerplate for PocketSphinx for C and not C++, I know C but not C++. Thank you.

Ralph

  • Member
  • *
  • Posts: 36
As noted above, this "boilerplate" is adapted and simplified from the demonstration code that comes with PocketSphinx.  That file is named "continuous.c" - it is straight C so it should meet your needs.  It should already be present on your computer if PocketSphinx is installed.  (Unfortunately the original continuous.c code is considerably more complex than the boilerplate - but that's why I posted the boilerplate.)  Off the top of my head I can't remember exactly what I did when I used C++ syntax in my simplified version, but I suspect the changes you would need to make are primarily to use only standard C headers, remove the "using namespace" directive, and use "printf" in place of "cout" for sending text to the console.

For getting started using C with PocketSphinx, it would probably help to go straight to the source at the PocketSphinx tutorial - http://cmusphinx.sourceforge.net/wiki/tutorialPocketSphinx

vladtech

  • Member
  • ****
  • v
  • Posts: 8
Thanks to Ralph I was about to make some changes to make it a pure C boilerplate, here is the code:

Code: [Select]
#include <stdio.h>
#include <string.h>
#include <pocketsphinx.h>
#include <sphinxbase/ad.h>
#include <sphinxbase/err.h>

const char * recognize_from_microphone();

ps_decoder_t *ps;                  // create pocketsphinx decoder structure
cmd_ln_t *config;                  // create configuration structure
ad_rec_t *ad;                      // create audio recording structure - for use with ALSA functions

int16 adbuf[4096];                 // buffer array to hold audio data
uint8 utt_started, in_speech;      // flags for tracking active speech - has speech started? - is speech currently happening?
int32 k;                           // holds the number of frames in the audio buffer
char const *hyp;                   // pointer to "hypothesis" (best guess at the decoded result)
char const *decoded_speech;


int main(int argc, char *argv[]) {

  config = cmd_ln_init(NULL, ps_args(), TRUE,                   // Load the configuration structure - ps_args() passes the default values
    "-hmm", "/usr/local/share/pocketsphinx/model/en-us/en-us",  // path to the standard english language model
    "-lm", "custom.lm",                                         // custom language model (file must be present)
    "-dict", "custom.dic",                                      // custom dictionary (file must be present)
    "-logfn", "/dev/null",                                      // suppress log info from being sent to screen
     NULL);

  ps = ps_init(config);                                                        // initialize the pocketsphinx decoder
  ad = ad_open_dev("sysdefault", (int) cmd_ln_float32_r(config, "-samprate")); // open default microphone at default samplerate

  while(1){                                                                   
    decoded_speech = recognize_from_microphone();            // call the function to capture and decode speech           
    printf("You Said: %s\n", decoded_speech); // send decoded speech to screen

   }

 ad_close(ad);                                                    // close the microphone
}
 
const char * recognize_from_microphone(){

    ad_start_rec(ad);                                // start recording
    ps_start_utt(ps);                                // mark the start of the utterance
    utt_started = FALSE;                             // clear the utt_started flag

    while(1) {                                       
        k = ad_read(ad, adbuf, 4096);                // capture the number of frames in the audio buffer
        ps_process_raw(ps, adbuf, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder

        in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected

        if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false                           
            utt_started = TRUE;                      // then set the flag
        }
 
        if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
            ps_end_utt(ps);                          // then mark the end of the utterance
            ad_stop_rec(ad);                         // stop recording
            hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
            return hyp;                              // the function returns the hypothesis
            break;                                   // exit the while loop and return to main
        }
    }

}



Ralph

  • Member
  • *
  • Posts: 36
Excellent work - and a sincere "thank you" for coming back to the site and posting the standard C version for others who may need it.  Best of luck moving forward with your PocketSphinx project.

 

* Search


* Recent Topics

Youtube by Impala
[Today at 10:23:59 AM]


Bob's gone Wood by Protowrxs
[January 16, 2018, 10:41:57 PM]


Automatic Garage Door malfunction by Protowrxs
[January 16, 2018, 05:34:00 PM]


speech synthesizer chip by Protowrxs
[January 16, 2018, 05:22:22 PM]


Spider by Protowrxs
[January 16, 2018, 05:18:26 PM]


CZ-1 2.0: problems, calibration and printing, part2 by MEgg
[January 14, 2018, 06:35:28 PM]


3D printing: autoleveling, autoadjustment by MEgg
[January 09, 2018, 05:18:46 PM]


Simple One-Servo Walker by ZeroMax
[January 03, 2018, 10:04:07 PM]


Anycubic Kossel (pulley) by Deity
[December 31, 2017, 09:04:23 AM]


Merry Christmas by Impala
[December 24, 2017, 12:04:45 AM]


printed CNC engraver by jinx
[December 17, 2017, 04:45:43 AM]


metal chassis desktop bot by craighissett
[December 14, 2017, 06:55:16 AM]


LEGO by Impala
[December 13, 2017, 09:46:53 PM]


Your workshop space by viswesh
[December 07, 2017, 09:32:47 PM]


Albert Michelson's Harmonic Analyzer by jinx
[December 05, 2017, 01:05:48 PM]

* Recent Posts

Youtube by Impala
[Today at 10:23:59 AM]


Re: Bob's gone Wood by Protowrxs
[January 16, 2018, 10:41:57 PM]


Bob's gone Wood by ZeroMax
[January 16, 2018, 06:49:12 PM]


Re: Automatic Garage Door malfunction by Protowrxs
[January 16, 2018, 05:34:00 PM]


Re: speech synthesizer chip by Protowrxs
[January 16, 2018, 05:22:22 PM]


Re: Spider by Protowrxs
[January 16, 2018, 05:18:26 PM]


Spider by viswesh
[January 15, 2018, 01:14:19 PM]


Re: CZ-1 2.0: problems, calibration and printing, part2 by MEgg
[January 14, 2018, 06:35:28 PM]


Re: speech synthesizer chip by ZeroMax
[January 14, 2018, 01:49:46 PM]


Re: CZ-1 2.0: problems, calibration and printing, part2 by MEgg
[January 14, 2018, 01:09:24 PM]


Re: Automatic Garage Door malfunction by MEgg
[January 13, 2018, 07:28:48 AM]


Automatic Garage Door malfunction by 1 what
[January 13, 2018, 02:35:51 AM]


speech synthesizer chip by mogul
[January 11, 2018, 01:42:41 AM]


Re: 3D printing: autoleveling, autoadjustment by MEgg
[January 09, 2018, 05:18:46 PM]


Re: Simple One-Servo Walker by ZeroMax
[January 03, 2018, 10:04:07 PM]