Hello Everyone,

Today I’m going to tell you how to expand dictionary of acoustic model for Sphinx4. In simple words, This tutorial will tell you how you can add more words in Sphinx’s words database (Dictionary) and let it recognize those words, which are not available in default acoustic models provided by CMU Sphinx. This tutorial is based on “HelloWorld” example provided by CMU Sphinx.

Important Files in this example :

1 ) HelloWorld.java

2) hello.gram

3) helloworld.config.xml

Acoustic Model used in this example :

WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

Lets say, We are creating a SR system for ABC National airlines. Everything will go fine and Sphinx will recognize most of the words except the name of cities and states of India.  Now, I will tell you, How to add name of cities and states in dictionary.

PART ONE

Step 1 : Create a txt file “words.txt”, Write all the names of cities and states in it and save.

Step 2 : Open this link : http://www.speech.cs.cmu.edu/tools/lmtool.html

Step 3 : On that page, go to “Sentence corpus file:” section, Browse to “words.txt” file and click “Compile Knowledge Base”.

Step 4 : On next page, Click on “Dictionary” link and save that .DIC file.

PART TWO

Step 1 : Extract WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar file.

Step 2 : Go to edu\cmu\sphinx\model\acoustic\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz\dict folder.

Step 3 : Open “cmudict.0.6d” file in that folder.

Step 4 : Copy data from .DIC file, you have downloaded in PART ONE, paste it in “cmudict.0.6d” file and save.

Step 5 : Zip the extracted hierarchy back as it was and Zip file named should be same as JAR file.

Now, remove “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar” file from Project’s CLASSPATH and add “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.zip” instead of it.

That’s it ! We are done.  Now Sphinx will also recognize all name of cities and states that we wrote in “words.txt” file.

Now, FAQ time. I will be posting FAQ and few important notes in comments. 🙂

If you have any quires, Please feel free to ask.

Regards,

81 Responses

  1. Hi, is there any possibility to make Sphinx ignore words which are unknown to the Dictionary, so I can ommit Warnings like this:

    14:37:14.732 WARNING dictionary Missing word: &

    14:37:14.732 WARNING jsgfGrammar Can’t find pronunciation for &

    14:37:14.732 WARNING dictionary Missing word: tammi

    14:37:14.732 WARNING jsgfGrammar Can’t find pronunciation for Tammi

    14:37:14.733 WARNING dictionary Missing word: –

    I just want to play dynamically addes mp3s by titles, so Sphinx should recognize spoken track-titles even if some parts of them are ignored (titles usually differ enough). Thanks for help

  2. hi sir

    can u please tell us how to add extra words in gram file which already added to the dict file?? even after adding these words in gram file while running the class file of demo program we are getting exception

    java.lang.NoClassDefFoundError: HelloWorld/edu/cmu/sphinx/demo/helloworld/HelloWorld (wrong name: edu/cmu/sphinx/demo/helloworld/HelloWorld)

    at java.lang.ClassLoader.defineClass1(Native Method)

    at java.lang.ClassLoader.defineClass(Unknown Source)

    at java.security.SecureClassLoader.defineClass(Unknown Source)

    at java.net.URLClassLoader.defineClass(Unknown Source)

    at java.net.URLClassLoader.access$000(Unknown Source)

    at java.net.URLClassLoader$1.run(Unknown Source)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(Unknown Source)

    at java.lang.ClassLoader.loadClass(Unknown Source)

    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)

    at java.lang.ClassLoader.loadClass(Unknown Source)

    at java.lang.ClassLoader.loadClassInternal(Unknown Source)

    Exception in thread “main” .

    can u please solve our prob sir???

  3. Thanx for your marvelous guidelines. I did expanded my acoustic model to recognize extra words. But my problem is that i cant run the project. the displayed errors are:

    Exception in thread “main” java.lang.NullPointerException

    at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:74)

    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:58)

    at edu.cmu.sphinx.demo.helloworld.HelloWorld.main(HelloWorld.java:33)

    I didn’t included the HelloWorld.jar in sphinx library,If I did the application only recognizes the words ‘hello will’ etc as in the demo. Please help me with this its very important for my academics

  4. Hi Puneet!

    I am working on speech recognition and using Sphinx 4, i am trying to execute system commands like running an exe file using voice command. I read out d specifications of Sphinx and can now run Hello World Demo effectively. But still i am unable to add new words to it. For e.g. I want to add Open and Close commands to grammer, I edited d gram file for same but my prog still printing words from good morning | hello etc those were specfied in prev .gram file. I read smwhr that it can be done wid Apache Ant, but i am unable to figure out how to do so..? Please Help !!

  5. Hi Puneet,

    Very concise tutorial, however, i am using Sphinx4 embedded in the voce library for processing. the library itself uses the WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar and sphinx4.jar file. when i follow you steps and replace the WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar with WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.zip i get the following error:

    Exception in thread “Animation Thread” java.lang.NullPointerException

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.loadProperties(ModelLoader.java:372)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.getIsBinaryDefault(ModelLoader.java:386)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.newProperties(ModelLoader.java:346)

    it seems that processing does not really like the zip file. is there any way i can package the zip file in a jar? I did try changing the extension but got the same mistake. please HELP

  6. Mahipal:

    Exception in thread “main” java.lang.IncompatibleClassChangeError: Found class edu.cmu.sphinx.util.props.PropertySheet, but interface was expected

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model.newProperties(Model.java:159)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)

    at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.newProperties(SimpleBreadthFirstSearchManager.java:182)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)

    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)

    at HelloWorld.main(HelloWorld.java:27)

    Hii i’m getting such type of exception please help me.

  7. Exception in thread “main” java.lang.IncompatibleClassChangeError: Found class edu.cmu.sphinx.util.props.PropertySheet, but interface was expected

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model.newProperties(Model.java:159)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:278)

    at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:244)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.newProperties(SimpleBreadthFirstSearchManager.java:182)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.decoder.AbstractDecoder.newProperties(AbstractDecoder.java:65)

    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:37)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.PropertySheet.getComponent(PropertySheet.java:287)

    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:90)

    at edu.cmu.sphinx.util.props.PropertySheet.getOwner(PropertySheet.java:505)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:161)

    at HelloWorld.main(HelloWorld.java:27)

    Hii Puneet i’m getting such exception. please help me.

  8. Hiii i find the solution of given problem.hey but now i’m trying to run HelloNGram source file sp please help i just want to know how we can add extra sentance in hellongram.gram file

  9. Pingback: ORLANDO
  10. Pingback: van phong luat su, luat su
  11. Hi Puneet,

    I wanted to thank you a lot for all your tutorials, and i wanted to ask you, when i change the WSJ file and ass a zip one i get this exception:

    class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model

    Problem configuring HelloWorld: Property exception component:’flatLinguist’ property:’acousticModel’ – component ‘wsj’ is missing

    edu.cmu.sphinx.util.props.InternalConfigurationException: component ‘wsj’ is missing

    Property exception component:’flatLinguist’ property:’acousticModel’ – component ‘wsj’ is missing

    edu.cmu.sphinx.util.props.InternalConfigurationException: component ‘wsj’ is missing

    Could you please guide me through this? Thanks alot

  12. i m workin on speech recognition..after expending the dic..i m running helloworld proj in eclipse..i got error like Exception in thread “main” java.lang.NullPointerExceptionat edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:74)..how can i solve tis…

  13. Bhai it’s not working,whenever i use the zip it gives a null pointer exception.

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.loadProperties(ModelLoader.java:372)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.getIsBinaryDefault(ModelLoader.java:386)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.newProperties(ModelLoader.java:346)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:214)

    at edu.cmu.sphinx.util.props.ValidatingPropertySheet.getComponent(ValidatingPropertySheet.java:403)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model.newProperties(Model.java:158)

    any idea how to do it

    or if you could please send your zip file

  14. Hi, your tutorials have been helpful as i am working Voice recognition project for my final year project. i need ways to build a language model for assertiveness(how to be assertive) that should get user voice as an input convert them in to text then analyze the result using NLP processors.can you hep

  15. BRO,

    even I am getting that null pointer Exception:

    Loading…

    Exception in thread “main” java.lang.NullPointerException

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.loadProperties(ModelLoader.java:372)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.getIsBinaryDefault(ModelLoader.java:386)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader.newProperties(ModelLoader.java:346)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:214)

    at edu.cmu.sphinx.util.props.ValidatingPropertySheet.getComponent(ValidatingPropertySheet.java:403)

    at edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model.newProperties(Model.java:158)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:214)

    at edu.cmu.sphinx.util.props.ValidatingPropertySheet.getComponent(ValidatingPropertySheet.java:403)

    at edu.cmu.sphinx.linguist.flat.FlatLinguist.setupAcousticModel(FlatLinguist.java:204)

    at edu.cmu.sphinx.linguist.flat.FlatLinguist.newProperties(FlatLinguist.java:167)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:214)

    at edu.cmu.sphinx.util.props.ValidatingPropertySheet.getComponent(ValidatingPropertySheet.java:403)

    at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.newProperties(SimpleBreadthFirstSearchManager.java:183)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:214)

    at edu.cmu.sphinx.util.props.ValidatingPropertySheet.getComponent(ValidatingPropertySheet.java:403)

    at edu.cmu.sphinx.decoder.Decoder.newProperties(Decoder.java:71)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:214)

    at edu.cmu.sphinx.util.props.ValidatingPropertySheet.getComponent(ValidatingPropertySheet.java:403)

    at edu.cmu.sphinx.recognizer.Recognizer.newProperties(Recognizer.java:93)

    at edu.cmu.sphinx.util.props.ConfigurationManager.lookup(ConfigurationManager.java:214)

    at demo.sphinx.helloworld.HelloWorld.main(HelloWorld.java:49)

    Bhai any idea how to solve this issue?

  16. sir,

    i want to add all English words in my .gram file, is their any way to include all English words in .gram file or i have to insert all words manually. Please sir help me to get out of this..

    Thank you

  17. Dear sir,

    I am following you since long time, your posts are very helpful to me. I am working on Speech-to-text conversion, i want to include Indian accent to increase accuracy. Do i need to play with Acoustic models..? or anything else. Please help me to get out of this as you have done it before many times.

    I will deeply appreciate your response.

    Thank you.

  18. Olá, eu fiz um treinamento de linguagem para o portugês cm o sphinxtrain e agora eu queria utilizar estes modelos no sphinx4 eu segui o seu tutorial e deu certo com o demo, tu saberia me dizer como faço pra usar o sphinx4 com os modelos que eu criei ?

  19. Hey Punit I watched your tutorial these are very helpful but I have a problem . My sphinx demo hello world program is running but it is not recognizing the voice output is start speaking.ctrl-c to quit. You said: start speaking. Ctrl-c to quit . Help me

  20. hello sir, i want to develope some applicationn that can be play some video,music and open some picture from library by using voice recognition. is it possible ??? to develop ?? How ??

  21. Hi this code is very useful to me but …. some scenarios when ever i speak some other word which is not present in gram file its picking randomly some word.please help me how to overcome this bug.

Leave a Reply

Your email address will not be published. Required fields are marked *


Notice: Use of undefined constant STOPSPAM_PLUGIN_VERSION - assumed 'STOPSPAM_PLUGIN_VERSION' in /home/webpilla/puneetk.com/wp-content/plugins/stop-spam/stop-spam.php on line 36

Notice: Use of undefined constant STOPSPAM_PLUGIN_VERSION - assumed 'STOPSPAM_PLUGIN_VERSION' in /home/webpilla/puneetk.com/wp-content/plugins/stop-spam/stop-spam.php on line 40