"I Think, I Conceptualize, I Analyze, I Design, and I Create." ~ Puneet Kalra

Cognitive Robotics Research Centre Of University Of Wales, Newport Puneet Kalra

Home Studies Research Projects Tutorials Portfolio

Puneet Kalra - www.puneetk.com - Socializing Robots

Expanding Dictionary Of Acoustic Model

January 5th, 2010 by Puneet Kalra Leave a reply »

Hello Everyone,

Today I’m going to tell you how to expand dictionary of acoustic model for Sphinx4. In simple words, This tutorial will tell you how you can add more words in Sphinx’s words database (Dictionary) and let it recognize those words, which are not available in default acoustic models provided by CMU Sphinx. This tutorial is based on “HelloWorld” example provided by CMU Sphinx.

Important Files in this example :
1 ) HelloWorld.java
2) hello.gram
3) helloworld.config.xml

Acoustic Model used in this example :
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

Lets say, We are creating a SR system for ABC National airlines. Everything will go fine and Sphinx will recognize most of the words except the name of cities and states of India.  Now, I will tell you, How to add name of cities and states in dictionary.

PART ONE
Step 1 :
Create a txt file “words.txt”, Write all the names of cities and states in it and save.
Step 2 : Open this link : http://www.speech.cs.cmu.edu/tools/lmtool.html
Step 3 : On that page, go to “Sentence corpus file:” section, Browse to “words.txt” file and click “Compile Knowledge Base”.
Step 4 : On next page, Click on “Dictionary” link and save that .DIC file.

PART TWO
Step 1 : Extract WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar file.
Step 2 : Go to edu\cmu\sphinx\model\acoustic\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz\dict folder.
Step 3 : Open “cmudict.0.6d” file in that folder.
Step 4 : Copy data from .DIC file, you have downloaded in PART ONE, paste it in “cmudict.0.6d” file and save.
Step 5 : Zip the extracted hierarchy back as it was and Zip file named should be same as JAR file.

Now, remove “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar” file from Project’s CLASSPATH and add “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.zip” instead of it.

That’s it ! We are done.  Now Sphinx will also recognize all name of cities and states that we wrote in “words.txt” file.
Now, FAQ time. I will be posting FAQ and few important notes in comments. :)

If you have any quires, Please feel free to ask.
Regards,

60 Responses

  1. Abubakkar says:

    How should i make sphinx4 recognise all the english words????

  2. sravani says:

    hi sir

    can u please tell us how to add extra words in gram file which already added to the dict file?? even after adding these words in gram file while running the class file of demo program we are getting exception

    java.lang.NoClassDefFoundError: HelloWorld/edu/cmu/sphinx/demo/helloworld/HelloWorld (wrong name: edu/cmu/sphinx/demo/helloworld/HelloWorld)
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(Unknown Source)
    at java.security.SecureClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.access$000(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClassInternal(Unknown Source)
    Exception in thread “main” .

    can u please solve our prob sir???

  3. Niphin says:

    Thanx for your marvelous guidelines. I did expanded my acoustic model to recognize extra words. But my problem is that i cant run the project. the displayed errors are:

    Exception in thread “main” java.lang.NullPointerException
    at edu.cmu.sphinx.util.props.SaxLoader.load(SaxLoader.java:74)
    at edu.cmu.sphinx.util.props.ConfigurationManager.(ConfigurationManager.java:58)
    at edu.cmu.sphinx.demo.helloworld.HelloWorld.main(HelloWorld.java:33)

    I didn’t included the HelloWorld.jar in sphinx library,If I did the application only recognizes the words ‘hello will’ etc as in the demo. Please help me with this its very important for my academics

  4. Acs says:

    Hi Puneet!
    I am working on speech recognition and using Sphinx 4, i am trying to execute system commands like running an exe file using voice command. I read out d specifications of Sphinx and can now run Hello World Demo effectively. But still i am unable to add new words to it. For e.g. I want to add Open and Close commands to grammer, I edited d gram file for same but my prog still printing words from good morning | hello etc those were specfied in prev .gram file. I read smwhr that it can be done wid Apache Ant, but i am unable to figure out how to do so..? Please Help !!

Trackbacks/
Pingbacks

  1. SCOTT

Leave a Reply

Please note : I will help you only if you will show some efforts! Don't expect replies to simple queries, You can easily find answers for them through searching.

I'm a student and I work as well. So i might reply late here. If you have something important to discuss Or stuck somewhere in your project. Please post on my Facebook page for a quick response.