Puneet Kalra - www.Puneetk.com

home | pwing | pikk

biography | facebook | contact

“Developing real time human a like robotics system with extra ordinary artificial intelligence, Not only artificial intelligence. A system that can learn new things itself” ~Puneet Kalra

< SUBSCRIBE >VIA FACEBOOK
FAVOURITES
Videos »
GET INSPIRED

Expanding Dictionary Of Acoustic Model

January 5th, 2010 by Puneet Kalra Leave a reply »

Hello Everyone,

Today I’m going to tell you how to expand dictionary of acoustic model for Sphinx4. In simple words, This tutorial will tell you how you can add more words in Sphinx’s words database (Dictionary) and let it recognize those words, which are not available in default acoustic models provided by CMU Sphinx. This tutorial is based on “HelloWorld” example provided by CMU Sphinx.

Important Files in this example :
1 ) HelloWorld.java
2) hello.gram
3) helloworld.config.xml

Acoustic Model used in this example :
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

Lets say, We are creating a SR system for ABC National airlines. Everything will go fine and Sphinx will recognize most of the words except the name of cities and states of India.  Now, I will tell you, How to add name of cities and states in dictionary.

PART ONE
Step 1 :
Create a txt file “words.txt”, Write all the names of cities and states in it and save.
Step 2 : Open this link : http://www.speech.cs.cmu.edu/tools/lmtool.html
Step 3 : On that page, go to “Sentence corpus file:” section, Browse to “words.txt” file and click “Compile Knowledge Base”.
Step 4 : On next page, Click on “Dictionary” link and save that .DIC file.

PART TWO
Step 1 : Extract WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar file.
Step 2 : Go to edu\cmu\sphinx\model\acoustic\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz\dict folder.
Step 3 : Open “cmudict.0.6d” file in that folder.
Step 4 : Copy data from .DIC file, you have downloaded in PART ONE, paste it in “cmudict.0.6d” file and save.
Step 5 : Zip the extracted hierarchy back as it was and Zip file named should be same as JAR file.

Now, remove “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar” file from Project’s CLASSPATH and add “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.zip” instead of it.

That’s it ! We are done.  Now Sphinx will also recognize all name of cities and states that we wrote in “words.txt” file.
Now, FAQ time. I will be posting FAQ and few important notes in comments. :)

If you have any quires, Please feel free to ask.
Regards,

Advertisement

17 Responses

  1. King says:

    Hello sir,
    Thanks a lot for such a nice tut…i have one request..please provide a tut for changing the grammers in Helloworld program..i mean how do i add/change few more names other than what is provided.
    Like i want to add words like “please jhon” etc…

  2. nassir says:

    hi friend
    can explain this in video
    thank you

  3. Billie Alesci says:

    Because of reading your blog, I decided to start my own. I had never been interested in keeping a blog until I saw how interesting yours was, then I was inspired!

  4. Pava says:

    hello Puneet,

    we tried to extend the acoustic dictionary as you hav said in the tutorial.

    But we get some error like

    16:38:40.740 WARNING dictionary Missing word: pune
    16:38:40.748 WARNING jsgfGrammar Can’t find pronunciation for Pune

    note : we’re working in the Eclipse IDE.

  5. Puneet Kalra says:

    Hello,

    Thanks for posting, sorry for late reply. Was little busy.

    Well, ..

    16:38:40.740 WARNING dictionary Missing word: pune
    16:38:40.748 WARNING jsgfGrammar Can’t find pronunciation for Pune

    ..

    This means, Sphinx is unable to find pronunciation of “Pune” in dic file of Acoustic model.

    Follow the steps correctly :) And it will work for sure :)

    If you still face any problem, Please feel free to post here.

    Regards,

  6. shahid says:

    but we can’t run zip file in dos

  7. Puneet Kalra says:

    Hello Shahid,

    Hmm, well i never tried it.

    I’ll suggest you to use Eclipse IDE.

    OR

    You’ll have follow the complete process of packing the files into JAR.

    Hope it works ! :)

    If you have any queries, Please feel free to ask :)
    Regards,

  8. nita says:

    hii friend…

    I want a dictionary tool for indian english…can u help me…plzzz

  9. Puneet Kalra says:

    Hello Nita,

    Thanks for posting,

    I just google’d about it. Here’s something i got for you :
    http://sourceforge.net/projects/hindiasr/

    Try this out !

    Or

    As i said in my post, If you need to recognize few words only. Then this trick is for you ! :)

    Best of luck.

  10. nita says:

    hey thanx frnd…i didnid expect reply 4m u…actually dat was my first comment to one website:)…but i really want a tool for English not Hindi:( can u help me…

  11. Puneet Kalra says:

    Hello again :)

    Can you please be more specific, What exactly do you need ?

    Regards,

  12. nita says:

    Hii..
    step 2 in PART ONE of your post talked about one dictionary tool…right??
    I think , that tool will only give pronunciation of US Accent…For my project I need a tool which gives pronunciation of Indian English Accent …hope u can understand this:(

  13. Puneet Kalra says:

    Hello Nita,

    Hmm, I tried on Google. I don’t think there’s any Acoustic model available for Indian English ( in free ). There are few but paid ones by HP, IBM n few more.

    If you need any kind of help, Please feel free to post :)

    Regards

  14. nita says:

    Ohh…ok friend..thank you very much:)

  15. nita says:

    Hello..
    Is there any analysis of speech done in Sphinx??

  16. nita says:

    Hi Puneet Karla,

    I need to do speech analysis using sphinx. is it possible or do we need to try with any other alternate software? Please let me know your kind suggestion on this. Also, is Sphinx returns only text as output?

Trackbacks/
Pingbacks

  1. SCOTT

Leave a Reply