ASR - Automatic Speech Recognition

Sonic

Overview
Main Features
References

Overview

SONIC is a complete toolkit for research and development of new algorithms for continuous speech recognition. The software has been under development at CSLR since March of 2001. SONIC represents a test bed for integrating new ideas and for supporting research activities that include speech recognition as core components at the University of Colorado. The toolkit is far from complete and is not meant as a general purpose HMM toolkit (as is the case for the HTK toolkit from Cambridge). SONIC is specifically designed for speech recognition research with careful attention applied for speed and efficiency needed for real-time use in live applications.

Main Features

SONIC is based on continuous density hidden Markov model (CDHMM) technology. The acoustic models are decision-tree state-clustered HMMs with associated gamma probability density functions to model statedurations.

The recognizer implements a two-pass search strategy. The first pass consists of a timesynchronous, beam-pruned Viterbi token-passing search through a lexical prefix tree. Cross-word acoustic models and trigram or four-gram language models are applied in the first pass of search. During the second pass, the resulting word-lattice is converted into a word-graph. Longer span language models can be used to rescore the word graph using an A* algorithm or to compute word-posterior probabilities to provide wordlevel confidence scores.

The recognizer toolkit consists of a core speech recognition engine and programming interface (API). The current implementation allows for two modes of speech recognition:

Keyword / Grammar Decoding – continuous speech recognition constrained by a finite-state grammar. This mode also allows for keyword and grammar spotting capabilities;
N-gram Decoding – speech recognition based on statistical n-gram language models.

SONIC incorporates speaker adaptation and normalization methods such as Maximum Likelihood Linear Regression (MLLR), Vocal Tract Length Normalization (VTLN), and cepstral mean & variance normalization. In addition advanced language-modeling strategies such as concept language models are also incorporated into the toolkit.

References

Pellom B. 2001. "SONIC: The University of Colorado Continuous Speech Recognizer", Technical Report TRCSLR-2001-01, University of Colorado, USA, 2001.

Pellom B. and Hacioglu K. 2003. "Recent Improvements in the CU SONIC ASR System for Noisy Speech: The SPINE Task", Proc. ICASSP, Hong Kong, 2003.

Cosi P., Pellom B. 2005. "Italian Children’s Speech Recognition For Advanced Interactive Literacy Tutors", in CD-Rom Proceedings INTERSPEECH 2005, Lisbon, Portugal, 2005, pp. 2201-2204.

Cosi P., "Recent Advances in Sonic Italian Children’s Speech Recognition for Interactive Literacy Tutors", in Proceedings of 1st Workshop on Child, Computer and Interaction (ICMI'08, 10TH International Conference on Multimodal Interfaces, post-conference workshop), Chania, Crete, Greece, October 23, 2008, CD-ROM.

For more information please contact :

Piero Cosi

Istituto di Scienze e Tecnologie della Cognizione - Sede Secondaria di Padova "ex Istituto di Fonetica e Dialettologia";
CNR di Padova (e-mail: piero.cosi@pd.istc.cnr.it).