Bavieca 



Overview

The BAVIECA ASR was developed by Daniel Bolaños at Boulder Learning Inc.

The BAVIECA toolkit includes a set of command line tools that can be used to build very sophisticated large vocabulary speech recognition systems from scratch. According to (Bolaños, 2012), "BAVIECA is an open-source speech recognition toolkit intended for speech research and system development. The toolkit supports lattice-based discriminative training, wide phonetic-context, efficient acoustic scoring, large n-gram language models, and the most common feature and model transformations. BAVIECA is written entirely in C++ and presents a simple and modular design with an emphasis on scalability and reusability. BAVIECA achieves competitive results in standard benchmarks. The toolkit is distributed under the highly unrestricted Apache 2.0 license, and is freely available on Source Forge". Moreover, as written in his official web site (http://www.bavieca.org/tools.html#), BAVIECA "offers an Application Programming Interface (API) that exposes speech processing features such as speech recognition, speech activity detection, forced alignment, etc. This API is provided as a C++ library that can be used to create stand-alone applications that exploit BAVIECA's speech recognition features". Compared to existing open-source automatic speech recognition (ASR) toolkits, such as HTK (Young et alii, 2009), CMU-Sphinx (Lee et alii, 1990), (Walker et alii, 2004), RWTH (Rybach et alii, 2009), JULIUS (Lee et alii, 2001) and the more recent Kaldi (Povey, Ghoshal, 2009), BAVIECA is characterized by a simple and modular design that favors scalability and reusability, a small code base, a focus on real-time performance and a highly unrestricted license.

Main Features

As illustrated in the BAVIECA web page (www.bavieca.org) the list below summarizes the main features of the BAVIECA toolkit.

Large vocabulary continuous speech recognition

Acoustic modeling

Language modeling

Speaker adaptation

Feature extraction

Lattice processing and n-best list generation

Speech activity detection

References

Bolaños D. (2012), "The Bavieca Open-Source Speech Recognition Toolkit". In Proceedings of IEEE Workshop on Spoken Language Technology (SLT), December 2-5, 2012, Miami, FL, USA, 2012.

Young S., Evermann G., Gales M., Hain T., Kershaw D., Liu X., Moore G., Odell J., Ollason D., Povey D., Valtchev V., and Woodland P. (2009), The HTK Book (for version 3.4). Cambridge Univ. Eng. Dept., 2009.

Lee K.F., Hon H.W., and Reddy R. (1990), "An overview of the SPHINX speech recognition system". In IEEE Transactions on Acoustics, Speech and Signal Processing 38.1 (1990), 35-45.

Walker W., Lamere P., Kwok P., Raj B., Singh R., Gouvea E., Wolf P., and Woelfel J. (2004), "Sphinx-4: A Flexible Open Source Framework for Speech Recognition," Sun Microsystems Inc., Technical Report SML1 TR2004-0811, 2004.

Rybach D., Gollan C., Heigold G., Hoffmeister B., Lööf J., Schülter R., and Ney H. (2009), "The RWTH Aachen University Open Source Speech Recognition System," in Proc. of INTERSPEECH, 2009, 2111-2114, 2009.

Lee A., Kawahara T., and Shikano K. (2001). "JULIUS - an open source real-time large vocabulary recognition engine". In Proceedings of INTERSPEECH 2001, 1691-1694.

Povey D., Ghoshal A., Boulianne G., Burget L., Glembek O., Goel N., Hannemann M., Motlí?ek P., Qian Y., Schwarz P., Silovský J., Stemmer G., Veselý K. (2011), "The Kaldi Speech Recognition Toolkit," in Proc. of ASRU, 2011.

top.gif (983 bytes)


For more information please contact :

Piero Cosi Istituto di Scienze e Tecnologie della Cognizione - Sede Secondaria di Padova "ex Istituto di Fonetica e Dialettologia";
CNR di Padova (e-mail: piero.cosi@pd.istc.cnr.it).

 


working.gif (1843 bytes)