
Books Written By Dr. Kai-Fu Lee
Books Written By Dr. Kai-Fu
Automatic Speech Recognition: The Development of the SPHINX System
Speech Recognition has a long history of being one of the difficult problems in Artificial Intelligence and Computer Science. As one goes from problem solving tasks such as puzzles and chess to perceptual tasks such as speech and vision, the problem characteristics change dramatically: knowledge poor to knowledge rich; low data rates to high data rates; slow response time (minutes to hours) to instantaneous response time. These characteristics taken together increase the computational complexity of the problem by several orders of magnitude. Further, speech provides a challenging task domain which embodies many of the requirements of intelligent behavior: operate in real time; exploit vast amounts of knowledge, tolerate errorful, unexpected unknown input; use symbols and abstractions; communicate in natural language and learn from the environment. Voice input to computers offers a number of advantages. It provides a natural, fast, hands free, eyes free, location free input medium. However, there are many as yet unsolved problems that prevent routine use of speech as an input device by non-experts. These includ e cost, real time response, speaker independence, robustness to variations such as noise, microphone, speech rate and loudness, and the ability to handle non-grammatical speech. Satisfactory solutions to each of these problems can be expected within the next decade. Recognition of unrestricted spontaneous continuous speech appears unsolvable at present. However, by the addition of simple constraints, such as clarification dialog to resolve ambiguity, we believe it will be possible to develop systems capable of accepting very large vocabulary continuous speechdictation.
Readings in speech recognition
Table of Contents
Speech recognition by machine: a review |
|
D. R. Reddy |
|
Pages: 8-38 |
|
The value of speech recognition systems |
|
Wayne A. Lea |
|
Pages: 39-46 |
|
Digital representations of speech signals |
|
Ronald W. Schafer, Lawrence R. Rabiner |
|
Pages: 49-64 |
|
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences |
|
Steven B. Davis, Paul Mermelstein |
|
Pages: 65-74 |
|
Vector quantization |
|
Robert M. Gray |
|
Pages: 75-100 |
|
A joint synchrony/mean-rate model of auditory speech processing |
|
Stephanie Seneff |
|
Pages: 101-111 |
|
Isolated andconnected word recognition—theory and selected applications |
|
Lawrence R. Rabiner, Stephen E. Levinson |
|
Pages: 115-153 |
|
Minimum prediction residual principle applied to speech recognition |
|
Fumitada Itakura |
|
Pages: 154-158 |
|
Dynamic programming algorithm optimization for spoken word recognition |
|
Hiroaki Sakoe, Seibi Chiba |
|
Pages: 159-165 |
|
Speaker-independent recognition of isolated words using clustering techniques |
|
Lawrence R. Rabiner, Stephen E. Levinson, Aaron E. Rosenberg, Jay G. Wilpon |
|
Pages: 166-179 |
|
Two-level DP-matching—a dynamic programming-based pattern matching algorithm for connected word recognition |
|
Hiroaki Sakoe |
|
Pages: 180-187 |
|
The use of a one-stage dynamic programming algorithm for connected word recognition |
|
Hermann Ney |
|
Pages: 188-196 |
|
The use of speech knowledge in automatic speech recognition |
|
Victor W. Zue |
|
Pages: 200-213 |
|
Performing fine phonetic distinctions: templates versus features |
|
Ronald A. Cole, Richard M. Stern, Moshé J. Lasry |
|
Pages: 214-224 |
|
Recognition of speaker-dependent continuous speech with KEAL |
|
G. Mercier, D. Bigorgne, L. Miclet, L. Le Guennec, M.Querre |
|
Pages: 225-234 |
|
The hearsay-II speech understanding system: a tutorial |
|
Lee D.Ermon , Victor R. Lesser |
|
Pages: 235-245 |
|
Learning and plan refinement in a knowledge-based system for automatic speech recognition |
|
Renato De Mori, Lily Lam, Michel Gilloux |
|
Pages: 246-262 |
|
A tutorial on hidden Markov models and selected applications in speech recognition |
|
Lawrence R. Rabiner |
|
Pages: 267-296 |
|
Stochastic modeling for automatic speech understanding |
|
James K. Baker |
|
Pages: 297-307 |
|
A maximum likelihood approach to continuous speech recognition |
|
Lalit R. Bahl, Frederick Jelinek, Robert L. Mercer |
|
Pages: 308-319 |
|
High performance connected digit recognition using hidden Markov models |
|
Lawrence R. Rabiner, Jay G. Wilpon, Frank K. Soong |
|
Pages: 320-331 |
|
Speech recognition with continuous-parameter hidden Markov models |
|
Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, Robert L. Mercer |
|
Pages: 332-339 |
|
Semi-continuous hidden Markov models for speech signals |
|
X. D. Huang, M. A. Jack |
|
Pages: 340-346 |
|
Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition |
|
Kai-Fu Lee |
|
Pages: 347-366 |
|
A stochastic segment model for phoneme-based continuous speech recognition |
|
S. Roucos, M. O. Dunham |
|
Pages: 367-370 |
|
Review of neural networks for speech recognition |
|
Richard P. Lippmann |
|
Pages: 374-392 |
|
Phoneme recognition using time-delay neural networks |
|
Alexander Waibel, Toshiyuki Hanazawa,Geofrey Hinton, Kiyohiro Shikano, Kevin J. Lang |
|
Pages: 393-404 |
|
Consonant recognition by modular construction of large phonemic time-delay neural networks |
|
Alex Waibel, Hidefumi Sawai, Kiyohiro Shikano |
|
Pages: 405-408 |
|
Learned phonetic discrimination using connectionist networks |
|
R. L. Watrous, L. Shastri, A. H. Waibel |
|
Pages: 409-412 |
|
Shift-tolerant LVQ and hybrid LVQ-HMM for phoneme recognition |
|
Erik McDermott, HitoshiIwamida , Shigeru Katagiri, Yoh’ichi Tohkura |
|
Pages: 425-438 |
|
The “neural” phonetic typewriter |
|
Teuvo Kohonen |
|
Pages: 425-424 |
|
Speaker-independent word recognition using dynamic programming neural networks |
|
Hiroaki Sakoe, Ryosuke Isotani, Kazunaga Yoshida, Ken-ichi Iso, Takao Watanabe |
|
Pages: 439-442 |
|
Speaker-independent word recognition using a neural prediction model |
|
Ken-ichi Iso, Takao Watanabe |
|
Pages: 443-446 |
|
Self-organized language modeling for speech recognition |
|
F. Jelinek |
|
Pages: 450-506 |
|
A tree-based statistical language model for natural language speech recognition |
|
Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, Robert L. Mercer |
|
Pages: 507-514 |
|
Modification of Earley’s algorithm for speech recognition |
|
AnnedorePaeseler |
|
Pages: 515-518 |
|
Language processing for speech understanding |
|
W. A. Woods |
|
Pages: 519-533 |
|
Prosodic knowledge sources for wordhypothesization in Alex continuous speech recognition system |
|
Alex Waibel |
|
Pages: 534-537 |
|
High level
knowledge sources in usable speech recognition systems |
|
Sheryl R. Young, Alexander G. Hauptmann, Wayne H. Ward, Edward T. Smith, Philip Werner |
|
Pages: 538-549 |
|
Review of the ARPA speech understanding project |
|
D. H. Klatt |
|
Pages: 554-575 |
|
The Harpy speech understanding system |
|
Bruce Lowerre |
|
Pages: 576-586 |
|
The development of an experimental discrete dictation recognizer |
|
Frederick Jelinek |
|
Pages: 587-595 |
|
BYBLOS: the BBN continuous speech recognition system |
|
Y. L. Chow, M. O. Dunham, O. A. Kimball, M. A. Krasner, G. F. Kubala, J. Makhoul, P. J. Price, S. Roucos, R. M. Schwartz |
|
Pages: 596-599 |
|
An overview of the SPHINX speech recognition system |
|
Kai-Fu Lee, Hsiao-Wuen Hon, Raj Reddy |
|
Pages: 600-610 |
|
ATR HMM-LR continuous speech recognition system |
|
ToshiyukiHanazawa , Kenji Kita, Satoshi Nakamura, Takeshi Kawabata, Kiyohiro Shikano |
|
Pages: 611-614 |
|
A word hypothesizer for a large vocabulary continuous speech understanding system |
|
L. Fissore, P. Laface, G. Micca, R. Pieraccini |
|
Pages: 615-618 |