Books Written By Dr. Kai-Fu Lee

Books Written By Dr. Kai-Fu Lee

Books Written By Dr. Kai-Fu 

Automatic Speech Recognition: The Development of the SPHINX System

Speech Recognition has a long history of being one of the difficult problems in Artificial Intelligence and Computer Science. As one goes from problem solving tasks such as puzzles and chess to perceptual tasks such as speech and vision, the problem characteristics change dramatically: knowledge poor to knowledge rich; low data rates to high data rates; slow response time (minutes to hours) to instantaneous response time. These characteristics taken together increase the computational complexity of the problem by several orders of magnitude. Further, speech provides a challenging task domain which embodies many of the requirements of intelligent behavior: operate in real time; exploit vast amounts of knowledge, tolerate errorful, unexpected unknown input; use symbols and abstractions; communicate in natural language and learn from the environment. Voice input to computers offers a number of advantages. It provides a natural, fast, hands free, eyes free, location free input medium. However, there are many as yet unsolved problems that prevent routine use of speech as an input device by non-experts. These includ e cost, real time response, speaker independence, robustness to variations such as noise, microphone, speech rate and loudness, and the ability to handle non-grammatical speech. Satisfactory solutions to each of these problems can be expected within the next decade. Recognition of unrestricted spontaneous continuous speech appears unsolvable at present. However, by the addition of simple constraints, such as clarification dialog to resolve ambiguity, we believe it will be possible to develop systems capable of accepting very large vocabulary continuous speechdictation.

Readings in speech recognition

Table of Contents

Speech recognition by machine: a review

D. R. Reddy

Pages: 8-38

The value of speech recognition systems

Wayne A. Lea

Pages: 39-46

Digital representations of speech signals

Ronald W. Schafer, Lawrence R. Rabiner

Pages: 49-64

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

Steven B. Davis, Paul Mermelstein

Pages: 65-74

Vector quantization

Robert M. Gray

Pages: 75-100

A joint synchrony/mean-rate model of auditory speech processing

Stephanie Seneff

Pages: 101-111

Isolated and


word recognition—theory and selected applications

Lawrence R. Rabiner, Stephen E. Levinson

Pages: 115-153

Minimum prediction residual principle applied to speech recognition

Fumitada Itakura

Pages: 154-158

Dynamic programming algorithm optimization for spoken word recognition

Hiroaki Sakoe, Seibi Chiba

Pages: 159-165

Speaker-independent recognition of isolated words using clustering techniques

Lawrence R. Rabiner, Stephen E. Levinson, Aaron E. Rosenberg, Jay G. Wilpon

Pages: 166-179

Two-level DP-matching—a dynamic programming-based pattern matching algorithm for connected word recognition

Hiroaki Sakoe

Pages: 180-187

The use of a one-stage dynamic programming algorithm for connected word recognition

Hermann Ney

Pages: 188-196

The use of speech knowledge in automatic speech recognition

Victor W. Zue

Pages: 200-213

Performing fine phonetic distinctions: templates versus features

Ronald A. Cole, Richard M. Stern, Moshé J. Lasry

Pages: 214-224

Recognition of speaker-dependent continuous speech with KEAL

G. Mercier, D. Bigorgne, L. Miclet, L. Le Guennec, M.


Pages: 225-234

The hearsay-II speech understanding system: a tutorial

Lee D.


, Victor R. Lesser

Pages: 235-245

Learning and plan refinement in a knowledge-based system for automatic speech recognition

Renato De Mori, Lily Lam, Michel Gilloux

Pages: 246-262

A tutorial on hidden Markov models and selected applications in speech recognition

Lawrence R. Rabiner

Pages: 267-296

Stochastic modeling for automatic speech understanding

James K. Baker

Pages: 297-307

A maximum likelihood approach to continuous speech recognition

Lalit R. Bahl, Frederick Jelinek, Robert L. Mercer

Pages: 308-319

High performance connected digit recognition using hidden Markov models

Lawrence R. Rabiner, Jay G. Wilpon, Frank K. Soong

Pages: 320-331

Speech recognition with continuous-parameter hidden Markov models

Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, Robert L. Mercer

Pages: 332-339

Semi-continuous hidden Markov models for speech signals

X. D. Huang, M. A. Jack

Pages: 340-346

Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition

Kai-Fu Lee

Pages: 347-366

A stochastic segment model for phoneme-based continuous speech recognition

S. Roucos, M. O. Dunham

Pages: 367-370

Review of neural networks for speech recognition

Richard P. Lippmann

Pages: 374-392

Phoneme recognition using time-delay neural networks

Alexander Waibel, Toshiyuki Hanazawa,


Hinton, Kiyohiro Shikano, Kevin J. Lang

Pages: 393-404

Consonant recognition by modular construction of large phonemic time-delay neural networks

Alex Waibel, Hidefumi Sawai, Kiyohiro Shikano

Pages: 405-408

Learned phonetic discrimination using connectionist networks

R. L. Watrous, L. Shastri, A. H. Waibel

Pages: 409-412

Shift-tolerant LVQ and hybrid LVQ-HMM for phoneme recognition

Erik McDermott, Hitoshi


, Shigeru Katagiri, Yoh’ichi Tohkura

Pages: 425-438

The “neural” phonetic typewriter

Teuvo Kohonen

Pages: 425-424

Speaker-independent word recognition using dynamic programming neural networks

Hiroaki Sakoe, Ryosuke Isotani, Kazunaga Yoshida, Ken-ichi Iso, Takao Watanabe

Pages: 439-442

Speaker-independent word recognition using a neural prediction model

Ken-ichi Iso, Takao Watanabe

Pages: 443-446

Self-organized language modeling for speech recognition

F. Jelinek

Pages: 450-506

A tree-based statistical language model for natural language speech recognition

Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, Robert L. Mercer

Pages: 507-514

Modification of Earley’s algorithm for speech recognition



Pages: 515-518

Language processing for speech understanding

W. A. Woods

Pages: 519-533

Prosodic knowledge sources for word


in Alex continuous speech recognition system

Alex Waibel

Pages: 534-537

High level

knowledge sources in usable speech recognition systems

Sheryl R. Young, Alexander G. Hauptmann, Wayne H. Ward, Edward T. Smith, Philip Werner

Pages: 538-549

Review of the ARPA speech understanding project

D. H. Klatt

Pages: 554-575

The Harpy speech understanding system

Bruce Lowerre

Pages: 576-586

The development of an experimental discrete dictation recognizer

Frederick Jelinek

Pages: 587-595

BYBLOS: the BBN continuous speech recognition system

Y. L. Chow, M. O. Dunham, O. A. Kimball, M. A. Krasner, G. F. Kubala, J. Makhoul, P. J. Price, S. Roucos, R. M. Schwartz

Pages: 596-599

An overview of the SPHINX speech recognition system

Kai-Fu Lee, Hsiao-


Hon, Raj Reddy

Pages: 600-610

ATR HMM-LR continuous speech recognition system



, Kenji Kita, Satoshi Nakamura, Takeshi Kawabata, Kiyohiro Shikano

Pages: 611-614

A word hypothesizer for a large vocabulary continuous speech understanding system

L. Fissore, P. Laface, G. Micca, R. Pieraccini

Pages: 615-618