top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Autore Virtanen Tuomas
Edizione [1st edition]
Pubbl/distr/stampa Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Descrizione fisica 1 online resource (516 p.)
Disciplina 006.4/54
Altri autori (Persone) VirtanenTuomas
SinghRita
RajBhiksha
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-283-64550-5
1-118-39268-X
1-118-39267-1
1-118-39266-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto -- List of Contributors xv -- Acknowledgments xvii -- 1 Introduction 1 / Tuomas Virtanen, Rita Singh, Bhiksha Raj -- 1.1 Scope of the Book 1 -- 1.2 Outline 2 -- 1.3 Notation 4 -- Part One FOUNDATIONS -- 2 The Basics of Automatic Speech Recognition 9 / Rita Singh, Bhiksha Raj, Tuomas Virtanen -- 2.1 Introduction 9 -- 2.2 Speech Recognition Viewed as Bayes Classification 10 -- 2.3 Hidden Markov Models 11 -- 2.3.1 Computing Probabilities with HMMs 12 -- 2.3.2 Determining the State Sequence 17 -- 2.3.3 Learning HMM Parameters 19 -- 2.3.4 Additional Issues Relating to Speech Recognition Systems 20 -- 2.4 HMM-Based Speech Recognition 24 -- 2.4.1 Representing the Signal 24 -- 2.4.2 The HMM for a Word Sequence 25 -- 2.4.3 Searching through all Word Sequences 26 -- References 29 -- 3 The Problem of Robustness in Automatic Speech Recognition 31 / Bhiksha Raj, Tuomas Virtanen, Rita Singh -- 3.1 Errors in Bayes Classification 31 -- 3.1.1 Type 1 Condition: Mismatch Error 33 -- 3.1.2 Type 2 Condition: Increased Bayes Error 34 -- 3.2 Bayes Classification and ASR 35 -- 3.2.1 All We Have is a Model: A Type 1 Condition 35 -- 3.2.2 Intrinsic Interferences - Signal Components that are Unrelated to the Message: A Type 2 Condition 36 -- 3.2.3 External Interferences - The Data are Noisy: Type 1 and Type 2 Conditions 36 -- 3.3 External Influences on Speech Recordings 36 -- 3.3.1 Signal Capture 37 -- 3.3.2 Additive Corruptions 41 -- 3.3.3 Reverberation 42 -- 3.3.4 A Simplified Model of Signal Capture 43 -- 3.4 The Effect of External Influences on Recognition 44 -- 3.5 Improving Recognition under Adverse Conditions 46 -- 3.5.1 Handling the Model Mismatch Error 46 -- 3.5.2 Dealing with Intrinsic Variations in the Data 47 -- 3.5.3 Dealing with Extrinsic Variations 47 -- References 50 -- Part Two SIGNAL ENHANCEMENT -- 4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement 53 / Rainer Martin, Dorothea Kolossa -- 4.1 Introduction 53 -- 4.2 Signal Analysis and Synthesis 55.
4.2.1 DFT-Based Analysis Synthesis with Perfect Reconstruction 55 -- 4.2.2 Probability Distributions for Speech and Noise DFT Coefficients 57 -- 4.3 Voice Activity Detection 58 -- 4.3.1 VAD Design Principles 58 -- 4.3.2 Evaluation of VAD Performance 62 -- 4.3.3 Evaluation in the Context of ASR 62 -- 4.4 Noise Power Spectrum Estimation 65 -- 4.4.1 Smoothing Techniques 65 -- 4.4.2 Histogram and GMM Noise Estimation Methods 67 -- 4.4.3 Minimum Statistics Noise Power Estimation 67 -- 4.4.4 MMSE Noise Power Estimation 68 -- 4.4.5 Estimation of the A Priori Signal-to-Noise Ratio 69 -- 4.5 Adaptive Filters for Signal Enhancement 71 -- 4.5.1 Spectral Subtraction 71 -- 4.5.2 Nonlinear Spectral Subtraction 73 -- 4.5.3 Wiener Filtering 74 -- 4.5.4 The ETSI Advanced Front End 75 -- 4.5.5 Nonlinear MMSE Estimators 75 -- 4.6 ASR Performance 80 -- 4.7 Conclusions 81 -- References 82 -- 5 Extraction of Speech from Mixture Signals 87 / Paris Smaragdis -- 5.1 The Problem with Mixtures 87 -- 5.2 Multichannel Mixtures 88 -- 5.2.1 Basic Problem Formulation 88 -- 5.2.2 Convolutive Mixtures 92 -- 5.3 Single-Channel Mixtures 98 -- 5.3.1 Problem Formulation 98 -- 5.3.2 Learning Sound Models 100 -- 5.3.3 Separation by Spectrogram Factorization 101 -- 5.3.4 Dealing with Unknown Sounds 105 -- 5.4 Variations and Extensions 107 -- 5.5 Conclusions 107 -- References 107 -- 6 Microphone Arrays 109 / John McDonough, Kenichi Kumatani -- 6.1 Speaker Tracking 110 -- 6.2 Conventional Microphone Arrays 113 -- 6.3 Conventional Adaptive Beamforming Algorithms 120 -- 6.3.1 Minimum Variance Distortionless Response Beamformer 120 -- 6.3.2 Noise Field Models 122 -- 6.3.3 Subband Analysis and Synthesis 123 -- 6.3.4 Beamforming Performance Criteria 126 -- 6.3.5 Generalized Sidelobe Canceller Implementation 129 -- 6.3.6 Recursive Implementation of the GSC 130 -- 6.3.7 Other Conventional GSC Beamformers 131 -- 6.3.8 Beamforming based on Higher Order Statistics 132 -- 6.3.9 Online Implementation 136 -- 6.3.10 Speech-Recognition Experiments 140.
6.4 Spherical Microphone Arrays 142 -- 6.5 Spherical Adaptive Algorithms 148 -- 6.6 Comparative Studies 149 -- 6.7 Comparison of Linear and Spherical Arrays for DSR 152 -- 6.8 Conclusions and Further Reading 154 -- References 155 -- Part Three FEATURE ENHANCEMENT -- 7 From Signals to Speech Features by Digital Signal Processing 161 / Matthias WŠ olfel -- 7.1 Introduction 161 -- 7.1.1 About this Chapter 162 -- 7.2 The Speech Signal 162 -- 7.3 Spectral Processing 163 -- 7.3.1 Windowing 163 -- 7.3.2 Power Spectrum 165 -- 7.3.3 Spectral Envelopes 166 -- 7.3.4 LP Envelope 166 -- 7.3.5 MVDR Envelope 169 -- 7.3.6 Warping the Frequency Axis 171 -- 7.3.7 Warped LP Envelope 175 -- 7.3.8 Warped MVDR Envelope 176 -- 7.3.9 Comparison of Spectral Estimates 177 -- 7.3.10 The Spectrogram 179 -- 7.4 Cepstral Processing 179 -- 7.4.1 Definition and Calculation of Cepstral Coefficients 180 -- 7.4.2 Characteristics of Cepstral Sequences 181 -- 7.5 Influence of Distortions on Different Speech Features 182 -- 7.5.1 Objective Functions 182 -- 7.5.2 Robustness against Noise 185 -- 7.5.3 Robustness against Echo and Reverberation 187 -- 7.5.4 Robustness against Changes in Fundamental Frequency 189 -- 7.6 Summary and Further Reading 191 -- References 191 -- 8 Features Based on Auditory Physiology and Perception 193 / Richard M. Stern, Nelson Morgan -- 8.1 Introduction 193 -- 8.2 Some Attributes of Auditory Physiology and Perception 194 -- 8.2.1 Peripheral Processing 194 -- 8.2.2 Processing at more Central Levels 200 -- 8.2.3 Psychoacoustical Correlates of Physiological Observations 202 -- 8.2.4 The Impact of Auditory Processing on Conventional Feature Extraction 206 -- 8.2.5 Summary 208 -- 8.3 “Classic” Auditory Representations 208 -- 8.4 Current Trends in Auditory Feature Analysis 213 -- 8.5 Summary 221 -- Acknowledgments 222 -- References 222 -- 9 Feature Compensation 229 / Jasha Droppo -- 9.1 Life in an Ideal World 229 -- 9.1.1 Noise Robustness Tasks 229 -- 9.1.2 Probabilistic Feature Enhancement 230.
9.1.3 Gaussian Mixture Models 231 -- 9.2 MMSE-SPLICE 232 -- 9.2.1 Parameter Estimation 233 -- 9.2.2 Results 236 -- 9.3 Discriminative SPLICE 237 -- 9.3.1 The MMI Objective Function 238 -- 9.3.2 Training the Front-End Parameters 239 -- 9.3.3 The Rprop Algorithm 240 -- 9.3.4 Results 241 -- 9.4 Model-Based Feature Enhancement 242 -- 9.4.1 The Additive Noise-Mixing Equation 243 -- 9.4.2 The Joint Probability Model 244 -- 9.4.3 Vector Taylor Series Approximation 246 -- 9.4.4 Estimating Clean Speech 247 -- 9.4.5 Results 247 -- 9.5 Switching Linear Dynamic System 248 -- 9.6 Conclusion 249 -- References 249 -- 10 Reverberant Speech Recognition 251 / Reinhold Haeb-Umbach, Alexander Krueger -- 10.1 Introduction 251 -- 10.2 The Effect of Reverberation 252 -- 10.2.1 What is Reverberation? 252 -- 10.2.2 The Relationship between Clean and Reverberant Speech Features 254 -- 10.2.3 The Effect of Reverberation on ASR Performance 258 -- 10.3 Approaches to Reverberant Speech Recognition 258 -- 10.3.1 Signal-Based Techniques 259 -- 10.3.2 Front-End Techniques 260 -- 10.3.3 Back-End Techniques 262 -- 10.3.4 Concluding Remarks 265 -- 10.4 Feature Domain Model of the Acoustic Impulse Response 265 -- 10.5 Bayesian Feature Enhancement 267 -- 10.5.1 Basic Approach 268 -- 10.5.2 Measurement Update 269 -- 10.5.3 Time Update 270 -- 10.5.4 Inference 271 -- 10.6 Experimental Results 272 -- 10.6.1 Databases 272 -- 10.6.2 Overview of the Tested Methods 273 -- 10.6.3 Recognition Results on Reverberant Speech 274 -- 10.6.4 Recognition Results on Noisy Reverberant Speech 276 -- 10.7 Conclusions 277 -- Acknowledgment 278 -- References 278 -- Part Four MODEL ENHANCEMENT -- 11 Adaptation and Discriminative Training of Acoustic Models 285 / Yannick Est`eve, Paul Del'eglise -- 11.1 Introduction 285 -- 11.1.1 Acoustic Models 286 -- 11.1.2 Maximum Likelihood Estimation 287 -- 11.2 Acoustic Model Adaptation and Noise Robustness 288 -- 11.2.1 Static (or Offline) Adaptation 289 -- 11.2.2 Dynamic (or Online) Adaptation 289.
11.3 Maximum A Posteriori Reestimation 290 -- 11.4 Maximum Likelihood Linear Regression 293 -- 11.4.1 Class Regression Tree 294 -- 11.4.2 Constrained Maximum Likelihood Linear Regression 297 -- 11.4.3 CMLLR Implementation 297 -- 11.4.4 Speaker Adaptive Training 298 -- 11.5 Discriminative Training 299 -- 11.5.1 MMI Discriminative Training Criterion 301 -- 11.5.2 MPE Discriminative Training Criterion 302 -- 11.5.3 I-smoothing 303 -- 11.5.4 MPE Implementation 304 -- 11.6 Conclusion 307 -- References 308 -- 12 Factorial Models for Noise Robust Speech Recognition 311 / John R. Hershey, Steven J. Rennie, Jonathan Le Roux -- 12.1 Introduction 311 -- 12.2 The Model-Based Approach 313 -- 12.3 Signal Feature Domains 314 -- 12.4 Interaction Models 317 -- 12.4.1 Exact Interaction Model 318 -- 12.4.2 Max Model 320 -- 12.4.3 Log-Sum Model 321 -- 12.4.4 Mel Interaction Model 321 -- 12.5 Inference Methods 322 -- 12.5.1 Max Model Inference 322 -- 12.5.2 Parallel Model Combination 324 -- 12.5.3 Vector Taylor Series Approaches 326 -- 12.5.4 SNR-Dependent Approaches 331 -- 12.6 Efficient Likelihood Evaluation in Factorial Models 332 -- 12.6.1 Efficient Inference using the Max Model 332 -- 12.6.2 Efficient Vector-Taylor Series Approaches 334 -- 12.6.3 Band Quantization 335 -- 12.7 Current Directions 337 -- 12.7.1 Dynamic Noise Models for Robust ASR 338 -- 12.7.2 Multi-Talker Speech Recognition using Graphical Models 339 -- 12.7.3 Noise Robust ASR using Non-Negative Basis Representations 340 -- References 341 -- 13 Acoustic Model Training for Robust Speech Recognition 347 / Michael L. Seltzer -- 13.1 Introduction 347 -- 13.2 Traditional Training Methods for Robust Speech Recognition 348 -- 13.3 A Brief Overview of Speaker Adaptive Training 349 -- 13.4 Feature-Space Noise Adaptive Training 351 -- 13.4.1 Experiments using fNAT 352 -- 13.5 Model-Space Noise Adaptive Training 353 -- 13.6 Noise Adaptive Training using VTS Adaptation 355 -- 13.6.1 Vector Taylor Series HMM Adaptation 355 -- 13.6.2 Updating the Acoustic Model Parameters 357.
13.6.3 Updating the Environmental Parameters 360 -- 13.6.4 Implementation Details 360 -- 13.6.5 Experiments using NAT 361 -- 13.7 Discussion 364 -- 13.7.1 Comparison of Training Algorithms 364 -- 13.7.2 Comparison to Speaker Adaptive Training 364 -- 13.7.3 Related Adaptive Training Methods 365 -- 13.8 Conclusion 366 -- References 366 -- Part Five COMPENSATION FOR INFORMATION LOSS -- 14 Missing-Data Techniques: Recognition with Incomplete Spectrograms 371 / Jon Barker -- 14.1 Introduction 371 -- 14.2 Classification with Incomplete Data 373 -- 14.2.1 A Simple Missing Data Scenario 374 -- 14.2.2 Missing Data Theory 376 -- 14.2.3 Validity of the MAR Assumption 378 -- 14.2.4 Marginalising Acoustic Models 379 -- 14.3 Energetic Masking 381 -- 14.3.1 The Max Approximation 381 -- 14.3.2 Bounded Marginalisation 382 -- 14.3.3 Missing Data ASR in the Cepstral Domain 384 -- 14.3.4 Missing Data ASR with Dynamic Features 386 -- 14.4 Meta-Missing Data: Dealing with Mask Uncertainty 388 -- 14.4.1 Missing Data with Soft Masks 388 -- 14.4.2 Sub-band Combination Approaches 391 -- 14.4.3 Speech Fragment Decoding 393 -- 14.5 Some Perspectives on Performance 395 -- References 396 -- 15 Missing-Data Techniques: Feature Reconstruction 399 / Jort Florent Gemmeke, Ulpu Remes -- 15.1 Introduction 399 -- 15.2 Missing-Data Techniques 401 -- 15.3 Correlation-Based Imputation 402 -- 15.3.1 Fundamentals 402 -- 15.3.2 Implementation 404 -- 15.4 Cluster-Based Imputation 406 -- 15.4.1 Fundamentals 406 -- 15.4.2 Implementation 408 -- 15.4.3 Advances 409 -- 15.5 Class-Conditioned Imputation 411 -- 15.5.1 Fundamentals 411 -- 15.5.2 Implementation 412 -- 15.5.3 Advances 413 -- 15.6 Sparse Imputation 414 -- 15.6.1 Fundamentals 414 -- 15.6.2 Implementation 416 -- 15.6.3 Advances 418 -- 15.7 Other Feature-Reconstruction Methods 420 -- 15.7.1 Parametric Approaches 420 -- 15.7.2 Nonparametric Approaches 421 -- 15.8 Experimental Results 421 -- 15.8.1 Feature-Reconstruction Methods 422 -- 15.8.2 Comparison with Other Methods 424.
15.8.3 Advances 426 -- 15.8.4 Combination with Other Methods 427 -- 15.9 Discussion and Conclusion 428 -- Acknowledgments 429 -- References 430 -- 16 Computational Auditory Scene Analysis and Automatic Speech Recognition 433 / Arun Narayanan, DeLiang Wang -- 16.1 Introduction 433 -- 16.2 Auditory Scene Analysis 434 -- 16.3 Computational Auditory Scene Analysis 435 -- 16.3.1 Ideal Binary Mask 435 -- 16.3.2 Typical CASA Architecture 438 -- 16.4 CASA Strategies 440 -- 16.4.1 IBM Estimation Based on Local SNR Estimates 440 -- 16.4.2 IBM Estimation using ASA Cues 442 -- 16.4.3 IBM Estimation as Binary Classification 448 -- 16.4.4 Binaural Mask Estimation Strategies 451 -- 16.5 Integrating CASA with ASR 452 -- 16.5.1 Uncertainty Transform Model 454 -- 16.6 Concluding Remarks 458 -- Acknowledgment 458 -- References 458 -- 17 Uncertainty Decoding 463 / Hank Liao -- 17.1 Introduction 463 -- 17.2 Observation Uncertainty 465 -- 17.3 Uncertainty Decoding 466 -- 17.4 Feature-Based Uncertainty Decoding 468 -- 17.4.1 SPLICE with Uncertainty 470 -- 17.4.2 Front-End Joint Uncertainty Decoding 471 -- 17.4.3 Issues with Feature-Based Uncertainty Decoding 472 -- 17.5 Model-Based Joint Uncertainty Decoding 473 -- 17.5.1 Parameter Estimation 475 -- 17.5.2 Comparisons with Other Methods 476 -- 17.6 Noisy CMLLR 477 -- 17.7 Uncertainty and Adaptive Training 480 -- 17.7.1 Gradient-Based Methods 481 -- 17.7.2 Factor Analysis Approaches 482 -- 17.8 In Combination with Other Techniques 483 -- 17.9 Conclusions 484 -- References 485 -- Index 487.
Record Nr. UNINA-9910141378103321
Virtanen Tuomas  
Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Autore Virtanen Tuomas
Edizione [1st edition]
Pubbl/distr/stampa Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Descrizione fisica 1 online resource (516 p.)
Disciplina 006.4/54
Altri autori (Persone) VirtanenTuomas
SinghRita
RajBhiksha
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-283-64550-5
1-118-39268-X
1-118-39267-1
1-118-39266-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto -- List of Contributors xv -- Acknowledgments xvii -- 1 Introduction 1 / Tuomas Virtanen, Rita Singh, Bhiksha Raj -- 1.1 Scope of the Book 1 -- 1.2 Outline 2 -- 1.3 Notation 4 -- Part One FOUNDATIONS -- 2 The Basics of Automatic Speech Recognition 9 / Rita Singh, Bhiksha Raj, Tuomas Virtanen -- 2.1 Introduction 9 -- 2.2 Speech Recognition Viewed as Bayes Classification 10 -- 2.3 Hidden Markov Models 11 -- 2.3.1 Computing Probabilities with HMMs 12 -- 2.3.2 Determining the State Sequence 17 -- 2.3.3 Learning HMM Parameters 19 -- 2.3.4 Additional Issues Relating to Speech Recognition Systems 20 -- 2.4 HMM-Based Speech Recognition 24 -- 2.4.1 Representing the Signal 24 -- 2.4.2 The HMM for a Word Sequence 25 -- 2.4.3 Searching through all Word Sequences 26 -- References 29 -- 3 The Problem of Robustness in Automatic Speech Recognition 31 / Bhiksha Raj, Tuomas Virtanen, Rita Singh -- 3.1 Errors in Bayes Classification 31 -- 3.1.1 Type 1 Condition: Mismatch Error 33 -- 3.1.2 Type 2 Condition: Increased Bayes Error 34 -- 3.2 Bayes Classification and ASR 35 -- 3.2.1 All We Have is a Model: A Type 1 Condition 35 -- 3.2.2 Intrinsic Interferences - Signal Components that are Unrelated to the Message: A Type 2 Condition 36 -- 3.2.3 External Interferences - The Data are Noisy: Type 1 and Type 2 Conditions 36 -- 3.3 External Influences on Speech Recordings 36 -- 3.3.1 Signal Capture 37 -- 3.3.2 Additive Corruptions 41 -- 3.3.3 Reverberation 42 -- 3.3.4 A Simplified Model of Signal Capture 43 -- 3.4 The Effect of External Influences on Recognition 44 -- 3.5 Improving Recognition under Adverse Conditions 46 -- 3.5.1 Handling the Model Mismatch Error 46 -- 3.5.2 Dealing with Intrinsic Variations in the Data 47 -- 3.5.3 Dealing with Extrinsic Variations 47 -- References 50 -- Part Two SIGNAL ENHANCEMENT -- 4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement 53 / Rainer Martin, Dorothea Kolossa -- 4.1 Introduction 53 -- 4.2 Signal Analysis and Synthesis 55.
4.2.1 DFT-Based Analysis Synthesis with Perfect Reconstruction 55 -- 4.2.2 Probability Distributions for Speech and Noise DFT Coefficients 57 -- 4.3 Voice Activity Detection 58 -- 4.3.1 VAD Design Principles 58 -- 4.3.2 Evaluation of VAD Performance 62 -- 4.3.3 Evaluation in the Context of ASR 62 -- 4.4 Noise Power Spectrum Estimation 65 -- 4.4.1 Smoothing Techniques 65 -- 4.4.2 Histogram and GMM Noise Estimation Methods 67 -- 4.4.3 Minimum Statistics Noise Power Estimation 67 -- 4.4.4 MMSE Noise Power Estimation 68 -- 4.4.5 Estimation of the A Priori Signal-to-Noise Ratio 69 -- 4.5 Adaptive Filters for Signal Enhancement 71 -- 4.5.1 Spectral Subtraction 71 -- 4.5.2 Nonlinear Spectral Subtraction 73 -- 4.5.3 Wiener Filtering 74 -- 4.5.4 The ETSI Advanced Front End 75 -- 4.5.5 Nonlinear MMSE Estimators 75 -- 4.6 ASR Performance 80 -- 4.7 Conclusions 81 -- References 82 -- 5 Extraction of Speech from Mixture Signals 87 / Paris Smaragdis -- 5.1 The Problem with Mixtures 87 -- 5.2 Multichannel Mixtures 88 -- 5.2.1 Basic Problem Formulation 88 -- 5.2.2 Convolutive Mixtures 92 -- 5.3 Single-Channel Mixtures 98 -- 5.3.1 Problem Formulation 98 -- 5.3.2 Learning Sound Models 100 -- 5.3.3 Separation by Spectrogram Factorization 101 -- 5.3.4 Dealing with Unknown Sounds 105 -- 5.4 Variations and Extensions 107 -- 5.5 Conclusions 107 -- References 107 -- 6 Microphone Arrays 109 / John McDonough, Kenichi Kumatani -- 6.1 Speaker Tracking 110 -- 6.2 Conventional Microphone Arrays 113 -- 6.3 Conventional Adaptive Beamforming Algorithms 120 -- 6.3.1 Minimum Variance Distortionless Response Beamformer 120 -- 6.3.2 Noise Field Models 122 -- 6.3.3 Subband Analysis and Synthesis 123 -- 6.3.4 Beamforming Performance Criteria 126 -- 6.3.5 Generalized Sidelobe Canceller Implementation 129 -- 6.3.6 Recursive Implementation of the GSC 130 -- 6.3.7 Other Conventional GSC Beamformers 131 -- 6.3.8 Beamforming based on Higher Order Statistics 132 -- 6.3.9 Online Implementation 136 -- 6.3.10 Speech-Recognition Experiments 140.
6.4 Spherical Microphone Arrays 142 -- 6.5 Spherical Adaptive Algorithms 148 -- 6.6 Comparative Studies 149 -- 6.7 Comparison of Linear and Spherical Arrays for DSR 152 -- 6.8 Conclusions and Further Reading 154 -- References 155 -- Part Three FEATURE ENHANCEMENT -- 7 From Signals to Speech Features by Digital Signal Processing 161 / Matthias WŠ olfel -- 7.1 Introduction 161 -- 7.1.1 About this Chapter 162 -- 7.2 The Speech Signal 162 -- 7.3 Spectral Processing 163 -- 7.3.1 Windowing 163 -- 7.3.2 Power Spectrum 165 -- 7.3.3 Spectral Envelopes 166 -- 7.3.4 LP Envelope 166 -- 7.3.5 MVDR Envelope 169 -- 7.3.6 Warping the Frequency Axis 171 -- 7.3.7 Warped LP Envelope 175 -- 7.3.8 Warped MVDR Envelope 176 -- 7.3.9 Comparison of Spectral Estimates 177 -- 7.3.10 The Spectrogram 179 -- 7.4 Cepstral Processing 179 -- 7.4.1 Definition and Calculation of Cepstral Coefficients 180 -- 7.4.2 Characteristics of Cepstral Sequences 181 -- 7.5 Influence of Distortions on Different Speech Features 182 -- 7.5.1 Objective Functions 182 -- 7.5.2 Robustness against Noise 185 -- 7.5.3 Robustness against Echo and Reverberation 187 -- 7.5.4 Robustness against Changes in Fundamental Frequency 189 -- 7.6 Summary and Further Reading 191 -- References 191 -- 8 Features Based on Auditory Physiology and Perception 193 / Richard M. Stern, Nelson Morgan -- 8.1 Introduction 193 -- 8.2 Some Attributes of Auditory Physiology and Perception 194 -- 8.2.1 Peripheral Processing 194 -- 8.2.2 Processing at more Central Levels 200 -- 8.2.3 Psychoacoustical Correlates of Physiological Observations 202 -- 8.2.4 The Impact of Auditory Processing on Conventional Feature Extraction 206 -- 8.2.5 Summary 208 -- 8.3 “Classic” Auditory Representations 208 -- 8.4 Current Trends in Auditory Feature Analysis 213 -- 8.5 Summary 221 -- Acknowledgments 222 -- References 222 -- 9 Feature Compensation 229 / Jasha Droppo -- 9.1 Life in an Ideal World 229 -- 9.1.1 Noise Robustness Tasks 229 -- 9.1.2 Probabilistic Feature Enhancement 230.
9.1.3 Gaussian Mixture Models 231 -- 9.2 MMSE-SPLICE 232 -- 9.2.1 Parameter Estimation 233 -- 9.2.2 Results 236 -- 9.3 Discriminative SPLICE 237 -- 9.3.1 The MMI Objective Function 238 -- 9.3.2 Training the Front-End Parameters 239 -- 9.3.3 The Rprop Algorithm 240 -- 9.3.4 Results 241 -- 9.4 Model-Based Feature Enhancement 242 -- 9.4.1 The Additive Noise-Mixing Equation 243 -- 9.4.2 The Joint Probability Model 244 -- 9.4.3 Vector Taylor Series Approximation 246 -- 9.4.4 Estimating Clean Speech 247 -- 9.4.5 Results 247 -- 9.5 Switching Linear Dynamic System 248 -- 9.6 Conclusion 249 -- References 249 -- 10 Reverberant Speech Recognition 251 / Reinhold Haeb-Umbach, Alexander Krueger -- 10.1 Introduction 251 -- 10.2 The Effect of Reverberation 252 -- 10.2.1 What is Reverberation? 252 -- 10.2.2 The Relationship between Clean and Reverberant Speech Features 254 -- 10.2.3 The Effect of Reverberation on ASR Performance 258 -- 10.3 Approaches to Reverberant Speech Recognition 258 -- 10.3.1 Signal-Based Techniques 259 -- 10.3.2 Front-End Techniques 260 -- 10.3.3 Back-End Techniques 262 -- 10.3.4 Concluding Remarks 265 -- 10.4 Feature Domain Model of the Acoustic Impulse Response 265 -- 10.5 Bayesian Feature Enhancement 267 -- 10.5.1 Basic Approach 268 -- 10.5.2 Measurement Update 269 -- 10.5.3 Time Update 270 -- 10.5.4 Inference 271 -- 10.6 Experimental Results 272 -- 10.6.1 Databases 272 -- 10.6.2 Overview of the Tested Methods 273 -- 10.6.3 Recognition Results on Reverberant Speech 274 -- 10.6.4 Recognition Results on Noisy Reverberant Speech 276 -- 10.7 Conclusions 277 -- Acknowledgment 278 -- References 278 -- Part Four MODEL ENHANCEMENT -- 11 Adaptation and Discriminative Training of Acoustic Models 285 / Yannick Est`eve, Paul Del'eglise -- 11.1 Introduction 285 -- 11.1.1 Acoustic Models 286 -- 11.1.2 Maximum Likelihood Estimation 287 -- 11.2 Acoustic Model Adaptation and Noise Robustness 288 -- 11.2.1 Static (or Offline) Adaptation 289 -- 11.2.2 Dynamic (or Online) Adaptation 289.
11.3 Maximum A Posteriori Reestimation 290 -- 11.4 Maximum Likelihood Linear Regression 293 -- 11.4.1 Class Regression Tree 294 -- 11.4.2 Constrained Maximum Likelihood Linear Regression 297 -- 11.4.3 CMLLR Implementation 297 -- 11.4.4 Speaker Adaptive Training 298 -- 11.5 Discriminative Training 299 -- 11.5.1 MMI Discriminative Training Criterion 301 -- 11.5.2 MPE Discriminative Training Criterion 302 -- 11.5.3 I-smoothing 303 -- 11.5.4 MPE Implementation 304 -- 11.6 Conclusion 307 -- References 308 -- 12 Factorial Models for Noise Robust Speech Recognition 311 / John R. Hershey, Steven J. Rennie, Jonathan Le Roux -- 12.1 Introduction 311 -- 12.2 The Model-Based Approach 313 -- 12.3 Signal Feature Domains 314 -- 12.4 Interaction Models 317 -- 12.4.1 Exact Interaction Model 318 -- 12.4.2 Max Model 320 -- 12.4.3 Log-Sum Model 321 -- 12.4.4 Mel Interaction Model 321 -- 12.5 Inference Methods 322 -- 12.5.1 Max Model Inference 322 -- 12.5.2 Parallel Model Combination 324 -- 12.5.3 Vector Taylor Series Approaches 326 -- 12.5.4 SNR-Dependent Approaches 331 -- 12.6 Efficient Likelihood Evaluation in Factorial Models 332 -- 12.6.1 Efficient Inference using the Max Model 332 -- 12.6.2 Efficient Vector-Taylor Series Approaches 334 -- 12.6.3 Band Quantization 335 -- 12.7 Current Directions 337 -- 12.7.1 Dynamic Noise Models for Robust ASR 338 -- 12.7.2 Multi-Talker Speech Recognition using Graphical Models 339 -- 12.7.3 Noise Robust ASR using Non-Negative Basis Representations 340 -- References 341 -- 13 Acoustic Model Training for Robust Speech Recognition 347 / Michael L. Seltzer -- 13.1 Introduction 347 -- 13.2 Traditional Training Methods for Robust Speech Recognition 348 -- 13.3 A Brief Overview of Speaker Adaptive Training 349 -- 13.4 Feature-Space Noise Adaptive Training 351 -- 13.4.1 Experiments using fNAT 352 -- 13.5 Model-Space Noise Adaptive Training 353 -- 13.6 Noise Adaptive Training using VTS Adaptation 355 -- 13.6.1 Vector Taylor Series HMM Adaptation 355 -- 13.6.2 Updating the Acoustic Model Parameters 357.
13.6.3 Updating the Environmental Parameters 360 -- 13.6.4 Implementation Details 360 -- 13.6.5 Experiments using NAT 361 -- 13.7 Discussion 364 -- 13.7.1 Comparison of Training Algorithms 364 -- 13.7.2 Comparison to Speaker Adaptive Training 364 -- 13.7.3 Related Adaptive Training Methods 365 -- 13.8 Conclusion 366 -- References 366 -- Part Five COMPENSATION FOR INFORMATION LOSS -- 14 Missing-Data Techniques: Recognition with Incomplete Spectrograms 371 / Jon Barker -- 14.1 Introduction 371 -- 14.2 Classification with Incomplete Data 373 -- 14.2.1 A Simple Missing Data Scenario 374 -- 14.2.2 Missing Data Theory 376 -- 14.2.3 Validity of the MAR Assumption 378 -- 14.2.4 Marginalising Acoustic Models 379 -- 14.3 Energetic Masking 381 -- 14.3.1 The Max Approximation 381 -- 14.3.2 Bounded Marginalisation 382 -- 14.3.3 Missing Data ASR in the Cepstral Domain 384 -- 14.3.4 Missing Data ASR with Dynamic Features 386 -- 14.4 Meta-Missing Data: Dealing with Mask Uncertainty 388 -- 14.4.1 Missing Data with Soft Masks 388 -- 14.4.2 Sub-band Combination Approaches 391 -- 14.4.3 Speech Fragment Decoding 393 -- 14.5 Some Perspectives on Performance 395 -- References 396 -- 15 Missing-Data Techniques: Feature Reconstruction 399 / Jort Florent Gemmeke, Ulpu Remes -- 15.1 Introduction 399 -- 15.2 Missing-Data Techniques 401 -- 15.3 Correlation-Based Imputation 402 -- 15.3.1 Fundamentals 402 -- 15.3.2 Implementation 404 -- 15.4 Cluster-Based Imputation 406 -- 15.4.1 Fundamentals 406 -- 15.4.2 Implementation 408 -- 15.4.3 Advances 409 -- 15.5 Class-Conditioned Imputation 411 -- 15.5.1 Fundamentals 411 -- 15.5.2 Implementation 412 -- 15.5.3 Advances 413 -- 15.6 Sparse Imputation 414 -- 15.6.1 Fundamentals 414 -- 15.6.2 Implementation 416 -- 15.6.3 Advances 418 -- 15.7 Other Feature-Reconstruction Methods 420 -- 15.7.1 Parametric Approaches 420 -- 15.7.2 Nonparametric Approaches 421 -- 15.8 Experimental Results 421 -- 15.8.1 Feature-Reconstruction Methods 422 -- 15.8.2 Comparison with Other Methods 424.
15.8.3 Advances 426 -- 15.8.4 Combination with Other Methods 427 -- 15.9 Discussion and Conclusion 428 -- Acknowledgments 429 -- References 430 -- 16 Computational Auditory Scene Analysis and Automatic Speech Recognition 433 / Arun Narayanan, DeLiang Wang -- 16.1 Introduction 433 -- 16.2 Auditory Scene Analysis 434 -- 16.3 Computational Auditory Scene Analysis 435 -- 16.3.1 Ideal Binary Mask 435 -- 16.3.2 Typical CASA Architecture 438 -- 16.4 CASA Strategies 440 -- 16.4.1 IBM Estimation Based on Local SNR Estimates 440 -- 16.4.2 IBM Estimation using ASA Cues 442 -- 16.4.3 IBM Estimation as Binary Classification 448 -- 16.4.4 Binaural Mask Estimation Strategies 451 -- 16.5 Integrating CASA with ASR 452 -- 16.5.1 Uncertainty Transform Model 454 -- 16.6 Concluding Remarks 458 -- Acknowledgment 458 -- References 458 -- 17 Uncertainty Decoding 463 / Hank Liao -- 17.1 Introduction 463 -- 17.2 Observation Uncertainty 465 -- 17.3 Uncertainty Decoding 466 -- 17.4 Feature-Based Uncertainty Decoding 468 -- 17.4.1 SPLICE with Uncertainty 470 -- 17.4.2 Front-End Joint Uncertainty Decoding 471 -- 17.4.3 Issues with Feature-Based Uncertainty Decoding 472 -- 17.5 Model-Based Joint Uncertainty Decoding 473 -- 17.5.1 Parameter Estimation 475 -- 17.5.2 Comparisons with Other Methods 476 -- 17.6 Noisy CMLLR 477 -- 17.7 Uncertainty and Adaptive Training 480 -- 17.7.1 Gradient-Based Methods 481 -- 17.7.2 Factor Analysis Approaches 482 -- 17.8 In Combination with Other Techniques 483 -- 17.9 Conclusions 484 -- References 485 -- Index 487.
Record Nr. UNINA-9910820996203321
Virtanen Tuomas  
Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui