top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Spoken language processing [[electronic resource] /] / edited by Joseph Mariani
Spoken language processing [[electronic resource] /] / edited by Joseph Mariani
Pubbl/distr/stampa London, : ISTE
Descrizione fisica 1 online resource (505 p.)
Disciplina 006.4/54
006.454
Altri autori (Persone) MarianiJoseph
Collana ISTE
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-282-25394-8
9786613814593
0-470-61118-9
0-470-39381-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Spoken Language Processing; Table of Contents; Preface; Chapter 1. Speech Analysis; 1.1. Introduction; 1.1.1. Source-filter model; 1.1.2. Speech sounds; 1.1.3. Sources; 1.1.4. Vocal tract; 1.1.5. Lip-radiation; 1.2. Linear prediction; 1.2.1. Source-filter model and linear prediction; 1.2.2. Autocorrelation method: algorithm; 1.2.3. Lattice filter; 1.2.4. Models of the excitation; 1.3. Short-term Fourier transform; 1.3.1. Spectrogram; 1.3.2. Interpretation in terms of filter bank; 1.3.3. Block-wise interpretation; 1.3.4. Modification and reconstruction; 1.4. A few other representations
1.4.1. Bilinear time-frequency representations1.4.2. Wavelets; 1.4.3. Cepstrum; 1.4.4. Sinusoidal and harmonic representations; 1.5. Conclusion; 1.6. References; Chapter 2. Principles of Speech Coding; 2.1. Introduction; 2.1.1. Main characteristics of a speech coder; 2.1.2. Key components of a speech coder; 2.2. Telephone-bandwidth speech coders; 2.2.1. From predictive coding to CELP; 2.2.2. Improved CELP coders; 2.2.3. Other coders for telephone speech; 2.3. Wideband speech coding; 2.3.1. Transform coding; 2.3.2. Predictive transform coding; 2.4. Audiovisual speech coding
2.4.1. A transmission channel for audiovisual speech2.4.2. Joint coding of audio and video parameters; 2.4.3. Prospects; 2.5. References; Chapter 3. Speech Synthesis; 3.1. Introduction; 3.2. Key goal: speaking for communicating; 3.2.1. What acoustic content?; 3.2.2. What melody?; 3.2.3. Beyond the strict minimum; 3.3 Synoptic presentation of the elementary modules in speech synthesis systems; 3.3.1. Linguistic processing; 3.3.2. Acoustic processing; 3.3.3. Training models automatically; 3.3.4. Operational constraints; 3.4. Description of linguistic processing; 3.4.1. Text pre-processing
3.4.2. Grapheme-to-phoneme conversion3.4.3. Syntactic-prosodic analysis; 3.4.4. Prosodic analysis; 3.5. Acoustic processing methodology; 3.5.1. Rule-based synthesis; 3.5.2. Unit-based concatenative synthesis; 3.6. Speech signal modeling; 3.6.1. The source-filter assumption; 3.6.2. Articulatory model; 3.6.3. Formant-based modeling; 3.6.4. Auto-regressive modeling; 3.6.5. Harmonic plus noise model; 3.7. Control of prosodic parameters: the PSOLA technique; 3.7.1. Methodology background; 3.7.2. The ancestors of the method; 3.7.3. Descendants of the method; 3.7.4. Evaluation
3.8. Towards variable-size acoustic units3.8.1. Constitution of the acoustic database; 3.8.2. Selection of sequences of units; 3.9. Applications and standardization; 3.10. Evaluation of speech synthesis; 3.10.1. Introduction; 3.10.2. Global evaluation; 3.10.3. Analytical evaluation; 3.10.4. Summary for speech synthesis evaluation; 3.11. Conclusions; 3.12. References; Chapter 4. Facial Animation for Visual Speech; 4.1. Introduction; 4.2. Applications of facial animation for visual speech; 4.2.1. Animation movies; 4.2.2. Telecommunications; 4.2.3. Human-machine interfaces
4.2.4. A tool for speech research
Record Nr. UNINA-9910139497403321
London, : ISTE
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Spoken language processing [[electronic resource] /] / edited by Joseph Mariani
Spoken language processing [[electronic resource] /] / edited by Joseph Mariani
Pubbl/distr/stampa London, : ISTE
Descrizione fisica 1 online resource (505 p.)
Disciplina 006.4/54
006.454
Altri autori (Persone) MarianiJoseph
Collana ISTE
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-282-25394-8
9786613814593
0-470-61118-9
0-470-39381-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Spoken Language Processing; Table of Contents; Preface; Chapter 1. Speech Analysis; 1.1. Introduction; 1.1.1. Source-filter model; 1.1.2. Speech sounds; 1.1.3. Sources; 1.1.4. Vocal tract; 1.1.5. Lip-radiation; 1.2. Linear prediction; 1.2.1. Source-filter model and linear prediction; 1.2.2. Autocorrelation method: algorithm; 1.2.3. Lattice filter; 1.2.4. Models of the excitation; 1.3. Short-term Fourier transform; 1.3.1. Spectrogram; 1.3.2. Interpretation in terms of filter bank; 1.3.3. Block-wise interpretation; 1.3.4. Modification and reconstruction; 1.4. A few other representations
1.4.1. Bilinear time-frequency representations1.4.2. Wavelets; 1.4.3. Cepstrum; 1.4.4. Sinusoidal and harmonic representations; 1.5. Conclusion; 1.6. References; Chapter 2. Principles of Speech Coding; 2.1. Introduction; 2.1.1. Main characteristics of a speech coder; 2.1.2. Key components of a speech coder; 2.2. Telephone-bandwidth speech coders; 2.2.1. From predictive coding to CELP; 2.2.2. Improved CELP coders; 2.2.3. Other coders for telephone speech; 2.3. Wideband speech coding; 2.3.1. Transform coding; 2.3.2. Predictive transform coding; 2.4. Audiovisual speech coding
2.4.1. A transmission channel for audiovisual speech2.4.2. Joint coding of audio and video parameters; 2.4.3. Prospects; 2.5. References; Chapter 3. Speech Synthesis; 3.1. Introduction; 3.2. Key goal: speaking for communicating; 3.2.1. What acoustic content?; 3.2.2. What melody?; 3.2.3. Beyond the strict minimum; 3.3 Synoptic presentation of the elementary modules in speech synthesis systems; 3.3.1. Linguistic processing; 3.3.2. Acoustic processing; 3.3.3. Training models automatically; 3.3.4. Operational constraints; 3.4. Description of linguistic processing; 3.4.1. Text pre-processing
3.4.2. Grapheme-to-phoneme conversion3.4.3. Syntactic-prosodic analysis; 3.4.4. Prosodic analysis; 3.5. Acoustic processing methodology; 3.5.1. Rule-based synthesis; 3.5.2. Unit-based concatenative synthesis; 3.6. Speech signal modeling; 3.6.1. The source-filter assumption; 3.6.2. Articulatory model; 3.6.3. Formant-based modeling; 3.6.4. Auto-regressive modeling; 3.6.5. Harmonic plus noise model; 3.7. Control of prosodic parameters: the PSOLA technique; 3.7.1. Methodology background; 3.7.2. The ancestors of the method; 3.7.3. Descendants of the method; 3.7.4. Evaluation
3.8. Towards variable-size acoustic units3.8.1. Constitution of the acoustic database; 3.8.2. Selection of sequences of units; 3.9. Applications and standardization; 3.10. Evaluation of speech synthesis; 3.10.1. Introduction; 3.10.2. Global evaluation; 3.10.3. Analytical evaluation; 3.10.4. Summary for speech synthesis evaluation; 3.11. Conclusions; 3.12. References; Chapter 4. Facial Animation for Visual Speech; 4.1. Introduction; 4.2. Applications of facial animation for visual speech; 4.2.1. Animation movies; 4.2.2. Telecommunications; 4.2.3. Human-machine interfaces
4.2.4. A tool for speech research
Record Nr. UNINA-9910830012803321
London, : ISTE
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Spoken language processing [[electronic resource] /] / edited by Joseph Mariani
Spoken language processing [[electronic resource] /] / edited by Joseph Mariani
Pubbl/distr/stampa London, : ISTE
Descrizione fisica 1 online resource (505 p.)
Disciplina 006.4/54
006.454
Altri autori (Persone) MarianiJoseph
Collana ISTE
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-282-25394-8
9786613814593
0-470-61118-9
0-470-39381-5
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Spoken Language Processing; Table of Contents; Preface; Chapter 1. Speech Analysis; 1.1. Introduction; 1.1.1. Source-filter model; 1.1.2. Speech sounds; 1.1.3. Sources; 1.1.4. Vocal tract; 1.1.5. Lip-radiation; 1.2. Linear prediction; 1.2.1. Source-filter model and linear prediction; 1.2.2. Autocorrelation method: algorithm; 1.2.3. Lattice filter; 1.2.4. Models of the excitation; 1.3. Short-term Fourier transform; 1.3.1. Spectrogram; 1.3.2. Interpretation in terms of filter bank; 1.3.3. Block-wise interpretation; 1.3.4. Modification and reconstruction; 1.4. A few other representations
1.4.1. Bilinear time-frequency representations1.4.2. Wavelets; 1.4.3. Cepstrum; 1.4.4. Sinusoidal and harmonic representations; 1.5. Conclusion; 1.6. References; Chapter 2. Principles of Speech Coding; 2.1. Introduction; 2.1.1. Main characteristics of a speech coder; 2.1.2. Key components of a speech coder; 2.2. Telephone-bandwidth speech coders; 2.2.1. From predictive coding to CELP; 2.2.2. Improved CELP coders; 2.2.3. Other coders for telephone speech; 2.3. Wideband speech coding; 2.3.1. Transform coding; 2.3.2. Predictive transform coding; 2.4. Audiovisual speech coding
2.4.1. A transmission channel for audiovisual speech2.4.2. Joint coding of audio and video parameters; 2.4.3. Prospects; 2.5. References; Chapter 3. Speech Synthesis; 3.1. Introduction; 3.2. Key goal: speaking for communicating; 3.2.1. What acoustic content?; 3.2.2. What melody?; 3.2.3. Beyond the strict minimum; 3.3 Synoptic presentation of the elementary modules in speech synthesis systems; 3.3.1. Linguistic processing; 3.3.2. Acoustic processing; 3.3.3. Training models automatically; 3.3.4. Operational constraints; 3.4. Description of linguistic processing; 3.4.1. Text pre-processing
3.4.2. Grapheme-to-phoneme conversion3.4.3. Syntactic-prosodic analysis; 3.4.4. Prosodic analysis; 3.5. Acoustic processing methodology; 3.5.1. Rule-based synthesis; 3.5.2. Unit-based concatenative synthesis; 3.6. Speech signal modeling; 3.6.1. The source-filter assumption; 3.6.2. Articulatory model; 3.6.3. Formant-based modeling; 3.6.4. Auto-regressive modeling; 3.6.5. Harmonic plus noise model; 3.7. Control of prosodic parameters: the PSOLA technique; 3.7.1. Methodology background; 3.7.2. The ancestors of the method; 3.7.3. Descendants of the method; 3.7.4. Evaluation
3.8. Towards variable-size acoustic units3.8.1. Constitution of the acoustic database; 3.8.2. Selection of sequences of units; 3.9. Applications and standardization; 3.10. Evaluation of speech synthesis; 3.10.1. Introduction; 3.10.2. Global evaluation; 3.10.3. Analytical evaluation; 3.10.4. Summary for speech synthesis evaluation; 3.11. Conclusions; 3.12. References; Chapter 4. Facial Animation for Visual Speech; 4.1. Introduction; 4.2. Applications of facial animation for visual speech; 4.2.1. Animation movies; 4.2.2. Telecommunications; 4.2.3. Human-machine interfaces
4.2.4. A tool for speech research
Record Nr. UNINA-9910841631303321
London, : ISTE
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Sprachkommunikation 2012 : Beiträge zur 10. ITG-Fachtagung vom 26. bis 28. September 2012 in Braunschweig / / wissenschaftliche Tagungsleitung, Prof. Dr.-Ing. Tim Fingscheidt, Technische Universität Braunschweig, Institut für Nachrichtentechnik, Prof. Dr.-Ing. Walter Kellermann, Friedrich-Alexander-Universität Erlangen-Nürnberg, Lehrstuhl für Multimediakommunikation und Signalverarbeitung ; Veranstalter, Informationstechnische Gesellschaft im VDE (ITG), ITG-Fachausschüsse 4.3 "Sprachakustik" und 4.4 "Sprachverarbeitung."
Sprachkommunikation 2012 : Beiträge zur 10. ITG-Fachtagung vom 26. bis 28. September 2012 in Braunschweig / / wissenschaftliche Tagungsleitung, Prof. Dr.-Ing. Tim Fingscheidt, Technische Universität Braunschweig, Institut für Nachrichtentechnik, Prof. Dr.-Ing. Walter Kellermann, Friedrich-Alexander-Universität Erlangen-Nürnberg, Lehrstuhl für Multimediakommunikation und Signalverarbeitung ; Veranstalter, Informationstechnische Gesellschaft im VDE (ITG), ITG-Fachausschüsse 4.3 "Sprachakustik" und 4.4 "Sprachverarbeitung."
Pubbl/distr/stampa VDE
Descrizione fisica 296 pages : illustrations ; ; 30 cm
Disciplina 006.4/54
Altri autori (Persone) FingscheidtTim <1966->
KellermannWalter <1965->
Collana ITG-Fachbericht
Soggetto topico Speech processing systems
Automatic speech recognition
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNISA-996281146803316
VDE
Materiale a stampa
Lo trovi qui: Univ. di Salerno
Opac: Controlla la disponibilità qui
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Autore Virtanen Tuomas
Edizione [1st edition]
Pubbl/distr/stampa Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Descrizione fisica 1 online resource (516 p.)
Disciplina 006.4/54
Altri autori (Persone) VirtanenTuomas
SinghRita
RajBhiksha
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-283-64550-5
1-118-39268-X
1-118-39267-1
1-118-39266-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto -- List of Contributors xv -- Acknowledgments xvii -- 1 Introduction 1 / Tuomas Virtanen, Rita Singh, Bhiksha Raj -- 1.1 Scope of the Book 1 -- 1.2 Outline 2 -- 1.3 Notation 4 -- Part One FOUNDATIONS -- 2 The Basics of Automatic Speech Recognition 9 / Rita Singh, Bhiksha Raj, Tuomas Virtanen -- 2.1 Introduction 9 -- 2.2 Speech Recognition Viewed as Bayes Classification 10 -- 2.3 Hidden Markov Models 11 -- 2.3.1 Computing Probabilities with HMMs 12 -- 2.3.2 Determining the State Sequence 17 -- 2.3.3 Learning HMM Parameters 19 -- 2.3.4 Additional Issues Relating to Speech Recognition Systems 20 -- 2.4 HMM-Based Speech Recognition 24 -- 2.4.1 Representing the Signal 24 -- 2.4.2 The HMM for a Word Sequence 25 -- 2.4.3 Searching through all Word Sequences 26 -- References 29 -- 3 The Problem of Robustness in Automatic Speech Recognition 31 / Bhiksha Raj, Tuomas Virtanen, Rita Singh -- 3.1 Errors in Bayes Classification 31 -- 3.1.1 Type 1 Condition: Mismatch Error 33 -- 3.1.2 Type 2 Condition: Increased Bayes Error 34 -- 3.2 Bayes Classification and ASR 35 -- 3.2.1 All We Have is a Model: A Type 1 Condition 35 -- 3.2.2 Intrinsic Interferences - Signal Components that are Unrelated to the Message: A Type 2 Condition 36 -- 3.2.3 External Interferences - The Data are Noisy: Type 1 and Type 2 Conditions 36 -- 3.3 External Influences on Speech Recordings 36 -- 3.3.1 Signal Capture 37 -- 3.3.2 Additive Corruptions 41 -- 3.3.3 Reverberation 42 -- 3.3.4 A Simplified Model of Signal Capture 43 -- 3.4 The Effect of External Influences on Recognition 44 -- 3.5 Improving Recognition under Adverse Conditions 46 -- 3.5.1 Handling the Model Mismatch Error 46 -- 3.5.2 Dealing with Intrinsic Variations in the Data 47 -- 3.5.3 Dealing with Extrinsic Variations 47 -- References 50 -- Part Two SIGNAL ENHANCEMENT -- 4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement 53 / Rainer Martin, Dorothea Kolossa -- 4.1 Introduction 53 -- 4.2 Signal Analysis and Synthesis 55.
4.2.1 DFT-Based Analysis Synthesis with Perfect Reconstruction 55 -- 4.2.2 Probability Distributions for Speech and Noise DFT Coefficients 57 -- 4.3 Voice Activity Detection 58 -- 4.3.1 VAD Design Principles 58 -- 4.3.2 Evaluation of VAD Performance 62 -- 4.3.3 Evaluation in the Context of ASR 62 -- 4.4 Noise Power Spectrum Estimation 65 -- 4.4.1 Smoothing Techniques 65 -- 4.4.2 Histogram and GMM Noise Estimation Methods 67 -- 4.4.3 Minimum Statistics Noise Power Estimation 67 -- 4.4.4 MMSE Noise Power Estimation 68 -- 4.4.5 Estimation of the A Priori Signal-to-Noise Ratio 69 -- 4.5 Adaptive Filters for Signal Enhancement 71 -- 4.5.1 Spectral Subtraction 71 -- 4.5.2 Nonlinear Spectral Subtraction 73 -- 4.5.3 Wiener Filtering 74 -- 4.5.4 The ETSI Advanced Front End 75 -- 4.5.5 Nonlinear MMSE Estimators 75 -- 4.6 ASR Performance 80 -- 4.7 Conclusions 81 -- References 82 -- 5 Extraction of Speech from Mixture Signals 87 / Paris Smaragdis -- 5.1 The Problem with Mixtures 87 -- 5.2 Multichannel Mixtures 88 -- 5.2.1 Basic Problem Formulation 88 -- 5.2.2 Convolutive Mixtures 92 -- 5.3 Single-Channel Mixtures 98 -- 5.3.1 Problem Formulation 98 -- 5.3.2 Learning Sound Models 100 -- 5.3.3 Separation by Spectrogram Factorization 101 -- 5.3.4 Dealing with Unknown Sounds 105 -- 5.4 Variations and Extensions 107 -- 5.5 Conclusions 107 -- References 107 -- 6 Microphone Arrays 109 / John McDonough, Kenichi Kumatani -- 6.1 Speaker Tracking 110 -- 6.2 Conventional Microphone Arrays 113 -- 6.3 Conventional Adaptive Beamforming Algorithms 120 -- 6.3.1 Minimum Variance Distortionless Response Beamformer 120 -- 6.3.2 Noise Field Models 122 -- 6.3.3 Subband Analysis and Synthesis 123 -- 6.3.4 Beamforming Performance Criteria 126 -- 6.3.5 Generalized Sidelobe Canceller Implementation 129 -- 6.3.6 Recursive Implementation of the GSC 130 -- 6.3.7 Other Conventional GSC Beamformers 131 -- 6.3.8 Beamforming based on Higher Order Statistics 132 -- 6.3.9 Online Implementation 136 -- 6.3.10 Speech-Recognition Experiments 140.
6.4 Spherical Microphone Arrays 142 -- 6.5 Spherical Adaptive Algorithms 148 -- 6.6 Comparative Studies 149 -- 6.7 Comparison of Linear and Spherical Arrays for DSR 152 -- 6.8 Conclusions and Further Reading 154 -- References 155 -- Part Three FEATURE ENHANCEMENT -- 7 From Signals to Speech Features by Digital Signal Processing 161 / Matthias WŠ olfel -- 7.1 Introduction 161 -- 7.1.1 About this Chapter 162 -- 7.2 The Speech Signal 162 -- 7.3 Spectral Processing 163 -- 7.3.1 Windowing 163 -- 7.3.2 Power Spectrum 165 -- 7.3.3 Spectral Envelopes 166 -- 7.3.4 LP Envelope 166 -- 7.3.5 MVDR Envelope 169 -- 7.3.6 Warping the Frequency Axis 171 -- 7.3.7 Warped LP Envelope 175 -- 7.3.8 Warped MVDR Envelope 176 -- 7.3.9 Comparison of Spectral Estimates 177 -- 7.3.10 The Spectrogram 179 -- 7.4 Cepstral Processing 179 -- 7.4.1 Definition and Calculation of Cepstral Coefficients 180 -- 7.4.2 Characteristics of Cepstral Sequences 181 -- 7.5 Influence of Distortions on Different Speech Features 182 -- 7.5.1 Objective Functions 182 -- 7.5.2 Robustness against Noise 185 -- 7.5.3 Robustness against Echo and Reverberation 187 -- 7.5.4 Robustness against Changes in Fundamental Frequency 189 -- 7.6 Summary and Further Reading 191 -- References 191 -- 8 Features Based on Auditory Physiology and Perception 193 / Richard M. Stern, Nelson Morgan -- 8.1 Introduction 193 -- 8.2 Some Attributes of Auditory Physiology and Perception 194 -- 8.2.1 Peripheral Processing 194 -- 8.2.2 Processing at more Central Levels 200 -- 8.2.3 Psychoacoustical Correlates of Physiological Observations 202 -- 8.2.4 The Impact of Auditory Processing on Conventional Feature Extraction 206 -- 8.2.5 Summary 208 -- 8.3 “Classic” Auditory Representations 208 -- 8.4 Current Trends in Auditory Feature Analysis 213 -- 8.5 Summary 221 -- Acknowledgments 222 -- References 222 -- 9 Feature Compensation 229 / Jasha Droppo -- 9.1 Life in an Ideal World 229 -- 9.1.1 Noise Robustness Tasks 229 -- 9.1.2 Probabilistic Feature Enhancement 230.
9.1.3 Gaussian Mixture Models 231 -- 9.2 MMSE-SPLICE 232 -- 9.2.1 Parameter Estimation 233 -- 9.2.2 Results 236 -- 9.3 Discriminative SPLICE 237 -- 9.3.1 The MMI Objective Function 238 -- 9.3.2 Training the Front-End Parameters 239 -- 9.3.3 The Rprop Algorithm 240 -- 9.3.4 Results 241 -- 9.4 Model-Based Feature Enhancement 242 -- 9.4.1 The Additive Noise-Mixing Equation 243 -- 9.4.2 The Joint Probability Model 244 -- 9.4.3 Vector Taylor Series Approximation 246 -- 9.4.4 Estimating Clean Speech 247 -- 9.4.5 Results 247 -- 9.5 Switching Linear Dynamic System 248 -- 9.6 Conclusion 249 -- References 249 -- 10 Reverberant Speech Recognition 251 / Reinhold Haeb-Umbach, Alexander Krueger -- 10.1 Introduction 251 -- 10.2 The Effect of Reverberation 252 -- 10.2.1 What is Reverberation? 252 -- 10.2.2 The Relationship between Clean and Reverberant Speech Features 254 -- 10.2.3 The Effect of Reverberation on ASR Performance 258 -- 10.3 Approaches to Reverberant Speech Recognition 258 -- 10.3.1 Signal-Based Techniques 259 -- 10.3.2 Front-End Techniques 260 -- 10.3.3 Back-End Techniques 262 -- 10.3.4 Concluding Remarks 265 -- 10.4 Feature Domain Model of the Acoustic Impulse Response 265 -- 10.5 Bayesian Feature Enhancement 267 -- 10.5.1 Basic Approach 268 -- 10.5.2 Measurement Update 269 -- 10.5.3 Time Update 270 -- 10.5.4 Inference 271 -- 10.6 Experimental Results 272 -- 10.6.1 Databases 272 -- 10.6.2 Overview of the Tested Methods 273 -- 10.6.3 Recognition Results on Reverberant Speech 274 -- 10.6.4 Recognition Results on Noisy Reverberant Speech 276 -- 10.7 Conclusions 277 -- Acknowledgment 278 -- References 278 -- Part Four MODEL ENHANCEMENT -- 11 Adaptation and Discriminative Training of Acoustic Models 285 / Yannick Est`eve, Paul Del'eglise -- 11.1 Introduction 285 -- 11.1.1 Acoustic Models 286 -- 11.1.2 Maximum Likelihood Estimation 287 -- 11.2 Acoustic Model Adaptation and Noise Robustness 288 -- 11.2.1 Static (or Offline) Adaptation 289 -- 11.2.2 Dynamic (or Online) Adaptation 289.
11.3 Maximum A Posteriori Reestimation 290 -- 11.4 Maximum Likelihood Linear Regression 293 -- 11.4.1 Class Regression Tree 294 -- 11.4.2 Constrained Maximum Likelihood Linear Regression 297 -- 11.4.3 CMLLR Implementation 297 -- 11.4.4 Speaker Adaptive Training 298 -- 11.5 Discriminative Training 299 -- 11.5.1 MMI Discriminative Training Criterion 301 -- 11.5.2 MPE Discriminative Training Criterion 302 -- 11.5.3 I-smoothing 303 -- 11.5.4 MPE Implementation 304 -- 11.6 Conclusion 307 -- References 308 -- 12 Factorial Models for Noise Robust Speech Recognition 311 / John R. Hershey, Steven J. Rennie, Jonathan Le Roux -- 12.1 Introduction 311 -- 12.2 The Model-Based Approach 313 -- 12.3 Signal Feature Domains 314 -- 12.4 Interaction Models 317 -- 12.4.1 Exact Interaction Model 318 -- 12.4.2 Max Model 320 -- 12.4.3 Log-Sum Model 321 -- 12.4.4 Mel Interaction Model 321 -- 12.5 Inference Methods 322 -- 12.5.1 Max Model Inference 322 -- 12.5.2 Parallel Model Combination 324 -- 12.5.3 Vector Taylor Series Approaches 326 -- 12.5.4 SNR-Dependent Approaches 331 -- 12.6 Efficient Likelihood Evaluation in Factorial Models 332 -- 12.6.1 Efficient Inference using the Max Model 332 -- 12.6.2 Efficient Vector-Taylor Series Approaches 334 -- 12.6.3 Band Quantization 335 -- 12.7 Current Directions 337 -- 12.7.1 Dynamic Noise Models for Robust ASR 338 -- 12.7.2 Multi-Talker Speech Recognition using Graphical Models 339 -- 12.7.3 Noise Robust ASR using Non-Negative Basis Representations 340 -- References 341 -- 13 Acoustic Model Training for Robust Speech Recognition 347 / Michael L. Seltzer -- 13.1 Introduction 347 -- 13.2 Traditional Training Methods for Robust Speech Recognition 348 -- 13.3 A Brief Overview of Speaker Adaptive Training 349 -- 13.4 Feature-Space Noise Adaptive Training 351 -- 13.4.1 Experiments using fNAT 352 -- 13.5 Model-Space Noise Adaptive Training 353 -- 13.6 Noise Adaptive Training using VTS Adaptation 355 -- 13.6.1 Vector Taylor Series HMM Adaptation 355 -- 13.6.2 Updating the Acoustic Model Parameters 357.
13.6.3 Updating the Environmental Parameters 360 -- 13.6.4 Implementation Details 360 -- 13.6.5 Experiments using NAT 361 -- 13.7 Discussion 364 -- 13.7.1 Comparison of Training Algorithms 364 -- 13.7.2 Comparison to Speaker Adaptive Training 364 -- 13.7.3 Related Adaptive Training Methods 365 -- 13.8 Conclusion 366 -- References 366 -- Part Five COMPENSATION FOR INFORMATION LOSS -- 14 Missing-Data Techniques: Recognition with Incomplete Spectrograms 371 / Jon Barker -- 14.1 Introduction 371 -- 14.2 Classification with Incomplete Data 373 -- 14.2.1 A Simple Missing Data Scenario 374 -- 14.2.2 Missing Data Theory 376 -- 14.2.3 Validity of the MAR Assumption 378 -- 14.2.4 Marginalising Acoustic Models 379 -- 14.3 Energetic Masking 381 -- 14.3.1 The Max Approximation 381 -- 14.3.2 Bounded Marginalisation 382 -- 14.3.3 Missing Data ASR in the Cepstral Domain 384 -- 14.3.4 Missing Data ASR with Dynamic Features 386 -- 14.4 Meta-Missing Data: Dealing with Mask Uncertainty 388 -- 14.4.1 Missing Data with Soft Masks 388 -- 14.4.2 Sub-band Combination Approaches 391 -- 14.4.3 Speech Fragment Decoding 393 -- 14.5 Some Perspectives on Performance 395 -- References 396 -- 15 Missing-Data Techniques: Feature Reconstruction 399 / Jort Florent Gemmeke, Ulpu Remes -- 15.1 Introduction 399 -- 15.2 Missing-Data Techniques 401 -- 15.3 Correlation-Based Imputation 402 -- 15.3.1 Fundamentals 402 -- 15.3.2 Implementation 404 -- 15.4 Cluster-Based Imputation 406 -- 15.4.1 Fundamentals 406 -- 15.4.2 Implementation 408 -- 15.4.3 Advances 409 -- 15.5 Class-Conditioned Imputation 411 -- 15.5.1 Fundamentals 411 -- 15.5.2 Implementation 412 -- 15.5.3 Advances 413 -- 15.6 Sparse Imputation 414 -- 15.6.1 Fundamentals 414 -- 15.6.2 Implementation 416 -- 15.6.3 Advances 418 -- 15.7 Other Feature-Reconstruction Methods 420 -- 15.7.1 Parametric Approaches 420 -- 15.7.2 Nonparametric Approaches 421 -- 15.8 Experimental Results 421 -- 15.8.1 Feature-Reconstruction Methods 422 -- 15.8.2 Comparison with Other Methods 424.
15.8.3 Advances 426 -- 15.8.4 Combination with Other Methods 427 -- 15.9 Discussion and Conclusion 428 -- Acknowledgments 429 -- References 430 -- 16 Computational Auditory Scene Analysis and Automatic Speech Recognition 433 / Arun Narayanan, DeLiang Wang -- 16.1 Introduction 433 -- 16.2 Auditory Scene Analysis 434 -- 16.3 Computational Auditory Scene Analysis 435 -- 16.3.1 Ideal Binary Mask 435 -- 16.3.2 Typical CASA Architecture 438 -- 16.4 CASA Strategies 440 -- 16.4.1 IBM Estimation Based on Local SNR Estimates 440 -- 16.4.2 IBM Estimation using ASA Cues 442 -- 16.4.3 IBM Estimation as Binary Classification 448 -- 16.4.4 Binaural Mask Estimation Strategies 451 -- 16.5 Integrating CASA with ASR 452 -- 16.5.1 Uncertainty Transform Model 454 -- 16.6 Concluding Remarks 458 -- Acknowledgment 458 -- References 458 -- 17 Uncertainty Decoding 463 / Hank Liao -- 17.1 Introduction 463 -- 17.2 Observation Uncertainty 465 -- 17.3 Uncertainty Decoding 466 -- 17.4 Feature-Based Uncertainty Decoding 468 -- 17.4.1 SPLICE with Uncertainty 470 -- 17.4.2 Front-End Joint Uncertainty Decoding 471 -- 17.4.3 Issues with Feature-Based Uncertainty Decoding 472 -- 17.5 Model-Based Joint Uncertainty Decoding 473 -- 17.5.1 Parameter Estimation 475 -- 17.5.2 Comparisons with Other Methods 476 -- 17.6 Noisy CMLLR 477 -- 17.7 Uncertainty and Adaptive Training 480 -- 17.7.1 Gradient-Based Methods 481 -- 17.7.2 Factor Analysis Approaches 482 -- 17.8 In Combination with Other Techniques 483 -- 17.9 Conclusions 484 -- References 485 -- Index 487.
Record Nr. UNINA-9910141378103321
Virtanen Tuomas  
Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Techniques for noise robustness in automatic speech recognition / / editors, Tuomas Virtanen, Rita Singh, Bhiksha Raj
Autore Virtanen Tuomas
Edizione [1st edition]
Pubbl/distr/stampa Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Descrizione fisica 1 online resource (516 p.)
Disciplina 006.4/54
Altri autori (Persone) VirtanenTuomas
SinghRita
RajBhiksha
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-283-64550-5
1-118-39268-X
1-118-39267-1
1-118-39266-3
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto -- List of Contributors xv -- Acknowledgments xvii -- 1 Introduction 1 / Tuomas Virtanen, Rita Singh, Bhiksha Raj -- 1.1 Scope of the Book 1 -- 1.2 Outline 2 -- 1.3 Notation 4 -- Part One FOUNDATIONS -- 2 The Basics of Automatic Speech Recognition 9 / Rita Singh, Bhiksha Raj, Tuomas Virtanen -- 2.1 Introduction 9 -- 2.2 Speech Recognition Viewed as Bayes Classification 10 -- 2.3 Hidden Markov Models 11 -- 2.3.1 Computing Probabilities with HMMs 12 -- 2.3.2 Determining the State Sequence 17 -- 2.3.3 Learning HMM Parameters 19 -- 2.3.4 Additional Issues Relating to Speech Recognition Systems 20 -- 2.4 HMM-Based Speech Recognition 24 -- 2.4.1 Representing the Signal 24 -- 2.4.2 The HMM for a Word Sequence 25 -- 2.4.3 Searching through all Word Sequences 26 -- References 29 -- 3 The Problem of Robustness in Automatic Speech Recognition 31 / Bhiksha Raj, Tuomas Virtanen, Rita Singh -- 3.1 Errors in Bayes Classification 31 -- 3.1.1 Type 1 Condition: Mismatch Error 33 -- 3.1.2 Type 2 Condition: Increased Bayes Error 34 -- 3.2 Bayes Classification and ASR 35 -- 3.2.1 All We Have is a Model: A Type 1 Condition 35 -- 3.2.2 Intrinsic Interferences - Signal Components that are Unrelated to the Message: A Type 2 Condition 36 -- 3.2.3 External Interferences - The Data are Noisy: Type 1 and Type 2 Conditions 36 -- 3.3 External Influences on Speech Recordings 36 -- 3.3.1 Signal Capture 37 -- 3.3.2 Additive Corruptions 41 -- 3.3.3 Reverberation 42 -- 3.3.4 A Simplified Model of Signal Capture 43 -- 3.4 The Effect of External Influences on Recognition 44 -- 3.5 Improving Recognition under Adverse Conditions 46 -- 3.5.1 Handling the Model Mismatch Error 46 -- 3.5.2 Dealing with Intrinsic Variations in the Data 47 -- 3.5.3 Dealing with Extrinsic Variations 47 -- References 50 -- Part Two SIGNAL ENHANCEMENT -- 4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement 53 / Rainer Martin, Dorothea Kolossa -- 4.1 Introduction 53 -- 4.2 Signal Analysis and Synthesis 55.
4.2.1 DFT-Based Analysis Synthesis with Perfect Reconstruction 55 -- 4.2.2 Probability Distributions for Speech and Noise DFT Coefficients 57 -- 4.3 Voice Activity Detection 58 -- 4.3.1 VAD Design Principles 58 -- 4.3.2 Evaluation of VAD Performance 62 -- 4.3.3 Evaluation in the Context of ASR 62 -- 4.4 Noise Power Spectrum Estimation 65 -- 4.4.1 Smoothing Techniques 65 -- 4.4.2 Histogram and GMM Noise Estimation Methods 67 -- 4.4.3 Minimum Statistics Noise Power Estimation 67 -- 4.4.4 MMSE Noise Power Estimation 68 -- 4.4.5 Estimation of the A Priori Signal-to-Noise Ratio 69 -- 4.5 Adaptive Filters for Signal Enhancement 71 -- 4.5.1 Spectral Subtraction 71 -- 4.5.2 Nonlinear Spectral Subtraction 73 -- 4.5.3 Wiener Filtering 74 -- 4.5.4 The ETSI Advanced Front End 75 -- 4.5.5 Nonlinear MMSE Estimators 75 -- 4.6 ASR Performance 80 -- 4.7 Conclusions 81 -- References 82 -- 5 Extraction of Speech from Mixture Signals 87 / Paris Smaragdis -- 5.1 The Problem with Mixtures 87 -- 5.2 Multichannel Mixtures 88 -- 5.2.1 Basic Problem Formulation 88 -- 5.2.2 Convolutive Mixtures 92 -- 5.3 Single-Channel Mixtures 98 -- 5.3.1 Problem Formulation 98 -- 5.3.2 Learning Sound Models 100 -- 5.3.3 Separation by Spectrogram Factorization 101 -- 5.3.4 Dealing with Unknown Sounds 105 -- 5.4 Variations and Extensions 107 -- 5.5 Conclusions 107 -- References 107 -- 6 Microphone Arrays 109 / John McDonough, Kenichi Kumatani -- 6.1 Speaker Tracking 110 -- 6.2 Conventional Microphone Arrays 113 -- 6.3 Conventional Adaptive Beamforming Algorithms 120 -- 6.3.1 Minimum Variance Distortionless Response Beamformer 120 -- 6.3.2 Noise Field Models 122 -- 6.3.3 Subband Analysis and Synthesis 123 -- 6.3.4 Beamforming Performance Criteria 126 -- 6.3.5 Generalized Sidelobe Canceller Implementation 129 -- 6.3.6 Recursive Implementation of the GSC 130 -- 6.3.7 Other Conventional GSC Beamformers 131 -- 6.3.8 Beamforming based on Higher Order Statistics 132 -- 6.3.9 Online Implementation 136 -- 6.3.10 Speech-Recognition Experiments 140.
6.4 Spherical Microphone Arrays 142 -- 6.5 Spherical Adaptive Algorithms 148 -- 6.6 Comparative Studies 149 -- 6.7 Comparison of Linear and Spherical Arrays for DSR 152 -- 6.8 Conclusions and Further Reading 154 -- References 155 -- Part Three FEATURE ENHANCEMENT -- 7 From Signals to Speech Features by Digital Signal Processing 161 / Matthias WŠ olfel -- 7.1 Introduction 161 -- 7.1.1 About this Chapter 162 -- 7.2 The Speech Signal 162 -- 7.3 Spectral Processing 163 -- 7.3.1 Windowing 163 -- 7.3.2 Power Spectrum 165 -- 7.3.3 Spectral Envelopes 166 -- 7.3.4 LP Envelope 166 -- 7.3.5 MVDR Envelope 169 -- 7.3.6 Warping the Frequency Axis 171 -- 7.3.7 Warped LP Envelope 175 -- 7.3.8 Warped MVDR Envelope 176 -- 7.3.9 Comparison of Spectral Estimates 177 -- 7.3.10 The Spectrogram 179 -- 7.4 Cepstral Processing 179 -- 7.4.1 Definition and Calculation of Cepstral Coefficients 180 -- 7.4.2 Characteristics of Cepstral Sequences 181 -- 7.5 Influence of Distortions on Different Speech Features 182 -- 7.5.1 Objective Functions 182 -- 7.5.2 Robustness against Noise 185 -- 7.5.3 Robustness against Echo and Reverberation 187 -- 7.5.4 Robustness against Changes in Fundamental Frequency 189 -- 7.6 Summary and Further Reading 191 -- References 191 -- 8 Features Based on Auditory Physiology and Perception 193 / Richard M. Stern, Nelson Morgan -- 8.1 Introduction 193 -- 8.2 Some Attributes of Auditory Physiology and Perception 194 -- 8.2.1 Peripheral Processing 194 -- 8.2.2 Processing at more Central Levels 200 -- 8.2.3 Psychoacoustical Correlates of Physiological Observations 202 -- 8.2.4 The Impact of Auditory Processing on Conventional Feature Extraction 206 -- 8.2.5 Summary 208 -- 8.3 “Classic” Auditory Representations 208 -- 8.4 Current Trends in Auditory Feature Analysis 213 -- 8.5 Summary 221 -- Acknowledgments 222 -- References 222 -- 9 Feature Compensation 229 / Jasha Droppo -- 9.1 Life in an Ideal World 229 -- 9.1.1 Noise Robustness Tasks 229 -- 9.1.2 Probabilistic Feature Enhancement 230.
9.1.3 Gaussian Mixture Models 231 -- 9.2 MMSE-SPLICE 232 -- 9.2.1 Parameter Estimation 233 -- 9.2.2 Results 236 -- 9.3 Discriminative SPLICE 237 -- 9.3.1 The MMI Objective Function 238 -- 9.3.2 Training the Front-End Parameters 239 -- 9.3.3 The Rprop Algorithm 240 -- 9.3.4 Results 241 -- 9.4 Model-Based Feature Enhancement 242 -- 9.4.1 The Additive Noise-Mixing Equation 243 -- 9.4.2 The Joint Probability Model 244 -- 9.4.3 Vector Taylor Series Approximation 246 -- 9.4.4 Estimating Clean Speech 247 -- 9.4.5 Results 247 -- 9.5 Switching Linear Dynamic System 248 -- 9.6 Conclusion 249 -- References 249 -- 10 Reverberant Speech Recognition 251 / Reinhold Haeb-Umbach, Alexander Krueger -- 10.1 Introduction 251 -- 10.2 The Effect of Reverberation 252 -- 10.2.1 What is Reverberation? 252 -- 10.2.2 The Relationship between Clean and Reverberant Speech Features 254 -- 10.2.3 The Effect of Reverberation on ASR Performance 258 -- 10.3 Approaches to Reverberant Speech Recognition 258 -- 10.3.1 Signal-Based Techniques 259 -- 10.3.2 Front-End Techniques 260 -- 10.3.3 Back-End Techniques 262 -- 10.3.4 Concluding Remarks 265 -- 10.4 Feature Domain Model of the Acoustic Impulse Response 265 -- 10.5 Bayesian Feature Enhancement 267 -- 10.5.1 Basic Approach 268 -- 10.5.2 Measurement Update 269 -- 10.5.3 Time Update 270 -- 10.5.4 Inference 271 -- 10.6 Experimental Results 272 -- 10.6.1 Databases 272 -- 10.6.2 Overview of the Tested Methods 273 -- 10.6.3 Recognition Results on Reverberant Speech 274 -- 10.6.4 Recognition Results on Noisy Reverberant Speech 276 -- 10.7 Conclusions 277 -- Acknowledgment 278 -- References 278 -- Part Four MODEL ENHANCEMENT -- 11 Adaptation and Discriminative Training of Acoustic Models 285 / Yannick Est`eve, Paul Del'eglise -- 11.1 Introduction 285 -- 11.1.1 Acoustic Models 286 -- 11.1.2 Maximum Likelihood Estimation 287 -- 11.2 Acoustic Model Adaptation and Noise Robustness 288 -- 11.2.1 Static (or Offline) Adaptation 289 -- 11.2.2 Dynamic (or Online) Adaptation 289.
11.3 Maximum A Posteriori Reestimation 290 -- 11.4 Maximum Likelihood Linear Regression 293 -- 11.4.1 Class Regression Tree 294 -- 11.4.2 Constrained Maximum Likelihood Linear Regression 297 -- 11.4.3 CMLLR Implementation 297 -- 11.4.4 Speaker Adaptive Training 298 -- 11.5 Discriminative Training 299 -- 11.5.1 MMI Discriminative Training Criterion 301 -- 11.5.2 MPE Discriminative Training Criterion 302 -- 11.5.3 I-smoothing 303 -- 11.5.4 MPE Implementation 304 -- 11.6 Conclusion 307 -- References 308 -- 12 Factorial Models for Noise Robust Speech Recognition 311 / John R. Hershey, Steven J. Rennie, Jonathan Le Roux -- 12.1 Introduction 311 -- 12.2 The Model-Based Approach 313 -- 12.3 Signal Feature Domains 314 -- 12.4 Interaction Models 317 -- 12.4.1 Exact Interaction Model 318 -- 12.4.2 Max Model 320 -- 12.4.3 Log-Sum Model 321 -- 12.4.4 Mel Interaction Model 321 -- 12.5 Inference Methods 322 -- 12.5.1 Max Model Inference 322 -- 12.5.2 Parallel Model Combination 324 -- 12.5.3 Vector Taylor Series Approaches 326 -- 12.5.4 SNR-Dependent Approaches 331 -- 12.6 Efficient Likelihood Evaluation in Factorial Models 332 -- 12.6.1 Efficient Inference using the Max Model 332 -- 12.6.2 Efficient Vector-Taylor Series Approaches 334 -- 12.6.3 Band Quantization 335 -- 12.7 Current Directions 337 -- 12.7.1 Dynamic Noise Models for Robust ASR 338 -- 12.7.2 Multi-Talker Speech Recognition using Graphical Models 339 -- 12.7.3 Noise Robust ASR using Non-Negative Basis Representations 340 -- References 341 -- 13 Acoustic Model Training for Robust Speech Recognition 347 / Michael L. Seltzer -- 13.1 Introduction 347 -- 13.2 Traditional Training Methods for Robust Speech Recognition 348 -- 13.3 A Brief Overview of Speaker Adaptive Training 349 -- 13.4 Feature-Space Noise Adaptive Training 351 -- 13.4.1 Experiments using fNAT 352 -- 13.5 Model-Space Noise Adaptive Training 353 -- 13.6 Noise Adaptive Training using VTS Adaptation 355 -- 13.6.1 Vector Taylor Series HMM Adaptation 355 -- 13.6.2 Updating the Acoustic Model Parameters 357.
13.6.3 Updating the Environmental Parameters 360 -- 13.6.4 Implementation Details 360 -- 13.6.5 Experiments using NAT 361 -- 13.7 Discussion 364 -- 13.7.1 Comparison of Training Algorithms 364 -- 13.7.2 Comparison to Speaker Adaptive Training 364 -- 13.7.3 Related Adaptive Training Methods 365 -- 13.8 Conclusion 366 -- References 366 -- Part Five COMPENSATION FOR INFORMATION LOSS -- 14 Missing-Data Techniques: Recognition with Incomplete Spectrograms 371 / Jon Barker -- 14.1 Introduction 371 -- 14.2 Classification with Incomplete Data 373 -- 14.2.1 A Simple Missing Data Scenario 374 -- 14.2.2 Missing Data Theory 376 -- 14.2.3 Validity of the MAR Assumption 378 -- 14.2.4 Marginalising Acoustic Models 379 -- 14.3 Energetic Masking 381 -- 14.3.1 The Max Approximation 381 -- 14.3.2 Bounded Marginalisation 382 -- 14.3.3 Missing Data ASR in the Cepstral Domain 384 -- 14.3.4 Missing Data ASR with Dynamic Features 386 -- 14.4 Meta-Missing Data: Dealing with Mask Uncertainty 388 -- 14.4.1 Missing Data with Soft Masks 388 -- 14.4.2 Sub-band Combination Approaches 391 -- 14.4.3 Speech Fragment Decoding 393 -- 14.5 Some Perspectives on Performance 395 -- References 396 -- 15 Missing-Data Techniques: Feature Reconstruction 399 / Jort Florent Gemmeke, Ulpu Remes -- 15.1 Introduction 399 -- 15.2 Missing-Data Techniques 401 -- 15.3 Correlation-Based Imputation 402 -- 15.3.1 Fundamentals 402 -- 15.3.2 Implementation 404 -- 15.4 Cluster-Based Imputation 406 -- 15.4.1 Fundamentals 406 -- 15.4.2 Implementation 408 -- 15.4.3 Advances 409 -- 15.5 Class-Conditioned Imputation 411 -- 15.5.1 Fundamentals 411 -- 15.5.2 Implementation 412 -- 15.5.3 Advances 413 -- 15.6 Sparse Imputation 414 -- 15.6.1 Fundamentals 414 -- 15.6.2 Implementation 416 -- 15.6.3 Advances 418 -- 15.7 Other Feature-Reconstruction Methods 420 -- 15.7.1 Parametric Approaches 420 -- 15.7.2 Nonparametric Approaches 421 -- 15.8 Experimental Results 421 -- 15.8.1 Feature-Reconstruction Methods 422 -- 15.8.2 Comparison with Other Methods 424.
15.8.3 Advances 426 -- 15.8.4 Combination with Other Methods 427 -- 15.9 Discussion and Conclusion 428 -- Acknowledgments 429 -- References 430 -- 16 Computational Auditory Scene Analysis and Automatic Speech Recognition 433 / Arun Narayanan, DeLiang Wang -- 16.1 Introduction 433 -- 16.2 Auditory Scene Analysis 434 -- 16.3 Computational Auditory Scene Analysis 435 -- 16.3.1 Ideal Binary Mask 435 -- 16.3.2 Typical CASA Architecture 438 -- 16.4 CASA Strategies 440 -- 16.4.1 IBM Estimation Based on Local SNR Estimates 440 -- 16.4.2 IBM Estimation using ASA Cues 442 -- 16.4.3 IBM Estimation as Binary Classification 448 -- 16.4.4 Binaural Mask Estimation Strategies 451 -- 16.5 Integrating CASA with ASR 452 -- 16.5.1 Uncertainty Transform Model 454 -- 16.6 Concluding Remarks 458 -- Acknowledgment 458 -- References 458 -- 17 Uncertainty Decoding 463 / Hank Liao -- 17.1 Introduction 463 -- 17.2 Observation Uncertainty 465 -- 17.3 Uncertainty Decoding 466 -- 17.4 Feature-Based Uncertainty Decoding 468 -- 17.4.1 SPLICE with Uncertainty 470 -- 17.4.2 Front-End Joint Uncertainty Decoding 471 -- 17.4.3 Issues with Feature-Based Uncertainty Decoding 472 -- 17.5 Model-Based Joint Uncertainty Decoding 473 -- 17.5.1 Parameter Estimation 475 -- 17.5.2 Comparisons with Other Methods 476 -- 17.6 Noisy CMLLR 477 -- 17.7 Uncertainty and Adaptive Training 480 -- 17.7.1 Gradient-Based Methods 481 -- 17.7.2 Factor Analysis Approaches 482 -- 17.8 In Combination with Other Techniques 483 -- 17.9 Conclusions 484 -- References 485 -- Index 487.
Record Nr. UNINA-9910820996203321
Virtanen Tuomas  
Chichester, West Sussex, U.K. ; , : Wiley, , 2012
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Twilio cookbook : over 70 easy-to-follow recipes, from exploring the key features of Twilio to building advanced telephony apps / / Roger Stringer ; cover image by Rick Cartledge
Twilio cookbook : over 70 easy-to-follow recipes, from exploring the key features of Twilio to building advanced telephony apps / / Roger Stringer ; cover image by Rick Cartledge
Autore Stringer Roger
Edizione [Second edition.]
Pubbl/distr/stampa Birmingham, England : , : Packt Publishing, , 2014
Descrizione fisica 1 online resource (334 p.)
Disciplina 006.454
Collana Quick answers to common problems
Soggetto topico Automatic speech recognition
Speech processing systems
Soggetto genere / forma Electronic books.
ISBN 1-78355-066-X
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Into the Frying Pan; Introduction; Adding two-factor voice authentication to verify users; Using Twilio SMS to set up two-factor authentication for secure websites; Adding order verification; Adding the Click-to-Call functionality to your website; Recording a phone call; Setting up a company directory; Setting up Text-to-Speech; Chapter 2: Now We're Cooking; Introduction; Tracking account usage; Screening calls; Buying a phone number; Setting up a voicemail system
Building an emergency calling systemChapter 3: Conducting Surveys via SMS; Introduction; Why use PDO instead of the standard MySQL functions?; Letting users subscribe to receive surveys; Building a survey tree; Sending a survey to your users; Adding tracking for each user; Listening to user responses and commands; Building a chart of responses; Chapter 4: Building a Conference Calling System; Introduction; Scheduling a conference call; Sending an SMS to all participants at the time of the call; Starting and recording a conference; Joining a conference call from the web browser
Monitoring the conference callMuting a participant; Chapter 5: Combining Twilio with Other APIs; Introduction; Searching for local businesses via text; Getting the local weather forecast; Searching for local movie listings; Searching for classifieds; Getting local TV listings; Searching Google using SMS; Searching the stock market; Getting the latest headlines; Chapter 6: Sending and Receiving SMS Messages; Introduction; Sending a message from a website; Replying to a message from the phone; Forwarding SMS messages to another phone number; Sending bulk SMS to a list of contacts
Tracking orders with SMSSending and receiving group chats; Sending SMS messages in a phone call; Monitoring a website; Chapter 7: Building a Reminder System; Introduction; Scheduling reminders via text; Getting notified when the time comes; Retrieving a list of upcoming reminders; Canceling an upcoming reminder; Adding another person to a reminder; Chapter 8: Building an IVR System; Introduction; Setting up IVRs; Screening and recording calls; Logging and reporting calls; Looking up HighriseHQ contacts on incoming calls; Getting directions; Leaving a message
Sending an SMS to your Salesforce.com contactsChapter 9: Building Your Own PBX; Introduction; Getting started with PBX; Setting up a subaccount for each user; Letting a user purchase a custom phone number; Allowing users to make calls from their call logs; Allowing incoming phone calls; Allowing outgoing phone calls; Deleting a subaccount; Chapter 10: Digging into OpenVBX; Introduction; Building a call log plugin; Building a searchable company directory; Collecting Stripe payments; Tracking orders; Building a caller ID routing plugin; Testing call flows
Chapter 11: Sending and Receiving Picture Messages
Record Nr. UNINA-9910453753003321
Stringer Roger  
Birmingham, England : , : Packt Publishing, , 2014
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Twilio cookbook : over 70 easy-to-follow recipes, from exploring the key features of Twilio to building advanced telephony apps / / Roger Stringer ; cover image by Rick Cartledge
Twilio cookbook : over 70 easy-to-follow recipes, from exploring the key features of Twilio to building advanced telephony apps / / Roger Stringer ; cover image by Rick Cartledge
Autore Stringer Roger
Edizione [Second edition.]
Pubbl/distr/stampa Birmingham, England : , : Packt Publishing, , 2014
Descrizione fisica 1 online resource (334 p.)
Disciplina 006.454
Collana Quick answers to common problems
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-78355-066-X
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Into the Frying Pan; Introduction; Adding two-factor voice authentication to verify users; Using Twilio SMS to set up two-factor authentication for secure websites; Adding order verification; Adding the Click-to-Call functionality to your website; Recording a phone call; Setting up a company directory; Setting up Text-to-Speech; Chapter 2: Now We're Cooking; Introduction; Tracking account usage; Screening calls; Buying a phone number; Setting up a voicemail system
Building an emergency calling systemChapter 3: Conducting Surveys via SMS; Introduction; Why use PDO instead of the standard MySQL functions?; Letting users subscribe to receive surveys; Building a survey tree; Sending a survey to your users; Adding tracking for each user; Listening to user responses and commands; Building a chart of responses; Chapter 4: Building a Conference Calling System; Introduction; Scheduling a conference call; Sending an SMS to all participants at the time of the call; Starting and recording a conference; Joining a conference call from the web browser
Monitoring the conference callMuting a participant; Chapter 5: Combining Twilio with Other APIs; Introduction; Searching for local businesses via text; Getting the local weather forecast; Searching for local movie listings; Searching for classifieds; Getting local TV listings; Searching Google using SMS; Searching the stock market; Getting the latest headlines; Chapter 6: Sending and Receiving SMS Messages; Introduction; Sending a message from a website; Replying to a message from the phone; Forwarding SMS messages to another phone number; Sending bulk SMS to a list of contacts
Tracking orders with SMSSending and receiving group chats; Sending SMS messages in a phone call; Monitoring a website; Chapter 7: Building a Reminder System; Introduction; Scheduling reminders via text; Getting notified when the time comes; Retrieving a list of upcoming reminders; Canceling an upcoming reminder; Adding another person to a reminder; Chapter 8: Building an IVR System; Introduction; Setting up IVRs; Screening and recording calls; Logging and reporting calls; Looking up HighriseHQ contacts on incoming calls; Getting directions; Leaving a message
Sending an SMS to your Salesforce.com contactsChapter 9: Building Your Own PBX; Introduction; Getting started with PBX; Setting up a subaccount for each user; Letting a user purchase a custom phone number; Allowing users to make calls from their call logs; Allowing incoming phone calls; Allowing outgoing phone calls; Deleting a subaccount; Chapter 10: Digging into OpenVBX; Introduction; Building a call log plugin; Building a searchable company directory; Collecting Stripe payments; Tracking orders; Building a caller ID routing plugin; Testing call flows
Chapter 11: Sending and Receiving Picture Messages
Record Nr. UNINA-9910791060203321
Stringer Roger  
Birmingham, England : , : Packt Publishing, , 2014
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Twilio cookbook : over 70 easy-to-follow recipes, from exploring the key features of Twilio to building advanced telephony apps / / Roger Stringer ; cover image by Rick Cartledge
Twilio cookbook : over 70 easy-to-follow recipes, from exploring the key features of Twilio to building advanced telephony apps / / Roger Stringer ; cover image by Rick Cartledge
Autore Stringer Roger
Edizione [Second edition.]
Pubbl/distr/stampa Birmingham, England : , : Packt Publishing, , 2014
Descrizione fisica 1 online resource (334 p.)
Disciplina 006.454
Collana Quick answers to common problems
Soggetto topico Automatic speech recognition
Speech processing systems
ISBN 1-78355-066-X
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Into the Frying Pan; Introduction; Adding two-factor voice authentication to verify users; Using Twilio SMS to set up two-factor authentication for secure websites; Adding order verification; Adding the Click-to-Call functionality to your website; Recording a phone call; Setting up a company directory; Setting up Text-to-Speech; Chapter 2: Now We're Cooking; Introduction; Tracking account usage; Screening calls; Buying a phone number; Setting up a voicemail system
Building an emergency calling systemChapter 3: Conducting Surveys via SMS; Introduction; Why use PDO instead of the standard MySQL functions?; Letting users subscribe to receive surveys; Building a survey tree; Sending a survey to your users; Adding tracking for each user; Listening to user responses and commands; Building a chart of responses; Chapter 4: Building a Conference Calling System; Introduction; Scheduling a conference call; Sending an SMS to all participants at the time of the call; Starting and recording a conference; Joining a conference call from the web browser
Monitoring the conference callMuting a participant; Chapter 5: Combining Twilio with Other APIs; Introduction; Searching for local businesses via text; Getting the local weather forecast; Searching for local movie listings; Searching for classifieds; Getting local TV listings; Searching Google using SMS; Searching the stock market; Getting the latest headlines; Chapter 6: Sending and Receiving SMS Messages; Introduction; Sending a message from a website; Replying to a message from the phone; Forwarding SMS messages to another phone number; Sending bulk SMS to a list of contacts
Tracking orders with SMSSending and receiving group chats; Sending SMS messages in a phone call; Monitoring a website; Chapter 7: Building a Reminder System; Introduction; Scheduling reminders via text; Getting notified when the time comes; Retrieving a list of upcoming reminders; Canceling an upcoming reminder; Adding another person to a reminder; Chapter 8: Building an IVR System; Introduction; Setting up IVRs; Screening and recording calls; Logging and reporting calls; Looking up HighriseHQ contacts on incoming calls; Getting directions; Leaving a message
Sending an SMS to your Salesforce.com contactsChapter 9: Building Your Own PBX; Introduction; Getting started with PBX; Setting up a subaccount for each user; Letting a user purchase a custom phone number; Allowing users to make calls from their call logs; Allowing incoming phone calls; Allowing outgoing phone calls; Deleting a subaccount; Chapter 10: Digging into OpenVBX; Introduction; Building a call log plugin; Building a searchable company directory; Collecting Stripe payments; Tracking orders; Building a caller ID routing plugin; Testing call flows
Chapter 11: Sending and Receiving Picture Messages
Record Nr. UNINA-9910819884403321
Stringer Roger  
Birmingham, England : , : Packt Publishing, , 2014
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Voice communication between humans and machines [[electronic resource] /] / David B. Roe and Jay G. Wilpon, editors
Voice communication between humans and machines [[electronic resource] /] / David B. Roe and Jay G. Wilpon, editors
Pubbl/distr/stampa Washington, D.C., : National Academy Press, 1994
Descrizione fisica viii, 548 p. : ill
Disciplina 006.4/54
Altri autori (Persone) RoeDavid B
WilponJay G
Soggetto topico Automatic speech recognition
Human-machine systems
Soggetto genere / forma Electronic books.
ISBN 1-280-19590-8
9786610195909
0-309-55625-2
0-585-00181-2
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Record Nr. UNINA-9910455934203321
Washington, D.C., : National Academy Press, 1994
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui