Vai al contenuto principale della pagina

Speech and Computer : 17th International Conference, SPECOM 2015, Athens, Greece, September 20-24, 2015, Proceedings / / edited by Andrey Ronzhin, Rodmonga Potapova, Nikos Fakotakis



(Visualizza in formato marc)    (Visualizza in BIBFRAME)

Titolo: Speech and Computer : 17th International Conference, SPECOM 2015, Athens, Greece, September 20-24, 2015, Proceedings / / edited by Andrey Ronzhin, Rodmonga Potapova, Nikos Fakotakis Visualizza cluster
Pubblicazione: Cham : , : Springer International Publishing : , : Imprint : Springer, , 2015
Edizione: 1st ed. 2015.
Descrizione fisica: 1 online resource (XVI, 506 p. 135 illus.)
Disciplina: 006.35
Soggetto topico: Artificial intelligence
Application software
Pattern recognition
Information storage and retrieval
Optical data processing
Database management
Artificial Intelligence
Information Systems Applications (incl. Internet)
Pattern Recognition
Information Storage and Retrieval
Image Processing and Computer Vision
Database Management
Persona (resp. second.): RonzhinA. L (Andreĭ Leonidovich)
PotapovaRodmonga
FakotakisNikos
Note generali: Bibliographic Level Mode of Issuance: Monograph
Nota di bibliografia: Includes bibliographical references and index.
Nota di contenuto: Intro -- Preface -- Organization -- About Athens -- Contents -- Invited Talks -- Multimodal Human-Robot Interaction from the Perspective of a Speech Scientist -- 1 Introduction -- 2 Major Differences Between HCI and HRI -- 3 Different Types of Robots and Resulting Implications for Interaction Schemes -- 4 Basic Interaction Schemes in Human-Robot-Communication -- 5 Conclusions -- References -- A Decade of Discriminative Language Modeling for Automatic Speech Recognition -- 1 Introduction -- 2 Features -- 2.1 Linguistic Features -- 2.2 Statistically Derived Features -- 2.3 Acoustic Features -- 3 Algorithms -- 4 Training Approaches -- 4.1 Supervised Training -- 4.2 Semi-supervised Training -- 4.3 Unsupervised Training -- 4.4 Summary of Experiments on Training Approaches -- 5 Conclusion -- References -- Conference Papers -- A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis -- 1 Introduction -- 2 The Kazakh Language -- 3 Speech Synthesis and Transcription for Kazakh -- 3.1 Dictionary and POS Tagging -- 3.2 Building Transcription Rules and Synthesizing Speech -- 4 Automatic Speech Recognition for Kazakh -- 4.1 The Speech Database -- 4.2 Acoustic Models -- 4.3 Experiments -- 5 Conclusions and Future Work -- References -- A Comparative Study of Speech Processing in Microphone Arrays with Multichannel Alignment and Zelinski Post-Filtering -- 1 Introduction -- 2 Experiments and Results -- 2.1 MA Directivity Patterns -- 2.2 Incoherent and Coherent Noise Reduction Level -- 2.3 Spectrograms of the Processed Speech Signal -- 2.4 Signal-to-Deviation Ratio -- 3 Conclusions -- References -- A Comparison of RNN LM and FLM for Russian Speech Recognition -- 1 Introduction -- 2 Related Works -- 3 Recurrent Neural Network Language Model Topology -- 4 Creation of Language Models for Russian ASR.
4.1 Creation of the Baseline Language Models -- 4.2 Creation of Recurrent Neural Network Language Models -- 5 Experiments -- 5.1 Experimental Setup -- 5.2 Experiments on Rescoring N-Best Lists Using FLM -- 5.3 Experiments on Rescoring N-Best Lists Using RNN LM -- 6 Conclusion -- References -- A Frequency Domain Adaptive Decorrelating Algorithm for Speech Enhancement -- 1 Introduction -- 2 Mixing Model -- 3 Proposed Frequency Domain (FD-SAD) Algorithm -- 4 Simulations, Results, and Analysis -- 4.1 System Mismatch (SM) Evaluation -- 4.2 Segmental SNR (SegSNR) Evaluation -- 5 Conclusion -- References -- Acoustic Markers of Emotional State ``Aggression'' -- 1 Introduction -- 2 Method and Procedure -- 3 Conclusion -- 3.1 Prospects of Investigation -- References -- Algorithms for Low Bit-Rate Coding with Adaptation to Statistical Characteristics of Speech Signal -- 1 Introduction -- 2 Related Works -- 2.1 Structural Scheme of the Hybrid MELP/CELP Coder -- 2.2 Experimental Study of the Developed Adaptive Hybrid MELP/CELP Coder -- 3 Conclusion -- References -- Analysing Human-Human Negotiations with the Aim to Develop a Dialogue System -- 1 Introduction -- 2 Empirical Material and Used Software -- 3 Analysis of Human-Human Dialogues: Argument-Based Negotiation -- 3.1 Arguments and Negotiation in Telemarketing Calls -- 3.2 Negotiation in Travel Dialogues -- 3.3 Arguments and Negotiation in Everyday Dialogues -- 4 Discussion -- 5 Conclusion -- References -- Analysis of Facial Motion Capture Data for Visual Speech Synthesis -- 1 Introduction -- 2 Speech Data and Collection -- 3 Methods -- 3.1 Interpretation of Speech Data by Animation Model -- 3.2 Approximation of Speech Data -- 4 Evaluation -- 4.1 Objective Evaluation -- 4.2 Verification by Animation Model -- 5 Conclusions -- References -- Auditory-Perceptual Recognition of the Emotional State of Aggression.
1 Introduction -- 2 Method, Procedure, and Results -- 3 Conclusions -- 4 Discussion -- 5 Prospects of Investigation -- References -- Automatic Classification and Prediction of Attitudes: Audio - Visual Analysis of Video Blogs -- 1 Introduction -- 2 Methodology -- 2.1 The Vlog Corpus -- 2.2 Attitude Annotation -- 2.3 Multimodal Feature Extraction -- 3 Attitude Classification Model -- 3.1 Feature Analysis -- 3.2 Prediction by Prosodic and Visual Features -- 4 Discussion -- 5 Conclusion -- References -- Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach -- 1 Introduction -- 2 Related Work -- 3 System Description -- 3.1 Training Data -- 3.2 Language Modeling -- 3.3 Acoustic Models -- 3.4 Test Data and Decoding -- 4 Results -- 4.1 Broadcast Conversation -- 4.2 Decoding with Advanced Language Models -- 5 Conclusions and Future Work -- References -- Automatic Estimation of Web Bloggers' Age Using Regression Models -- 1 Introduction -- 2 Background Work -- 3 Proposed Age Estimation of Web Bloggers Using Regression Models -- 4 Experimental Setup and Results -- 5 Conclusion -- References -- Automatic Preprocessing Technique for Detection of Corrupted Speech Signal Fragments for the Purpose of Speaker Recognition -- 1 Introduction -- 2 Preprocessing Technique -- 3 Preprocessing Technique -- 3.1 Click Detector -- 3.2 Overloading Detector -- 3.3 Clipping Detector -- 3.4 Tones Detector -- 3.5 Music Detector -- 3.6 Voice Activity Detector -- 4 Experimental and Results -- 5 Conclusions -- References -- Automatic Sound Recognition of Urban Environment Events -- 1 Introduction -- 2 System Description -- 3 Experimental Setup -- 3.1 Audio Data Description -- 3.2 Feature Extraction -- 3.3 Classification -- 4 Experimental Results -- 5 Conclusions -- References.
Automatically Trained TTS for Effective Attacks to Anti-spoofing System -- 1 Introduction -- 2 Anti-spoofing System -- 3 Spoofing Attack Modelling -- 4 Experiments -- 4.1 TTS Training Database -- 4.2 Evaluation Results -- 5 Conclusion -- References -- EmoChildRu: Emotional Child Russian Speech Corpus -- 1 Introduction -- 2 Emotional Child Russian Speech Corpus - EmoChildRu -- 2.1 Data Collection -- 2.2 Corpus and Software Structure -- 3 Data Analysis -- 4 Experimental Results -- 4.1 Human Recognition of Emotional States -- 4.2 Automatic Classification of Emotional States -- 5 Discussion -- 6 Conclusions -- References -- Cognitive Mechanism of Semantic Content Decoding of Spoken Discourse in Noise -- 1 Introduction -- 2 Method and Experiment -- 3 Discussion -- 3.1 MRA Text Assessment Method -- 4 Conclusion -- References -- Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System -- 1 Introduction -- 2 System Overview -- 2.1 The Lexical Classifier -- 2.2 The Prosodic Classifier -- 2.3 The Combined Model -- 2.4 Second Pass for Question Mark Detection -- 3 Experimental Setup -- 3.1 The Datasets -- 3.2 ASR Setup -- 4 Results and Discussion -- 5 Conclusions and Future Research -- References -- Construction of a Modern Greek Grammar Checker Through Mnemosyne Formalism -- 1 Introduction -- 2 Particularities of Modern Greek Language -- 3 Lexical Ambiguity in Modern Greek -- 4 Features of the Grammar Checker -- 5 Implementation of Software -- 6 ``Kanon'' Formalism -- 7 Evaluation -- References -- Contribution to the Design of an Expressive Speech Synthesis System for the Arabic Language -- 1 Introduction -- 2 System Description -- 2.1 Orthographic-to-Phonetic Transcription -- 2.2 Diphone Database -- 2.3 Diphone Concatenation -- 2.4 Voice Transformation -- 3 Experiments and Results -- 4 Conclusion and Future Works.
References -- Deep Neural Network Based Continuous Speech Recognition for Serbian Using the Kaldi Toolkit -- 1 Introduction -- 2 GMM-HMM Recipe -- 3 DNN Recipe -- 4 Data Preparation -- 5 Experimental Results -- 6 Conclusion -- References -- DNN-Based Speech Synthesis: Importance of Input Features and Training Data -- 1 Introduction -- 2 Framework -- 2.1 Database and Input/Output Features -- 2.2 DNN Setup -- 2.3 HMM Setup -- 2.4 Synthesis -- 3 Results -- 3.1 Objective Evaluation -- 3.2 Subjective Evaluation -- 4 Conclusions -- References -- Emotion State Manifestation in Voice Features: Chimpanzees, Human Infants, Children, Adults -- 1 Introduction -- 2 Method -- 3 Results -- 3.1 Experiment 1 -- 3.2 Experiment 2 -- 3.3 Experiment 3 -- 4 Conclusion and Discussion -- References -- Estimation of Vowel Spectra Near Vocal Chords with Restoration of a Clipped Speech Signal -- 1 Problem Statement -- 2 Output Signals -- 3 Input Signals -- 4 Transfer Function of the Vocal Tract -- 5 Perception Tests -- 6 Algorithm for Restoration of Clipped Signals -- 7 Conclusion -- References -- Fast Algorithm for Precise Estimation of Fundamental Frequency on Short Time Intervals -- 1 Problem Statement -- 2 Model and Cost Function -- 3 Minimum of the Cost Function -- 4 The Basic Algebraic Transformations -- 5 Unbiased Criterion -- 6 Example -- 7 Evaluation of the Algorithm -- 8 Conclusion -- References -- Gender Classification of Web Authors Using Feature Selection and Language Models -- 1 Introduction -- 2 Proposed Gender Identification Methodology -- 2.1 Feature Extraction -- 2.2 Feature Selection -- 2.3 Classification -- 3 Experimental Setup and Evaluation -- 4 Conclusion -- References -- Improving Acoustic Models for Russian Spontaneous Speech Recognition -- 1 Introduction -- 2 Applying the SWB Recipe to Russian Data -- 3 Lowering Sensitivity to Acoustic Variability.
4 Speaker-Dependent Bottleneck Features.
Sommario/riassunto: This book constitutes the refereed proceedings of the 17th International Conference on Speech and Computer, SPECOM 2015, held in Athens, Greece, in September 2015. The 59 revised full papers presented together with 2 invited talks were carefully reviewed and selected from 104 initial submissions. The papers cover a wide range of topics in the area of computer speech processing such as recognition, synthesis, and understanding and related domains including signal processing, language and text processing, multi-modal speech processing or human-computer interaction.
Titolo autorizzato: Speech and Computer  Visualizza cluster
ISBN: 3-319-23132-4
Formato: Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione: Inglese
Record Nr.: 9910485021203321
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Serie: Lecture Notes in Artificial Intelligence ; ; 9319