LEADER 05527nam  22007335  450 
001 9910720086103321
005 20230509085529.0
010   $a981-9924-01-4
024 7 $a10.1007/978-981-99-2401-1
035   $a(CKB)5720000000183818
035   $a(MiAaPQ)EBC7248740
035   $a(Au-PeEL)EBL7248740
035   $a(DE-He213)978-981-99-2401-1
035   $a(BIP)090181598
035   $a(PPN)270613749
035   $a(EXLCZ)995720000000183818
100   $a20230509d2023      u|         0
101 0 $aeng
135   $aurcnu||||||||
181   $ctxt$2rdacontent
182   $cc$2rdamedia
183   $acr$2rdacarrier
200 10$aMan-Machine Speech Communication $e17th National Conference, NCMMSC 2022, Hefei, China, December 15?18, 2022, Proceedings /$fedited by Ling Zhenhua, Gao Jianqing, Yu Kai, Jia Jia
205   $a1st ed. 2023.
210  1$aSingapore :$cSpringer Nature Singapore :$cImprint: Springer,$d2023.
215   $a1 online resource (342 pages)
225 1 $aCommunications in Computer and Information Science,$x1865-0937 ;$v1765
311   $a981-9924-00-6 
320   $aIncludes bibliographical references and index.
327   $aMCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation -- Baby Cry Recognition Based on Acoustic Segment Model -- A Multi-feature Sets Fusion Strategy with Similar Samples Removal for Snore Sound Classification -- Multi-Hypergraph Neural Networks for Emotion Recognition in Multi-Party Conversations -- Using Emoji as an Emotion Modality in Text-Based Depression Detection -- Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis -- Semantic enhancement framework for robust speech recognition -- Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model -- Predictive AutoEncoders are Context-Aware Unsupervised Anomalous Sound Detectors -- A pipelined framework with serialized output training for overlapping speech recognition -- Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification -- Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement -- Multiple Confidence Gates for Joint Training of SE and ASR -- Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion -- Pre-training Techniques For Improving Text-to-Speech Synthesis By Automatic Speech Recognition Based Data Enhancement -- A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition -- Interplay between prosody and syntax-semantics: Evidence from the prosodic features of Mandarin tag questions -- Improving Fine-grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis -- Violence Detection through Fusing Visual Information to Auditory Scene -- Mongolian Text-to-Speech Challenge under Low-Resource Scenario for NCMMSC2022 -- VC-AUG Voice Conversion based Data Augmentation for Text-Dependent Speaker Veri?cation -- Transformer-based potential emotional relation mining network for emotion recognition in conversation -- FastFoley Non-Autoregressive Foley Sound Generation Based On Visual Semantics -- Structured Hierarchical Dialogue Policy with Graph Neural Networks -- Deep Reinforcement Learning for On-line Dialogue State Tracking -- Dual Learning for Dialogue State Tracking -- Automatic Stress Annotation and Prediction For Expressive Mandarin TTS -- MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.
330   $aThis book constitutes the refereed proceedings of the 17th National Conference on Man?Machine Speech Communication, NCMMSC 2022, held in China, in December 2022. The 21 full papers and 7 short papers included in this book were carefully reviewed and selected from 108 submissions. They were organized in topical sections as follows: MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation.- Baby Cry Recognition Based on Acoustic Segment Model, MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.
410  0$aCommunications in Computer and Information Science,$x1865-0937 ;$v1765
606   $aComputer vision
606   $aNatural language processing (Computer science)
606   $aSignal processing
606   $aArtificial intelligence
606   $aUser interfaces (Computer systems)
606   $aHuman-computer interaction
606   $aComputer Vision
606   $aNatural Language Processing (NLP)
606   $aSignal, Speech and Image Processing 
606   $aArtificial Intelligence
606   $aUser Interfaces and Human Computer Interaction
610   $aScience
615  0$aComputer vision.
615  0$aNatural language processing (Computer science).
615  0$aSignal processing.
615  0$aArtificial intelligence.
615  0$aUser interfaces (Computer systems).
615  0$aHuman-computer interaction.
615 14$aComputer Vision.
615 24$aNatural Language Processing (NLP).
615 24$aSignal, Speech and Image Processing .
615 24$aArtificial Intelligence.
615 24$aUser Interfaces and Human Computer Interaction.
676   $a006.4
702   $aZhenhua$b Ling
801  0$bMiAaPQ
801  1$bMiAaPQ
801  2$bMiAaPQ
906   $aBOOK
912   $a9910720086103321
996   $aMan-Machine Speech Communication$93389332
997   $aUNINA