Vai al contenuto principale della pagina
| Titolo: |
Big Data Analytics and Knowledge Discovery : 26th International Conference, DaWaK 2024, Naples, Italy, August 26–28, 2024, Proceedings / / edited by Robert Wrembel, Silvia Chiusano, Gabriele Kotsis, A Min Tjoa, Ismail Khalil
|
| Pubblicazione: | Cham : , : Springer Nature Switzerland : , : Imprint : Springer, , 2024 |
| Edizione: | 1st ed. 2024. |
| Descrizione fisica: | 1 online resource (409 pages) |
| Disciplina: | 005.7 |
| Soggetto topico: | Statistics |
| Data mining | |
| Information technology - Management | |
| Artificial intelligence | |
| Data Mining and Knowledge Discovery | |
| Computer Application in Administrative Data Processing | |
| Artificial Intelligence | |
| Dades massives | |
| Mineria de dades | |
| Intel·ligència artificial | |
| Soggetto genere / forma: | Congressos |
| Llibres electrònics | |
| Persona (resp. second.): | WrembelRobert |
| Nota di bibliografia: | Includes bibliographical references and index. |
| Nota di contenuto: | Intro -- Preface -- Organization -- Abstracts of Keynote Talks -- Multimodal Deep Learning in Medical Imaging -- Digital Humanism as an Enabler for a Holistic Socio-Technical Approach to the Latest Developments in Computer Science and Artificial Intelligence -- Deep Entity Processing in the Era of Large Language Models: Challenges and Opportunities -- Contents -- Modeling and Design -- LiteSelect: A Lightweight Adaptive Learning Algorithm for Online Index Selection -- 1 Introduction -- 2 The Online Index Selection Problem -- 3 LiteSelect: An Lightweight Online Index Tuner -- 3.1 Algorithm LiteSelect -- 3.2 Fine Tuning LiteSelect -- 4 Experimental Evaluation -- 4.1 Experimental Setup -- 4.2 Parameter Impact Analysis -- 4.3 Index Tuning Performance Comparison -- 5 Related Work -- 6 Conclusion -- References -- IDAGEmb: An Incremental Data Alignment Based on Graph Embedding -- 1 Introduction -- 2 Background -- 2.1 Existing Data Alignment Approaches -- 2.2 Graph Embedding in Representation Learning -- 2.3 Discussion -- 3 Methodology -- 3.1 Research Design -- 3.2 Preliminaries -- 3.3 Adopted Algorithm for IDAGEmb -- 4 Experiments and Results -- 4.1 Experiment Configuration -- 4.2 Experiment #1: Embedding Method Selection -- 4.3 Experiment #2: Comparison with Static Methods (effectiveness and Efficiency) -- 4.4 Experiment #3: Model Sensitivity to Data Order Variation -- 5 Conclusion and Outlook -- References -- Learning Paradigms and Modelling Methodologies for Digital Twins in Process Industry -- 1 Introduction and Motivation -- 1.1 Research Questions (RQs) -- 1.2 Structure of Review -- 2 Literature Search Strategy -- 2.1 Quality Assessment Checks -- 2.2 Selection of Primary Studies -- 2.3 Data Synthesis and Analysis Approach -- 3 Reporting the Review -- 3.1 Overview of All Studies -- 3.2 Overview of All Primary Studies. |
| 4 Evaluating the Research Questions -- 5 Discussion and Conclusion -- References -- Entity Matching and Similarity -- MultiMatch: Low-Resource Generalized Entity Matching Using Task-Conditioned Hyperadapters in Multitask Learning -- 1 Introduction -- 2 Background -- 2.1 Problem Formulation -- 2.2 Entity Matching with Single-task Objective Models -- 2.3 Fully Fine-tuning Methods -- 2.4 Parameter-Efficient Fine-tuning Methods -- 2.5 Entity Matching with Parameter-Efficient Multi-task Models -- 3 MultiMatch Training -- 4 Experiments -- 5 Analysis -- 5.1 Single Versus Multiple Objective Models -- 5.2 Task Ablation Experiments -- 6 Conclusions and Future Work -- References -- Embedding-Based Data Matching for Disparate Data Sources -- 1 Context and Main Issues -- 2 Proposed Framework -- 2.1 Problem Statement -- 2.2 Overview -- 3 Experiments -- 3.1 RQ1. Effectiveness and Stability -- 3.2 RQ2. Ablation -- 4 Conclusion -- References -- Subtree Similarity Search Based on Structure and Text -- 1 Introduction -- 2 Problem Definition -- 3 Related Works -- 3.1 Tree Edit Distance -- 3.2 Lower Bounds of Tree Edit Distance -- 3.3 Upper Bounds of Tree Edit Distance -- 3.4 Subtree Similarity Search -- 3.5 Other Related Problems -- 4 Preliminaries -- 5 Proposed Method -- 6 Experiments -- 6.1 Dataset -- 6.2 Methods -- 6.3 Effect of the Recall -- 6.4 Effect of the Document Size -- 6.5 Effect of the Query Size -- 6.6 Accuracy -- 7 Conclusion -- References -- Classification -- Towards Hybrid Embedded Feature Selection and Classification Approach with Slim-TSF -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 4 Experimental Evaluations -- 4.1 Data Collection -- 4.2 Experimental Settings -- 4.3 Bootstrapping -- 4.4 Remarks -- 5 Conclusions -- References -- Evaluation of High Sparsity Strategies for Efficient Binary Classification -- 1 Introduction -- 2 Related Work. | |
| 3 Materials and Methods -- 4 Results and Discussion -- 5 Conclusions and Future Work -- References -- Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 An Incremental Synthetic Data Generation System -- 4 Experiments -- 4.1 Datasets and Experiments Setup -- 4.2 Statistical Analysis -- 4.3 Performance Evaluation on Classifiers -- 5 Conclusions -- References -- Exploring Evaluation Metrics for Binary Classification in Data Analysis: the Worthiness Benchmark Concept -- 1 Introduction and Related Research -- 2 Methodology -- 3 Discussion and Conclusion -- References -- Machine Learning Methods and Applications -- Exploring Causal Chain Identification: Comprehensive Insights from Text and Knowledge Graphs -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 In-Chain Domain Knowledge -- 3.2 CK-CEVAE -- 3.3 Chained Prediction Unit -- 4 Experiments -- 4.1 Chains Acquisition -- 4.2 Domain Detection Model -- 4.3 Models Configurations -- 4.4 Overall Analysis -- 4.5 Ablation Study -- 5 Case Study: Understanding Semantic Continuity in Knowledge Graphs -- 6 Discussion -- 7 Conclusion -- References -- Towards Regional Explanations with Validity Domains for Local Explanations -- 1 Introduction -- 2 Related Work -- 2.1 Explanation Methods -- 2.2 Explanation Evaluation Metrics -- 2.3 Validity Domain of Models -- 3 Toy Example -- 4 Our Proposal -- 4.1 Validity Domain -- 4.2 Model Summary -- 4.3 Evaluation Metrics -- 5 Experiments -- 5.1 Protocol -- 5.2 Evaluation of Methods -- 5.3 Model Summary -- 5.4 Sensitivity Analysis -- 6 Discussion and Limits -- 7 Conclusion and Perspectives -- References -- Analyzing a Decade of Evolution: Trends in Natural Language Processing -- 1 Introduction -- 2 Methodology -- 2.1 PDF Parsing -- 3 Results -- 4 Conclusion. | |
| 5 Limitations -- References -- Improving Serendipity for Collaborative Metric Learning Based on Mutual Proximity -- 1 Introduction -- 2 Background -- 2.1 Serendipity -- 2.2 Collaborative Metric Learning (CML) -- 2.3 Mutual Proximity (MP) -- 2.4 Advantages and Originality of the Proposed Method -- 3 Methodology -- 3.1 Learning Embeddings -- 3.2 Searching Embedding Space and Recommending Items -- 4 Experiments -- 4.1 Datasets -- 4.2 Metrics -- 4.3 Results -- 5 Conclusions and Discussion -- References -- Ada2vec: Adaptive Representation Learning for Large-Scale Dynamic Heterogeneous Networks -- 1 Introduction -- 2 Related Work -- 3 Problem Definition -- 4 The Ada2vec Framework -- 4.1 Part 1 Dynamic -- 4.2 Part 2 Heterogeneity -- 4.3 Part 3 Change -- 5 Experimental Evaluations -- 5.1 Data -- 5.2 Benchmarks -- 5.3 Classification -- 5.4 Clustering -- 5.5 Performance Analysis -- 6 Conclusion and Future Work -- References -- Differentially-Private Neural Network Training with Private Features and Public Labels -- 1 Introduction -- 2 Background -- 2.1 Differential Privacy -- 2.2 DP-SGD -- 3 Related Work -- 4 Proposed Approach -- 4.1 Sanitization Layer -- 4.2 Bounding Sensitivity and Adding Noise -- 4.3 Design Choices and Tradeoffs -- 5 Experimental Evaluation -- 5.1 Experimental Settings -- 5.2 Results -- 6 Conclusion -- References -- Time Series -- Series2Graph++: Distributed Detection of Correlation Anomalies in Multivariate Time Series -- 1 Introduction -- 2 Related Work -- 3 Series2Graph++ -- 4 Experiments -- 5 Conclusion -- References -- Anomaly Detection from Time Series Under Uncertainty -- 1 Introduction -- 2 Related Work -- 3 Proposed Approach -- 4 Experiments -- 4.1 Uncertainty Quantification Evaluation -- 4.2 Model Performance -- 5 Conclusion -- References -- Comparison of Measures for Characterizing the Difficulty of Time Series Classification. | |
| 1 Introduction -- 2 Methodology -- 2.1 Data and Models -- 2.2 Complexity Measures -- 3 Analysis -- 3.1 Correlation Analysis -- 3.2 Relationships Between the Complexity Measures -- 4 Conclusion -- References -- Dynamic Time Warping for Phase Recognition in Tribological Sensor Data -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Dynamic Time Warping (DTW) -- 3.2 Tribological Use Case -- 3.3 Experiments -- 4 Results -- 4.1 Classification of the Whole Wear Phases -- 4.2 Partial Classification of the Wear Phases -- 5 Conclusion -- References -- Data Repositories -- Putting Co-Design-Supporting Data Lakes to the Test: An Evaluation on AEC Case Studies -- 1 Motivation: Data Management in AEC -- 2 ArchIBALD Architecture Development and Definition -- 2.1 Requirement Analysis -- 2.2 Design of the ArchIBALD Architecture -- 3 Scenario-Based Case Studies: Context and Overview -- 3.1 The livMatS Biomimetic Shell -- 3.2 Co-Design of Robotic Prefabrication -- 3.3 Co-Design of End-Effectors for On-Site Assembly -- 3.4 Co-Design of On-Site Planning and Execution -- 4 Evaluation -- 4.1 Case Study 1: Co-Design of Robotic Prefabrication -- 4.2 Case Study 2: Co-Design of End-Effectors -- 4.3 Case Study 3: Co-Design of On-Site Planning and Execution -- 5 Conclusion -- References -- Creating and Querying Data Cubes in Python Using PyCube -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Use Case -- 4.1 Initializing PyCube -- 4.2 Analyzing the Data in the View -- 5 Populating the View -- 5.1 Generating the SQL Query -- 5.2 Converting Result Sets to Dataframes -- 6 Experiments -- 6.1 Experimental Setup -- 6.2 Data Retrieval Speeds -- 6.3 Memory Usage -- 6.4 Code Comparison -- 7 Conclusion and Future Work -- References -- An E-Commerce Benchmark for Evaluating Performance Trade-Offs in Document Stores -- 1 Introduction -- 2 Benchmark Design. | |
| 2.1 E-Commerce Application. | |
| Sommario/riassunto: | This book constitutes the proceedings of the 26th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2024, which too place in Naples, Italy, during August 26-28, 2024. The 16 full and 20 short papers included in this book were carefully reviewed and selected from 83 submissions. They were organized in topical sections as follows: Modeling and design; entity matching and similarity; classification; machine learning methods and applications; time series; data repositories;optimization; and data quality and applications. . |
| Titolo autorizzato: | Big data analytics and knowledge discovery ![]() |
| ISBN: | 9783031683237 |
| 9783031683220 | |
| Formato: | Materiale a stampa |
| Livello bibliografico | Monografia |
| Lingua di pubblicazione: | Inglese |
| Record Nr.: | 9910881092203321 |
| Lo trovi qui: | Univ. Federico II |
| Opac: | Controlla la disponibilità qui |