12604nam 22007815 450 991088109220332120251215101525.09783031683237(electronic bk.)978303168322010.1007/978-3-031-68323-7(MiAaPQ)EBC31606744(Au-PeEL)EBL31606744(CKB)34075067800041(DE-He213)978-3-031-68323-7(EXLCZ)993407506780004120240817d2024 u| 0engurcnu||||||||txtrdacontentcrdamediacrrdacarrierBig Data Analytics and Knowledge Discovery 26th International Conference, DaWaK 2024, Naples, Italy, August 26–28, 2024, Proceedings /edited by Robert Wrembel, Silvia Chiusano, Gabriele Kotsis, A Min Tjoa, Ismail Khalil1st ed. 2024.Cham :Springer Nature Switzerland :Imprint: Springer,2024.1 online resource (409 pages)Lecture Notes in Computer Science,1611-3349 ;14912Print version: Wrembel, Robert Big Data Analytics and Knowledge Discovery Cham : Springer,c2024 9783031683220 Includes bibliographical references and index.Intro -- Preface -- Organization -- Abstracts of Keynote Talks -- Multimodal Deep Learning in Medical Imaging -- Digital Humanism as an Enabler for a Holistic Socio-Technical Approach to the Latest Developments in Computer Science and Artificial Intelligence -- Deep Entity Processing in the Era of Large Language Models: Challenges and Opportunities -- Contents -- Modeling and Design -- LiteSelect: A Lightweight Adaptive Learning Algorithm for Online Index Selection -- 1 Introduction -- 2 The Online Index Selection Problem -- 3 LiteSelect: An Lightweight Online Index Tuner -- 3.1 Algorithm LiteSelect -- 3.2 Fine Tuning LiteSelect -- 4 Experimental Evaluation -- 4.1 Experimental Setup -- 4.2 Parameter Impact Analysis -- 4.3 Index Tuning Performance Comparison -- 5 Related Work -- 6 Conclusion -- References -- IDAGEmb: An Incremental Data Alignment Based on Graph Embedding -- 1 Introduction -- 2 Background -- 2.1 Existing Data Alignment Approaches -- 2.2 Graph Embedding in Representation Learning -- 2.3 Discussion -- 3 Methodology -- 3.1 Research Design -- 3.2 Preliminaries -- 3.3 Adopted Algorithm for IDAGEmb -- 4 Experiments and Results -- 4.1 Experiment Configuration -- 4.2 Experiment #1: Embedding Method Selection -- 4.3 Experiment #2: Comparison with Static Methods (effectiveness and Efficiency) -- 4.4 Experiment #3: Model Sensitivity to Data Order Variation -- 5 Conclusion and Outlook -- References -- Learning Paradigms and Modelling Methodologies for Digital Twins in Process Industry -- 1 Introduction and Motivation -- 1.1 Research Questions (RQs) -- 1.2 Structure of Review -- 2 Literature Search Strategy -- 2.1 Quality Assessment Checks -- 2.2 Selection of Primary Studies -- 2.3 Data Synthesis and Analysis Approach -- 3 Reporting the Review -- 3.1 Overview of All Studies -- 3.2 Overview of All Primary Studies.4 Evaluating the Research Questions -- 5 Discussion and Conclusion -- References -- Entity Matching and Similarity -- MultiMatch: Low-Resource Generalized Entity Matching Using Task-Conditioned Hyperadapters in Multitask Learning -- 1 Introduction -- 2 Background -- 2.1 Problem Formulation -- 2.2 Entity Matching with Single-task Objective Models -- 2.3 Fully Fine-tuning Methods -- 2.4 Parameter-Efficient Fine-tuning Methods -- 2.5 Entity Matching with Parameter-Efficient Multi-task Models -- 3 MultiMatch Training -- 4 Experiments -- 5 Analysis -- 5.1 Single Versus Multiple Objective Models -- 5.2 Task Ablation Experiments -- 6 Conclusions and Future Work -- References -- Embedding-Based Data Matching for Disparate Data Sources -- 1 Context and Main Issues -- 2 Proposed Framework -- 2.1 Problem Statement -- 2.2 Overview -- 3 Experiments -- 3.1 RQ1. Effectiveness and Stability -- 3.2 RQ2. Ablation -- 4 Conclusion -- References -- Subtree Similarity Search Based on Structure and Text -- 1 Introduction -- 2 Problem Definition -- 3 Related Works -- 3.1 Tree Edit Distance -- 3.2 Lower Bounds of Tree Edit Distance -- 3.3 Upper Bounds of Tree Edit Distance -- 3.4 Subtree Similarity Search -- 3.5 Other Related Problems -- 4 Preliminaries -- 5 Proposed Method -- 6 Experiments -- 6.1 Dataset -- 6.2 Methods -- 6.3 Effect of the Recall -- 6.4 Effect of the Document Size -- 6.5 Effect of the Query Size -- 6.6 Accuracy -- 7 Conclusion -- References -- Classification -- Towards Hybrid Embedded Feature Selection and Classification Approach with Slim-TSF -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 4 Experimental Evaluations -- 4.1 Data Collection -- 4.2 Experimental Settings -- 4.3 Bootstrapping -- 4.4 Remarks -- 5 Conclusions -- References -- Evaluation of High Sparsity Strategies for Efficient Binary Classification -- 1 Introduction -- 2 Related Work.3 Materials and Methods -- 4 Results and Discussion -- 5 Conclusions and Future Work -- References -- Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 An Incremental Synthetic Data Generation System -- 4 Experiments -- 4.1 Datasets and Experiments Setup -- 4.2 Statistical Analysis -- 4.3 Performance Evaluation on Classifiers -- 5 Conclusions -- References -- Exploring Evaluation Metrics for Binary Classification in Data Analysis: the Worthiness Benchmark Concept -- 1 Introduction and Related Research -- 2 Methodology -- 3 Discussion and Conclusion -- References -- Machine Learning Methods and Applications -- Exploring Causal Chain Identification: Comprehensive Insights from Text and Knowledge Graphs -- 1 Introduction -- 2 Related Work -- 3 Methodology -- 3.1 In-Chain Domain Knowledge -- 3.2 CK-CEVAE -- 3.3 Chained Prediction Unit -- 4 Experiments -- 4.1 Chains Acquisition -- 4.2 Domain Detection Model -- 4.3 Models Configurations -- 4.4 Overall Analysis -- 4.5 Ablation Study -- 5 Case Study: Understanding Semantic Continuity in Knowledge Graphs -- 6 Discussion -- 7 Conclusion -- References -- Towards Regional Explanations with Validity Domains for Local Explanations -- 1 Introduction -- 2 Related Work -- 2.1 Explanation Methods -- 2.2 Explanation Evaluation Metrics -- 2.3 Validity Domain of Models -- 3 Toy Example -- 4 Our Proposal -- 4.1 Validity Domain -- 4.2 Model Summary -- 4.3 Evaluation Metrics -- 5 Experiments -- 5.1 Protocol -- 5.2 Evaluation of Methods -- 5.3 Model Summary -- 5.4 Sensitivity Analysis -- 6 Discussion and Limits -- 7 Conclusion and Perspectives -- References -- Analyzing a Decade of Evolution: Trends in Natural Language Processing -- 1 Introduction -- 2 Methodology -- 2.1 PDF Parsing -- 3 Results -- 4 Conclusion.5 Limitations -- References -- Improving Serendipity for Collaborative Metric Learning Based on Mutual Proximity -- 1 Introduction -- 2 Background -- 2.1 Serendipity -- 2.2 Collaborative Metric Learning (CML) -- 2.3 Mutual Proximity (MP) -- 2.4 Advantages and Originality of the Proposed Method -- 3 Methodology -- 3.1 Learning Embeddings -- 3.2 Searching Embedding Space and Recommending Items -- 4 Experiments -- 4.1 Datasets -- 4.2 Metrics -- 4.3 Results -- 5 Conclusions and Discussion -- References -- Ada2vec: Adaptive Representation Learning for Large-Scale Dynamic Heterogeneous Networks -- 1 Introduction -- 2 Related Work -- 3 Problem Definition -- 4 The Ada2vec Framework -- 4.1 Part 1 Dynamic -- 4.2 Part 2 Heterogeneity -- 4.3 Part 3 Change -- 5 Experimental Evaluations -- 5.1 Data -- 5.2 Benchmarks -- 5.3 Classification -- 5.4 Clustering -- 5.5 Performance Analysis -- 6 Conclusion and Future Work -- References -- Differentially-Private Neural Network Training with Private Features and Public Labels -- 1 Introduction -- 2 Background -- 2.1 Differential Privacy -- 2.2 DP-SGD -- 3 Related Work -- 4 Proposed Approach -- 4.1 Sanitization Layer -- 4.2 Bounding Sensitivity and Adding Noise -- 4.3 Design Choices and Tradeoffs -- 5 Experimental Evaluation -- 5.1 Experimental Settings -- 5.2 Results -- 6 Conclusion -- References -- Time Series -- Series2Graph++: Distributed Detection of Correlation Anomalies in Multivariate Time Series -- 1 Introduction -- 2 Related Work -- 3 Series2Graph++ -- 4 Experiments -- 5 Conclusion -- References -- Anomaly Detection from Time Series Under Uncertainty -- 1 Introduction -- 2 Related Work -- 3 Proposed Approach -- 4 Experiments -- 4.1 Uncertainty Quantification Evaluation -- 4.2 Model Performance -- 5 Conclusion -- References -- Comparison of Measures for Characterizing the Difficulty of Time Series Classification.1 Introduction -- 2 Methodology -- 2.1 Data and Models -- 2.2 Complexity Measures -- 3 Analysis -- 3.1 Correlation Analysis -- 3.2 Relationships Between the Complexity Measures -- 4 Conclusion -- References -- Dynamic Time Warping for Phase Recognition in Tribological Sensor Data -- 1 Introduction -- 2 Related Work -- 3 Method -- 3.1 Dynamic Time Warping (DTW) -- 3.2 Tribological Use Case -- 3.3 Experiments -- 4 Results -- 4.1 Classification of the Whole Wear Phases -- 4.2 Partial Classification of the Wear Phases -- 5 Conclusion -- References -- Data Repositories -- Putting Co-Design-Supporting Data Lakes to the Test: An Evaluation on AEC Case Studies -- 1 Motivation: Data Management in AEC -- 2 ArchIBALD Architecture Development and Definition -- 2.1 Requirement Analysis -- 2.2 Design of the ArchIBALD Architecture -- 3 Scenario-Based Case Studies: Context and Overview -- 3.1 The livMatS Biomimetic Shell -- 3.2 Co-Design of Robotic Prefabrication -- 3.3 Co-Design of End-Effectors for On-Site Assembly -- 3.4 Co-Design of On-Site Planning and Execution -- 4 Evaluation -- 4.1 Case Study 1: Co-Design of Robotic Prefabrication -- 4.2 Case Study 2: Co-Design of End-Effectors -- 4.3 Case Study 3: Co-Design of On-Site Planning and Execution -- 5 Conclusion -- References -- Creating and Querying Data Cubes in Python Using PyCube -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Use Case -- 4.1 Initializing PyCube -- 4.2 Analyzing the Data in the View -- 5 Populating the View -- 5.1 Generating the SQL Query -- 5.2 Converting Result Sets to Dataframes -- 6 Experiments -- 6.1 Experimental Setup -- 6.2 Data Retrieval Speeds -- 6.3 Memory Usage -- 6.4 Code Comparison -- 7 Conclusion and Future Work -- References -- An E-Commerce Benchmark for Evaluating Performance Trade-Offs in Document Stores -- 1 Introduction -- 2 Benchmark Design.2.1 E-Commerce Application.This book constitutes the proceedings of the 26th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2024, which too place in Naples, Italy, during August 26-28, 2024. The 16 full and 20 short papers included in this book were carefully reviewed and selected from 83 submissions. They were organized in topical sections as follows: Modeling and design; entity matching and similarity; classification; machine learning methods and applications; time series; data repositories;optimization; and data quality and applications. .Lecture Notes in Computer Science,1611-3349 ;14912StatisticsData miningInformation technologyManagementArtificial intelligenceStatisticsData Mining and Knowledge DiscoveryComputer Application in Administrative Data ProcessingArtificial IntelligenceDades massivesthubMineria de dadesthubIntel·ligència artificialthubCongressosthubLlibres electrònicsthubStatistics.Data mining.Information technologyManagement.Artificial intelligence.Statistics.Data Mining and Knowledge Discovery.Computer Application in Administrative Data Processing.Artificial Intelligence.Dades massivesMineria de dadesIntel·ligència artificial005.7Wrembel RobertMiAaPQMiAaPQMiAaPQ9910881092203321Big data analytics and knowledge discovery2201431UNINA