LEADER 05568nam 22006974a 450 001 996211655103316 005 20230617031032.0 010 $a1-280-36625-7 010 $a9786610366255 010 $a0-470-30781-1 010 $a0-471-45864-3 010 $a0-471-44835-4 035 $a(CKB)1000000000018977 035 $a(EBL)159847 035 $a(OCoLC)123112222 035 $a(SSID)ssj0000295984 035 $a(PQKBManifestationID)11250991 035 $a(PQKBTitleCode)TC0000295984 035 $a(PQKBWorkID)10322357 035 $a(PQKB)10628633 035 $a(MiAaPQ)EBC159847 035 $a(EXLCZ)991000000000018977 100 $a20021105d2003 uy 0 101 0 $aeng 135 $aur|n|---||||| 181 $ctxt 182 $cc 183 $acr 200 10$aExploratory data mining and data cleaning$b[electronic resource] /$fTamraparni Dasu, Theorodre Johnson 210 $aNew York $cWiley-Interscience$d2003 215 $a1 online resource (226 p.) 225 1 $aWiley series in probability and statistics 300 $aDescription based upon print version of record. 311 $a0-471-26851-8 320 $aIncludes bibliographical references (p. 189-195) and index. 327 $aExploratory Data Mining and Data Cleaning; Contents; Preface; 1. Exploratory Data Mining and Data Cleaning: An Overview; 1.1 Introduction; 1.2 Cautionary Tales; 1.3 Taming the Data; 1.4 Challenges; 1.5 Methods; 1.6 EDM; 1.6.1 EDM Summaries-Parametric; 1.6.2 EDM Summaries-Nonparametric; 1.7 End-to-End Data Quality (DQ); 1.7.1 DQ in Data Preparation; 1.7.2 EDM and Data Glitches; 1.7.3 Tools for DQ; 1.7.4 End-to-End DQ: The Data Quality Continuum; 1.7.5 Measuring Data Quality; 1.8 Conclusion; 2. Exploratory Data Mining; 2.1 Introduction; 2.2 Uncertainty; 2.2.1 Annotated Bibliography 327 $a2.3 EDM: Exploratory Data Mining2.4 EDM Summaries; 2.4.1 Typical Values; 2.4.2 Attribute Variation; 2.4.3 Example; 2.4.4 Attribute Relationships; 2.4.5 Annotated Bibliography; 2.5 What Makes a Summary Useful?; 2.5.1 Statistical Properties; 2.5.2 Computational Criteria; 2.5.3 Annotated Bibliography; 2.6 Data-Driven Approach-Nonparametric Analysis; 2.6.1 The Joy of Counting; 2.6.2 Empirical Cumulative Distribution Function (ECDF); 2.6.3 Univariate Histograms; 2.6.4 Annotated Bibliography; 2.7 EDM in Higher Dimensions; 2.8 Rectilinear Histograms; 2.9 Depth and Multivariate Binning 327 $a2.9.1 Data Depth2.9.2 Aside: Depth-Related Topics; 2.9.3 Annotated Bibliography; 2.10 Conclusion; 3. Partitions and Piecewise Models; 3.1 Divide and Conquer; 3.1.1 Why Do We Need Partitions?; 3.1.2 Dividing Data; 3.1.3 Applications of Partition-Based EDM Summaries; 3.2 Axis-Aligned Partitions and Data Cubes; 3.2.1 Annotated Bibliography; 3.3 Nonlinear Partitions; 3.3.1 Annotated Bibliography; 3.4 DataSpheres (DS); 3.4.1 Layers; 3.4.2 Data Pyramids; 3.4.3 EDM Summaries; 3.4.4 Annotated Bibliography; 3.5 Set Comparison Using EDM Summaries; 3.5.1 Motivation; 3.5.2 Comparison Strategy 327 $a3.5.3 Statistical Tests for Change3.5.4 Application-Two Case Studies; 3.5.5 Annotated Bibliography; 3.6 Discovering Complex Structure in Data with EDM Summaries; 3.6.1 Exploratory Model Fitting in Interactive Response Time; 3.6.2 Annotated Bibliography; 3.7 Piecewise Linear Regression; 3.7.1 An Application; 3.7.2 Regression Coefficients; 3.7.3 Improvement in Fit; 3.7.4 Annotated Bibliography; 3.8 One-Pass Classification; 3.8.1 Quantile-Based Prediction with Piecewise Models; 3.8.2 Simulation Study; 3.8.3 Annotated Bibliography; 3.9 Conclusion; 4. Data Quality; 4.1 Introduction 327 $a4.2 The Meaning of Data Quality4.2.1 An Example; 4.2.2 Data Glitches; 4.2.3 Conventional Definition of DQ; 4.2.4 Times Have Changed; 4.2.5 Annotated Bibliography; 4.3 Updating DQ Metrics: Data Quality Continuum; 4.3.1 Data Gathering; 4.3.2 Data Delivery; 4.3.3 Data Monitoring; 4.3.4 Data Storage; 4.3.5 Data Integration; 4.3.6 Data Retrieval; 4.3.7 Data Mining/Analysis; 4.3.8 Annotated Bibliography; 4.4 The Meaning of Data Quality Revisited; 4.4.1 Data Interpretation; 4.4.2 Data Suitability; 4.4.3 Dataset Type; 4.4.4 Attribute Type; 4.4.5 Application Type 327 $a4.4.6 Data Quality-A Many Splendored Thing 330 $aWritten for practitioners of data mining, data cleaning and database management. Presents a technical treatment of data quality including process, metrics, tools and algorithms.Focuses on developing an evolving modeling strategy through an iterative data exploration loop and incorporation of domain knowledge.Addresses methods of detecting, quantifying and correcting data quality issues that can have a significant impact on findings and decisions, using commercially available tools as well as new algorithmic approaches.Uses case studies to illustrate applications in real 410 0$aWiley series in probability and statistics. 606 $aData mining 606 $aElectronic data processing$xData preparation 606 $aElectronic data processing$xQuality control 615 0$aData mining. 615 0$aElectronic data processing$xData preparation. 615 0$aElectronic data processing$xQuality control. 676 $a005.741 676 $a006.3 676 $a006.312 700 $aDasu$b Tamraparni$0281835 701 $aJohnson$b Theodore$0281836 801 0$bMiAaPQ 801 1$bMiAaPQ 801 2$bMiAaPQ 906 $aBOOK 912 $a996211655103316 996 $aExploratory data mining and data cleaning$9673537 997 $aUNISA