LEADER 05554nam 2200649 450 001 9910132206803321 005 20200520144314.0 010 $a1-118-87357-2 010 $a1-118-87405-6 010 $a1-118-87358-0 035 $a(CKB)3710000000121099 035 $a(EBL)1699137 035 $a(DLC) 2014003777 035 $a(Au-PeEL)EBL1699137 035 $a(CaPaEBR)ebr10878036 035 $a(CaONFJC)MIL615362 035 $a(OCoLC)869460667 035 $a(CaSebORM)9781118873571 035 $a(MiAaPQ)EBC1699137 035 $a(PPN)189240091 035 $a(EXLCZ)993710000000121099 100 $a20140610h20142014 uy 0 101 0 $aeng 135 $aur|n|---||||| 181 $2rdacontent 182 $2rdamedia 183 $2rdacarrier 200 10$aDiscovering knowledge in data $ean introduction to data mining /$fDaniel T. Larose, Chantal D. Larose 205 $a2nd ed. 210 1$aHoboken, New Jersey :$cIEEE,$d2014. 210 4$dİ2014 215 $a1 online resource (336 p.) 225 1 $aWiley Series on Methods and Applications in Data Mining 300 $aIncludes index. 311 $a0-470-90874-2 320 $aIncludes bibliographical references at the end of each chapters and index. 327 $aDISCOVERING KNOWLEDGE IN DATA; Contents; Preface; 1 An Introduction to Data Mining; 1.1 What is Data Mining?; 1.2 Wanted: Data Miners; 1.3 The Need for Human Direction of Data Mining; 1.4 The Cross-Industry Standard Practice for Data Mining; 1.4.1 Crisp-DM: The Six Phases; 1.5 Fallacies of Data Mining; 1.6 What Tasks Can Data Mining Accomplish?; 1.6.1 Description; 1.6.2 Estimation; 1.6.3 Prediction; 1.6.4 Classification; 1.6.5 Clustering; 1.6.6 Association; References; Exercises; 2 Data Preprocessing; 2.1 Why do We Need to Preprocess the Data?; 2.2 Data Cleaning; 2.3 Handling Missing Data 327 $a2.4 Identifying Misclassifications2.5 Graphical Methods for Identifying Outliers; 2.6 Measures of Center and Spread; 2.7 Data Transformation; 2.8 Min-Max Normalization; 2.9 Z-Score Standardization; 2.10 Decimal Scaling; 2.11 Transformations to Achieve Normality; 2.12 Numerical Methods for Identifying Outliers; 2.13 Flag Variables; 2.14 Transforming Categorical Variables into Numerical Variables; 2.15 Binning Numerical Variables; 2.16 Reclassifying Categorical Variables; 2.17 Adding an Index Field; 2.18 Removing Variables that are Not Useful; 2.19 Variables that Should Probably Not Be Removed 327 $a2.20 Removal of Duplicate Records2.21 A Word About Id Fields; THE R ZONE; References; Exercises; Hands-On Analysis; 3 Exploratory Data Analysis; 3.1 Hypothesis Testing Versus Exploratory Data Analysis; 3.2 Getting to Know the Data Set; 3.3 Exploring Categorical Variables; 3.4 Exploring Numeric Variables; 3.5 Exploring Multivariate Relationships; 3.6 Selecting Interesting Subsets of the Data for Further Investigation; 3.7 Using EDA to Uncover Anomalous Fields; 3.8 Binning Based on Predictive Value; 3.9 Deriving New Variables: Flag Variables; 3.10 Deriving New Variables: Numerical Variables 327 $a3.11 Using EDA to Investigate Correlated Predictor Variables3.12 Summary; THE R ZONE; Reference; Exercises; Hands-On Analysis; 4 Univariate Statistical Analysis; 4.1 Data Mining Tasks in Discovering Knowledge in Data; 4.2 Statistical Approaches to Estimation and Prediction; 4.3 Statistical Inference; 4.4 How Confident are We in Our Estimates?; 4.5 Confidence Interval Estimation of the Mean; 4.6 How to Reduce the Margin of Error; 4.7 Confidence Interval Estimation of the Proportion; 4.8 Hypothesis Testing for the Mean; 4.9 Assessing the Strength of Evidence Against the Null Hypothesis 327 $a4.10 Using Confidence Intervals to Perform Hypothesis Tests4.11 Hypothesis Testing for the Proportion; THE R ZONE; Reference; Exercises; 5 Multivariate Statistics; 5.1 Two-Sample t-Test for Difference in Means; 5.2 Two-Sample Z-Test for Difference in Proportions; 5.3 Test for Homogeneity of Proportions; 5.4 Chi-Square Test for Goodness of Fit of Multinomial Data; 5.5 Analysis of Variance; 5.6 Regression Analysis; 5.7 Hypothesis Testing in Regression; 5.8 Measuring the Quality of a Regression Model; 5.9 Dangers of Extrapolation; 5.10 Confidence Intervals for the Mean Value of Given 327 $a5.11 Prediction Intervals for a Randomly Chosen Value of Given 330 $a"This is a new edition of a highly praised, successful reference on data mining, now more important than ever due to the growth of the field and wide range of applications. This edition features new chapters on multivariate statistical analysis, covering analysis of variance and chi-square procedures; cost-benefit analyses; and time-series data analysis. There is also extensive coverage of the R statistical programming language. Graduate and advanced undergraduate students of computer science and statistics, managers/CEOs/CFOs, marketing executives, market researchers and analysts, sales analysts, and medical professionals will want this comprehensive reference"--$cProvided by publisher. 410 0$aWiley series on methods and applications in data mining. 606 $aData mining 615 0$aData mining. 676 $a006.3/12 686 $aCOM021040$aCOM021030$2bisacsh 700 $aLarose$b Daniel T.$0497081 702 $aLarose$b Chantal D. 801 0$bMiAaPQ 801 1$bMiAaPQ 801 2$bMiAaPQ 906 $aBOOK 912 $a9910132206803321 996 $aDiscovering knowledge in data$9754739 997 $aUNINA