Discovering knowledge in data : an introduction to data mining / / Daniel T. Larose, Chantal D. Larose |
Autore | Larose Daniel T. |
Edizione | [2nd ed.] |
Pubbl/distr/stampa | Hoboken, New Jersey : , : IEEE, , 2014 |
Descrizione fisica | 1 online resource (336 p.) |
Disciplina | 006.3/12 |
Collana | Wiley Series on Methods and Applications in Data Mining |
Soggetto topico | Data mining |
ISBN |
1-118-87357-2
1-118-87405-6 1-118-87358-0 |
Classificazione | COM021040COM021030 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
DISCOVERING KNOWLEDGE IN DATA; Contents; Preface; 1 An Introduction to Data Mining; 1.1 What is Data Mining?; 1.2 Wanted: Data Miners; 1.3 The Need for Human Direction of Data Mining; 1.4 The Cross-Industry Standard Practice for Data Mining; 1.4.1 Crisp-DM: The Six Phases; 1.5 Fallacies of Data Mining; 1.6 What Tasks Can Data Mining Accomplish?; 1.6.1 Description; 1.6.2 Estimation; 1.6.3 Prediction; 1.6.4 Classification; 1.6.5 Clustering; 1.6.6 Association; References; Exercises; 2 Data Preprocessing; 2.1 Why do We Need to Preprocess the Data?; 2.2 Data Cleaning; 2.3 Handling Missing Data
2.4 Identifying Misclassifications2.5 Graphical Methods for Identifying Outliers; 2.6 Measures of Center and Spread; 2.7 Data Transformation; 2.8 Min-Max Normalization; 2.9 Z-Score Standardization; 2.10 Decimal Scaling; 2.11 Transformations to Achieve Normality; 2.12 Numerical Methods for Identifying Outliers; 2.13 Flag Variables; 2.14 Transforming Categorical Variables into Numerical Variables; 2.15 Binning Numerical Variables; 2.16 Reclassifying Categorical Variables; 2.17 Adding an Index Field; 2.18 Removing Variables that are Not Useful; 2.19 Variables that Should Probably Not Be Removed 2.20 Removal of Duplicate Records2.21 A Word About Id Fields; THE R ZONE; References; Exercises; Hands-On Analysis; 3 Exploratory Data Analysis; 3.1 Hypothesis Testing Versus Exploratory Data Analysis; 3.2 Getting to Know the Data Set; 3.3 Exploring Categorical Variables; 3.4 Exploring Numeric Variables; 3.5 Exploring Multivariate Relationships; 3.6 Selecting Interesting Subsets of the Data for Further Investigation; 3.7 Using EDA to Uncover Anomalous Fields; 3.8 Binning Based on Predictive Value; 3.9 Deriving New Variables: Flag Variables; 3.10 Deriving New Variables: Numerical Variables 3.11 Using EDA to Investigate Correlated Predictor Variables3.12 Summary; THE R ZONE; Reference; Exercises; Hands-On Analysis; 4 Univariate Statistical Analysis; 4.1 Data Mining Tasks in Discovering Knowledge in Data; 4.2 Statistical Approaches to Estimation and Prediction; 4.3 Statistical Inference; 4.4 How Confident are We in Our Estimates?; 4.5 Confidence Interval Estimation of the Mean; 4.6 How to Reduce the Margin of Error; 4.7 Confidence Interval Estimation of the Proportion; 4.8 Hypothesis Testing for the Mean; 4.9 Assessing the Strength of Evidence Against the Null Hypothesis 4.10 Using Confidence Intervals to Perform Hypothesis Tests4.11 Hypothesis Testing for the Proportion; THE R ZONE; Reference; Exercises; 5 Multivariate Statistics; 5.1 Two-Sample t-Test for Difference in Means; 5.2 Two-Sample Z-Test for Difference in Proportions; 5.3 Test for Homogeneity of Proportions; 5.4 Chi-Square Test for Goodness of Fit of Multinomial Data; 5.5 Analysis of Variance; 5.6 Regression Analysis; 5.7 Hypothesis Testing in Regression; 5.8 Measuring the Quality of a Regression Model; 5.9 Dangers of Extrapolation; 5.10 Confidence Intervals for the Mean Value of Given 5.11 Prediction Intervals for a Randomly Chosen Value of Given |
Record Nr. | UNISA-996198490203316 |
Larose Daniel T.
![]() |
||
Hoboken, New Jersey : , : IEEE, , 2014 | ||
![]() | ||
Lo trovi qui: Univ. di Salerno | ||
|
Discovering knowledge in data : an introduction to data mining / / Daniel T. Larose, Chantal D. Larose |
Autore | Larose Daniel T. |
Edizione | [2nd ed.] |
Pubbl/distr/stampa | Hoboken, New Jersey : , : IEEE, , 2014 |
Descrizione fisica | 1 online resource (336 p.) |
Disciplina | 006.3/12 |
Collana | Wiley Series on Methods and Applications in Data Mining |
Soggetto topico | Data mining |
ISBN |
1-118-87357-2
1-118-87405-6 1-118-87358-0 |
Classificazione | COM021040COM021030 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
DISCOVERING KNOWLEDGE IN DATA; Contents; Preface; 1 An Introduction to Data Mining; 1.1 What is Data Mining?; 1.2 Wanted: Data Miners; 1.3 The Need for Human Direction of Data Mining; 1.4 The Cross-Industry Standard Practice for Data Mining; 1.4.1 Crisp-DM: The Six Phases; 1.5 Fallacies of Data Mining; 1.6 What Tasks Can Data Mining Accomplish?; 1.6.1 Description; 1.6.2 Estimation; 1.6.3 Prediction; 1.6.4 Classification; 1.6.5 Clustering; 1.6.6 Association; References; Exercises; 2 Data Preprocessing; 2.1 Why do We Need to Preprocess the Data?; 2.2 Data Cleaning; 2.3 Handling Missing Data
2.4 Identifying Misclassifications2.5 Graphical Methods for Identifying Outliers; 2.6 Measures of Center and Spread; 2.7 Data Transformation; 2.8 Min-Max Normalization; 2.9 Z-Score Standardization; 2.10 Decimal Scaling; 2.11 Transformations to Achieve Normality; 2.12 Numerical Methods for Identifying Outliers; 2.13 Flag Variables; 2.14 Transforming Categorical Variables into Numerical Variables; 2.15 Binning Numerical Variables; 2.16 Reclassifying Categorical Variables; 2.17 Adding an Index Field; 2.18 Removing Variables that are Not Useful; 2.19 Variables that Should Probably Not Be Removed 2.20 Removal of Duplicate Records2.21 A Word About Id Fields; THE R ZONE; References; Exercises; Hands-On Analysis; 3 Exploratory Data Analysis; 3.1 Hypothesis Testing Versus Exploratory Data Analysis; 3.2 Getting to Know the Data Set; 3.3 Exploring Categorical Variables; 3.4 Exploring Numeric Variables; 3.5 Exploring Multivariate Relationships; 3.6 Selecting Interesting Subsets of the Data for Further Investigation; 3.7 Using EDA to Uncover Anomalous Fields; 3.8 Binning Based on Predictive Value; 3.9 Deriving New Variables: Flag Variables; 3.10 Deriving New Variables: Numerical Variables 3.11 Using EDA to Investigate Correlated Predictor Variables3.12 Summary; THE R ZONE; Reference; Exercises; Hands-On Analysis; 4 Univariate Statistical Analysis; 4.1 Data Mining Tasks in Discovering Knowledge in Data; 4.2 Statistical Approaches to Estimation and Prediction; 4.3 Statistical Inference; 4.4 How Confident are We in Our Estimates?; 4.5 Confidence Interval Estimation of the Mean; 4.6 How to Reduce the Margin of Error; 4.7 Confidence Interval Estimation of the Proportion; 4.8 Hypothesis Testing for the Mean; 4.9 Assessing the Strength of Evidence Against the Null Hypothesis 4.10 Using Confidence Intervals to Perform Hypothesis Tests4.11 Hypothesis Testing for the Proportion; THE R ZONE; Reference; Exercises; 5 Multivariate Statistics; 5.1 Two-Sample t-Test for Difference in Means; 5.2 Two-Sample Z-Test for Difference in Proportions; 5.3 Test for Homogeneity of Proportions; 5.4 Chi-Square Test for Goodness of Fit of Multinomial Data; 5.5 Analysis of Variance; 5.6 Regression Analysis; 5.7 Hypothesis Testing in Regression; 5.8 Measuring the Quality of a Regression Model; 5.9 Dangers of Extrapolation; 5.10 Confidence Intervals for the Mean Value of Given 5.11 Prediction Intervals for a Randomly Chosen Value of Given |
Record Nr. | UNINA-9910132206803321 |
Larose Daniel T.
![]() |
||
Hoboken, New Jersey : , : IEEE, , 2014 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
Discovering knowledge in data : an introduction to data mining / / Daniel T. Larose, Chantal D. Larose |
Autore | Larose Daniel T. |
Edizione | [2nd ed.] |
Pubbl/distr/stampa | Hoboken, New Jersey : , : IEEE, , 2014 |
Descrizione fisica | 1 online resource (336 p.) |
Disciplina | 006.3/12 |
Collana | Wiley Series on Methods and Applications in Data Mining |
Soggetto topico | Data mining |
ISBN |
1-118-87357-2
1-118-87405-6 1-118-87358-0 |
Classificazione | COM021040COM021030 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
DISCOVERING KNOWLEDGE IN DATA; Contents; Preface; 1 An Introduction to Data Mining; 1.1 What is Data Mining?; 1.2 Wanted: Data Miners; 1.3 The Need for Human Direction of Data Mining; 1.4 The Cross-Industry Standard Practice for Data Mining; 1.4.1 Crisp-DM: The Six Phases; 1.5 Fallacies of Data Mining; 1.6 What Tasks Can Data Mining Accomplish?; 1.6.1 Description; 1.6.2 Estimation; 1.6.3 Prediction; 1.6.4 Classification; 1.6.5 Clustering; 1.6.6 Association; References; Exercises; 2 Data Preprocessing; 2.1 Why do We Need to Preprocess the Data?; 2.2 Data Cleaning; 2.3 Handling Missing Data
2.4 Identifying Misclassifications2.5 Graphical Methods for Identifying Outliers; 2.6 Measures of Center and Spread; 2.7 Data Transformation; 2.8 Min-Max Normalization; 2.9 Z-Score Standardization; 2.10 Decimal Scaling; 2.11 Transformations to Achieve Normality; 2.12 Numerical Methods for Identifying Outliers; 2.13 Flag Variables; 2.14 Transforming Categorical Variables into Numerical Variables; 2.15 Binning Numerical Variables; 2.16 Reclassifying Categorical Variables; 2.17 Adding an Index Field; 2.18 Removing Variables that are Not Useful; 2.19 Variables that Should Probably Not Be Removed 2.20 Removal of Duplicate Records2.21 A Word About Id Fields; THE R ZONE; References; Exercises; Hands-On Analysis; 3 Exploratory Data Analysis; 3.1 Hypothesis Testing Versus Exploratory Data Analysis; 3.2 Getting to Know the Data Set; 3.3 Exploring Categorical Variables; 3.4 Exploring Numeric Variables; 3.5 Exploring Multivariate Relationships; 3.6 Selecting Interesting Subsets of the Data for Further Investigation; 3.7 Using EDA to Uncover Anomalous Fields; 3.8 Binning Based on Predictive Value; 3.9 Deriving New Variables: Flag Variables; 3.10 Deriving New Variables: Numerical Variables 3.11 Using EDA to Investigate Correlated Predictor Variables3.12 Summary; THE R ZONE; Reference; Exercises; Hands-On Analysis; 4 Univariate Statistical Analysis; 4.1 Data Mining Tasks in Discovering Knowledge in Data; 4.2 Statistical Approaches to Estimation and Prediction; 4.3 Statistical Inference; 4.4 How Confident are We in Our Estimates?; 4.5 Confidence Interval Estimation of the Mean; 4.6 How to Reduce the Margin of Error; 4.7 Confidence Interval Estimation of the Proportion; 4.8 Hypothesis Testing for the Mean; 4.9 Assessing the Strength of Evidence Against the Null Hypothesis 4.10 Using Confidence Intervals to Perform Hypothesis Tests4.11 Hypothesis Testing for the Proportion; THE R ZONE; Reference; Exercises; 5 Multivariate Statistics; 5.1 Two-Sample t-Test for Difference in Means; 5.2 Two-Sample Z-Test for Difference in Proportions; 5.3 Test for Homogeneity of Proportions; 5.4 Chi-Square Test for Goodness of Fit of Multinomial Data; 5.5 Analysis of Variance; 5.6 Regression Analysis; 5.7 Hypothesis Testing in Regression; 5.8 Measuring the Quality of a Regression Model; 5.9 Dangers of Extrapolation; 5.10 Confidence Intervals for the Mean Value of Given 5.11 Prediction Intervals for a Randomly Chosen Value of Given |
Record Nr. | UNINA-9910817415203321 |
Larose Daniel T.
![]() |
||
Hoboken, New Jersey : , : IEEE, , 2014 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|