top

  Info

  • Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

  Info

  • Utilizzare questo link per rimuovere la selezione effettuata.
Data Mining for Business Analytics : Concepts, Techniques, and Applications with XLMiner
Data Mining for Business Analytics : Concepts, Techniques, and Applications with XLMiner
Autore Bruce Peter C
Edizione [3rd ed.]
Pubbl/distr/stampa Hoboken : , : John Wiley & Sons, Incorporated, , 2016
Descrizione fisica 1 online resource (549 pages)
Disciplina 005.54
Altri autori (Persone) ShmueliGalit
PatelNitin R
Soggetto topico Business--Data processing
Soggetto genere / forma Electronic books.
ISBN 9781118729243
9781118729472
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Cover -- Title Page -- Copyright -- Dedication -- Contents -- Foreword -- Preface to the Third Edition -- Preface to the First Edition -- Acknowledgments -- Part I: Preliminaries -- Chapter 1: Introduction -- 1.1 What Is Business Analytics? -- 1.2 What Is Data Mining? -- 1.3 Data Mining and Related Terms -- 1.4 Big Data -- 1.5 Data Science -- 1.6 Why Are There So Many Different Methods? -- 1.7 Terminology and Notation -- 1.8 Road Maps to This Book -- Order of Topics -- Chapter 2: Overview of the Data Mining Process -- 2.1 Introduction -- 2.2 Core Ideas in Data Mining -- Classification -- Prediction -- Association Rules and Recommendation Systems -- Predictive Analytics -- Data Reduction and Dimension Reduction -- Data Exploration and Visualization -- Supervised and Unsupervised Learning -- 2.3 The Steps in Data Mining -- 2.4 Preliminary Steps -- Organization of Datasets -- Sampling from a Database -- Oversampling Rare Events in Classification Tasks -- Preprocessing and Cleaning the Data -- 2.5 Predictive Power and Overfitting -- Creation and Use of Data Partitions -- Overfitting -- 2.6 Building a Predictive Model with XLMiner -- Predicting Home Values in the West Roxbury Neighborhood -- Modeling Process -- 2.7 Using Excel for Data Mining -- 2.8 Automating Data Mining Solutions -- Data Mining Software Tools: the State of the Market -- Problems -- Part II: Data Exploration and Dimension Reduction -- Chapter 3: Data Visualization -- 3.1 Uses of Data Visualization -- 3.2 Data Examples -- Example 1: Boston Housing Data -- Example 2: Ridership on Amtrak Trains -- 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots -- Distribution Plots: Boxplots and Histograms -- Heatmaps: Visualizing Correlations and Missing Values -- 3.4 Multidimensional Visualization -- Adding Variables: Color, Size, Shape, Multiple Panels, and Animation.
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering -- Reference: Trend Line and Labels -- Scaling up to Large Datasets -- Multivariate Plot: Parallel Coordinates Plot -- Interactive Visualization -- 3.5 Specialized Visualizations -- Visualizing Networked Data -- Visualizing Hierarchical Data: Treemaps -- Visualizing Geographical Data: Map Charts -- 3.6 Summary: Major Visualizations and Operations, by Data Mining Goal -- Prediction -- Classification -- Time Series Forecasting -- Unsupervised Learning -- Problems -- Chapter 4: Dimension Reduction -- 4.1 Introduction -- 4.2 Curse of Dimensionality -- 4.3 Practical Considerations -- Example 1: House Prices in Boston -- 4.4 Data Summaries -- Summary Statistics -- Pivot Tables -- 4.5 Correlation Analysis -- 4.6 Reducing the Number of Categories in Categorical Variables -- 4.7 Converting a Categorical Variable to a Numerical Variable -- 4.8 Principal Components Analysis -- Example 2: Breakfast Cereals -- Principal Components -- Normalizing the Data -- Using Principal Components for Classification and Prediction -- 4.9 Dimension Reduction Using Regression Models -- 4.10 Dimension Reduction Using Classification and Regression Trees -- Problems -- Part III: Performance Evaluation -- Chapter 5: Evaluating Predictive Performance -- 5.1 Introduction -- 5.2 Evaluating Predictive Performance -- Benchmark: The Average -- Prediction Accuracy Measures -- Comparing Training and Validation Performance -- Lift Chart -- 5.3 Judging Classifier Performance -- Benchmark: The Naive Rule -- Class Separation -- The Classification Matrix -- Using the Validation Data -- Accuracy Measures -- Propensities and Cutoff for Classification -- Performance in Unequal Importance of Classes -- Asymmetric Misclassification Costs -- Generalization to More Than Two Classes -- 5.4 Judging Ranking Performance.
Lift Charts for Binary Data -- Decile Lift Charts -- Beyond Two Classes -- Lift Charts Incorporating Costs and Benefits -- Lift as Function of Cutoff -- 5.5 Oversampling -- Oversampling the Training Set -- Evaluating Model Performance Using a Non-oversampled Validation Set -- Evaluating Model Performance If Only Oversampled Validation Set Exists -- Problems -- Part IV: Prediction and Classification Methods -- Chapter 6: Multiple Linear Regression -- 6.1 Introduction -- 6.2 Explanatory vs. Predictive Modeling -- 6.3 Estimating the Regression Equation and Prediction -- Example: Predicting the Price of Used Toyota Corolla Cars -- 6.4 Variable Selection in Linear Regression -- Reducing the Number of Predictors -- How to Reduce the Number of Predictors -- Problems -- Chapter 7: κ-Nearest-Neighbors (κ-NN) -- 7.1 The -NN Classifier (categorical outcome) -- Determining Neighbors -- Classification Rule -- Example: Riding Mowers -- Choosing -- Setting the Cutoff Value -- -NN with More Than Two Classes -- Converting Categorical Variables to Binary Dummies -- 7.2 -NN for a Numerical Response -- 7.3 Advantages and Shortcomings of -NN Algorithms -- Problems -- Chapter 8: The Naive Bayes Classifier -- 8.1 Introduction -- Cutoff Probability Method -- Conditional Probability -- Example 1: Predicting Fraudulent Financial Reporting -- 8.2 Applying the Full (Exact) Bayesian Classifier -- Using the "Assign to the Most Probable Class" Method -- Using the Cutoff Probability Method -- Practical Difficulty with the Complete (Exact) Bayes Procedure -- Solution: Naive Bayes -- Example 2: Predicting Fraudulent Financial Reports, Two Predictors -- Example 3: Predicting Delayed Flights -- 8.3 Advantages and Shortcomings of the Naive Bayes Classifier -- Problems -- Chapter 9: Classification and Regression Trees -- 9.1 Introduction -- 9.2 Classification Trees.
Recursive Partitioning -- Example 1: Riding Mowers -- Measures of Impurity -- Tree Structure -- Classifying a New Observation -- 9.3 Evaluating the Performance of a Classification Tree -- Example 2: Acceptance of Personal Loan -- 9.4 Avoiding Overfitting -- Stopping Tree Growth: CHAID -- Pruning the Tree -- 9.5 Classification Rules from Trees -- 9.6 Classification Trees for More Than two Classes -- 9.7 Regression Trees -- Prediction -- Measuring Impurity -- Evaluating Performance -- 9.8 Advantages, Weaknesses, and Extensions -- 9.9 Improving Prediction: Multiple Trees -- Problems -- Chapter 10: Logistic Regression -- 10.1 Introduction -- 10.2 The Logistic Regression Model -- Example: Acceptance of Personal Loan -- Model with a Single Predictor -- Estimating the Logistic Model from Data: Computing Parameter Estimates -- Interpreting Results in Terms of Odds (for a Profiling Goal) -- 10.3 Evaluating Classification Performance -- Variable Selection -- 10.4 Example of Complete Analysis: Predicting Delayed Flights -- Data Preprocessing -- Model Fitting and Estimation -- Model Interpretation -- Model Performance -- Variable Selection -- 10.5 Appendix: Logistic Regression for Profiling -- Appendix A: Why Linear Regression Is Problematic for a Categorical Response -- Appendix B: Evaluating Explanatory Power -- Appendix C: Logistic Regression for More Than Two Classes -- Problems -- Chapter 11: Neural Nets -- 11.1 Introduction -- 11.2 Concept and Structure of a Neural Network -- 11.3 Fitting a Network to Data -- Example 1: Tiny Dataset -- Computing Output of Nodes -- Preprocessing the Data -- Training the Model -- Example 2: Classifying Accident Severity -- Avoiding Overfitting -- Using the Output for Prediction and Classification -- 11.4 Required User Input -- 11.5 Exploring the Relationship Between Predictors and Response.
11.6 Advantages and Weaknesses of Neural Networks -- Unsupervised Feature Extraction and Deep Learning -- Problems -- Chapter 12: Discriminant Analysis -- 12.1 Introduction -- Example 1: Riding Mowers -- Example 2: Personal Loan Acceptance -- 12.2 Distance of an Observation from a Class -- 12.3 Fisher's Linear Classification Functions -- 12.4 Classification Performance of Discriminant Analysis -- 12.5 Prior Probabilities -- 12.6 Unequal Misclassification Costs -- 12.7 Classifying More Than Two Classes -- Example 3: Medical Dispatch to Accident Scenes -- 12.8 Advantages and Weaknesses -- Problems -- Chapter 13: Combining Methods: Ensembles and Uplift Modeling -- 13.1 Ensembles -- Why Ensembles Can Improve Predictive Power -- Simple Averaging -- Bagging -- Boosting -- Advantages and Weaknesses of Ensembles -- 13.2 Uplift (Persuasion) Modeling -- A-B Testing -- Uplift -- Gathering the Data -- A Simple Model -- Modeling Individual Uplift -- Using the Results of an Uplift Model -- 13.3 Summary -- Problems -- Part V: Mining Relationships among Records -- Chapter 14: Association Rules and Collaborative Filtering -- 14.1 Association Rules -- Discovering Association Rules in Transaction Databases -- Example 1: Synthetic Data on Purchases of Phone Faceplates -- Generating Candidate Rules -- The Apriori Algorithm -- Selecting Strong Rules -- Data Format -- The Process of Rule Selection -- Interpreting the Results -- Rules and Chance -- Example 2: Rules for Similar Book Purchases -- 14.2 Collaborative Filtering -- Data Type and Format -- Example 3: Netflix Prize Contest -- User-Based Collaborative Filtering: "People Like You" -- Item-Based Collaborative Filtering -- Advantages and Weaknesses of Collaborative Filtering -- Collaborative Filtering vs. Association Rules -- 14.3 Summary -- Problems -- Chapter 15: Cluster Analysis -- 15.1 Introduction.
Example: Public Utilities.
Record Nr. UNINA-9910795808803321
Bruce Peter C  
Hoboken : , : John Wiley & Sons, Incorporated, , 2016
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui
Statistics for Data Science and Analytics
Statistics for Data Science and Analytics
Autore Bruce Peter C
Edizione [1st ed.]
Pubbl/distr/stampa Newark : , : John Wiley & Sons, Incorporated, , 2024
Descrizione fisica 1 online resource (381 pages)
Altri autori (Persone) GedeckPeter
DobbinsJanet
ISBN 1-394-25383-4
1-394-25382-6
1-394-25381-8
Formato Materiale a stampa
Livello bibliografico Monografia
Lingua di pubblicazione eng
Nota di contenuto Cover -- Title Page -- Copyright -- Contents -- About the Authors -- Acknowledgments -- About the Companion Website -- Introduction -- Chapter 1 Statistics and Data Science -- 1.1 Big Data: Predicting Pregnancy -- 1.2 Phantom Protection from Vitamin E -- 1.3 Statistician, Heal Thyself -- 1.4 Identifying Terrorists in Airports -- 1.5 Looking Ahead -- 1.6 Big Data and Statisticians -- 1.6.1 Data Scientists -- Chapter 2 Designing and Carrying Out a Statistical Study -- 2.1 Statistical Science -- 2.2 Big Data -- 2.3 Data Science -- 2.4 Example: Hospital Errors -- 2.5 Experiment -- 2.6 Designing an Experiment -- 2.6.1 A/B Tests -- A Controlled Experiment for the Hospital Plans -- 2.6.2 Randomizing -- 2.6.3 Planning -- 2.6.4 Bias -- 2.6.4.1 Placebo -- 2.6.4.2 Blinding -- 2.6.4.3 Before‐after Pairing -- 2.7 The Data -- 2.7.1 Dataframe Format -- 2.8 Variables and Their Flavors -- 2.8.1 Numeric Variables -- 2.8.2 Categorical Variables -- 2.8.3 Binary Variables -- 2.8.4 Text Data -- 2.8.5 Random Variables -- 2.8.6 Simplified Columnar Format -- 2.9 Python: Data Structures and Operations -- 2.9.1 Primary Data Types -- 2.9.2 Comments -- 2.9.3 Variables -- 2.9.4 Operations on Data -- 2.9.4.1 Converting Data Types -- 2.9.5 Advanced Data Structures -- 2.9.5.1 Classes and Objects -- 2.9.5.2 Data Types and Their Declaration -- 2.10 Are We Sure We Made a Difference? -- 2.11 Is Chance Responsible? The Foundation of Hypothesis Testing -- 2.11.1 Looking at Just One Hospital -- 2.12 Probability -- 2.12.1 Interpreting Our Result -- 2.13 Significance or Alpha Level -- 2.13.1 Increasing the Sample Size -- 2.13.2 Simulating Probabilities with Random Numbers -- 2.14 Other Kinds of Studies -- 2.15 When to Use Hypothesis Tests -- 2.16 Experiments Falling Short of the Gold Standard -- 2.17 Summary -- 2.18 Python: Iterations and Conditional Execution -- 2.18.1 if Statements.
2.18.2 for Statements -- 2.18.3 while Statements -- 2.18.4 break and continue Statements -- 2.18.5 Example: Calculate Mean, Standard Deviation, Subsetting -- 2.18.6 List Comprehensions -- 2.19 Python: Numpy, scipy, and pandas-The Workhorses of Data Science -- 2.19.1 Numpy -- 2.19.2 Scipy -- 2.19.3 Pandas -- 2.19.3.1 Reading and Writing Data -- 2.19.3.2 Accessing Data -- 2.19.3.3 Manipulating Data -- 2.19.3.4 Iterating Over a DataFrame -- 2.19.3.5 And a Lot More -- Exercises -- Chapter 3 Exploring and Displaying the Data -- 3.1 Exploratory Data Analysis -- 3.2 What to Measure-Central Location -- 3.2.1 Mean -- 3.2.2 Median -- 3.2.3 Mode -- 3.2.4 Expected Value -- 3.2.5 Proportions for Binary Data -- 3.2.5.1 Percents -- 3.3 What to Measure-Variability -- 3.3.1 Range -- 3.3.2 Percentiles -- 3.3.3 Interquartile Range -- 3.3.4 Deviations and Residuals -- 3.3.5 Mean Absolute Deviation -- 3.3.6 Variance and Standard Deviation -- 3.3.6.1 Denominator of N or N-1? -- 3.3.7 Population Variance -- 3.3.8 Degrees of Freedom -- 3.4 What to Measure-Distance (Nearness) -- 3.5 Test Statistic -- 3.5.1 Test Statistic for this Study -- 3.6 Examining and Displaying the Data -- 3.6.1 Frequency Tables -- 3.6.2 Histograms -- 3.6.3 Bar Chart -- 3.6.4 Box Plots -- 3.6.5 Tails and Skew -- 3.6.6 Errors and Outliers Are Not the Same Thing! -- 3.7 Python: Exploratory Data Analysis/Data Visualization -- 3.7.1 Matplotlib -- 3.7.2 Data Visualization Using Pandas and Seaborn -- Exercises -- Chapter 4 Accounting for Chance-Statistical Inference -- 4.1 Avoid Being Fooled by Chance -- 4.2 The Null Hypothesis -- 4.3 Repeating the Experiment -- 4.3.1 Shuffling and Picking Numbers from a Hat or Box -- 4.3.2 How Many Reshuffles? -- 4.3.3 The t‐Test -- 4.3.4 Conclusion -- 4.4 Statistical Significance -- 4.4.1 Bottom Line -- 4.4.1.1 Statistical Significance as a Screening Device.
4.4.2 Torturing the Data -- 4.4.3 Practical Significance -- 4.5 Power -- 4.6 The Normal Distribution -- 4.6.1 The Exact Test -- 4.7 Summary -- 4.8 Python: Random Numbers -- 4.8.1 Generating Random Numbers Using the random Package -- 4.8.2 Random Numbers in numpy and scipy -- 4.8.3 Using Random Numbers in Other Packages -- 4.8.4 Example: Implement a Resampling Experiment -- 4.8.5 Write Functions for Code Reuse -- 4.8.6 Organizing Code into Modules -- Exercises -- Chapter 5 Probability -- 5.1 What Is Probability -- 5.2 Simple Probability -- 5.2.1 Venn Diagrams -- 5.3 Probability Distributions -- 5.3.1 Binomial Distribution -- 5.3.1.1 Example -- 5.4 From Binomial to Normal Distribution -- 5.4.1 Standardization (Normalization) -- 5.4.2 Standard Normal Distribution -- 5.4.2.1 z‐Tables -- 5.4.3 The 95 Percent Rule -- 5.5 Appendix: Binomial Formula and Normal Approximation -- 5.5.1 Normal Approximation -- 5.6 Python: Probability -- 5.6.1 Converting Counts to Probabilities -- 5.6.2 Probability Distributions in Python -- 5.6.3 Probability Distributions in random -- 5.6.4 Probability Distributions in the scipy Package -- 5.6.4.1 Continuous Distributions -- 5.6.4.2 Discrete Distributions -- Exercises -- Chapter 6 Categorical Variables -- 6.1 Two‐way Tables -- 6.2 Conditional Probability -- 6.2.1 From Numbers to Percentages to Conditional Probabilities -- 6.3 Bayesian Estimates -- 6.3.1 Let's Review the Different Probabilities -- 6.3.2 Bayesian Calculations -- 6.4 Independence -- 6.4.1 Chi‐square Test -- 6.4.1.1 Sensor Calibration -- 6.4.1.2 Standardizing Departure from Expected -- 6.5 Multiplication Rule -- 6.6 Simpson's Paradox -- 6.7 Python: Counting and Contingency Tables -- 6.7.1 Counting in Python -- 6.7.2 Counting in Pandas -- 6.7.3 Two‐way Tables Using Pandas -- 6.7.4 Chi‐square Test -- Exercises -- Chapter 7 Surveys and Sampling.
7.1 Literary Digest-Sampling Trumps "All Data" -- 7.2 Simple Random Samples -- 7.3 Margin of Error: Sampling Distribution for a Proportion -- 7.3.1 The Confidence Interval -- 7.3.2 A More Manageable Box: Sampling with Replacement -- 7.3.3 Summing Up -- 7.4 Sampling Distribution for a Mean -- 7.4.1 Simulating the Behavior of Samples from a Hypothetical Population -- 7.5 The Bootstrap -- 7.5.1 Resampling Procedure (Bootstrap) -- 7.6 Rationale for the Bootstrap -- 7.6.1 Let's Recap -- 7.6.2 Formula‐based Counterparts to Resampling -- 7.6.2.1 FORMULA: The Z‐interval -- 7.6.2.2 Proportions -- 7.6.3 For a Mean: T‐interval -- 7.6.4 Example-Manual Calculations -- 7.6.5 Example-Software -- 7.6.6 A Bit of History-1906 at Guinness Brewery -- 7.6.7 The Bootstrap Today -- 7.6.8 Central Limit Theorem -- 7.7 Standard Error -- 7.7.1 Standard Error via Formula -- 7.8 Other Sampling Methods -- 7.8.1 Stratified Sampling -- 7.8.2 Cluster Sampling -- 7.8.3 Systematic Sampling -- 7.8.4 Multistage Sampling -- 7.8.5 Convenience Sampling -- 7.8.6 Self‐selection -- 7.8.7 Nonresponse Bias -- 7.9 Absolute vs. Relative Sample Size -- 7.10 Python: Random Sampling Strategies -- 7.10.1 Implement Simple Random Sample (SRS) -- 7.10.2 Determining Confidence Intervals -- 7.10.3 Bootstrap Sampling to Determine Confidence Intervals for a Mean -- 7.10.4 Advanced Sampling Techniques -- 7.10.4.1 Stratified Sampling for Categorical Variables -- 7.10.4.2 Stratified Sampling of Continuous Variables -- Exercises -- Chapter 8 More than Two Samples or Categories -- 8.1 Count Data-R × C Tables -- 8.2 The Role of Experiments (Many Are Costly) -- 8.2.1 Example: Marriage Therapy -- 8.3 Chi‐Square Test -- 8.3.1 Alternate Option -- 8.3.2 Testing for the Role of Chance -- 8.3.3 Standardization to the Chi‐Square Statistic -- 8.3.4 Chi‐Square Example on the Computer -- 8.4 Single Sample-Goodness‐of‐Fit.
8.4.1 Resampling Procedure -- 8.5 Numeric Data: ANOVA -- 8.6 Components of Variance -- 8.6.1 From ANOVA to Regression -- 8.7 Factorial Design -- 8.7.1 Stratification and Blocking -- 8.7.2 Blocking -- 8.8 The Problem of Multiple Inference -- 8.9 Continuous Testing -- 8.9.1 Medicine -- 8.9.2 Business -- 8.10 Bandit Algorithms -- 8.10.1 Web Testing -- 8.11 Appendix: ANOVA, the Factor Diagram, and the F‐Statistic -- 8.11.1 Decomposition: The Factor Diagram -- 8.11.2 Constructing the ANOVA Table -- 8.11.3 Inference Using the ANOVA Table -- 8.11.4 The F‐Distribution -- 8.11.5 Different Sized Groups -- 8.11.5.1 Resampling Method -- 8.11.5.2 Formula Method -- 8.11.6 Caveats and Assumptions -- 8.12 More than One Factor or Variable-From ANOVA to Statistical Models -- 8.13 Python: Contingency Tables and Chi‐square Test -- 8.13.1 Example: Marriage Therapy -- 8.13.2 Example: Imanishi‐Kari Data -- 8.14 Python: ANOVA -- 8.14.1 Visual Comparison of Groups -- 8.14.2 ANOVA Using Resampling Test -- 8.14.3 ANOVA Using the F‐Statistic -- Exercises -- Chapter 9 Correlation -- 9.1 Example: Delta Wire -- 9.2 Example: Cotton Dust and Lung Disease -- 9.3 The Vector Product Sum Test -- 9.3.1 Example: Baseball Payroll -- 9.3.1.1 Resampling Procedure -- 9.4 Correlation Coefficient -- 9.4.1 Inference for the Correlation Coefficient-Resampling -- 9.4.1.1 Hypothesis Test-Resampling -- 9.4.1.2 Example: Baseball Again -- 9.4.1.3 Inference for the Correlation Coefficient: Formulas -- 9.5 Correlation is not Causation -- 9.5.1 A Lurking External Cause -- 9.5.2 Coincidence -- 9.6 Other Forms of Association -- 9.7 Python: Correlation -- 9.7.1 Vector Operations -- 9.7.2 Resampling Test for Vector Product Sums -- 9.7.3 Calculating Correlation Coefficient -- 9.7.4 Calculate Correlation with numpy, pandas -- 9.7.5 Hypothesis Tests for Correlation -- 9.7.6 Using the t Statistic.
9.7.7 Visualizing Correlation.
Record Nr. UNINA-9910880800503321
Bruce Peter C  
Newark : , : John Wiley & Sons, Incorporated, , 2024
Materiale a stampa
Lo trovi qui: Univ. Federico II
Opac: Controlla la disponibilità qui