|
sample from the initial data""; ""Using CHAID stumps when interviewing an SME""; ""Using a single cluster K-means as an alternative to anomaly detection""; ""Using an @NULL multiple Derive to explore missing data""; ""Creating an outlier report to give to SMEs"" |
""Detecting potential model instability early using the Partition node and Feature Selection""""Chapter 2: Data Preparation � Select""; ""Introduction""; ""Using the Feature Selection node creatively to remove, or decapitate, perfect predictors""; ""Running a Statistics node on anti-join to evaluate potential missing data""; ""Evaluating the use of sampling for speed""; ""Removing redundant variables using correlation matrices""; ""Selecting variable using the CHAID modeling node""; ""Selecting variables using the Means node"" |
""Selecting variables using single-antecedent association rules""""Chapter 3: Data Preparation � Clean""; ""Introduction""; ""Binning scale variables to address missing data""; ""Using a full data model/partial data model approach to address missing data""; ""Imputing in-stream mean or median""; ""Imputing missing values randomly from uniform or normal distributions""; ""Using random imputation to match a variable's distribution""; ""Searching for similar records using a neural network for inexact matching""; ""Using neuro-fuzzy searching to find similar names"" |
""Producing longer Soundex codes""""Chapter 4: Data Preparation � Construct""; ""Introduction""; ""Building transformations with multiple Derive nodes""; ""Calculating and comparing conversion rates""; ""Grouping categorical values""; ""Transforming high skew and kurtosis variables with a multiple Derive node""; ""Creating flag variables for aggregation""; ""Using Association Rules for interaction detection/feature creation""; ""Creating time-aligned cohorts""; ""Chapter 5: Data Preparation � Integrate and Format""; ""Introduction"" |
""Speeding up merge with caching and optimization settings""""Merging a look-up table""; ""Shuffle-down (nonstandard aggregation)""; ""Cartesian product merge using key-less merge by key""; ""Multiplying out using Cartesian product merge, user source, and derive dummy""; ""Changing large numbers of variable names without scripting""; ""Parsing nonstandard dates""; ""Parsing and performing a conversion on a complex stream""; ""Sequence processing""; ""Chapter 6: Selecting and Building a Model""; ""Introduction""; ""Evaluating balancing with the Auto Classifier"" |
""Building models with and without outliers"" |