1.

Record Nr.

UNINA9910522999103321

Autore

Montesinos López Osval Antonio

Titolo

Multivariate Statistical Machine Learning Methods for Genomic Prediction

Pubbl/distr/stampa

Cham, : Springer Nature, 2022

Cham : , : Springer International Publishing AG, , 2022

©2022

ISBN

3-030-89010-4

Edizione

[1st ed.]

Descrizione fisica

1 online resource (707 pages)

Classificazione

MED090000SCI011000SCI070000SCI086000TEC003000

Altri autori (Persone)

Montesinos LópezAbelardo

CrossaJosé

Soggetti

Agricultural science

Life sciences: general issues

Botany & plant sciences

Animal reproduction

Probability & statistics

Aprenentatge automàtic

Genètica vegetal

Estadística matemàtica

Anàlisi multivariable

Processament de dades

Llibres electrònics

Lingua di pubblicazione

Inglese

Formato

Materiale a stampa

Livello bibliografico

Monografia

Nota di contenuto

Intro -- Foreword -- Preface -- Acknowledgments -- Contents -- Chapter 1: General Elements of Genomic Selection and Statistical Learning -- 1.1 Data as a Powerful Weapon -- 1.2 Genomic Selection -- 1.2.1 Concepts of Genomic Selection -- 1.2.2 Why Is Statistical Machine Learning a Key Element of Genomic Selection? -- 1.3 Modeling Basics -- 1.3.1 What Is a Statistical Machine Learning Model? -- 1.3.2 The Two Cultures of Model Building: Prediction Versus Inference -- 1.3.3 Types of Statistical Machine Learning Models and Model Effects -- 1.3.3.1 Types of Statistical Machine Learning Models -- 1.3.3.2



Model Effects -- 1.4 Matrix Algebra Review -- 1.5 Statistical Data Types -- 1.5.1 Data Types -- 1.5.2 Multivariate Data Types -- 1.6 Types of Learning -- 1.6.1 Definition and Examples of Supervised Learning -- 1.6.2 Definitions and Examples of Unsupervised Learning -- 1.6.3 Definition and Examples of Semi-Supervised Learning -- References -- Chapter 2: Preprocessing Tools for Data Preparation -- 2.1 Fixed or Random Effects -- 2.2 BLUEs and BLUPs -- 2.3 Marker Depuration -- 2.4 Methods to Compute the Genomic Relationship Matrix -- 2.5 Genomic Breeding Values and Their Estimation -- 2.6 Normalization Methods -- 2.7 General Suggestions for Removing or Adding Inputs -- 2.8 Principal Component Analysis as a Compression Method -- Appendix 1 -- Appendix 2 -- References -- Chapter 3: Elements for Building Supervised Statistical Machine Learning Models -- 3.1 Definition of a Linear Multiple Regression Model -- 3.2 Fitting a Linear Multiple Regression Model via the Ordinary Least Square (OLS) Method -- 3.3 Fitting the Linear Multiple Regression Model via the Maximum Likelihood (ML) Method -- 3.4 Fitting the Linear Multiple Regression Model via the Gradient Descent (GD) Method -- 3.5 Advantages and Disadvantages of Standard Linear Regression Models (OLS and MLR).

3.6 Regularized Linear Multiple Regression Model -- 3.6.1 Ridge Regression -- 3.6.2 Lasso Regression -- 3.7 Logistic Regression -- 3.7.1 Logistic Ridge Regression -- 3.7.2 Lasso Logistic Regression -- Appendix 1: R Code for Ridge Regression Used in Example 2 -- References -- Chapter 4: Overfitting, Model Tuning, and Evaluation of Prediction Performance -- 4.1 The Problem of Overfitting and Underfitting -- 4.2 The Trade-Off Between Prediction Accuracy and Model Interpretability -- 4.3 Cross-validation -- 4.3.1 The Single Hold-Out Set Approach -- 4.3.2 The k-Fold Cross-validation -- 4.3.3 The Leave-One-Out Cross-validation -- 4.3.4 The Leave-m-Out Cross-validation -- 4.3.5 Random Cross-validation -- 4.3.6 The Leave-One-Group-Out Cross-validation -- 4.3.7 Bootstrap Cross-validation -- 4.3.8 Incomplete Block Cross-validation -- 4.3.9 Random Cross-validation with Blocks -- 4.3.10 Other Options and General Comments on Cross-validation -- 4.4 Model Tuning -- 4.4.1 Why Is Model Tuning Important? -- 4.4.2 Methods for Hyperparameter Tuning (Grid Search, Random Search, etc.) -- 4.5 Metrics for the Evaluation of Prediction Performance -- 4.5.1 Quantitative Measures of Prediction Performance -- 4.5.2 Binary and Ordinal Measures of Prediction Performance -- 4.5.3 Count Measures of Prediction Performance -- References -- Chapter 5: Linear Mixed Models -- 5.1 General of Linear Mixed Models -- 5.2 Estimation of the Linear Mixed Model -- 5.2.1 Maximum Likelihood Estimation -- 5.2.1.1 EM Algorithm -- E Step -- M Step -- 5.2.1.2 REML -- 5.2.1.3 BLUPs -- 5.3 Linear Mixed Models in Genomic Prediction -- 5.4 Illustrative Examples of the Univariate LMM -- 5.5 Multi-trait Genomic Linear Mixed-Effects Models -- 5.6 Final Comments -- Appendix 1 -- Appendix 2 -- Appendix 3 -- Appendix 4 -- Appendix 5 -- Appendix 6 -- Appendix 7 -- References.

Chapter 6: Bayesian Genomic Linear Regression -- 6.1 Bayes Theorem and Bayesian Linear Regression -- 6.2 Bayesian Genome-Based Ridge Regression -- 6.3 Bayesian GBLUP Genomic Model -- 6.4 Genomic-Enabled Prediction BayesA Model -- 6.5 Genomic-Enabled Prediction BayesB and BayesC Models -- 6.6 Genomic-Enabled Prediction Bayesian Lasso Model -- 6.7 Extended Predictor in Bayesian Genomic Regression Models -- 6.8 Bayesian Genomic Multi-trait Linear Regression Model -- 6.8.1 Genomic Multi-trait Linear Model -- 6.9 Bayesian Genomic Multi-trait and Multi-environment Model (BMTME) -- Appendix 1 -- Appendix 2: Setting Hyperparameters for the Prior Distributions of the



BRR Model -- Appendix 3: R Code Example 1 -- Appendix 4: R Code Example 2 -- Appendix 5 -- R Code Example 3 -- R Code for Example 4 -- References -- Chapter 7: Bayesian and Classical Prediction Models for Categorical and Count Data -- 7.1 Introduction -- 7.2 Bayesian Ordinal Regression Model -- 7.2.1 Illustrative Examples -- 7.3 Ordinal Logistic Regression -- 7.4 Penalized Multinomial Logistic Regression -- 7.4.1 Illustrative Examples for Multinomial Penalized Logistic Regression -- 7.5 Penalized Poisson Regression -- 7.6 Final Comments -- Appendix 1 -- Appendix 2 -- Appendix 3 -- Appendix 4 (Example 4) -- Appendix 5 -- Appendix 6 -- References -- Chapter 8: Reproducing Kernel Hilbert Spaces Regression and Classification Methods -- 8.1 The Reproducing Kernel Hilbert Spaces (RKHS) -- 8.2 Generalized Kernel Model -- 8.2.1 Parameter Estimation Under the Frequentist Paradigm -- 8.2.2 Kernels -- 8.2.3 Kernel Trick -- 8.2.4 Popular Kernel Functions -- 8.2.5 A Two Separate Step Process for Building Kernel Machines -- 8.3 Kernel Methods for Gaussian Response Variables -- 8.4 Kernel Methods for Binary Response Variables -- 8.5 Kernel Methods for Categorical Response Variables.

8.6 The Linear Mixed Model with Kernels -- 8.7 Hyperparameter Tuning for Building the Kernels -- 8.8 Bayesian Kernel Methods -- 8.8.1 Extended Predictor Under the Bayesian Kernel BLUP -- 8.8.2 Extended Predictor Under the Bayesian Kernel BLUP with a Binary Response Variable -- 8.8.3 Extended Predictor Under the Bayesian Kernel BLUP with a Categorical Response Variable -- 8.9 Multi-trait Bayesian Kernel -- 8.10 Kernel Compression Methods -- 8.10.1 Extended Predictor Under the Approximate Kernel Method -- 8.11 Final Comments -- Appendix 1 -- Appendix 2 -- Appendix 3 -- Appendix 4 -- Appendix 5 -- Appendix 6 -- Appendix 7 -- Appendix 8 -- Appendix 9 -- Appendix 10 -- Appendix 11 -- References -- Chapter 9: Support Vector Machines and Support Vector Regression -- 9.1 Introduction to Support Vector Machine -- 9.2 Hyperplane -- 9.3 Maximum Margin Classifier -- 9.3.1 Derivation of the Maximum Margin Classifier -- 9.3.2 Wolfe Dual -- 9.4 Derivation of the Support Vector Classifier -- 9.5 Support Vector Machine -- 9.5.1 One-Versus-One Classification -- 9.5.2 One-Versus-All Classification -- 9.6 Support Vector Regression -- Appendix 1 -- Appendix 2 -- Appendix 3 -- References -- Chapter 10: Fundamentals of Artificial Neural Networks and Deep Learning -- 10.1 The Inspiration for the Neural Network Model -- 10.2 The Building Blocks of Artificial Neural Networks -- 10.3 Activation Functions -- 10.3.1 Linear -- 10.3.2 Rectifier Linear Unit (ReLU) -- 10.3.3 Leaky ReLU -- 10.3.4 Sigmoid -- 10.3.5 Softmax -- 10.3.6 Tanh -- 10.4 The Universal Approximation Theorem -- 10.5 Artificial Neural Network Topologies -- 10.6 Successful Applications of ANN and DL -- 10.7 Loss Functions -- 10.7.1 Loss Functions for Continuous Outcomes -- 10.7.2 Loss Functions for Binary and Ordinal Outcomes -- 10.7.3 Regularized Loss Functions -- 10.7.4 Early Stopping Method of Training.

10.8 The King Algorithm for Training Artificial Neural Networks: Backpropagation -- 10.8.1 Backpropagation Algorithm: Online Version -- 10.8.1.1 Feedforward Part -- 10.8.1.2 Backpropagation Part -- 10.8.2 Illustrative Example 10.1: A Hand Computation -- 10.8.3 Illustrative Example 10.2-By Hand Computation -- References -- Chapter 11: Artificial Neural Networks and Deep Learning for Genomic Prediction of Continuous Outcomes -- 11.1 Hyperparameters to Be Tuned in ANN and DL -- 11.1.1 Network Topology -- 11.1.2 Activation Functions -- 11.1.3 Loss Function -- 11.1.4 Number of Hidden Layers -- 11.1.5 Number of Neurons in Each Layer -- 11.1.6 Regularization Type -- 11.1.7 Learning Rate -- 11.1.8 Number of Epochs and Number



of Batches -- 11.1.9 Normalization Scheme for Input Data -- 11.2 Popular DL Frameworks -- 11.3 Optimizers -- 11.4 Illustrative Examples -- Appendix 1 -- Appendix 2 -- Appendix 3 -- Appendix 4 -- Appendix 5 -- References -- Chapter 12: Artificial Neural Networks and Deep Learning for Genomic Prediction of Binary, Ordinal, and Mixed Outcomes -- 12.1 Training DNN with Binary Outcomes -- 12.2 Training DNN with Categorical (Ordinal) Outcomes -- 12.3 Training DNN with Count Outcomes -- 12.4 Training DNN with Multivariate Outcomes -- 12.4.1 DNN with Multivariate Continuous Outcomes -- 12.4.2 DNN with Multivariate Binary Outcomes -- 12.4.3 DNN with Multivariate Ordinal Outcomes -- 12.4.4 DNN with Multivariate Count Outcomes -- 12.4.5 DNN with Multivariate Mixed Outcomes -- Appendix 1 -- Appendix 2 -- Appendix 3 -- Appendix 4 -- Appendix 5 -- References -- Chapter 13: Convolutional Neural Networks -- 13.1 The Importance of Convolutional Neural Networks -- 13.2 Tensors -- 13.3 Convolution -- 13.4 Pooling -- 13.5 Convolutional Operation for 1D Tensor for Sequence Data -- 13.6 Motivation of CNN.

13.7 Why Are CNNs Preferred over Feedforward Deep Neural Networks for Processing Images?.

Sommario/riassunto

This book is open access under a CC BY 4.0 license This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension. The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool.