LEADER 05791nam 22006135 450 001 9910552748103321 005 20250324142835.0 010 $a9784431569220 010 $a4431569227 024 7 $a10.1007/978-4-431-56922-0 035 $a(MiAaPQ)EBC6925877 035 $a(Au-PeEL)EBL6925877 035 $a(CKB)21399639400041 035 $a(PPN)26152464X 035 $a(DE-He213)978-4-431-56922-0 035 $a(EXLCZ)9921399639400041 100 $a20220314d2022 u| 0 101 0 $aeng 135 $aurcnu|||||||| 181 $ctxt$2rdacontent 182 $cc$2rdamedia 183 $acr$2rdacarrier 200 10$aMinimum Divergence Methods in Statistical Machine Learning $eFrom an Information Geometric Viewpoint /$fby Shinto Eguchi, Osamu Komori 205 $a1st ed. 2022. 210 1$aTokyo :$cSpringer Japan :$cImprint: Springer,$d2022. 215 $a1 online resource (224 pages) 300 $aIncludes index. 311 08$aPrint version: Eguchi, Shinto Minimum Divergence Methods in Statistical Machine Learning Tokyo : Springer Japan,c2022 9784431569206 327 $aInformation geometry -- Information divergence -- Maximum entropy model -- Minimum divergence method -- Unsupervised learning algorithms -- Regression model -- Classification. . 330 $aThis book explores minimum divergence methods of statistical machine learning for estimation, regression, prediction, and so forth, in which we engage in information geometry to elucidate their intrinsic properties of the corresponding loss functions, learning algorithms, and statistical models. One of the most elementary examples is Gauss's least squares estimator in a linear regression model, in which the estimator is given by minimization of the sum of squares between a response vector and a vector of the linear subspace hulled by explanatory vectors. This is extended to Fisher's maximum likelihood estimator (MLE) for an exponential model, in which the estimator is provided by minimization of the Kullback-Leibler (KL) divergence between a data distribution and a parametric distribution of the exponential model in an empirical analogue. Thus, we envisage a geometric interpretation of such minimization procedures such that a right triangle iskept with Pythagorean identity in the sense of the KL divergence. This understanding sublimates a dualistic interplay between a statistical estimation and model, which requires dual geodesic paths, called m-geodesic and e-geodesic paths, in a framework of information geometry. We extend such a dualistic structure of the MLE and exponential model to that of the minimum divergence estimator and the maximum entropy model, which is applied to robust statistics, maximum entropy, density estimation, principal component analysis, independent component analysis, regression analysis, manifold learning, boosting algorithm, clustering, dynamic treatment regimes, and so forth. We consider a variety of information divergence measures typically including KL divergence to express departure from one probability distribution to another. An information divergence is decomposed into the cross-entropy and the (diagonal) entropy in which the entropy associates with a generative modelas a family of maximum entropy distributions; the cross entropy associates with a statistical estimation method via minimization of the empirical analogue based on given data. Thus any statistical divergence includes an intrinsic object between the generative model and the estimation method. Typically, KL divergence leads to the exponential model and the maximum likelihood estimation. It is shown that any information divergence leads to a Riemannian metric and a pair of the linear connections in the framework of information geometry. We focus on a class of information divergence generated by an increasing and convex function U, called U-divergence. It is shown that any generator function U generates the U-entropy and U-divergence, in which there is a dualistic structure between the U-divergence method and the maximum U-entropy model. We observe that a specific choice of U leadsto a robust statistical procedure via the minimum U-divergence method. If U is selected as an exponential function, then the corresponding U-entropy and U-divergence are reduced to the Boltzmann-Shanon entropy and the KL divergence; the minimum U-divergence estimator is equivalent to the MLE. For robust supervised learning to predict a class label we observe that the U-boosting algorithm performs well for contamination of mislabel examples if U is appropriately selected. We present such maximal U-entropy and minimum U-divergence methods, in particular, selecting a power function as U to provide flexible performance in statistical machine learning. . 606 $aStatistics 606 $aStatistics 606 $aComputer science$xMathematics 606 $aMathematical statistics 606 $aStatistics in Engineering, Physics, Computer Science, Chemistry and Earth Sciences 606 $aStatistical Theory and Methods 606 $aProbability and Statistics in Computer Science 615 0$aStatistics. 615 0$aStatistics. 615 0$aComputer science$xMathematics. 615 0$aMathematical statistics. 615 14$aStatistics in Engineering, Physics, Computer Science, Chemistry and Earth Sciences. 615 24$aStatistical Theory and Methods. 615 24$aProbability and Statistics in Computer Science. 676 $a006.31 700 $aEguchi$b Shinto$0781822 702 $aKomori$b Osamu 801 0$bMiAaPQ 801 1$bMiAaPQ 801 2$bMiAaPQ 906 $aBOOK 912 $a9910552748103321 996 $aMinimum divergence methods in statistical machine learning$92961028 997 $aUNINA