LEADER 04232nam 22005175 450 001 9910299296903321 005 20200710083146.0 010 $a981-13-1199-4 024 7 $a10.1007/978-981-13-1199-4 035 $a(CKB)4100000006674885 035 $a(DE-He213)978-981-13-1199-4 035 $a(MiAaPQ)EBC6311643 035 $a(PPN)230535631 035 $a(EXLCZ)994100000006674885 100 $a20180922d2018 u| 0 101 0 $aeng 135 $aurnn|008mamaa 181 $ctxt$2rdacontent 182 $cc$2rdamedia 183 $acr$2rdacarrier 200 10$aLaboratory Experiments in Information Retrieval $eSample Sizes, Effect Sizes, and Statistical Power /$fby Tetsuya Sakai 205 $a1st ed. 2018. 210 1$aSingapore :$cSpringer Singapore :$cImprint: Springer,$d2018. 215 $a1 online resource (IX, 150 p. 53 illus., 43 illus. in color.) 225 1 $aThe Information Retrieval Series,$x1871-7500 ;$v40 311 $a981-13-1198-6 327 $a1 Preliminaries -- 2 t-tests -- 3 Analysis of Variance -- 4 Multiple Comparison Procedures -- 5 The Correct Ways to Use Significance Tests -- 6 Topic Set Size Design Using Excel -- 7 Power Analysis Using R -- 8 Conclusions. 330 $aCovering aspects from principles and limitations of statistical significance tests to topic set size design and power analysis, this book guides readers to statistically well-designed experiments. Although classical statistical significance tests are to some extent useful in information retrieval (IR) evaluation, they can harm research unless they are used appropriately with the right sample sizes and statistical power and unless the test results are reported properly. The first half of the book is mainly targeted at undergraduate students, and the second half is suitable for graduate students and researchers who regularly conduct laboratory experiments in IR, natural language processing, recommendations, and related fields. Chapters 1?5 review parametric significance tests for comparing system means, namely, t-tests and ANOVAs, and show how easily they can be conducted using Microsoft Excel or R. These chapters also discuss a few multiple comparison procedures for researchers who are interested in comparing every system pair, including a randomised version of Tukey's Honestly Significant Difference test. The chapters then deal with known limitations of classical significance testing and provide practical guidelines for reporting research results regarding comparison of means. Chapters 6 and 7 discuss statistical power. Chapter 6 introduces topic set size design to enable test collection builders to determine an appropriate number of topics to create. Readers can easily use the author?s Excel tools for topic set size design based on the paired and two-sample t-tests, one-way ANOVA, and confidence intervals. Chapter 7 describes power-analysis-based methods for determining an appropriate sample size for a new experiment based on a similar experiment done in the past, detailing how to utilize the author?s R tools for power analysis and how to interpret the results. Case studies from IR for both Excel-based topic set size design and R-based power analysis are also provided. 410 0$aThe Information Retrieval Series,$x1871-7500 ;$v40 606 $aInformation storage and retrieval 606 $aStatistics  606 $aInformation Storage and Retrieval$3https://scigraph.springernature.com/ontologies/product-market-codes/I18032 606 $aStatistics for Engineering, Physics, Computer Science, Chemistry and Earth Sciences$3https://scigraph.springernature.com/ontologies/product-market-codes/S17020 615 0$aInformation storage and retrieval. 615 0$aStatistics . 615 14$aInformation Storage and Retrieval. 615 24$aStatistics for Engineering, Physics, Computer Science, Chemistry and Earth Sciences. 676 $a025.04 700 $aSakai$b Tetsuya$4aut$4http://id.loc.gov/vocabulary/relators/aut$0859931 801 0$bMiAaPQ 801 1$bMiAaPQ 801 2$bMiAaPQ 906 $aBOOK 912 $a9910299296903321 996 $aLaboratory Experiments in Information Retrieval$91918855 997 $aUNINA