LEADER 05530nam 2200673 450 001 9910790909603321 005 20230803221341.0 010 $a90-272-7013-9 035 $a(CKB)2550000001280557 035 $a(EBL)1682185 035 $a(SSID)ssj0001194238 035 $a(PQKBManifestationID)11627557 035 $a(PQKBTitleCode)TC0001194238 035 $a(PQKBWorkID)11150621 035 $a(PQKB)11629857 035 $a(MiAaPQ)EBC1682185 035 $a(Au-PeEL)EBL1682185 035 $a(CaPaEBR)ebr10866742 035 $a(CaONFJC)MIL601841 035 $a(OCoLC)878920007 035 $a(EXLCZ)992550000001280557 100 $a20140516h20142014 uy 0 101 0 $aeng 135 $aur|n|---||||| 181 $ctxt 182 $cc 183 $acr 200 10$aLiterary detective work on the computer /$fMichael P. Oakes 210 1$aAmsterdam (Netherlands) ;$aPhiladelphia, Pennsylvania :$cJohn Benjamins Publishing Company,$d2014. 210 4$dİ2014 215 $a1 online resource (293 p.) 225 1 $aNatural Language Processing ;$vVolume 12 300 $aDescription based upon print version of record. 311 $a90-272-4999-7 311 $a1-306-70590-8 320 $aIncludes bibliographical references and index. 327 $aLiterary Detective Work on the Computer; Editorial page; Title page; LCC data; Table of contents; Preface; 1. Author identification; 1. Introduction; 2. Feature selection; 2.1 Evaluation of feature sets for authorship attribution; 3. Inter-textual distances; 3.1 Manhattan distance and Euclidean distance; 3.2 Labbe? and Labbe?'s measure; 3.3 Chi-squared distance; 3.4 The cosine similarity measure; 3.6 Burrows' Delta; 3.5 Kullback-Leibler Divergence (KLD); 3.7 Evaluation of feature-based measures for inter-textual distance; 3.8 Inter-textual distance by semantic similarity 327 $a3.9 Stemmatology as a measure of inter-textual distance4. Clustering techniques; 4.1 Introduction to factor analysis; 4.2 Matrix algebra; 4.3 Use of matrix algebra for PCA; 4.4 PCA case studies; 4.5 Correspondence analysis; 5. Comparisons of classifiers; 6. Other tasks related to authorship; 6.1 Stylochronometry; 6.2 Affect dictionaries and psychological profiling; 6.3 Evaluation of author profiling; 7. Conclusion; 2. Plagiarism and spam filtering; 1. Introduction; 2. Plagiarism detection software; 2.1 Collusion and plagiarism, external and intrinsic 327 $a2.2 Preprocessing of corpora and feature extraction2.3 Sequence comparison and exact match; 2.4 Source-suspicious document similarity measures; 2.5 Fingerprinting; 2.6 Language models; 2.7 Natural Language Processing; 2.8 Intrinsic plagiarism detection; 2.9 Plagiarism of program code; 2.10 Distance between translated and original text; 2.11 Direction of plagiarism; 2.12 The search engine-based approach used at PAN-13; 2.13 Case study 1: Hidden influences from printed sources in the Gaelic tales; 2.14 Case study 2: General George Pickett and related writings; 2.15 Evaluation methods 327 $a2.16 Conclusion3. Spam filters; 3.1 Content-based techniques; 3.2 Building a labelled corpus for training; 3.3 Exact matching techniques; 3.4 Rule-based methods; 3.5 Machine learning; 3.5.1 Nai?ve Bayes; 3.5.2 Logistic regression; 3.5.3 Boosting; 3.6 Unsupervised machine learning approaches; 3.7 Other spam-filtering problems; 3.8 Evaluation of spam filters; 3.9 Non-linguistic techniques; 3.9.1 Safelists; 3.9.2 Human challenges; 3.9.3 Reputation analysis; 3.9.4 Networking considerations; 3.9.5 Web harvesting; 3.9.6 Payment and legislation; 3.10 Conclusion; 4. Recommendations for further reading 327 $a3. Computer studies of Shakespearean authorship1. Introduction; 2. Shakespeare, Wilkins and Pericles; 2.1 Correspondence analysis for ""Pericles"" and related texts; 3. Shakespeare, Fletcher and The Two Noble Kinsmen; 4. King John; 5. The Raigne of King Edward III; 5.1 Neural networks in stylometry; 5.2 Cusum charts in stylometry; 5.3 Burrows' Zeta and Iota; 6. Hand D in "Sir Thomas More"; 6.1 Elliott, Valenza and the Earl of Oxford; 6.2 Elliott and Valenza: Hand D; 6.3 Bayesian approach to questions of Shakespearian authorship; 6.4 Bayesian analysis of Shakespeare's second-person pronouns 327 $a6.5 Vocabulary differences, LDA and the authorship of Hand D 330 $aComputational linguistics can be used to uncover mysteries in text which are not always obvious to visual inspection. For example, the computer analysis of writing style can show who might be the true author of a text in cases of disputed authorship or suspected plagiarism. The theoretical background to authorship attribution is presented in a step by step manner, and comprehensive reviews of the field are given in two specialist areas, the writings of William Shakespeare and his contemporaries, and the various writing styles seen in religious texts. The final chapter looks at the progress com 410 0$aNatural language processing ;$vVolume 12. 606 $aComputational linguistics$xResearch 606 $aImitation in literature 606 $aPlagiarism 615 0$aComputational linguistics$xResearch. 615 0$aImitation in literature. 615 0$aPlagiarism. 676 $a410.285 700 $aOakes$b Michael P.$0495462 801 0$bMiAaPQ 801 1$bMiAaPQ 801 2$bMiAaPQ 906 $aBOOK 912 $a9910790909603321 996 $aLiterary detective work on the computer$93748728 997 $aUNINA