02277oam 2200433zu 450 991014514860332120241212215544.097815090894681509089462(CKB)1000000000695725(SSID)ssj0000451170(PQKBManifestationID)12176423(PQKBTitleCode)TC0000451170(PQKBWorkID)10459110(PQKB)11240079(NjHacI)991000000000695725(EXLCZ)99100000000069572520160829d2007 uy engur|||||||||||txtccrMachine Learning and Applications; Proceedings: International Conference on Machine Learning and Applications (6th: 2007: Cincinnati, Ohio)[Place of publication not identified]IEEE Computer Society Press20071 online resourceBibliographic Level Mode of Issuance: Monograph9780769530697 0769530699 An optical character recognition (OCR) system with a high recognition rate is challenging to develop. One of the major contributors to OCR errors is smeared characters. Several factors lead to the smearing of characters such as bad scanning quality and a poor binarization technique. Typical approaches to character segmentation falls into three major categories: image-based, recognition-based, and holistic-based. Among these approaches, the segmentation path can be linear or non-linear. Our paper proposes a non-linear approach to segment characters on grayscale document images. Our method first determines whether characters are smeared together using general character features. The correct segmentation path is found using a shortest path approach. We achieved a segmentation accuracy of 95% over a set of about 2,000 smeared characters.Machine learningCongressesMachine learning006.31Wani M. Arif1006177PQKBPROCEEDING9910145148603321Machine Learning and Applications; Proceedings: International Conference on Machine Learning and Applications (6th: 2007: Cincinnati, Ohio)2314962UNINA