New Method to Correct Kanji OCR Errors

NARA Institute of Science and Technology (NAIST) hosted the 242nd Special Interest Group of Natural Language Processing in Information Processing Society of Japan (IPSJ-SIGNL 242) on October 25th and 26th.

With support from Genial Technology, Inc., Kotaro Sakamoto, in Yokohama National University, made a presentation of a research paper, A study on the edit distance taking kanji radicals into account contracts’ OCR kanji error correction and thereby suggested a unique method to correct Kanji OCR errors.

In the trend of process automation with RPA and OCR, kanji (Chinese character) errors have troubled people in East Asian countries. The languages require us to treat thousands of kanji, some of which resemble one another. These languages often cause OCR errors because it is difficult for computers to distinguish similar characters such as 主, 柱, 注, 住, and 往.

The proposed method uses kanji radicals to calculate Kanji Damerau-Levenshtein Distance, a specified edit distance that treats different kanji with a similar appearance in a more accurate way.

Genial Technology, Inc. believes Kanji Damerau-Levenshtein Distance benefits the process automation trend in East Asian countries by correcting kanji OCR errors more precisely than the existing edit distances.

Kotaro Sakamoto
Jul 2019 – Present: Outsourced Employee, Genial Technology, Inc.
Oct 2016 – Present: Part-time Teacher, Tokyo Metropolitan College of Industrial Technology
May 2014 – Apr 2019: Research Assistant, National Institute of Informatics
Aug 2015 – Aug 2016: Visiting Research Scholar, LTI, Carnegie Mellon University
Apr 2011 – Oct 2015, Oct 2016 – Present: Graduate School of Environment and Information Sciences, Yokohama National University