Topic: Utilizing OCR (Optical Character Recognision) with a PDA
Overview: Although OCR technology is developed for the Desktop computer using flat surfaces (2D) images, with PDA it presents not only the challenge of taking the image but also processing it. This paper will describes the methods to optimize the image as well as the techniques to decode the characters using neural net work processing.
Preliminary OCR/PRT analysis for PDA use
Overview
What are we doing and how we going to do it? The balance of this document demonstrates our capabilities and expertise in product development.
The process of automatic date code validation involves several discrete steps: image capture; image enhancement; date stamp localization; de-skewing; character recognition; post-recognition corrections and database entries.
The major processing events occur with image enhancement, date stamp localization and de-skewing. The algorithms for these processes are well known, however, creating an environment for this to occur in near real-time will require mathematical techniques such as integer arithmetic and scaling.
The preferred image is a black and white image allowing for reduced load on memory and computational processing. This method will increase the noise per image which will require additional filtering to remove single dots as well as an edge filter.
Algorithms will be in place to best handle de-skewing. Whether before or after localization of the date stamp will be determined after sufficient testing has been completed to evaluate processing times.
The OCR problem offers some advantages. The number of fonts and characters will be limited and there we may be able to tailor them to improve recognition. We have determined using a hybrid neural net approach is best for character recognition.
The very narrow context for the data should substantially improve recognition. We will be dealing with very simple linguistic rules.
Assumptions
- Low speed PDA (206Mhz CPU)
- Limited CPU support for floating point functions
- Run time Memory limitation (presently <64MB)
- 800x600 pixels resolution
- No optical processing
- No analog processing
- Initial recognition set limited to 5 fonts, 20 character strings (no G, K, Q, W, X, Z) plus 10 digits
- Extended set for another 5 fonts
Objective
To develop optimized algorithms for recognizing a batch date on a Kraft product. The process will localize the date code and then decode the date text. The initial objective is to scan a box from a distance of one foot. The process will use a local database to improve recognition and speed the process.
Abstract
The PRT program will be launched like any Pocket PC program. The program will be menu driven to support queries, reports and process a scanned image. A menu selection will trigger the camera to run. There will also be a programmed "hot button" on the Pocket PC. This will take a snap shot and feed it to the PRT routine. The PRT will display status and progress. Upon completion the program will return one of three results:
Success: Get the record from the local database.
Failure: Will display manual entry windows for the operator.
Other: Will display all possibilities (probably up to ten), so the operator can select one from a pull down screen.
There will be a flight recorder to support the program. Each failure will be registered with all parameters. This will permit Kraft or ABT to analyze what has caused the failure and tune-up or fix the algorithm.
Approach
- The algorithms will be implemented using integer and/or modulo 2 arithmetic. This approach will reduce processing time because the Pocket PC does not have a floating-point processor.
- Initial development will use Java. Java was selected because it will simulate a slow machine. The final code will be written in C++ with selected functions written in machine language (Assembler). The machine language code will reduce calculation-processing time for critical operations.
- We will be using DSP (Digital Signal Processing) techniques that will be ported from the DSP chip. The use of DSP techniques will permit us to use such well-established techniques as: FFT (Fast Fourier Transform), convolution and low/high/band pass filtering. This will be useful in the pre-recognition phase (image processing). The DSP routines will be written in C++ and Assembler.
- We will be using the hybrid neural net (HNN) technique to find and recognize individual characters. For the NN part, we will be using three layers (Input layer, Hidden Layer and Output layer). They will have three weight matrices (one from input values to input layer; one from input layer to hidden layer and one from hidden layer to the output layer). Number of inputs will be determined by:
- Character structure (if the structure is 16x12 then we have 192 elements/inputs)
- Normalization
- Pocket PC processing speed
Number of neurons in hidden layer will be determined by:
-
- Number of outputs will be 30 characters for now (web capabilities demo will use 20 characters).
- Transfer function will be changed to accommodate the no-floating point approach.
- For training purposes, we will be using the back propagation algorithm, which will run on a high-speed machine. The Pocket PC will then update the weights any time a new learning process runs. We'll also update (if we need) the number of neurons in the hidden layers.
Enhanced
- Input Special Confidence
- Image Features Values
Preprocessing
- Smoothing (FFT)
- Normalization (DDA, Deskewing, Size, delta correlator)
Neural Network
- Training*
- Feature Extraction
- Identification
Postprocessing
- Thresholding
- Decision Making
- Output Identification
Fig. 1.1 Schematic diagram of HNN-based pattern recognition system.
*Training will run on high-speed host not on handheld device. It will only be run when updating algorithm or during fine tuning of algorithm
- We will develop an algorithm or a weight array to further improve the recognition process. For example, the weight of the character location (e.g. a "D" in location 0 will have higher weight than "P" in location 0. "D" in location 1 will have lower weight than "P" in location 1 because D can be used for DECEMBER and P can be used for SEPTEMBER).
- We will have a virtual associate array to compare each recognition at each step. This array will contain all possible dates for recognition with all possible formats, e.g. if we go back five (5) years and there are one hundred formats then this array will be in the length of (assuming date length is 20 chars):
5 years x 365 dates/year x 20 characters/date x 100 = 3,650,000 - We call this array virtual, since we can save it in a dictionary format and consume less than 10% of the above space taken. However during use it will look as a real array to the processor.
- We have not determined yet if we are to normalize the input data or to normalize the test set to match the input data. Normalization will help us dealing with different size fonts without increasing the font sets.
To further increase performance we will investigate using Digital Differential Analyzers (DDA's) and digital correlators to implement some of the image processing.
This document was written and prepared by:
Charles Bibas, founder and CEOof Advanced Barcode Technology, was educated at the Technion in Haifa, Israel, which awarded him a Bachelor of Science in Computer Engineering (BSCE).
Stephen A. Bauman, vice president of engineering for Advanced Barcode Technology, leads the company's integrated programming activities. Mr. Bauman received both BSEE and MSEE degrees from MIT, where he later taught computer engineering. His extensive programming and engineering background includes the software design of the Apollo navigation computer and development of the PageNet Network for delivering paging information over the Internet.
Imyoul Kim, director of software development for Advanced Barcode Technology heads the company's development and deployment activities. Ms. Kim received her BS in Mathematics from Cheju National University in South Korea and her MSCS from Queen's College in New York. Ms. Kim joined the company in 2000 and has rapidly demonstrated her prowess in tackling unique programming issues as well as her ability to effectively interact with clients on all critical issues.