PDF ResearchScanning in a Production Imaging Environment |
||
|
Written and Produced by The Rheinner Group
The Document Capture Process1. Document Preparation: The first step is document preparation for scanning. There are two basic kinds of preparation:
2. Scanning: A scanner throughput estimate can be achieved by testing the scanner with the types of documents normally scanned. A number of things can be done to speed up scanning. Since scanners work faster in landscape format than they do in the normal lengthwise or portrait mode, a common technique is to feed documents in landscape and use the scanning software to batch rotate them back to their normal orientation after they have been scanned. For high-volume systems, a twin backup should be in operation in order to avoid the delays caused by jams or problem documents on the main scanning source. When not in use for jams or paper problems, the twin can be used to perform rescans requested by quality control so that the normal production scanning process will continue uninterrupted. 2a. Quality Control: Quality control is the inspection of the document to make sure that the image controls set for the scanner have produced an image of acceptable quality. The quality control person will be able to make some adjustments to the image, for example deskewing documents that are slightly out of skew or the rotation of upside down documents. However, quality control operators cannot correct a stretched image, low contrast, or incorrect resolution. These documents must be rejected and rescanned with adjusted scanner controls. 3. Indexing: Indexing, the most critical and the time-consuming step in the capture process, identifies the document and its contents to the image management system. An index is akin to the index of books found in a library and serves a similar purpose. Someone looking for a book first goes to the index to look up the book on the basis of author, publisher, date of publication or subject. The index card then reveals where the book is kept in the library shelves. The process for indexing images is similar, since the images themselves are kept as files on very high capacity storage devices and therefore can not be easily found without an index. A document image index typically has two kinds of indexes:
3a. Data Extraction: Data can be extracted from images either automatically or manually. The manual process involves a human operator who reads the image and fills out an attached set of database fields. This is how most indexes are created. Other information, irrelevant for the index may also be required to be extracted at this time. Information such as amount paid, customer address, or bank account numbers may be extracted from the image and placed into a database other than the index database. Depending on the type of document it may make the most sense to automatically extract the information. Forms or other highly-structured documents are well suited to automatic recognition, leaving the operator free to process only those images not readable by the OCR engine. The OCR engine may also be instructed to look in certain areas of the form in order to find, read and transfer the information into a corresponding index field. 4. Information Release: Information release is the last step of the capture process. Once the document has been prepared, delivered to the scanner, scanned, extracted, indexed, and through quality control, it is ready to be released to the image system. Depending on the application, there may be no further use for the image once the data has been extracted, in which case it will simply be deleted. The more likely scenario is that the image will need to be available for others in the business process or will need to be archived at this point. In either case it can now be made available to the rest of the system. The net effective throughput of the image system can be determined from the length of time required for each of the above steps. Obviously considerable planning, technical assistance and performance is required to reduce the steps to a minimum commitment of time and resources.
|
||