PDF Research Capture Process Options... |
||
If a file is saved in the PDF format, it enjoys all of the privileges associated with the format - portability, cross-platform access, page independence, etc. However, files that are scanned for conversion to PDF can take various forms depending upon the specific needs and resources of the organization ordering the conversion. Virtually any project that is being converted from paper originals via Adobe Capture requires evaluation for which specific PDF document mode is the best option. Factors that must be weighed include: distribution method, image literacy, file size, accuracy, text searchability, scalability, and archivability. |
||
1. Scanning to PDF Image Only |
||
|
The PDF full image mode is simply a bitmap scan (typically a TIFF file) that has simply been saved as a PDF file or importing it into Acrobat without any conversion. Either way, the user ends up with little more than the scan bitmap with PDF file code wrapped around it. In general, the file is larger than the original scan, but it will comply with any application that opens PDF files. |
||
2. Converting to PDF Image + Text |
||
|
The PDF image + hidden text mode is a quick solution for organizations that want a file that is literally a scan replica with text searchability. The OCR conversion of the bitmap is hidden under the bitmap characters in such a way that when you select text in the Reader component, the bitmaps are highlighted. A search will highlight the bitmaps that converted into the text searched. The file size is generally 20% larger than a PDF full image so it is not typically acceptable for large file publishing over the web. It can be used on intranet servers which typically run at a much higher speed. As bandwidth increases, more documents will be suitable candidates for image + hidden text solutions. File size aside, this mode is ideal for legal or financial applications. Why? Legal documents may have signatures on them and a scan replica cannot be easily faulted for operator error. Relative to PDF Normal and PDF Text Only, this mode is fast to create without time consuming proofing. In the case of financial applications, the need for number accuracy is so great that the difference between having to meticulously proof text or not is a substantial cost factor. Besides, tabular data is not usually searched, so the accuracy of hidden text numbers is not a significant issue. |
||
3. Editing in Reviewer |
||
|
Adobe Capture has an proofing and editing component called Reviewer. One of the scan conversion options is to save to an .acd file type. These files can be opened in Reviewer which is specifically designed to highlight "suspects" (misinterpreted bitmaps), misspellings, and numbers. While proofing can take place in Acrobat itself, Reviewer is a better application for the task because it shows a large blowup of the questionable bitmap as you jump from suspect to suspect. The operator can also replace graphic bitmaps with tiff images (b&w, grayscale, or color). The completed file can then be saved in one of the PDF modes, typically either PDF Normal or PDF Text Only. |
||
4. Saving as PDF Normal |
||
|
The PDF normal mode is significantly smaller than Image Only or Image + Hidden Text. That is because most of the bitmap component is stripped out of the file. What is left is the OCR text in the approximate font, size, style, and position, bitmapped suspects and images. PDF normal files can be further processed for removal of more bitmaps elements, further reducing the size. But further processing takes time and may not be worth the slight percentage of accuracy improvement. PDF normal files are small - typically half the size of image + hidden text, fully text searchable and appropriate for web distribution. |
||
5. Refining to PDF Text Only |
||
|
The PDF full text mode is the cleanest and smallest bitmap translation a client can aspire to. Virtually all bitmap elements (except inserted graphics) are eliminated. Depending on the original, these conversions can require real artistry on the part of the proofer. But the client has the ideal digital conversion from the paper original - very small, highest quality, most flexibility.
|
||