|
Home | TOC | Previous | Next | Download
Vendor should use auto-rotate
and voting when generating OCR. Most OCR software offers an auto-rotate option.
When auto-rotate is enabled, the software will OCR each image four times,
rotated 90 degrees each time. It determines the best result and publishes the
content to the load file. The majority of documents have the same orientation:
portrait. Without auto-rotate, these documents can yield good results. The rest
of the documents may be designed for a landscape layout, such as an HR chart.
Other documents still may have been scanned “upside-down”,
resulting in garbage OCR. OCR voting is a process where multiple OCR programs
compare results to determine the best results.
Quality Check
The OCR text should best
approximate and recreate the formatting found on the original image. The OCR
field should never be just the words in one long string.
No text and the top, bottom
or either side should be clipped.
Multi-Page Text Files
There should be a one
document to one OCR text file ratio. The OCR filename must match the document
image key. So, a 10 page document with the image key of AA001 should have a
corresponding file AA001.TXT that contains the OCR for AA001 through AA010.
Each page of OCR should have
a line identifying the page number, or Bates number. In this fashion, people
can search for any Bates number and find the correct document. Please include
space between the OCR text and page marker.
The following shows sample
OCR:
<<
AA001 >>
Text
for first page
<<
AA002 >>
Text
for second page
The following chart shows a
sample database and corresponding OCR files:
|
IMAGE KEY
|
BEGBATES
|
ENDBATES
|
PATH
|
FILENAME
|
|
AA001
|
AA001
|
AA0010
|
D:\[VOLUME NAME]\OCR\
|
AA001.TXT
|
|
AA011
|
AA011
|
AA0011
|
D:\[VOLUME NAME]\OCR\
|
AA011.TXT
|
|
AA012
|
AA012
|
AA0038
|
D:\[VOLUME NAME]\OCR\
|
AA012.TXT
|
|
AA039.0001*
|
AA039.0001
|
AA0100
|
D:\[VOLUME NAME]\OCR\
|
AA039.0001.TXT
|
*
Please refer to Bates prefix and suffix conventions.
Home | TOC | Previous | Next | Download
Contact Ad Litem
(C)2005 Ad Litem Consulting, Inc.
|