|
Home | TOC | Previous | Next | Download
This section was
provided by Dataflight.
Concordance Database Load
Files
The most reliable format for
Concordance data delivery is a Concordance Database. This will ensure that the vendor has the
correct fields, and that the data will load without a hitch. Rolling productions, delivered as
Concordance databases, can be merged into the working set utilizing the
standard “Import Concordance Database” option. Additional fields of data can be
imported into existing records in the same fashion, allowing for initial base
level coding to be done, and then more detailed coding for a subset of
“Key” documents, identified through initial review. In instances where this is not an
option, data can be delivered utilizing standard delimited files for coded
data, and TXT or RTF files for OCR data. Refer to the load files section of
this document to see the firm's preference.
Delimited Load Files
The first line of the
delimited text database load file should be the field names.
Concordance allows users to
specify delimiters; however, the best practice is to use the “Concordance
Standard Delimiter” characters, which are:
·
Comma (020),
·
Quote (254),
·
Newline (174)
OCR Load Files
OCR is loaded into
Concordance through the READOCR CPL (Concordance Programming Language) script,
which is designed to import document level OCR (one database record represents
one document). Your text files should be on the document level to import
properly with this CPL.
The choice of multi-page OCR
files, or "Document level" files, means that the full document,
including all pages, resides within a single file. If the database has five
records, then there are five documents and five OCR text files, each containing
however many pages. Most vendors will delineate between OCR pages by adding
text such as, << ABC0000001 >>.
The OCR text filename must be
unique. Otherwise the READOCR program may import that text into multiple
records. The filename, therefore, should match the image key field for the
associated document in the database (IMAGEKEY.TXT). The script will scan selected volume
directories for the filename that matches the value of the
“IMAGEKEY" field.
Example:
Two documents have been
OCR’d for import into a Concordance database, with Bates ranges
corresponding MSC000001 and contains 3 pages. The second begins at MSC000004
and contains 2 pages. The corresponding OCR text files are named MSC000001.TXT,
and MSC000004.TXT.
|
BEGBATES*
|
ENDBATES
|
PATH
|
FILENAME
|
|
MSC000001
|
MSC000003
|
D:\[VOLUME_NAME]\OCR\
|
MSC000001.TXT
|
|
MSC000004
|
MSC000005
|
D:\[VOLUME_NAME]\OCR\
|
MSC000004.TXT
|
* Image key - unique value.
Opticon OPT (Load) Files
The Opticon load file details
the link between documents in Concordance and their corresponding images. Each
line reference defines the image key (the reference from the database), its
volume label (for identification purposes), and the associated image (with its
full file path). The load file entries also define the document breaks and,
optionally, page counts.
The Opticon load file format
is a text-delimited file containing all information necessary to link the
imagebase with the database. There is one line entry per image file, whether it
is a single-page or multi-page image file. The load file consists of seven
delimited entries as follows:
ALIAS,VOLUME,PATH,DOC_BREAK,FOLDER_BREAK,BOX_BREAK,PAGES
Example:
The following is a 5-image
load file example. It details 2 documents; the first relates to the image key
MSC000001 and contains 3 pages. The second begins at MSC000004 and contains 2
pages.
MSC000001,MSC001,D:\IMAGES\001\MSC000001.TIF,Y,,,3
MSC000002,MSC001,D:\IMAGES\001\MSC000002.TIF,,,,
MSC000003,MSC001,D:\IMAGES\001\MSC000003.TIF,,,,
MSC000004,MSC001,D:\IMAGES\001\MSC000004.TIF,Y,,,2
MSC000005,MSC001,D:\IMAGES\001\MSC000005.TIF,,,,
|
Value
|
Description
|
|
ALIAS
|
Should match your image key
from the Concordance database. Concordance stores this key in order to
reference the image.
|
|
VOLUME
|
This entry is the name of
the volume where the image resides.
This is typically the volume name of a CD or server. (Optional)
|
|
PATH
|
This is the full path and
file name (and extension) of the image.
|
|
DOC_PATH
|
Enter a ‘Y’ to
denote whether this image marks the beginning of a document.
|
|
FOLDER_BREAK
|
Enter a ‘Y’ to
denote whether this image marks the beginning of a folder. (Optional)
|
|
BOX_BREAK
|
Enter a ‘Y’ to
denote whether this image marks the beginning of a box. (Not Currently
Supported)
|
|
PAGES
|
This entry is the number of
pages associated with the image. (Optional)
|
Opticon currently supports
the following image types:
• TIFF
files: (single and multi-page): (.TIF)
• JPEG
files (.JPG)
• GIF
files (.GIF)
• Bitmap
files (.BMP)
• PCX
files (.PCX)
• CALS
files (.CAL, .MIL)
To learn more about
Concordance and Opticon, please visit http://www.dataflight.com.
Home | TOC | Previous | Next | Download
Contact Ad Litem
(C)2005 Ad Litem Consulting, Inc.
|