Tuning Abbyy FlexiCapture Layouts and Document Definitions

So you have spent many hours analyzing and creating the layouts and definitions for the documents you need to be processed through Abbyy.  Now you should be almost ready for production, except you need to tune.  Many samples of the documents in question need to be run through and the results checked over very carefully to find and fix all the little issues that will be present.

Continue reading

Vetting ABBYY ‘Keen Eye’ FlexiCapture at ImageSource

First off, ABBYY means “keen eye”, an apt name for a product that dynamically and automatically captures and processes widely disparate documents.  Powerful document recognition separates and classifies docs, and state-of-the art optical character recognition rips the data from the images.  I like the motto that pops up on screen – “take the data, leave the paper”.  I love doing just that, sending paper briskly off  to start its next recycled life.  It’s the greenest thing to do, especially when compared to  filling endless cabinets and long-term off-site storage facilities.

When you want to recommend, sell, support, and solve major customer problems with ECM software at ImageSource, due diligence mandates a thorough feature review and testing.  I’ll describe some of the steps I was involved with in this process for ABBYY FlexiCapture – but mine is but a single slice of the vet team pie.  Development teams and other engineering teams performed specific examinations to answer questions about integration, APIs, and more narrow capabilities to solve unique problems faced by eager customers.  Also, ImageSource staff with a variety of titles took a week-long training course with intensive labs.  Unfortunately I missed the class but was given the opportunity to spin up for a pre-sales demo last year, which was a lot of fun.

So here’s a peek at our process:

 Laptop Install

First things first!  I like to be able to run new software on my laptop whenever possible.  This frees me from all bandwidth and location constraints.  I can easily focus on the vet effort on a plane, down by the river, wherever and whenever.  ABBYY FlexiCapture has a convenient ‘Standalone Installation’ which gives you access to all the key components on one box.

 Obtain Sample Images from Client

In this case we gathered dozens of hardcopy invoices from a large international corporation.  The images were not pretty and included originals, copies, printed faxes, you name it.

 Ascertain Server Needs

After reviewing the ABBYY documentation we set the requirements for our labs – memory per server, disk space, software required, scan station requirements, scanner requirements, and required operating systems.

 Spin Up VMs

Thanks to Mike Peterson we had three servers up in no time.

Convening the Team , Locking Down the ‘War Room’

Gene Eckhart, Jeff Doyle and I  met in our Olympia office for a week.  Gene secured the war room where we periodically met with developers, project managers, engineers, and principals.  Most of the time it was the three of us banging away. Continue reading

College Transcript Processing

College Transcript Processing refers to converting a paper based transcript into an electronic transcript via software that OCR’s the scanned paper version, locates specific data within the transcript and saves that data for later use.  The reason for processing a transcript via software is to improve the rate of data transfer to another system for storage and retrieval versus manual data entry by a data entry specialist.  This is a somewhat difficult task due to the following reasons:

  1. Each and every College presents similar data in a very different format.
  2. Almost all colleges attempt to prevent the copying of the paper transcript through various copy protection methods.  Most of these methods render the data on the transcript almost un-readable.

The data that is similar on a transcript falls into several main areas:

  1.  College Identifying Information
  2. Student Identifying Information
  3. Session/Course Information
  4. Previous Colleges Attended Information
  5. Degrees Awarded Information

The data is similar but not the same on each college transcript.  In addition, the layout of a transcript varies greatly between the various colleges.  Session/Course data could take up the entire width of the paper for one college, but be formatted as multiple columns of data for another college.  There are many, many variations that need to be taken into consideration when attempting to OCR to find and extract the data.

So far the Abbyy FlexiCapture 9.x software has been able to handle most of these issues out of the box.  One of its most powerful features I am finding out is the scripting language to write rule, custom scripts and export scripts that can correct OCR issues and assist the Verification Operator improving efficiency and throughput.

The scripts for rules, custom scripts or export can be written in VBasic or Jscript.  There is some documentation on the Abbyy classes and objects, but not a whole lot.  Most of what I have done has been through trial and error or in specific cases from examples provided by Tech Support.  However, what scripts that have been developed work well for correcting OCR issues and providing automated checks of extracted field data.  Through Custom scripts there is even the option to use a Database lookup on extracted data and return other fields from the database to assist in providing a complete set of validated information.

This has been a learning experience but it is proving to be well worth the effort in getting the data off the paper and into the system used to evaluate a student for enrollment by cutting down on the man hours required under the old manual data entry.

Kofax 9.x – They’ve finally done it… Almost

I have been working with the Kofax Capture product for over ten years now. To prove that, let me tell you the configuration on one of my first installs. I remember setting up a Bell and Howell 3338 scanner (you know, the one that required a cherry picker to get out of the box and on to the desk) with the Kofax KF board and Kofax Capture version 2.x. Ah yes, I look back fondly on the old days of deploying a scanner with the Kofax card and software. I know it has been out for a while now, but I recently started working with version 9 of Kofax Capture and I am pleased to say that they have finally addressed some of the Kofax gotchas that have been plaguing us for years.

For starters, they made client deployment 100% easier by creating the MSI package. I can’t tell you how many conversation I have had with client admins that go like this:

Me: No we don’t have a SMS or other type deployment package you can use, but you can make your own.

Client Admin: (Furrows brow) Huh?

I will be much happier when those conversations are a little less embarrassing. Now the workstations can be deployed using Microsoft SMS, Group Policy, IBM Tivoli, Symantec Altiris, HP Openview, or whatever deployment suite you use. Kofax has only tested SMS, but Continue reading

Abbyy FlexiCapture For Transcript Processing – A More Detailed Review

Last time we took a look at the Abbyy FlexiCapture product to perform College Transcript processing in a broad overview.  This time I would like to start looking at some lower level details of the product that show where FlexiLayouts end and Project Level Document Definitions begin.

Let’s start with some basic definitions.  A Layout is used to help the Recognition Engine to identify the document in a batch as belonging to a particular Document Definition.   A Layout is also used to help the Recognition Engine to find the locations of the data to extract and place in fields the user can then see and modify if necessary.  A Document Definition is used to determine the type of processing to perform on the document, the fields contained in the document and the type of data those fields should have.
Continue reading

Student Enrollment Transcript Processing

Many Colleges and Universities must handle transcripts received from other Colleges and Universities for Student Enrollment processing. The receiving College goes through several steps to process a Student Application for Enrollment and the associated transcript(s) that Application may have and those steps may require the application to pass to several different people.  Usually a folder is created to hold all the documents being received to support the Application and once all required documents are received this folder is passed on to Evaluators to evaluate the application and make an Admit or Deny decision.

The processing of a transcript may follow different processing depending on the College.  In one case the information on the transcript is manually entered into an ERP system for Student Processing on a line by line basis.  This is very labor intensive and slows the processing of a Student’s Application.  In addition, the Evaluator must review the transcript and mark those lines that cannot be transferred, and manually add up the Units Attempted, Units Earned/Completed and calculate a GPA to see if the Student qualifies for admission.  A lot of manual processing is done on a single transcript and a single Student may have one or more transcripts from previous institutions and all have to be evaluated in the same way to determine admission to the school.

There are a couple of products available that can help to automate this Student Enrollment Processing, Oracle I/PM and ABBYY FlexiCapture.  Oracle I/PM can provide both the image storage of all the documents received and also a workflow to route the document sets through the various stages of processing electronically.  This relives the use of paper in the processing and the associated issues of losing track of documents or folders and time consuming searches to find a document.

Since a paper transcript is different for different institutions a product that allows flexibility in processing different formats is required to read the data from the transcript and place it into similar fields that can be uploaded into an ERP system.  The ABBYY FlexiCapture product allows for the capture of information from a free format form like a transcript.  It has a module called FlexiLayout that allows the developer to specify where on a page a specific data set may reside.  It can handle table data like Session/Course data which can be repeated multiple times on a single transcript.  It can handle multiple page transcripts and multiple columns of data on a single page that continues on the next page.  This product is very flexible in the design stages to allow the developer to handle almost all the common issues when attempting to extract data from a transcript.

By using the ABBYY FlexiCapture product and releasing the extracted transcript data and the image into I/PM there are several time and labor gains to Enrollment Processing.

  • Almost all manual routing of paper is eliminated.  This saves time in both the movement of folders from one desk to another and also saves time in searching for the correct folder to place newly acquired documents.
  • Manual Line by Line data entry of transcripts is reduced.  Even with the ABBYY product some labor is still required to review the extraction results and ensure the data is correct.  However, this Validation step takes a lot less time and effort then manual line by line data entry.  The data can then be uploaded electronically into an ERP/Student Processing system.
  • Since the extracted data is now in the I/PM repository it is easy to develop a form that can allow the evaluator to select the Session/Course lines to include in a Total Summary and then press a button so the totals are calculated automatically.  This sure beats the manual method of using a hand calculator.

Using both of these products help in lowering the costs of processing Student Applications for Enrollment and the time consuming effort of transcript processing.