Indexing Tables in Kofax-Based Environments

We recently had a customer who needed to migrate off of an aging and highly customized Capture/indexing/workflow one-off solution. At the center of many of their form types in this system was a repeatable field collection object that functioned much like how you would expect a .NET DataTable control to function – values could be added horizontally to the current “row”, and at the end of it you could hit enter and a new “row” would be added. As you moved through, you also had the ability to validate the line item as a whole. In other words, nothing too out-of-the-ordinary.

Unfortunately, this stood out as a red flag for both myself and my coworker when we first saw it, since we were migrating the client to Kofax Capture. There’s nothing inherently wrong with Kofax’s flagship product, in fact it is an excellent tool for getting content where it needs to be, often in record time. One thing it doesn’t do well out-of-the-box, however, is table fields. Defining one looks normal enough, but when you actually get the chance to index them, each column ends up being a standard index field. Needless to say, turning the table 90 degrees counter-clockwise and forcing keyers to manually delimit values is not an ideal experience, especially when 99% of your form is tables that need to be indexed. Continue reading

Two Cents on Implementing Advanced Capture Solutions

I have implemented numerous Advanced Capture solutions over the years and have identified a number of sticking points that are easily overcome, but can also be a real pain. I would classify an Advanced Capture solution as any document capture implementation that is more than a standard heads-up indexing or very basic zonal OCR of fixed forms based on X and Y coordinates. That definition leaves a lot of room for what could be designated as an ‘advanced capture solution’, but I think it fits. Once you move out of the realm of basic capture, you start to encounter a lot of the same problems. There is one problem I seem to encounter every time we implement a solution.

Continue reading

KTM TDS Model Building

Are you tired of separator sheets?  Tired of wasted paper and countless hours of flipping through pages and inserting a barcode sheet at the start of a new document just to take it out after the batch is scanned or leave it in the batch and have more paper to store?  Why not have the computer do the work for you?  That’s the idea behind the Project Planner module in KTM.  There is a standard separation functionality built into KTM that works very well on structured and semi-structured documents but when you have more complex separation rules the Project Planner component of KTM is what you need.

Continue reading

Tuning Abbyy FlexiCapture Layouts and Document Definitions

So you have spent many hours analyzing and creating the layouts and definitions for the documents you need to be processed through Abbyy.  Now you should be almost ready for production, except you need to tune.  Many samples of the documents in question need to be run through and the results checked over very carefully to find and fix all the little issues that will be present.

Continue reading

When handwriting is your only option…. Peter Lang

When researching Enterprise Content Management capture projects, the question of handwriting recognition comes up again and again — and many people aren’t sure what to expect.  More commonly, their expectations are unrealistic. They think there is no hope at all, ever. On the other end of the spectrum, some think that tiny fevered cursive scribblings from a rushed meeting can be scanned (or even faxed) and read with accuracy. In helping people think about their forms and the viability of capturing handwriting, I have a few simple guidelines to consider which seem to apply in a majority of cases.

  • Are handwritten forms really the only option?  If the form is available online, can the data be made “fillable” and then submitted directly to your database tables?  Can you let the user fill the form online and print, thus producing machine print and eliminating handwriting?  How about taking the data that a user entered and bar coding it (if the form must be printed rather than be submitted)?  Also helpful and sometimes overlooked:  prefilling form  data from your database through a merge process with a bar code index for retrieval of that same data.
  • Does your Capture software support ICR?  Intelligent Character Recognition (ICR) is what you need to read handwriting.  Optical Character Recognition (OCR) is much more common and is designed to read machine print.  Please don’t try to make it read handwriting – you won’t like the results!
  • Make sure the handwriting is constrained. Annoying? Perhaps. But making the person filling the form write in boxes sets you up for the most successful ICR results.  The catch phrase here could be “Curse the cursive”.  When a character is joined to another character it is faster to write.  However,  the ICR software really struggles to figure out where one character starts and another stops.  And here’s where recognition tanks.   With the real world example below, we can generally expect 100% recognition.

  • Ask for all caps handwriting. You can often tell your ICR engine to look for upper case characters only. This really

Continue reading

College Transcript Processing

College Transcript Processing refers to converting a paper based transcript into an electronic transcript via software that OCR’s the scanned paper version, locates specific data within the transcript and saves that data for later use.  The reason for processing a transcript via software is to improve the rate of data transfer to another system for storage and retrieval versus manual data entry by a data entry specialist.  This is a somewhat difficult task due to the following reasons:

  1. Each and every College presents similar data in a very different format.
  2. Almost all colleges attempt to prevent the copying of the paper transcript through various copy protection methods.  Most of these methods render the data on the transcript almost un-readable.

The data that is similar on a transcript falls into several main areas:

  1.  College Identifying Information
  2. Student Identifying Information
  3. Session/Course Information
  4. Previous Colleges Attended Information
  5. Degrees Awarded Information

The data is similar but not the same on each college transcript.  In addition, the layout of a transcript varies greatly between the various colleges.  Session/Course data could take up the entire width of the paper for one college, but be formatted as multiple columns of data for another college.  There are many, many variations that need to be taken into consideration when attempting to OCR to find and extract the data.

So far the Abbyy FlexiCapture 9.x software has been able to handle most of these issues out of the box.  One of its most powerful features I am finding out is the scripting language to write rule, custom scripts and export scripts that can correct OCR issues and assist the Verification Operator improving efficiency and throughput.

The scripts for rules, custom scripts or export can be written in VBasic or Jscript.  There is some documentation on the Abbyy classes and objects, but not a whole lot.  Most of what I have done has been through trial and error or in specific cases from examples provided by Tech Support.  However, what scripts that have been developed work well for correcting OCR issues and providing automated checks of extracted field data.  Through Custom scripts there is even the option to use a Database lookup on extracted data and return other fields from the database to assist in providing a complete set of validated information.

This has been a learning experience but it is proving to be well worth the effort in getting the data off the paper and into the system used to evaluate a student for enrollment by cutting down on the man hours required under the old manual data entry.

Making TeleForm and LiquidOffice Work Together

Scanning and capturing data via OCR can save lots of time over manual indexing.  Linking these same forms and metadata to an eForm workflow process takes it all to the next level of efficiency – processes that took hours and days can be reduced to minutes or seconds.  Using out-of-the-box connectors and settings, form templates that were created in Cardiff TeleForm can be exported to Cardiff LiquidOffice.  With a given form residing in both LiquidOffice and TeleForm, capture choices are broadened and the benefits of workflow are easily in reach.

Here’s how it is done.  TeleForm forms can be saved in a file exchange format that LiquidOffice Designer can read.  Once a TeleForm form is complete and tested, it is ready to go.  The form can be a traditional form that was created from scratch in TeleForm Designer, or it can be an ‘existing form’ created initially outside of TeleForm.  These forms then have TeleForm fields placed in appropriate data entry locations.

Once exported to the file exchange format, the file can be opened in LiquidOffice Designer.  It will look just like the TeleForm form.  An important step here is to save the file to the native LiquidOffice .XFM file format, then close LiquidOffice Designer.  Reopen LiquidOffice Designer and open the .XFM file – this quick step ensures the form can be previewed as an HTML or PDF form, or published to the LiquidOffice server, without error.  The fields in LiquidOffice retain important settings that were created in TeleForm such as field name, maximum length, and valid entry characters. If the form is a traditional TeleForm form, ID and reference marks can be optionally retained.  (This allows a LiquidOffice user to fill a form, print it, and fax or scan it in TeleForm.)

Now that we have identical forms in TeleForm and LiquidOffice, TeleForm can be configured to export metadata to LiquidOffice.   Open the TeleForm form in Designer and add the included LiquidOffice Export connect agent.  When configuring this connect agent, you supply the LiquidOffice form GUID or workflow process GUID.  The GUID uniquely identifies the form or process, and can be located via the LiquidOffice Management Console.   You also identify the LiquidOffice Virtual Submit Directory and routing information.  Since TeleForm can have many export agents attached to it, any current export may also be retained – they will coexist without complaint.

When the form is processed in TeleForm – scanned, faxed, or even created using TeleForm Verifier’s NonForm Data Entry feature, the form will appear in the appropriate inbox or queue in LiquidOffice – filled and ready for approval, ad-hoc routing, or workflow processing.  Now that the form exists in LiquidOffice, you may also direct users to fill the forms directly from LiquidOffice when appropriate, bypassing TeleForm altogether.  The form now has grown wings and is ready to fly.