The Case for 11g

March 28, 2011

ORacle Fusion Middleware 11g

Welcome to the Jungle

A Look Back

Oracle IPM 11g is the latest version of the venerable Image and Process Management product but the product has a long history.  IPM was developed by Optika in the late 90s with the name eMedia as a workflow enabled replacement for an imaging solution named FilePower.  The eMedia brand was phased out at version 2.0 and replaced with the Acorde name.  We still have some clients who are successfully running Acorde installations to this day.

Optika was bought out by a company called Stellent and the product went through another rebranding phase this time as Imaging and Business Process Management (IBPM).  It was at this time the version was bumped up from Acorde 4.0 to Stellent IBPM 7.5 to bring the product in line with Stellent’s overall product versioning.

Finally Oracle buys Stellent and brings the Stellent Content Server and IBPM products under their umbrella.  Content Server turns into Universal Content Management and IBPM turns into Oracle Image and Process Management 10gR3.

So we’ve finally arrived at IPM.  A key thing to remember that during this period of development and rebranding is that the product remained essentially the same.  It operated on the same principles and was architected in the same manner.  Not to say there weren’t improvements between eMedia 1.0 and IPM 10gR3 but these improvements embodied natural evolution of the product.

This is true no longer; IPM 11g has changed the game.

Read the rest of this entry »

In KTM there is a nifty feature to search the entire document for a date field. It will recognize all dates existing on the form and with some other snazzy logic you can find the date you are looking for. If it is nearby the word “recieved”, then you probably have a recieve date. Easy, right?

Okay, sometimes dates get a little more tricky. “3/17/2011“, “3-17-20011” and “MAR 17, 2001” are all valid date strings. Any of those formats could be found on your document. In KTM there is a nifty feature to search for the string “MAR” and replace it with a “3” when searching for dates. You use it in your locator’s regular expression. You can setup your own dictionary of months to look for “March” or “Mar” (or “Marzo” if you need internationalization).

Here’s the gotcha. I recently found text in an OCR’d document like this: “19 NOV2008“. It’s a bit of an odd string. The OCR engine didn’t think there was enough space between the “NOV” and the “2008” to put an actual space character in the ORC’d text. So, I can read it, but KTM can’t. The nifty feature to search for the string “NOV” fails because it is only looking at whole words, those with whitespace on either side. Unfortunately, there is no option in the KTM dictionary setup to change this.

Here’s the fix. Modify the default KTM regular expression from this:

[0-3]?\d§English_Months_Abr§([12]\d{3}|\d{2})

to this:

[0-3]?\d\s*(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\s*[0-3]?\d

You can now make the space character optional in your regular expression search. You are no longer using the month dictionary, but that’s okay. This logic is only to locate a date, not translate it from a string to a date when found.

Problem solved.

I have been working with the Kofax Capture product for over ten years now. To prove that, let me tell you the configuration on one of my first installs. I remember setting up a Bell and Howell 3338 scanner (you know, the one that required a cherry picker to get out of the box and on to the desk) with the Kofax KF board and Kofax Capture version 2.x. Ah yes, I look back fondly on the old days of deploying a scanner with the Kofax card and software. I know it has been out for a while now, but I recently started working with version 9 of Kofax Capture and I am pleased to say that they have finally addressed some of the Kofax gotchas that have been plaguing us for years.

For starters, they made client deployment 100% easier by creating the MSI package. I can’t tell you how many conversation I have had with client admins that go like this:

Me: No we don’t have a SMS or other type deployment package you can use, but you can make your own.

Client Admin: (Furrows brow) Huh?

I will be much happier when those conversations are a little less embarrassing. Now the workstations can be deployed using Microsoft SMS, Group Policy, IBM Tivoli, Symantec Altiris, HP Openview, or whatever deployment suite you use. Kofax has only tested SMS, but with the MSI package it should work for any suite.

You are now allowed to have multiple instances of the Administration module in an environment at one time. They finally figured out how to manage their database and I am glad that they did. There are some caveats to this new functionality:

  • In a KCNS install the Administration module can only be opened at the Central Site.
  • Two users cannot be modifying the same object (Batch Class, Document Class, etc.) at the same time. This is a good thing, though.
  • If you deplore change, you can disable this new feature in the ACConfig.xml

You can now retroactively update a Batch Class. THIS IS BIG! If you have ever had to export out 50 batches and re-import them in all because a checkbox was inadvertently checked in the Release Script setup, you know what I mean. Basically you can make your change, publish the Batch Class, and then update the existing batches in Batch Manager. There are some caveats to this new functionality as well, but they had to start somewhere. Here are some of the things you can’t add/remove/update:

  • Queues
  • Form types
  • Folder classes
  • Batch fields
  • Document or folder index fields

Release Scripts are now known as Export Connectors. Sure it is still Release.exe, but I am much happier with the Export Connector name than I ever was saying ‘Release Script’.  This is especially true when we are selling productized versions. In addition to the out of the box Database and Text Export Connectors, we now have Email and Fax Export Connectors. The Email Connector works with Exchange and SMTP, while the Fax Connector works with Biscom, RightFax, and the Kofax Communication Server. Another nice thing that Kofax does is include the source code for all their Export Connectors. This way we can tweak and modify these things as needed.

One thing that Kofax has been sorely lacking is the Batch Workflow capabilities they have finally added. Nothing is out-of-the-box, but they give you the source code to a custom module called CMSplit that can split batches apart depending on the Form Type. Think of the possibilities. With Kofax Capture you can capture documents at one site, split the documents up into child batches depending on form type, and send the documents to different sites for processing. This workflow functionality has been lacking from Kofax Capture and I am glad they finally added it. It is just a framework and will require custom coding, but it is step in the right direction.

There are a number of other enhancements new to Kofax Capture with version 9, but I discussed above all the ones that actually made me say ‘Yes’ when I found out about them. Some of the other new or enhanced features are:

  • Limited their OCR engine to ABBYY only, but you can request a patch for the old engine if functionality you used is no longer available.
  • Enhanced .NET support for Validation scripting.
  • Centralized scanner profiles have been improved.
  • More options for PDF generation now.
  • Custom Modules can now be deployed centrally with the Kofax Capture Deployment Service. This is a must considering any workflow with be done with a custom module.

If you are on an older version of Kofax Capture, I encourage you to look at what improvements you could make to your processes or management with an upgrade to version 9.

I recently was asked to help with a client’s KTM (Kofax Transformation Modules) project, because they were not pleased with the percentages of valid and/or correct extraction fields. My first question was, “Are you using subclasses?”  The answer was, “No.”  Subclassifying your top forms is an easy way to greatly improve your extraction results.

What I mean by that is instead of trying to use a single locator to find data from all of your documents with a “one size fits all” approach, you can use subclasses to first classify the document and then tune your locators specific to that form to look in a precise location for the information. For example, let’s say you need to find a “Case Number” off of all of your forms. Some forms might have the word “Case Number” above the text you need to extract. Others might have the word “Number” to the left of the data. Another might not have any text around the data to key off of at all. It’s difficult to add enough rules in one locator to catch all the possible scenarios. Furthermore, there are times when adding rules to help find data on one form will actually give you negative results from another. Subclasses can help by allowing you to create a specific locator to zero in on the information that you are looking for.

How many subclasses are enough? I like to use the 80/20 rule. When listing all of your documents in relation to volume, 80% of your volume should come from 20% of your forms. I know that there are exceptions to the rule, but this is a good place to start. I have done projects here at ImageSource where we subclassified the top 10, 20 or 50 forms. When forms are subclassified, the extraction averages go way up by using locators like the Advanced Zone Locator on structured forms. This locator is very helpful because once you draw a box around the data, you can set it to run its own cleanup and OCR of that zone rather than taking the original full-text OCR results. However, this is only really useful on forms that have been subclassified since you know exactly where in data is on the page. Format Locators are also very helpful because you know how the data is structured in relation to the form, and you can create a regular expression to look for text. This helps reduce the amount of incorrect possible alternative results. For the rest of the forms that are not subclassified, you still need to create the miscellaneous locators, but the idea is that the majority of your documents are being subclassified and coming through with very high extraction rates.

The other nice thing about KTM is that you can use locators at the parent class, and each of the subclasses will inherit the locator unless you specifically change them. An example of where this is helpful is for fields that use a database locator with a fuzzy lookup that applies to all the forms, but you don’t what to create a specific locator for all the subclasses. In addition, you can still use the incredible training power that KTM provides. When using the specific learning, it will apply the training to the particular subclass. I have found that with KTM there are many different ways to “skin a cat,” and this is just one of the methods that can dramatically improve your extraction results.

Brandon Konen
Systems Engineer
ImageSource, Inc.

Follow

Get every new post delivered to your Inbox.