From time to time I receive questions about large file uploads with ILINX Capture. ILINX Capture can upload files of any size. The limitation is within Internet Information Services(IIS) and or the amount of memory installed in the web server. This is not only true for ILINX Capture, but and ASP or ASP.Net application.

Depending on the architecture of the ASP or ASP.Net application files being uploaded to the web server are typically streamed into the web server’s memory during the upload process before being written to disk. Depending on the number of user concurrently uploading files and the size of the files being uploaded will determine how much physical memory should be installed in the server. By default IIS has a 200KB size limit for uploading a single file. This can be increased, but not any higher than necessary or you may risk overconsumption of the web server’s memory.

Configuring File Upload Size in IIS 6

1. Open Internet Information Services Manager by clicking the Windows Start Menu and Run. Type inetmgr and click OK.

2. Once IIS Manger opens navigate the tree and right click the server name and click properties.

3. From the server properties window check the Enable Direct Metabase Edit checkbox and click OK.

4. Browse to the C:\windows\system32\inetsrv directory and edit the Metabase.xml file with a text editor such as Notepad.

5. Search for the attribute AspMaxRequestEntityAllowed and edit the value to the size in bytes that you want to allow for a maximum upload size. Save and close the Metabase.xml file.

AspMaxRequestEntityAllowed=”204800″

6. Open the Registry editor and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSOAP\30\SOAPISAP.

7. Modify the MaxPostSize key. Set the decimal value to the maximum upload size in bytes and click OK.

8. Reboot the web server to ensure the changes have taken effect.

Configuring File Upload Size in IIS 7

1. Open Internet Information Services Manager by clicking the Windows Start Menu and Run. Type inetmgr and click OK.

2. Navigate the tree to the Virtual Directory that you would like to enable large file uploads.

3. In the Features View pane double click ASP.

4. In the ASP setting pane edit the Maximum Requesting Entity and Response Buffering Limit columns. Set this to the maximum file upload size in bytes and click Apply.

 

5. Open the Windows Command Prompt and enter the following command. Change the maxAllowedContentLength to your maximum file upload size in bytes and hit enter to execute the command.

C:\Windows\System32\inetsrv\appcmd set config “Default Web Site” -section:requestFiltering -requestLimits.maxAllowedContentLength:104857600

9. Open the Registry editor and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSOAP\30\SOAPISAP.

10. Modify the MaxPostSize key. Set the decimal value to the maximum upload size in bytes and click OK.

11. Reboot the web server to ensure the changes have taken effect.

Bryan Wilhelm
Senior Systems Engineer
ImageSource, Inc.

The feature set in ILINX Capture is vast and it can be a drag reviewing and interpreting feature lists in software documentation.  Those of you not familiar with ILINX Capture can visit the following website www.ilinxcapture.com, or feel free to leave a comment and we can provide additional information and/or a hands-on demonstration.  In short, ILINX Capture is a web based capture platform that excels in distributed capture and custom capture workflow environments.  It is scalable to work on a single workstation or it can be extended to an enterprise wide global standard for capture in your organization.

I wanted to use this post to touch on a couple of the features that I see being used more and more in ILINX Capture.  These features became part of the product based on customer feedback, industry direction, and internal vision for the product.  All of the following features can be added to any point in your process flow map, so it provides not only the functionality but also the flexibility to adapt to the business needs of current processes in place today.

  1. 2D Barcode Support   – This feature adds the ability to read metadata, classify and separate documents, and provide quality control checks through the recognition of 2D barcodes.  Through a GUI the user has the ability to parse the barcode data and map it to fields, separate and identify the type of document, and validate that the number of pages in the document match what was captured through the scanning or electronic import process. 
  2. Web Service Integration  – This feature provides ILINX Capture with the ability to integrate with any existing web service.  Most commonly, we see this used to perform database lookups or validations against existing line of business systems.  Another way this is being utilized is to interact with different organization processes, for example, you can create a support ticket in an organization’s support system every time a process exception occurs in their fully automated capture workflow.
  3. Queue Thresholds & Triggers Work queues in ILINX Capture are areas where human interaction is required to process data or documents through the workflow.  The thresholds and triggers provide the ability to monitor the batches or documents in a queue and execute a function when a threshold or trigger is met.  This is useful to monitor escalations or the processing of high priority documents.  For example, if a fax comes in to the system for an auto loan or stock trade, in most cases, this is a time sensitive process that needs to move rapidly through the workflow.  Between the notification features and the thresholds/triggers, ILINX Capture can ensure that 1) a user is notified that there is high priority work to process, 2) the documents are processed within a defined time frame, and 3) if the documents are not processed the system can notify a manager or route the documents to another user group.

These are just a few of the features that have been added to extend the functionality of this product.  Stay tuned to this blog for additional information on other features that help shape this product to provide value to its customer community. 

Ryan Keller
ImageSource, Inc.

When researching Enterprise Content Management capture projects, the question of handwriting recognition comes up again and again — and many people aren’t sure what to expect.  More commonly, their expectations are unrealistic. They think there is no hope at all, ever. On the other end of the spectrum, some think that tiny fevered cursive scribblings from a rushed meeting can be scanned (or even faxed) and read with accuracy. In helping people think about their forms and the viability of capturing handwriting, I have a few simple guidelines to consider which seem to apply in a majority of cases.

  • Are handwritten forms really the only option?  If the form is available online, can the data be made “fillable” and then submitted directly to your database tables?  Can you let the user fill the form online and print, thus producing machine print and eliminating handwriting?  How about taking the data that a user entered and bar coding it (if the form must be printed rather than be submitted)?  Also helpful and sometimes overlooked:  prefilling form  data from your database through a merge process with a bar code index for retrieval of that same data.
  • Does your Capture software support ICR?  Intelligent Character Recognition (ICR) is what you need to read handwriting.  Optical Character Recognition (OCR) is much more common and is designed to read machine print.  Please don’t try to make it read handwriting – you won’t like the results!
  • Make sure the handwriting is constrained. Annoying? Perhaps. But making the person filling the form write in boxes sets you up for the most successful ICR results.  The catch phrase here could be “Curse the cursive”.  When a character is joined to another character it is faster to write.  However,  the ICR software really struggles to figure out where one character starts and another stops.  And here’s where recognition tanks.   With the real world example below, we can generally expect 100% recognition.

  • Ask for all caps handwriting. You can often tell your ICR engine to look for upper case characters only. This really increases accuracy. And when the form filler forgets to write AS IF SHOUTING, you can often get OK results anyway.
  • Show them how! I know it may seem condescending, but consider this a helpful reminder to those who would blow through the blocks in a mad dash. Show users an example of the way to write in constrained print fields.  And here’s where you can tell them to use all-caps, and show it in your example.

  • Use key index values and database lookups! If there is an employee number, unique phone number, SSN/TaxID, or other unique ID for the person filling the form, use it whenever you can. Then perform a database lookup to confirm identity and optionally populate any other fields that you may need that happen to exist already in your database.
  • Less is More. People burn out on filling lengthy forms using constrained print fields.  Try to minimze the amount the need to write and careless handwriting will decrease.
  • Comb fields can work too. If you think all those constrained print boxes are just too hideous looking, try using comb fields instead. But remember, as soon as people ignore the combs and write cursively or sloppily, ICR results plummet.

  • Use Drop Out Colors for the boxes. If your scanner and ICR software support color dropout technology, you make the ICR engine’s job easier. The boxes aren’t recognized by the scanner, but the handwriting is. So now the constrained print box lines (which make sure each handwritten character is isolated in a target area) don’t have to be considered during ICR.
  • Use OMR bubbles if you really really need perfect index value from handwriting. Remember filling page one of standardized test?  This painful process might be worth it. This is called Optical Mark Recognition.  Since the engine just needs to confirm if a bubble is filled or not, this is easier and more accurate than OCR or ICR.

  • Faxing? Well, OK. But recognition levels will go down.

With these hints in mind, you can look forward to results that are perhaps short of miraculous – that is, less accurate than OCR.  By all means, the results are still worthwhile and produce great time savings when properly implemented.    There are more tricks to describe, which I may save for a later blog.  Please contact ImageSource if you have any questions about capturing handwriting in forms.

First off, ABBYY means “keen eye”, an apt name for a product that dynamically and automatically captures and processes widely disparate documents.  Powerful document recognition separates and classifies docs, and state-of-the art optical character recognition rips the data from the images.  I like the motto that pops up on screen – “take the data, leave the paper”.  I love doing just that, sending paper briskly off  to start its next recycled life.  It’s the greenest thing to do, especially when compared to  filling endless cabinets and long-term off-site storage facilities.

When you want to recommend, sell, support, and solve major customer problems with ECM software at ImageSource, due diligence mandates a thorough feature review and testing.  I’ll describe some of the steps I was involved with in this process for ABBYY FlexiCapture – but mine is but a single slice of the vet team pie.  Development teams and other engineering teams performed specific examinations to answer questions about integration, APIs, and more narrow capabilities to solve unique problems faced by eager customers.  Also, ImageSource staff with a variety of titles took a week-long training course with intensive labs.  Unfortunately I missed the class but was given the opportunity to spin up for a pre-sales demo last year, which was a lot of fun.

So here’s a peek at our process:

 Laptop Install

First things first!  I like to be able to run new software on my laptop whenever possible.  This frees me from all bandwidth and location constraints.  I can easily focus on the vet effort on a plane, down by the river, wherever and whenever.  ABBYY FlexiCapture has a convenient ‘Standalone Installation’ which gives you access to all the key components on one box.

 Obtain Sample Images from Client

In this case we gathered dozens of hardcopy invoices from a large international corporation.  The images were not pretty and included originals, copies, printed faxes, you name it.

 Ascertain Server Needs

After reviewing the ABBYY documentation we set the requirements for our labs – memory per server, disk space, software required, scan station requirements, scanner requirements, and required operating systems.

 Spin Up VMs

Thanks to Mike Peterson we had three servers up in no time.

Convening the Team , Locking Down the ‘War Room’

Gene Eckhart, Jeff Doyle and I  met in our Olympia office for a week.  Gene secured the war room where we periodically met with developers, project managers, engineers, and principals.  Most of the time it was the three of us banging away.

 Lab Software Install

Now we installed ILINX Capture on one server, ABBYY ‘s ‘Distributed Installation’ on another server, and SQL server on the last.   This architecture would mimic what we’d encounter in the field – and also the standalone install wouldn’t cut it as it doesn’t scale and it uses SQL Express as a support database. As installed,  we can easily add more servers for high-volume stress testing.  By running a WebEx all week we were able to record every moment of each day’s work, easily pass the focus from machine to machine, and allow others a view of what we were doing who were remote.  We involved ABBYY tech support when we had a question and felt we could speed up an installation process.  Turns out we could, and it was great to have the technician join our session without delay and see what was up. Also, as we installed we meticulously kept a running log of any issues – however minor – we encountered.  At the end of each day Gene led a review session where we discussed and polished the invaluable ‘Lessons’ doc.

 End-To-End Test

This was our ‘Hello World’ moment – we set up communication between ILINX Capture and ABBYY, and created an appropriate ILINX Capture workflow.  Then we created a simple FlexiLayout, exported it, imported it into FlexiCapture, and created a document definition and an export.  We configured the scanner and the scan station and established we had end-to-end connectivity.

 Building Generic Flexilayouts

One of the many goals of our week was to share baseline knowledge as well as advanced techniques for capturing documents.  We identified  two forms that were relatively easy to identify  and constituted a large amount of the total paper volume.  In short order we had FlexiLayouts and document definitions configured.  Then it was time to tweak and refine.  The ability to chain elements together worked outstandingly – find a keyword, then find the nearest zip code with the help of regular expressions.  Then using out-of-the-box settings we could  find the state, city, address, and addressee.   Wow, powerful.

Building an Uber FlexiLayout

Now it was time to roll the sleeves and build a smarter FlexiLayout that could capture invoices from a variety of sources.  We used advanced features such as FlexiLayout alternatives, element groups, object collection elements, and other settings to start recognizing semi-structured forms from a wide variety of sources.  Then we added a little bit of FlexiLayout language code to help us “crawl” around the identified forms to find dates and monetary amounts that could sometimes be below keywords, or to the right, etc.  We didn’t need to script any validation rules for our purposes, but I showed some script I had created prior  to our meeting .  A quick unit test showed great results – we now had stepped away from a model where each form had to have its own FlexiLayout.

 Running Recognition Tests

We changed our lab coat to testing hazmat suits and ran many batches of documents we had used in development as well as documents we had never looked at before.

 Recording Results

While never a thrill, here we benefitted from a spreadsheet created by Jeff Martin, Gene Eckhardt and  Brandon Konen that allowed easy entry of recognition results.  This is known as our “Advanced Capture Analysis and Comparison Tool”, highly regarded in our ranks.  The data was automatically crunched allowing us to very quickly establish baselines, compare our scan results with other products, share our results with coworker and principals, etc.

Lessons Learned Doc Revisited

It’s a privilege to be able to work with industry veterans such as Jeff Doyle and Gene Eckhardt on a project such as this.  They brought years of experience with them to improve every process we covered.  While evaluating  the Lessons Learned doc, they were able to extrapolate possible impacts in environments and scenarios they have seen in the field.  They also add fresh mitigation alternatives to work through problems encountered.  Our Lessons Learned docs are part of a valuable and large knowledge base that has been added to at ImageSource for year after year.

Findings and Conclusions Write-Up

After a demonstration to some coworkers needing to ramp-up on our configuration, we collaborated to create a summary document and here Gene took the lead.  We were able to draw on the Lessons Learned doc, the Advanced Capture Analysis and Comparison Tool, and meeting notes to piece together our findings and quantify our conclusions.  The summary outlined the scope of our efforts, including excluded activities, our environment and products tested, results, conclusions, general observations, and Best Practice recommendations.

It’s one thing to kick the tires on a car before purchase.  But a methodical, thorough and thoughtful approach is the norm for analogous software tasks at ImageSource.

I have been working with the Kofax Capture product for over ten years now. To prove that, let me tell you the configuration on one of my first installs. I remember setting up a Bell and Howell 3338 scanner (you know, the one that required a cherry picker to get out of the box and on to the desk) with the Kofax KF board and Kofax Capture version 2.x. Ah yes, I look back fondly on the old days of deploying a scanner with the Kofax card and software. I know it has been out for a while now, but I recently started working with version 9 of Kofax Capture and I am pleased to say that they have finally addressed some of the Kofax gotchas that have been plaguing us for years.

For starters, they made client deployment 100% easier by creating the MSI package. I can’t tell you how many conversation I have had with client admins that go like this:

Me: No we don’t have a SMS or other type deployment package you can use, but you can make your own.

Client Admin: (Furrows brow) Huh?

I will be much happier when those conversations are a little less embarrassing. Now the workstations can be deployed using Microsoft SMS, Group Policy, IBM Tivoli, Symantec Altiris, HP Openview, or whatever deployment suite you use. Kofax has only tested SMS, but with the MSI package it should work for any suite.

You are now allowed to have multiple instances of the Administration module in an environment at one time. They finally figured out how to manage their database and I am glad that they did. There are some caveats to this new functionality:

  • In a KCNS install the Administration module can only be opened at the Central Site.
  • Two users cannot be modifying the same object (Batch Class, Document Class, etc.) at the same time. This is a good thing, though.
  • If you deplore change, you can disable this new feature in the ACConfig.xml

You can now retroactively update a Batch Class. THIS IS BIG! If you have ever had to export out 50 batches and re-import them in all because a checkbox was inadvertently checked in the Release Script setup, you know what I mean. Basically you can make your change, publish the Batch Class, and then update the existing batches in Batch Manager. There are some caveats to this new functionality as well, but they had to start somewhere. Here are some of the things you can’t add/remove/update:

  • Queues
  • Form types
  • Folder classes
  • Batch fields
  • Document or folder index fields

Release Scripts are now known as Export Connectors. Sure it is still Release.exe, but I am much happier with the Export Connector name than I ever was saying ‘Release Script’.  This is especially true when we are selling productized versions. In addition to the out of the box Database and Text Export Connectors, we now have Email and Fax Export Connectors. The Email Connector works with Exchange and SMTP, while the Fax Connector works with Biscom, RightFax, and the Kofax Communication Server. Another nice thing that Kofax does is include the source code for all their Export Connectors. This way we can tweak and modify these things as needed.

One thing that Kofax has been sorely lacking is the Batch Workflow capabilities they have finally added. Nothing is out-of-the-box, but they give you the source code to a custom module called CMSplit that can split batches apart depending on the Form Type. Think of the possibilities. With Kofax Capture you can capture documents at one site, split the documents up into child batches depending on form type, and send the documents to different sites for processing. This workflow functionality has been lacking from Kofax Capture and I am glad they finally added it. It is just a framework and will require custom coding, but it is step in the right direction.

There are a number of other enhancements new to Kofax Capture with version 9, but I discussed above all the ones that actually made me say ‘Yes’ when I found out about them. Some of the other new or enhanced features are:

  • Limited their OCR engine to ABBYY only, but you can request a patch for the old engine if functionality you used is no longer available.
  • Enhanced .NET support for Validation scripting.
  • Centralized scanner profiles have been improved.
  • More options for PDF generation now.
  • Custom Modules can now be deployed centrally with the Kofax Capture Deployment Service. This is a must considering any workflow with be done with a custom module.

If you are on an older version of Kofax Capture, I encourage you to look at what improvements you could make to your processes or management with an upgrade to version 9.

I recently was asked to help with a client’s KTM (Kofax Transformation Modules) project, because they were not pleased with the percentages of valid and/or correct extraction fields. My first question was, “Are you using subclasses?”  The answer was, “No.”  Subclassifying your top forms is an easy way to greatly improve your extraction results.

What I mean by that is instead of trying to use a single locator to find data from all of your documents with a “one size fits all” approach, you can use subclasses to first classify the document and then tune your locators specific to that form to look in a precise location for the information. For example, let’s say you need to find a “Case Number” off of all of your forms. Some forms might have the word “Case Number” above the text you need to extract. Others might have the word “Number” to the left of the data. Another might not have any text around the data to key off of at all. It’s difficult to add enough rules in one locator to catch all the possible scenarios. Furthermore, there are times when adding rules to help find data on one form will actually give you negative results from another. Subclasses can help by allowing you to create a specific locator to zero in on the information that you are looking for.

How many subclasses are enough? I like to use the 80/20 rule. When listing all of your documents in relation to volume, 80% of your volume should come from 20% of your forms. I know that there are exceptions to the rule, but this is a good place to start. I have done projects here at ImageSource where we subclassified the top 10, 20 or 50 forms. When forms are subclassified, the extraction averages go way up by using locators like the Advanced Zone Locator on structured forms. This locator is very helpful because once you draw a box around the data, you can set it to run its own cleanup and OCR of that zone rather than taking the original full-text OCR results. However, this is only really useful on forms that have been subclassified since you know exactly where in data is on the page. Format Locators are also very helpful because you know how the data is structured in relation to the form, and you can create a regular expression to look for text. This helps reduce the amount of incorrect possible alternative results. For the rest of the forms that are not subclassified, you still need to create the miscellaneous locators, but the idea is that the majority of your documents are being subclassified and coming through with very high extraction rates.

The other nice thing about KTM is that you can use locators at the parent class, and each of the subclasses will inherit the locator unless you specifically change them. An example of where this is helpful is for fields that use a database locator with a fuzzy lookup that applies to all the forms, but you don’t what to create a specific locator for all the subclasses. In addition, you can still use the incredible training power that KTM provides. When using the specific learning, it will apply the training to the particular subclass. I have found that with KTM there are many different ways to “skin a cat,” and this is just one of the methods that can dramatically improve your extraction results.

Brandon Konen
Systems Engineer
ImageSource, Inc.

Follow

Get every new post delivered to your Inbox.