KTM TDS Model Building

October 13, 2011

Are you tired of separator sheets?  Tired of wasted paper and countless hours of flipping through pages and inserting a barcode sheet at the start of a new document just to take it out after the batch is scanned or leave it in the batch and have more paper to store?  Why not have the computer do the work for you?  That’s the idea behind the Project Planner module in KTM.  There is a standard separation functionality built into KTM that works very well on structured and semi-structured documents but when you have more complex separation rules the Project Planner component of KTM is what you need.  This program is designed to create a template or “model” for the automatic separation which is then used by the KTM Server during the normal batch workflow.  This is why you might also hear this process referred to as “model building.” I want to give you a brief look at the setup of a TDS (Trainable Document Separation) model and how to integrate it with a KTM project.

The first thing that you need to do is collect lots and lots of samples.  The program requires that each class or document type have at least 50 samples.  Each document needs to be in tiff format and have its own folder.  Furthermore, documents that are multiple pages should be separated into single page tiff images and placed into their own folder.  The next step is to take your collection of document folders and group them into folder of different document types (these will become your classes in your KTM project).  This is a very time consuming process but it will help when you go to import in documents and you will see why.

Another thing that you should be aware of is that Project Planner requires an additional license and does not get installed with the normal KTM install.  Following the standard KTM install, you can find the Project Planner setup.exe located in the install media under the “Kofax Transformation Modules” and “Project Planner.”

After you have project planner installed and you have created a new project, you need to import those documents that were just sorted.  Once imported, there is a handy tool that allows you to select where the separation for each class, document, and page are.  This will allow the system to automatically create the classes and import the appropriate documents into each class

After the files are imported you will see the classes automatically created for this model.  The next step is to run all of the documents through the OCR engine in order for the system to be able to read the documents.  This process can take hours for larger sample sets so there have been times that I just let it run overnight.

When all the documents have completed running through the OCR engine, you can begin the cleanup process.  This is simply a matter of confirming documents are part of a class or not by checking the checkbox on each document.  You only need confirm enough documents so that the system is confident on the classification based on the samples provided.  As you can see from the screen shot, the bar across the bottom is color coordinated to show the confidence of a particular class. 

Blue are the documents that you have confirmed, green means confident and red means unconfident.  As documents are confirmed the red bar will get smaller and eventually go away.  This cleanup process is complete when enough documents have been confirmed for each document class so that all of the red is gone.

The next step is to compile the information into a TDS model which the KTM project can use for separation.  This is done by creating two files in Project Planner.  The first is a classification file, or the mod file, that the system will use to distinguish what class each page belongs to.  You can either use a text classification or image classification.  The second file that needs to be compiled is the document separation file, or the ads file.  This allows the KTM Server to use the training provided in the cleanup step to know where to separate each document.

The final step is to link the model to your KTM project.   Open up a project in Project Builder and go to the project setting within Project Builder.  On the Document Separation tab, one of the options is to use the “Trainable Document Separation (TDS)”.  Select this and browse to the folder containing the mod and ads files.

When you click OK you should get a message that tells you that “The TDS project was successfully imported.  New classes were created according to the definition of the document separation model.”  If not already there, classes will be automatically created in your project.  You’re now ready to synchronize the project within Kofax Administration and publish the batch class.

In summary, by using KTM and combining it with the TDS model it will you save time and money by reducing the amount of document preparation required when scanning.  For example, in a recent install I worked with a company that had a whole room of employees (about 20) doing manual separation.  We installed KTM and used the TDS model for separation and now they only have 4-5 people doing the same volume of documents in less time.  This a very powerful tool that I would suggest to anyone who has a need for automatic separation of semi-structured and unstructured documents.

 

Brandon Konen
Systems Engineer
ImageSource Inc.

From time to time I receive questions about large file uploads with ILINX Capture. ILINX Capture can upload files of any size. The limitation is within Internet Information Services(IIS) and or the amount of memory installed in the web server. This is not only true for ILINX Capture, but and ASP or ASP.Net application.

Depending on the architecture of the ASP or ASP.Net application files being uploaded to the web server are typically streamed into the web server’s memory during the upload process before being written to disk. Depending on the number of user concurrently uploading files and the size of the files being uploaded will determine how much physical memory should be installed in the server. By default IIS has a 200KB size limit for uploading a single file. This can be increased, but not any higher than necessary or you may risk overconsumption of the web server’s memory.

Configuring File Upload Size in IIS 6

1. Open Internet Information Services Manager by clicking the Windows Start Menu and Run. Type inetmgr and click OK.

2. Once IIS Manger opens navigate the tree and right click the server name and click properties.

3. From the server properties window check the Enable Direct Metabase Edit checkbox and click OK.

4. Browse to the C:\windows\system32\inetsrv directory and edit the Metabase.xml file with a text editor such as Notepad.

5. Search for the attribute AspMaxRequestEntityAllowed and edit the value to the size in bytes that you want to allow for a maximum upload size. Save and close the Metabase.xml file.

AspMaxRequestEntityAllowed=”204800″

6. Open the Registry editor and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSOAP\30\SOAPISAP.

7. Modify the MaxPostSize key. Set the decimal value to the maximum upload size in bytes and click OK.

8. Reboot the web server to ensure the changes have taken effect.

Configuring File Upload Size in IIS 7

1. Open Internet Information Services Manager by clicking the Windows Start Menu and Run. Type inetmgr and click OK.

2. Navigate the tree to the Virtual Directory that you would like to enable large file uploads.

3. In the Features View pane double click ASP.

4. In the ASP setting pane edit the Maximum Requesting Entity and Response Buffering Limit columns. Set this to the maximum file upload size in bytes and click Apply.

 

5. Open the Windows Command Prompt and enter the following command. Change the maxAllowedContentLength to your maximum file upload size in bytes and hit enter to execute the command.

C:\Windows\System32\inetsrv\appcmd set config “Default Web Site” -section:requestFiltering -requestLimits.maxAllowedContentLength:104857600

9. Open the Registry editor and navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSOAP\30\SOAPISAP.

10. Modify the MaxPostSize key. Set the decimal value to the maximum upload size in bytes and click OK.

11. Reboot the web server to ensure the changes have taken effect.

Bryan Wilhelm
Senior Systems Engineer
ImageSource, Inc.

Follow

Get every new post delivered to your Inbox.