Kofax KTM Dictionary Gotcha on Dates
March 18, 2011
In KTM there is a nifty feature to search the entire document for a date field. It will recognize all dates existing on the form and with some other snazzy logic you can find the date you are looking for. If it is nearby the word “recieved”, then you probably have a recieve date. Easy, right?
Okay, sometimes dates get a little more tricky. “3/17/2011“, “3-17-20011” and “MAR 17, 2001” are all valid date strings. Any of those formats could be found on your document. In KTM there is a nifty feature to search for the string “MAR” and replace it with a “3” when searching for dates. You use it in your locator’s regular expression. You can setup your own dictionary of months to look for “March” or “Mar” (or “Marzo” if you need internationalization).
Here’s the gotcha. I recently found text in an OCR’d document like this: “19 NOV2008“. It’s a bit of an odd string. The OCR engine didn’t think there was enough space between the “NOV” and the “2008” to put an actual space character in the ORC’d text. So, I can read it, but KTM can’t. The nifty feature to search for the string “NOV” fails because it is only looking at whole words, those with whitespace on either side. Unfortunately, there is no option in the KTM dictionary setup to change this.
Here’s the fix. Modify the default KTM regular expression from this:
[0-3]?\d§English_Months_Abr§([12]\d{3}|\d{2})
to this:
[0-3]?\d\s*(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\s*[0-3]?\d
You can now make the space character optional in your regular expression search. You are no longer using the month dictionary, but that’s okay. This logic is only to locate a date, not translate it from a string to a date when found.
Problem solved.
Oracle IPM 11g Released!
March 27, 2010
For those of you who have not heard Oracle has released the next generation of their Enterprise Content Management Software, Imaging and Process Management (IPM) 11g. This version is the first major step that Oracle has taken to tightly integrate the product into Oracle’s overall software architecture…IPM 11g has been completely overhauled to be part of the Fusion Middleware (FMW) tech stack. From the ECM perspective, Oracle now has a complete seamlessly integrated end to end offering that includes the storage repository, document management, business process management, library services, web publishing, records management, reporting/monitoring and application integration. This creates many advantages for customers that use or plan to use other Oracle products in their workplace, as well as, integrating and leveraging existing investments in non-Oracle software.
I have been working as a Systems Engineer and Project Manager with the IPM software base for over 8 years, through the Stellent IBPM acquisition, all the way back to the Optika Acorde and eMedia days. A couple major differences in implementing the latest Oracle 11g version are the requirements for Oracle Universal Content Management (UCM) for the storage repository and Oracle WebLogic Server for the application/web server. I look at both of these requirements in a positive light. UCM and WebLogic Server are powerful robust products that provide standard approaches to managing content storage and applications, respectively, from the FMW perspective. With that said, if you do not have experience with either UCM or WebLogic, you will need to get up to speed with them to succeed in an IPM implementation. Neither of these products can be installed through the “Next, Next, Next, Finished!” approach, so careful upfront planning and architecting is required to ensure a successful implementation.
Let’s talk about the new user interface a little bit. Oracle has followed suit with the rest of the major players in the ECM world by creating a complete web based interface for performing all administrative and end user functions. This makes administration duties of the system much easier than in past versions that require administration to be done through the “thick” client. Also, by moving to the WebLogic Server the full featured web interface is now much more browser agnostic than in the past. The image viewer comes in two flavors that support over 400 file formats; a zero footprint view only version and the a re-written java applet that allows for full annotations, annotation security, and server based conversion/rendering for access speed. The following are a couple of screen captures of the user interface from IPM 11g:
The Client Interface
The Zero Footprint Viewer
The Java Applet Viewer
Lastly, I would like to touch on a feature that is often overlooked when implementing ECM solutions, application integration. Oracle has done a great job in IPM 11g to provide some powerful capabilities for leveraging investments into Oracle and non-Oracle applications through integration. For a process where users are assigning metadata to a record in their business application, through application integration this data can be pushed to and associated with the document stored in IPM 11g. Another example of integration would be the image enablement of a business application. In this case a user could be accessing records in their ERP system a hotkey, menu item, or button in the application screen can retrieve and display the document from IPM without the user ever having to leave their business application. These capabilities can create significant efficiencies in an organization through increased user productivity, the reduction of training and the simplification of support and administration.
All in all I see the changes that Oracle made in IPM 11g as great additions to an already strong platform. Oracle has a product that not only adheres to their architecture model, but also will provide many benefits to the customers that use it. Stay tuned to this blog for more information related to our experiences with Oracle IPM 11g.
Ryan Keller Project Manager ImageSource, Inc.
Business Process Optimization
November 14, 2009
For those of you who attended my breakout session at the NEXUS ECM Conference on automating business processes this topic will be familiar to you. If you missed the session, this blog will provide a glimpse into the world of automating and optimizing business processes.
There are many different ways to approach process automation and optimization and the purpose of this blog topic is to provide information based on my industry experience. I will discuss identifying processes within an organization and then automating those processes utilizing a number of valuable implementation strategies.
Why Automate?
From my experience in the Enterprise Content Management industry, I have found the main reasons to automate or optimize a business process are as follows:
- Gain Process Efficiencies
- Process Quality Improvement
- Improve Reporting, Tracking & Auditing
Process Identification
Let’s take a look at identifying a business process that could be automated. When looking at processes to automate or optimize, the starting point is to identify a process and then extensively research the process to get a clear understanding of the current state. A good place to start with this research is to look at all of the inputs and outputs of the current process. This can include documents, data and communication associated with accomplishing tasks in a process.
Next, we will want to evaluate the identified process to determine what manual steps in the process can be automated. From identifying the steps we then can determine which ones will provide the best return for the business and/or user.
The last key to identifying and evaluating business processes is the inclusion of the user community in the analysis of the current process to determine; 1) what is currently working well, 2) what could use improving, 3) what are the major deficiencies and 4) what is on the user’s wish list for the process.
By following these steps in identifying and evaluating a business process you will set yourself up for success when architecting and implementing a solution for automation or optimization.
Implementation Strategies
Now that we have discussed identifying business processes let’s take a look at some implementation strategies to assist you in automating/optimizing the process.
- Understand the Business Process: As discussed earlier in the post, it is critical to fully understand the process that you are automating.
- Evaluate current bottlenecks
- Determine the user interaction with the current process
- Require Ownership at All Levels: In order to get full acceptance of the solution you are implementing you should ensure that the entire team is on board and understands the benefits to them and the organization. This includes:
- Executive Level
- Departmental Management
- End Users
- Know what to Automate: Don’t automate every manual process for the sake of automation. Determine the return value associated with the re-engineering of the process. In some cases it will make more sense to keep the process manual. For example, in a customer service organization, it may be more beneficial to provide human interaction to a customer instead of sending an automatically generated email.
- Educate Yourself on Existing Systems: Understanding the current infrastructure in place can be critical when determining the return on investment and initial cost of the process re-engineering.
- If there is already an Enterprise Content Management system in place, you should be able to leverage this for tasks associated with document capture, document management/archival, workflow, etc…
- Line of Business systems (Oracle E-Business, JD Edwards, PeopleSoft, MSFT Great Plains, etc…) can be leveraged for storing metadata associated with the documents you are capturing. Using software like ILINX Integrate, these LOB systems can then be image enabled to retrieve documents directly from your document management system without ever leaving the LOB system.
- Promote Ongoing Analysis & Optimization: This strategy is key to creating and maintaining truly efficient and optimized processes within an organization. Let’s take the following example:
- A manual process is identified to automate
- The process is automated with success using the above implementation strategies
- Everyone is happy and uses the new and improved process
- Now that the process has been improved it is common to call the project a success and never look back. This may work for some time, but eventually the process will need to be evaluated again to determine if additional automation or optimization needs to take place. Over time business processes evolve and technology changes, so this step can be imperative to keep your business process streamlined.
In summary, we have taken a quick look at the process of identifying business process to automate and optimize, as well as, some strategies for success when taking on the task of business process re-engineering. Please feel free to post comments related to this information or your own experiences related to this topic.
Ryan S. Keller Project Manager ImageSource, Inc.
ECM Best Practices: Document Capture & Metadata Collection
October 24, 2009
The term “Data Strategy” and can be used to understand strategic requirements for enterprise metadata concepts and best practices. Why is this important for an ECM solution? The simple answer is why contribute to the disarray of an organizations data management when implementing an ECM solution.
The key to a successful ECM system is the ability for users to easily and quickly find information within the repository. If specific information cannot be easily and quickly found, users may not be enthusiastic about using the system.
The biggest stumbling block in the way of finding specific or relevant information is when the search returns too many documents that the user will have to physically look through before finding the specific document or set of documents that exactly meet their requirements.
In order to prevent this impedance to effective document retrieval, organizations have to be able to refine the population of documents that are searched so that irrelevant information is not selected and returned in the query result set.
In a well designed ECM system, this is accomplished by using metadata (or taxonomy’s) to limit the search result set to items which meet the user’s selected parameters (i.e. date range, case number, document type, etc.). Metadata is used to describe content that the ECM system intakes.
Metadata not only provides a way to index (and therefore retrieve) content but also it provides the means of managing content throughout its lifecycle. For a document or item of content, this means data about it such as its author, its title, the issue date, and other information which can usefully be associated with it.
Whether you are using ILINX Capture, Kofax Ascent Capture, or Captovation users will have to enter some metadata, though as much as possible should be created and collected automatically if possible given the document sources, types, and available databases linked to documents. It is important to get the balance right, too few mandatory elements may result in little metadata being entered, too many mandatory elements may be seen as a tedious chore.
Consistency and accuracy of metadata values is crucial to the value of metadata. Controlled vocabularies, pick lists, default values and inheritance are all important tools and techniques in this context.
When designing a system the recommendation is to apply metadata to all documents that are expected to be found easily in a query and all documents that are stored as “final” archived records in the ECM repository. This process will be essential in the user adoption and success of the system. For more information please visit the North West’s premier conference at http://nexusecm.com and register for one of the ECM break out sessions.
Jon Sutherland
Sr. Systems Engineer

Distributed Capture for the Enterprise
September 4, 2009
Distributed Capture (for Scanning and Indexing) has been gaining ground in the last several years. Used to be that documents were sent to the basement where a dedicated scan operator “fed the dragon” by scanning hundreds if not thousands of documents a day. This was known as “Centralized Capture”. Problem was, how to relay the vital index information necessary for search and retrieval to the scan operator?
The last few years have seen technology leap forward with the advent of inexpensive “desktop scanners”, browser based software such as ImageSource’s ILINX Capture http://www.ilinxcapture.com/index.htm and integration tools that “image enable” or “document enable” the existing Line of Business Systems used by most all departments such as ImageSource’s ILINX AIK Toolkit. Line of Business Systems are those account driven applications dedicated to the function of a department. SAP, Oracle Financials and JD Edwards for Finance. Salesforce.com, Great Plains and Siebel for Sales. Peoplesoft for Human Resources… just to name a few. Since these systems manage the events that create documents, why not enable them to manage the documents directly. The productivity gain may not be so obvious.
Most of us associate document enablement with the immediate benefit of displaying a scanned image on the screen with the click of a button in their line of business application. Productivity gains result from not having to perform a secondary search in the document management system for the documents that match the record in their Line of Business application. This in itself is a powerful capability. But what gains can be realized on the capture side by reversing the process in a Distributed Capture solution that uses desktop scanners and Line of Business Applications to index the documents into the system?
Here’s some real benefits to distributed capture:
1.) Distributed Capture eliminates the need to transport documents from the cubicle farm to the basement.
2.) Distributed Capture eliminates barcodes/patchcodes necessary for the proper functioning of a “batch oriented” centralized scanning system.
3.) Distributed Capture eliminates the need for a dedicated scanner operator and/or scanning department.
4.) Distributed Capture can capitalize on SDK toolkits that integrate line of business applications for more accurate indexing.
5.) Scanner costs can be reduced by relying on inexpensive desktop scanners rather than “the big iron” in the basement.
Distributed Capture used to be an unrealized capability in the enterprise. Today, with more advanced scanner hardware, more technologically “in-tune” workers and more advanced toolkits for document management……….. it can be a reality.
Managing Multiple ECM Systems at an Organization
July 10, 2009
A trend that I have been noticing more and more is the presence of multiple Enterprise Content Management (ECM) Systems at organizations. There are many reasons why the scenario of multiple ECM Systems can occur; 1) mergers & acquisitions, 2) strengths of the ECM products, 3) lack of internal communication and/or understanding of existing systems in the organization, 4) division of dollars at departmental levels, and so forth and so on…
One issue that can arise from having multiple ECM solutions within an organization is the difficulty of accessing information stored in the different systems. Let’s look at an example of an organization that has two ECM systems used for content archiving and retrieval. The first ECM system is used by the Human Resources department for storing documents and data related to employee on-boarding and personnel management. The second ECM system was installed at a later date by the Accounts Payable department. This system is used for managing documents related to payroll, as well as, acting as portal for employees to find information related to their pay. The presence of multiple systems within the organization creates a number of headaches (i.e. managing of storage, security/permissions, what information is where, etc…), but for now let’s focus on the fact that there are users that need access to the documents and metadata stored in both systems. The user’s have the ability to log into the first system and run a search to find HR documents, and then to find Payroll documents they have to switch over to the second system, log in, and run another search. This requires the user’s to have knowledge and an understanding of both systems to perform their job functions, and the laborious nature of this task creates inefficiencies within the departments.
There are a number of ways to get around the issue of having documents and data, that user’s need access to, in multiple locations:
- Consolidation of the systems. This can be an arduous task, but in the end there will be only one system to manage. Consolidation is most commonly the option when multiple ECM solutions are the result of mergers & acquisitions. This may not always the best solution because of business requirements and product strengths.
- Link the data and documents between the systems. This functionality varies between products, but a good example of how this can be done is through the utilization of the ILINX SharePoint Connector software. This software gives users that ability to search multiple content management systems through SharePoint. The users have a single access point to all ECM related content using SharePoint, which alleviates the required system knowledge and consumption of time associated with searching through multiple systems.
- Leveraging the investment in one of the ECM systems. Almost any top tier ECM system can be used to retrieve content from an external system. For example, using Oracle Imaging and Process Management linked servers can be created to access content from external data sources. Keep in mind that most native functionality for linking content within ECM systems has limitations. When this is the case the use of middleware products, like the ILINX AIK, can alleviate these limitations.
It is rare that companies have the horsepower to take on these tasks themselves, so there are solution integrators out there, like ImageSource, which can assist with this work. Advantages of going with an experienced ECM integrator are that they can evaluate the current business requirements, assist in streamlining the current process, provide recommendations (if needed) on optimizing the solution, and provide risk mitigation throughout the configuration/redesign process.
The utilization of multiple ECM solutions within an organization can be a cumbersome endeavor. By evaluating the overall business requirements and optimizing the organization’s ECM architecture these pains can be easily overcome and will make Enterprise Content Management a more valuable asset to the organization.
Ryan Keller
Project Manager / Sr. Systems Engineer
ImageSource, Inc.
To find out about the latest trends in the industry and interact with industry experts, businss leaders, and the user community –> Check Out NEXUS 2009!







