Vetting ABBYY ‘Keen Eye’ FlexiCapture at ImageSource
April 29, 2011
First off, ABBYY means “keen eye”, an apt name for a product that dynamically and automatically captures and processes widely disparate documents. Powerful document recognition separates and classifies docs, and state-of-the art optical character recognition rips the data from the images. I like the motto that pops up on screen – “take the data, leave the paper”. I love doing just that, sending paper briskly off to start its next recycled life. It’s the greenest thing to do, especially when compared to filling endless cabinets and long-term off-site storage facilities.
When you want to recommend, sell, support, and solve major customer problems with ECM software at ImageSource, due diligence mandates a thorough feature review and testing. I’ll describe some of the steps I was involved with in this process for ABBYY FlexiCapture – but mine is but a single slice of the vet team pie. Development teams and other engineering teams performed specific examinations to answer questions about integration, APIs, and more narrow capabilities to solve unique problems faced by eager customers. Also, ImageSource staff with a variety of titles took a week-long training course with intensive labs. Unfortunately I missed the class but was given the opportunity to spin up for a pre-sales demo last year, which was a lot of fun.
So here’s a peek at our process:
Laptop Install
First things first! I like to be able to run new software on my laptop whenever possible. This frees me from all bandwidth and location constraints. I can easily focus on the vet effort on a plane, down by the river, wherever and whenever. ABBYY FlexiCapture has a convenient ‘Standalone Installation’ which gives you access to all the key components on one box.
Obtain Sample Images from Client
In this case we gathered dozens of hardcopy invoices from a large international corporation. The images were not pretty and included originals, copies, printed faxes, you name it.
Ascertain Server Needs
After reviewing the ABBYY documentation we set the requirements for our labs – memory per server, disk space, software required, scan station requirements, scanner requirements, and required operating systems.
Spin Up VMs
Thanks to Mike Peterson we had three servers up in no time.
Convening the Team , Locking Down the ‘War Room’
Gene Eckhart, Jeff Doyle and I met in our Olympia office for a week. Gene secured the war room where we periodically met with developers, project managers, engineers, and principals. Most of the time it was the three of us banging away.
Lab Software Install
Now we installed ILINX Capture on one server, ABBYY ‘s ‘Distributed Installation’ on another server, and SQL server on the last. This architecture would mimic what we’d encounter in the field – and also the standalone install wouldn’t cut it as it doesn’t scale and it uses SQL Express as a support database. As installed, we can easily add more servers for high-volume stress testing. By running a WebEx all week we were able to record every moment of each day’s work, easily pass the focus from machine to machine, and allow others a view of what we were doing who were remote. We involved ABBYY tech support when we had a question and felt we could speed up an installation process. Turns out we could, and it was great to have the technician join our session without delay and see what was up. Also, as we installed we meticulously kept a running log of any issues – however minor – we encountered. At the end of each day Gene led a review session where we discussed and polished the invaluable ‘Lessons’ doc.
End-To-End Test
This was our ‘Hello World’ moment – we set up communication between ILINX Capture and ABBYY, and created an appropriate ILINX Capture workflow. Then we created a simple FlexiLayout, exported it, imported it into FlexiCapture, and created a document definition and an export. We configured the scanner and the scan station and established we had end-to-end connectivity.
Building Generic Flexilayouts
One of the many goals of our week was to share baseline knowledge as well as advanced techniques for capturing documents. We identified two forms that were relatively easy to identify and constituted a large amount of the total paper volume. In short order we had FlexiLayouts and document definitions configured. Then it was time to tweak and refine. The ability to chain elements together worked outstandingly – find a keyword, then find the nearest zip code with the help of regular expressions. Then using out-of-the-box settings we could find the state, city, address, and addressee. Wow, powerful.
Building an Uber FlexiLayout
Now it was time to roll the sleeves and build a smarter FlexiLayout that could capture invoices from a variety of sources. We used advanced features such as FlexiLayout alternatives, element groups, object collection elements, and other settings to start recognizing semi-structured forms from a wide variety of sources. Then we added a little bit of FlexiLayout language code to help us “crawl” around the identified forms to find dates and monetary amounts that could sometimes be below keywords, or to the right, etc. We didn’t need to script any validation rules for our purposes, but I showed some script I had created prior to our meeting . A quick unit test showed great results – we now had stepped away from a model where each form had to have its own FlexiLayout.
Running Recognition Tests
We changed our lab coat to testing hazmat suits and ran many batches of documents we had used in development as well as documents we had never looked at before.
Recording Results
While never a thrill, here we benefitted from a spreadsheet created by Jeff Martin, Gene Eckhardt and Brandon Konen that allowed easy entry of recognition results. This is known as our “Advanced Capture Analysis and Comparison Tool”, highly regarded in our ranks. The data was automatically crunched allowing us to very quickly establish baselines, compare our scan results with other products, share our results with coworker and principals, etc.
Lessons Learned Doc Revisited
It’s a privilege to be able to work with industry veterans such as Jeff Doyle and Gene Eckhardt on a project such as this. They brought years of experience with them to improve every process we covered. While evaluating the Lessons Learned doc, they were able to extrapolate possible impacts in environments and scenarios they have seen in the field. They also add fresh mitigation alternatives to work through problems encountered. Our Lessons Learned docs are part of a valuable and large knowledge base that has been added to at ImageSource for year after year.
Findings and Conclusions Write-Up
After a demonstration to some coworkers needing to ramp-up on our configuration, we collaborated to create a summary document and here Gene took the lead. We were able to draw on the Lessons Learned doc, the Advanced Capture Analysis and Comparison Tool, and meeting notes to piece together our findings and quantify our conclusions. The summary outlined the scope of our efforts, including excluded activities, our environment and products tested, results, conclusions, general observations, and Best Practice recommendations.
It’s one thing to kick the tires on a car before purchase. But a methodical, thorough and thoughtful approach is the norm for analogous software tasks at ImageSource.
Implementing Content Management Systems with Multiple Environments
September 11, 2009
A common recommendation we have when designing Enterprise Content Management systems is the use of multiple environments. I am referring the use of Development, Test, and/or QA environments to complement a Production environment. There are many advantages to deploying systems with multiple environments, and I would like to discuss the role of multiple environments and the advantages to implementing them for your ECM system.
Depending on the size and complexity of the solution different supporting environments are recommended. For, example with a smaller departmental level solution with little or no custom development, it is common to only recommend one supporting environment used for development and testing. Now let’s take another example where a customer has an enterprise level ECM system with custom development and a requirement for minimal system downtime. The following is a common layout for this type of system:
- Development Environment – Used for custom development and preparation for testing changes to the ECM system. This environment is usually much smaller than the Production Environment and is commonly running on virtual servers/machines.
- Test Environment – Used for end to end testing of changes to the system. Changes are certified in this environment prior to moving to the QA or Production. This environment is usually smaller than Production, but it is imperative that the functionality is consistent to ensure proper testing and certification of the changes.
- Quality Assurance Environment – This environment serves a couple of purposes and it closely mirrors the architecture of the Production Environment. Performance load testing and client acceptance are performed in this environment. In some instances, this environment can also serve as a disaster recovery environment in the event of a Production outage.
- Production Environment – Used for the ECM Production System.
This environment configuration is representative of a common layout for multiple environments, but depending on the organization and solution it can vary. The ECM solution architects play a valuable role in recommending the optimal configuration. At ImageSource, we have extensive knowledge and experience with ECM architecture and take a great deal of pride in designing the correct layout for the customer and the solution.
Now that we have discussed the role that different environments can play, I would like to touch on some key points to implementing multiple environments for your ECM System. These points are based on my many years of experience implementing these types of solutions.
- It is imperative to put in place and enforce the procedures for implementing changes in these environments. It is very common for us to find ECM solutions implemented with multiple environments, but all of the changes are being implemented directly into Production and the other environment(s) are out of date with Production. It takes additional time to follow the procedures and keep all environments in sync, but it is the key to mitigating the risk of Production issues, such as downtime.
- It is not a requirement to match the horsepower and server structure of your Production Environment with your Development and Test Environments. There are scenarios where this makes sense, but for the most part a scaled down version is sufficient.
- The time and investment that it takes to implement and support multiple environments will pay for itself by mitigating risk and system downtime. There are many items that can adversely affect a Production system, e.g. operating system patches, ECM software patches or upgrades, the addition of new functionality, the modification of existing functionality. Being able to test and certify that the changes will not adversely affect the Production system prior to implementation can completely eliminate a company’s exposure to security or compliance issues, as well as, avoid the cost of having the Production system go down.
- The Development or Test level Environments can provide valuable training for your technical staff. Having environments that will not affect the Production Environment give a company’s ECM technical workers the ability to hone their skills and make mistakes without the negative impact of bringing down the Production system.
In summary, I would highly recommend the use of multiple environments when implementing or supporting an Enterprise Content Management system. It is an investment of time and money to implement and maintain multiple environments, but the payoff will quickly be realized through risk mitigation. If you already have an ECM solution implemented without multiple environments, I would recommend evaluating the cost that Production issues cost your business and use that information to make the case for implementing an additional environment or environments. If you have multiple environments but they are not synchronized with or being used to complement the Production Environment, now is the time to re-evaluate your procedures and leverage your company’s investment to support the health of the system.
Ryan S. Keller
Project Manager
ImageSource, Inc.

