As an integrator there are many things we can do as far as making the install a success, but only a very limited amount we can do to guarantee its continuing success. One of the most often seen issues is the handoff of an installation to the client, in particular the maintenance of the database.

I will talk in particular about installations using Microsoft SQL Server. When the database is created it is more often as not left to us as the integrators to create the database (sometimes even the installation of the MSQL Server instance as well). We will choose the path for the database file groups, and allow the defaults for everything else. For operational purposes of our software this will work, but there should be changes made and additional procedures put in place by the clients DBA to guarantee continued optimal performance of the newly implemented system.

When visiting these systems at a later date we find that log files have not been truncated, leaving to question whether any backups have been done at all. Log files are the record of changes which have taken place since the last log truncation; this allows that the database can be restored to a point in time since the last full backup to reverse an adverse action or system failure. These log files are often much larger than the data files. In a maintained system a DBA would have established a normal backup strategy which would have backed up the log file at different intervals through the day and truncated the log file after a full database backup.

The other is often finding the file growth setting for the database to be at a preset auto growth amount. These by default are either 10 mb or 10%. When the database was first established an estimate was made to come up with a number which would be used for the initial file size for the database and log files. For databases which have a need for growth these presets can be very inefficient, as among other things they are likely to do their growth during business hours. Take a 200gb database which started at 150 gb, if it grows by only 10 mb every time it needs to grow there will be a large number of inefficient file space allocations to have made up the 50gb of space. If told to grow by 10 percent, then we will have seen 15 gb file allocations taking place but during business hours which can have an impact upon users. One which the maintenance has been taken up by the DBA will have been monitored and auto growth will have been turned off. The DBA would have scheduled the data file growth based upon measurements taken of the databases past growth rate, and most likely the new space allocation will have taken place in off business hours.

There may be a myriad of reasons why this handoff has failed including the lack of a DBA in the organization. I do not purport to know what the answer is, only that when a follow on visit to a client is made, that inspection of the database for these settings would be worthwhile so that they can be brought the clients attention and hopefully then engagement by the IT department for the maintenance needing to take place.

 

Jeff Doyle
Senior System Architect
ImageSource, Inc.

ILINX Product Suite

July 31, 2010

I am not usually out to promote specific products on this blog, but I have been getting really excited about the latest advancements in the ILINX Product Suite.  It is an area that I, among other experienced ECM technologists, have utilized our expertise in creating and refining solutions that can provide real world value for businesses implementing or utilizing ECM solutions.  Take a minute to read this quick post and judge for yourself the value that ILINX Products can provide for your organization.

You may be hearing the word ILINX used in Enterprise Content Management circles more and more these days.  From the humble beginnings of a simple release script connecting a document capture system to an ECM repository the ILINX Product Suite has grown into a set of powerful, easy to use products that provide quick ROI.  There are multiple levels to the ILINX Product Suite ranging from a full blown web client based document capture system (ILINX Capture) or an ECM Repository (ILINX Content Store) to variety of middleware products that can provide time savings and productivity boosting results like ILINX Integrate.

If you are not familiar with all that the Product Suite has to offer, check out the ILINX website for the details and product demos.

-Ryan Keller

ILINX Integrate Redux

July 26, 2010

ILINX Integrate has been nicely summarized by John Linehan in his December 19, 2009 blog.  I saw Shad White and John’s ILINX Integrate demonstration at last year’s Nexus and was really impressed.  Simply stated, this application allows you to take data from one application and paste it into another without modifying either application.  And to avoid confusion Integrate has also been known as ILINK AIK or Application Extender Kit.  Works for me!

Now that I have worked with the Integrate program I have come up with some tips and tricks that will allow you to get up to speed with this tool a little faster.  This document assumes you’ve at least partially perused some ILINX Integrate documentation as I’ll refer to components without describing them.

Always budget sufficient time for your project!  Not every project is right the first time.  With some testing and massaging, you’ll get there. But remember that taking your time and really testing your project will pay big dividends.  Are you cut and paste results consistent? Do you need to insert any delays?  Have you tested using different logins? Have you moved your target and source windows around in testing? Have you accessed browser-based screens from all possible user links? Have you kept your eye on the Integrate log?  Did you test from the Studio and the Client? Multiple machines? Multiple OS’s?  Multiple browsers? How about under various phases of the moon?  OK scratch the last one as it is (for sure) unnecessary.

Read the documentation! The Designer Guide PDF file is your ticket but the on-line help is also very good.  There’s a lot of functionality packed in and you may find some project shortcuts.  More likely you’ll find the solutions to problems you weren’t thinking Integrate could solve (ok this is a long-cut!).  There’s a mail task component, an FTP task, a script task, an XSLT task, a screen capture task, and many more.  This is not a steroidal snipping tool but rather a feature-rich application extension environment.

Consider starting with a simple thick client application.  I’ve used Windows calculator as a handy target application for testing.  Make your

connection, and define the screen which holds the fields you wish to work with.  It helps to already have data in the field you wish to define.  When you map the field you should see the contents of the field in the Integrate Value field. In this example the field value is ‘brian eno’.

Also useful is to add a dialog task.  The dialog tasks provides the ability to perform quick tests to validate that the data you are trying to grab is obtainable.  You can add an event to this task — events are task triggers, essentially.  You can configure this event as a Koolbar button – a taskbar containing buttons you configure.  When executing a project click the button to see the values you are grabbing.

And remember you can add many buttons, associated with many tasks in your project. Label your buttons well!

Is your data not pasting when it should?  Here’s the first thing to do: change the default value of your connection’s Field Refresh Option.

Set this to ‘Read Fields Prior To Task Execution’.  Then  start with the log file.  You may need to redefine the screen that encapsulates your target fields.

Is your Koolbar not popping up when it should?  See the last entry above. And for Browser applications there is a  Browser Window property field. Here you may substitute portions of the window caption or URL text with wildcard characters (*,?,#).  This can make all the difference when trying to identify a screen!  You may also wildcard html element index values and html FRAME element URL values. To do this, click on ‘Manage Hierarchy’.

I’ve found it helpful to keep a record of the hierarchy by copy and pasting the hierarchy AND screenshotting the screen element hierarchy (like the example below).  This record can speed debugging efforts.

It’s easy to add a web URL task.  Use input parameters to map field values like this: http://localhost/search.asp?query=${parm1}+${parm2}.  You can then map parm1 and parm2 to fields in your project using the Data Mapping editor. Easy-schmeezy!

Hopefully this is enough to whet your appetite and save some time getting started.  Doing careful and well-tested development work will result in a smooth-running application that will pay for itself time and time again. Reminds me of the prep work adage I try to never forget:   ”give me six hours to chop down a tree and I will spend the first four sharpening the axe.”  (Abraham Lincoln, 1809-65).  I like that this quote came from Abe and not Jack Torrance (remember Nicholson in The Shining?).

Last time we took a look at the Abbyy FlexiCapture product to perform College Transcript processing in a broad overview.  This time I would like to start looking at some lower level details of the product that show where FlexiLayouts end and Project Level Document Definitions begin.

Let’s start with some basic definitions.  A Layout is used to help the Recognition Engine to identify the document in a batch as belonging to a particular Document Definition.   A Layout is also used to help the Recognition Engine to find the locations of the data to extract and place in fields the user can then see and modify if necessary.  A Document Definition is used to determine the type of processing to perform on the document, the fields contained in the document and the type of data those fields should have.

Now for some details on the FlexiLayout Design Studio.  The studio can load a sample image or document to OCR the image and allow the designer to start identifying specific locations on the image to find OCR data elements like text, separator lines, white space, or pictures.  These identified elements can then be used to locate data locations to extract information to be used for the field data.  The field data can be either single fields or table fields and can be specified as repeatable if they occur more than once.  In the case of a Transcript, field data locations for Student Name, Date of Birth and Student Identifier are usually single fields located near the top of a page.  Field data like Sessions and Course information are more likely to be a table of fields that are repeating on the page.  In attempting to capture this data into the correct field a fixed identifier like Static Text must be used to limit the search for the actual data to a specific region.  For something like Student Name it might be a static text label to the left of the name of the student that can then be used to anchor where the name of the student is located.  Once the anchor is found, then the field definition for the name of the student can be determined from the relationship of where the anchor is located to where the name of the student is located by using an x and y offset from the anchor location to draw a box around the information to extract.  These same types of steps are used to locate the data to extract for all the other fields to capture from the image including the table fields.

Now that the layout identifies the location of the fields to extract data, it is transferred over and serves as the basis for a Document Definition.  The fields identified in the layout are created in the Document Definition with the same names and data types from the layout.  At this point, you can use the Document Definition as is or modify it to add additional fields and or data validation scripts.  This turns out to be a very useful feature since not all transcripts contain exactly the same data, but to release this data to a backend system or database does require some consistency in the names of the fields and their types.  So if one Document Definition for a transcript has the Student SSN, then all Document Definitions for other transcripts should have a Student SSN even if the actual transcript image does not contain such a value.

In addition, data validation scripts written in either VB or Java can correct the data the Recognition Engine extracts so the operator does not have to perform this work.  For example, when reading the Course data for Units Attempted the value read should be 3.00 but in a lot of cases is read as 3. 00 with a space in the middle.  A data validation script can be written to automatically remove the space to get to the correct value.  A data validation script can also help to split up a field into multiple values.  For instance the Student Name is most likely defined as a single field that contains the whole name of the student.  But many backend systems or database like to have the name broken down into its parts like First Name, Middle Name and Last Name.  Therefore a data validation script can be written to split the name into its parts and assign the data to the separate fields.

Again we have only just started to scratch the surface of the capabilities and features of the Abbyy FlexiCapture product.  To really get a feel for this product requires a week long training course that is way beyond this blog.  If you have any interest in how this product can help you then by all means contact an ImageSource Sales Representative through our web site at www.imagesourceinc.com

  

Oracle IPM 11g Released!

March 27, 2010

For those of you who have not heard Oracle has released the next generation of their Enterprise Content Management Software, Imaging and Process Management (IPM) 11g.  This version is the first major step that Oracle has taken to tightly integrate the product into Oracle’s overall software architecture…IPM 11g has been completely overhauled to be part of the Fusion Middleware (FMW) tech stack.  From the ECM perspective, Oracle now has a complete seamlessly integrated end to end offering that includes the storage repository, document management, business process management, library services, web publishing, records management, reporting/monitoring and application integration.  This creates many advantages for customers that use or plan to use other Oracle products in their workplace, as well as, integrating and leveraging existing investments in non-Oracle software.

I have been working as a Systems Engineer and Project Manager with the IPM software base for over 8 years, through the Stellent IBPM acquisition, all the way back to the Optika Acorde and eMedia days.  A couple major differences in implementing the latest Oracle 11g version are the requirements for Oracle Universal Content Management (UCM) for the storage repository and Oracle WebLogic Server for the application/web server.  I look at both of these requirements in a positive light.  UCM and WebLogic Server are powerful robust products that provide standard approaches to managing content storage and applications, respectively, from the FMW perspective.  With that said, if you do not have experience with either UCM or WebLogic, you will need to get up to speed with them to succeed in an IPM implementation.  Neither of these products can be installed through the “Next, Next, Next, Finished!” approach, so careful upfront planning and architecting is required to ensure a successful implementation. 

Let’s talk about the new user interface a little bit.  Oracle has followed suit with the rest of the major players in the ECM world by creating a complete web based interface for performing all administrative and end user functions.  This makes administration duties of the system much easier than in past versions that require administration to be done through the “thick” client.  Also, by moving to the WebLogic Server the full featured web interface is now much more browser agnostic than in the past.  The image viewer comes in two flavors that support over 400 file formats; a zero footprint view only version and the a re-written java applet that allows for full annotations, annotation security, and server based conversion/rendering for access speed.  The following are a couple of screen captures of the user interface from IPM 11g:

The Client Interface

The Zero Footprint Viewer

The Java Applet Viewer

Lastly, I would like to touch on a feature that is often overlooked when implementing ECM solutions, application integration. Oracle has done a great job in IPM 11g to provide some powerful capabilities for leveraging investments into Oracle and non-Oracle applications through integration.  For a process where users are assigning metadata to a record in their business application, through application integration this data can be pushed to and associated with the document stored in IPM 11g. Another example of integration would be the image enablement of a business application.  In this case a user could be accessing records in their ERP system a hotkey, menu item, or button in the application screen can retrieve and display the document from IPM without the user ever having to leave their business application.  These capabilities can create significant efficiencies in an organization through increased user productivity, the reduction of training and the simplification of support and administration. 

All in all I see the changes that Oracle made in IPM 11g as great additions to an already strong platform.  Oracle has a product that not only adheres to their architecture model, but also will provide many benefits to the customers that use it.  Stay tuned to this blog for more information related to our experiences with Oracle IPM 11g.

Ryan Keller
Project Manager
ImageSource, Inc.
  
Follow

Get every new post delivered to your Inbox.