College Transcript Processing refers to converting a paper based transcript into an electronic transcript via software that OCR’s the scanned paper version, locates specific data within the transcript and saves that data for later use. The reason for processing a transcript via software is to improve the rate of data transfer to another system for storage and retrieval versus manual data entry by a data entry specialist. This is a somewhat difficult task due to the following reasons:
Each and every College presents similar data in a very different format.
Almost all colleges attempt to prevent the copying of the paper transcript through various copy protection methods. Most of these methods render the data on the transcript almost un-readable.
The data that is similar on a transcript falls into several main areas:
College Identifying Information
Student Identifying Information
Previous Colleges Attended Information
Degrees Awarded Information
The data is similar but not the same on each college transcript. In addition, the layout of a transcript varies greatly between the various colleges. Session/Course data could take up the entire width of the paper for one college, but be formatted as multiple columns of data for another college. There are many, many variations that need to be taken into consideration when attempting to OCR to find and extract the data.
So far the Abbyy FlexiCapture 9.x software has been able to handle most of these issues out of the box. One of its most powerful features I am finding out is the scripting language to write rule, custom scripts and export scripts that can correct OCR issues and assist the Verification Operator improving efficiency and throughput.
The scripts for rules, custom scripts or export can be written in VBasic or Jscript. There is some documentation on the Abbyy classes and objects, but not a whole lot. Most of what I have done has been through trial and error or in specific cases from examples provided by Tech Support. However, what scripts that have been developed work well for correcting OCR issues and providing automated checks of extracted field data. Through Custom scripts there is even the option to use a Database lookup on extracted data and return other fields from the database to assist in providing a complete set of validated information.
This has been a learning experience but it is proving to be well worth the effort in getting the data off the paper and into the system used to evaluate a student for enrollment by cutting down on the man hours required under the old manual data entry.
Last time we took a look at the Abbyy FlexiCapture product to perform College Transcript processing in a broad overview. This time I would like to start looking at some lower level details of the product that show where FlexiLayouts end and Project Level Document Definitions begin.
Let’s start with some basic definitions. A Layout is used to help the Recognition Engine to identify the document in a batch as belonging to a particular Document Definition. A Layout is also used to help the Recognition Engine to find the locations of the data to extract and place in fields the user can then see and modify if necessary. A Document Definition is used to determine the type of processing to perform on the document, the fields contained in the document and the type of data those fields should have. Continue reading →
Many Colleges and Universities must handle transcripts received from other Colleges and Universities for Student Enrollment processing. The receiving College goes through several steps to process a Student Application for Enrollment and the associated transcript(s) that Application may have and those steps may require the application to pass to several different people. Usually a folder is created to hold all the documents being received to support the Application and once all required documents are received this folder is passed on to Evaluators to evaluate the application and make an Admit or Deny decision.
The processing of a transcript may follow different processing depending on the College. In one case the information on the transcript is manually entered into an ERP system for Student Processing on a line by line basis. This is very labor intensive and slows the processing of a Student’s Application. In addition, the Evaluator must review the transcript and mark those lines that cannot be transferred, and manually add up the Units Attempted, Units Earned/Completed and calculate a GPA to see if the Student qualifies for admission. A lot of manual processing is done on a single transcript and a single Student may have one or more transcripts from previous institutions and all have to be evaluated in the same way to determine admission to the school.
There are a couple of products available that can help to automate this Student Enrollment Processing, Oracle I/PM and ABBYY FlexiCapture. Oracle I/PM can provide both the image storage of all the documents received and also a workflow to route the document sets through the various stages of processing electronically. This relives the use of paper in the processing and the associated issues of losing track of documents or folders and time consuming searches to find a document.
Since a paper transcript is different for different institutions a product that allows flexibility in processing different formats is required to read the data from the transcript and place it into similar fields that can be uploaded into an ERP system. The ABBYY FlexiCapture product allows for the capture of information from a free format form like a transcript. It has a module called FlexiLayout that allows the developer to specify where on a page a specific data set may reside. It can handle table data like Session/Course data which can be repeated multiple times on a single transcript. It can handle multiple page transcripts and multiple columns of data on a single page that continues on the next page. This product is very flexible in the design stages to allow the developer to handle almost all the common issues when attempting to extract data from a transcript.
By using the ABBYY FlexiCapture product and releasing the extracted transcript data and the image into I/PM there are several time and labor gains to Enrollment Processing.
Almost all manual routing of paper is eliminated. This saves time in both the movement of folders from one desk to another and also saves time in searching for the correct folder to place newly acquired documents.
Manual Line by Line data entry of transcripts is reduced. Even with the ABBYY product some labor is still required to review the extraction results and ensure the data is correct. However, this Validation step takes a lot less time and effort then manual line by line data entry. The data can then be uploaded electronically into an ERP/Student Processing system.
Since the extracted data is now in the I/PM repository it is easy to develop a form that can allow the evaluator to select the Session/Course lines to include in a Total Summary and then press a button so the totals are calculated automatically. This sure beats the manual method of using a hand calculator.
Using both of these products help in lowering the costs of processing Student Applications for Enrollment and the time consuming effort of transcript processing.