1 — Choose Your Battles

Just because you have purchased a great new scanning/capture/data entry automation application doesn’t mean that it makes sense to automate every type of document under the sun.  Sure you may feel empowered to spend the time or money required to automate the indexing of that quarterly report that is generated only 4 times per year, but that would be analogous to hunting for quail with a bazooka.  Make sure that you look at the feasibility and return on investment before jumping into projects.  Always take the automation projects with the highest & fastest ROI first and pass on the low or negative net present value projects.

2 — Choose the Most Accurate Recognition Technology

Obviously, your choice may be limited by the records that you are trying to automate.  However, if you do have a choice, follow this simple rule. Barcode/Patch Code recognition is the most accurate, then OCR (machine printed text recognition), then constrained ICR (handwriting recognition), and lastly unconstrained ICR.

3 — Test For Recognition Accuracy Early

Even when using barcode recognition or OCR, the accuracy of recognition will likely be less than 100% in the long run.  Make certain that you test the accuracy of the recognition component of your capture/automation design early on in the process with a relatively large sample.  This will ensure that there are no surprises down the road.

Additionally, if you are in the evaluation stage of selecting an application, make sure that the supplier of the product performs the demonstration with a large sample of your documents.  Avoid demonstrations using standard documents that are prepared by the vendor.  Why?  You want to ensure that the automated indexing procedure that they have developed works on your documents with a high degree of accuracy…not only on their documents that they have prepared for the demo.  A good trick to throw at vendors is to provide 10 samples of the document type to be automated for the demonstration. Then, at the time of the demo, give them 100 more documents (of the same type) that they have never seen before.  This will truly address the accuracy of the application and automation process.

4 — Key on Documents You Control

In many capture applications, the logic used to automate indexing and separate an individual document (set of pages) from a batch is to key off of some identification page.  In most cases, it is easier to achieve full automation with a high level of accuracy if your identification page is one that you control.

Assume that you need to scan and index all of your vendor bills into your DM system.  Automating the indexing for these documents can be difficult since you have no control and there are many different formats of vendor invoices.  For example, 1000 different vendors could mean 1000 or more different invoice formats.  Creating an automated indexing process would be very time consuming in this case.  Furthermore, your vendors could change formats of their invoice on you without any notice.  This can result in the constant reworking of your data entry automation scheme.  Additionally, automating vendor invoices is a process that typically requires human quality control which will increase your overall costs.

As an alternative, explore automating the input using records you control as the identification page.  Using our vendor invoices example, you can use the checks you cut to pay the invoices as the identification page.  Your bank checks, in conjunction with your accounting system’s database, can typically provide an automation process that is nearly 100% accurate and fully automated.  The key here is a change in the process.  Rather than having each individual vendor invoice in your DM system as the process output, you would have a check packet in your system as the output.  The check packet would consist of the check (or check stub) followed by all of the invoices the check paid for.  If you every need to retrieve a specific invoice, you can search your accounting system for the check number that paid it and then pull up the check packet in your DM system.

Sure, this process does add an extra step to retrieval, but it cuts down dramatically on the input process costs and would provide a greater ROI due to reduced input costs related to quality assurance and the like.

5 — Quality Assurance

Any index automation process is prone to some level of error.  Therefore, it is best practice to establish some level of quality assurance procedure, even if it is a very brief procedure.  Even though today’s scanning devices have features to detect multi-feeds and auto-threshold scanned images, you will want to verify image quality even if on a random basis.

6 — Pre and Post-Verification

It is important to ensure that you track what was intended to be processed and what was actually processed.  At a minimum, simple page counts and record (individual documents in the batch) counts should be employed and verified with the output.  Even the most thorough index automation process can come across an unexpected file that will throw the process off.

7 — Documentation

There are thousands of technical writers working for thousands of software companies.  Some are definitely better than others.  However, regardless of how good these technical writers are, boiler plate documentation is never best for a specific process.  Take the time to document the process (with screenshots and videos if possible) for your scanning/indexing staff.  The time spent documenting the process will pay off tenfold down the road.

8 — Outsource

Last, and certainly not least, outsource any manual indexing processes that make sense to outsource.  All too often, firms spend time and money staffing people to perform tasks that are not part of the firm’s distinctive competence.  Outsourcing makes sense in many situations, even for small companies and small projects.  Keep in mind that you can lower costs through ‘hybrid’ outsourcing…where only part of the process is outsourced. 

Original Article