How often have you found yourself looking at the PDF on the screen and scratching your head wondering why a word or phrase you are looking at on the monitor or printout just can’t be found in the PDF. Well you are not alone, and in fact very recently EID, Inc. identified the issue during a client’s implementation of an ECM system and reached out to the PDF Association for insight. Recently, the Executive Director of the PDF Association, Duff Johnson, wrote an excellent article, “What you may be missing when you search PDF documents” explaining a common cause for the problem and what should be done to address the issue within content and records management solutions.
This is an important issue because many organizations rely on search features, especially when searching for, or within documents in litigation matters, and public agencies use when complying with Freedom of Information or public records requests (FOIA or PRA).
An organization will often assume that the searches of the PDF files are working just fine, and they locate 500, or 5000, or 50,000 responsive documents. Now, imagine if after staff identifies and legal reviews these documents, the organization realizes that the searches they were relying on, did not produce accurate results. Worse, what if the problem is realized during litigation? The impact could range from extremely embarrassing to potentially adverse instructions or fines.
Organizations that rely on, create, and/or receive PDF files should consider evaluating whether their text extraction and search software uses PDF structures correctly, along with considering whether an ANSI 25 Trustworthy Assessment should be prepared to evaluate all content management practices. This does not occur in every environment nor with every PDF, but this does point the importance of evaluating content being ingested from outside your organization where you don’t have control over either software or content being generated, which is the case for most groups. This becomes more important when you start utilizing more and more of the advanced features available with the various PDF standards not available in the older formats originally used for document scanning.
As you consider this, you should also consider whether it is time to store documents in your content or records management system upon receipt rather than after completion as we all have done for many, many years. The difference between approaches lies in the basis that today when we either create or receive an electronic file, we need to save/store it somewhere (unless we decide to delete it), so why not save it in a trustworthy content/records management environment. The concept being that if you aren’t able to reliably store and retrieve the information exactly as you saved it over a period of time, why would you want to save your important documents there?
This and other articles are prepared by the Chair of the AIIM C27 Document Management Standards Committee (including C27.3 Trustworthy Assessments) and the Advanced Data Storage Committee.
For more information on ECM industry standards and best practices, contact:
Mr. Robert Blatt, MIT, LIT
C27 Committee Chair
C21 Committee Chair
Ms. Betsy Fanning
AIIM International Director of Standards
ISO TC/171 Secretariat