Optical Character Recognition (OCR)
When it comes to innovative new document management solutions, Optical Character Recognition (OCR) is at the very forefront of this development. Optical Character Recognition is a process where an image file of a document is scanned, and the text is “read” and converted to electronic characters. This essentially means the document is “searchable,” meaning that both the individual content in the document and the filename can be scanned.
Optical Character Recognition can save your business valuable time and money, as it speeds up the retrieval of documents from within a highly efficient document management system. This will allow you to optimise your business workflow and implement long-term solutions that work within the overarching digital transformation trend.
The type of Optical Character Solution recommended by our Sales team will depend on your individual requirements and whether the data is primarily used for searching, populating index fields, or extracted for use in another system altogether.
Please contact Document Data Group for more information about how we can help and what solutions work for your business.
There are 4 main ways that Optical Character Recognition can be used in the context of EDM:
Full text OCR– This is a process where the entire document is recognised, and all of the text found is stored with the image file to enable the user to search on any word, number or string of characters.
Zonal OCR – When the documents being processed are of a consistent format the data found on the page can be extracted and used to populate index fields to identify the document. This is commonly used for delivery notes, feedback forms and greatly reduces the time required to index high volumes of repetitive documents.
Profiled OCR – This is where the document type is profiled to create a “map” of where particular data will be found on the page. Software systems identify which “map” to use through a series of unique identifiers such as a VAT Reg Number, then apply this map to the document to extract the key fields as required. This is commonly used to process Purchase Invoices where the data once extracted, can then be imported to a Financial Software Package such as Sage to reduce the time spent on manual transaction keying.
Context OCR – This is similar to Profiled OCR but instead of mapping the areas on the document that will contain the data required, the software will “learn” where the data is based on the surrounding text. For example, if the system finds the words “gross total” then the number to the right with a £ sign will more than likely be the Gross Total value for the Invoice. This method requires time and does rely on human input to deal with exceptions.