What are the existing solutions to process invoices?

This post presents the existing solutions available for invoice recognition, and explains how artificial intelligence can help for template-free invoice parsing.

Introduction

Invoice processing has long been a labor-intensive feat, requiring humans in the loop, from the collection of invoices, extraction and validation of data, to the payment process. New efforts to improve this entire workflow have involved sending and receiving invoices digitally, yet according to Billentis Market Report for eInvoicing, 70% of all invoice processing globally is still paper-based. What we are seeing at the forefront of the invoice digitisation revolution, is paper scan and image-based PDF invoices being processed with OCR (Optical Character Recognition).

Types of invoice formats

Nowadays, invoices can be sent and received in various formats, whether in paper or electronically in the form of PDFs, or directly via Electronic Data Interchange. However, paper invoices and the majority of invoices alike, come in unstructured formats, consisting of invoice data that cannot be automatically read from the document and input into accounting systems. In fact, even digital invoices can be unstructured as they exist solely as Visual Digital Format Invoices, or in other words, digital images of scanned invoices. These digital images have picture formats like JPG, PNG, GIF; scanned format like TIF; or most popularly, PDF containing a scanned image.

It is important to distinguish between the unstructured PDF and the structured Data PDF, which contains a data layer that reflects the fields of an invoice in a structured way. While the latter allows a computer to process the information contained in the document, common PDF only allows a human reader to see the content of a document. Apart from Data PDF, examples of structured invoice formats include XML, EDI, HTML, CSV, and template Spreadsheets.

  • Unstructured Digital: Visual Digital Format Invoices - the primary aspect is visualisation of the content, which assumes human processing.
  • Structured Digital: Data Format Invoices - documents with data layer that are either standardized (based on publicly shared specifications EN 16931, UBL, CII, EDIFACT, etc.) or unstandardized (not following publicly shared specifications) - the primary aspect is computer processing of the data, but some formats may also provide for visualisation.

Where OCR comes in

OCR (Optical Character Recognition) was hailed as the solution to processing unstructured documents, and since the large amount of invoices being exchanged are unstructured, using online OCR in invoice scanning and processing has been adopted by many companies. According to Billentis, more than 40% of large businesses use front-end scanning and OCR for invoice processing with robust growth in trend. Since paper invoices and invoices sent as images are unstructured documents with no data layer, the data needs to be either entered into the accounting systems manually by a clerk, or scanned and extracted using traditional OCR with a set of rules. OCR transforms the images into searchable and manageable metadata to populate the defined invoice fields that need to be detected. The output will be structured data that can be automatically processed by accounting systems. We will dive deeper into what OCR is in another article and how it is used in combination with other cognitive technologies to extract invoice data.

What are the alternatives for Invoice Data Extraction

When it comes to extraction of fields from invoices (VAT, Invoice line, Amount, etc…), companies have several options, which are not all equally beneficial in terms of cost, speed, and accuracy. Currently, there are multiple ways by which companies process incoming invoices:

  1. Manually: The invoices are sent and received in paper or through emails, and a clerk inputs the fields of invoices into the accounting system.
  2. Digital PDF Capture: The supplier is asked to manually fill in a PDF with a data layer that reflects the fields of an invoice in a structured way. These PDFs can then be read and processed by the receiving company's accounting systems. The cost and onboarding efforts for suppliers are very high in that case.
  3. Invoice portal: The supplier manually fills in a form on a portal with their invoice information, resulting in invoices output as structured documents sent to the company. This works similarly to the Digital PDF Capture but is used exclusively between the specific supplier and company. The cost and onboarding efforts for suppliers are also very high in that case.
  4. Private digital billing channels: The supplier sends invoices in XML or EDI formats as structured documents containing all the invoice fields. This method requires suppliers to adapt their billing applications to send invoices in such formats.
  5. Traditional OCR with template-based rules: Scanned invoices are processed automatically by an online OCR tool and the retrieved text is classified as certain fields based on a set of rules. The output is structured invoice data that can be read and processed by accounting systems. However, this method requires setup time for each company and does not scale to other types of documents.
  6. Template free invoice recognition: thanks to Machine Learning, invoices can be read and understood whatever their format. No set-up is needed. However some of these solutions are in their infancy and have unequal results. We observe that some of them are still double-checked by humans, which explains a high unit price. Compared to other solutions, our solution Fyn.ai offers very high accuracy and speed, for a low cost.

So what’ the best option for accounting professionals ?

The end of basic OCR

Many believe that OCR softwares with template -based rules (see n°5 above) solved the manual data entry chores required for bringing unstructured invoices (paper or images) into an electronic workflow for processing.This is far from being true, the output is only highly accurate when it comes to formats that the software has already seen. Companies spend much money on off-the-shelf OCR tools every year in the hope of managing piles of paper invoices better. Yet they are discovering that it may not be the best strategy. The costs to set up templates for different invoices and having them processed at an acceptable level of accuracy simply outweighs the expectations of a workflow automation.

This is why many bookkeeping professionals we spoke with don’t trust online OCR solutions, and prefer to do the work themselves.

Why Artificial Intelligence is a game changer

Artificial Intelligence enables to solve all limits of the above listed solutions by combining high reliability, low costs, and high speed. While some solutions are started to embed Artificial Intelligence, no solution has yet managed to have a reliable, template free, user friendly AI solution.

That is why we launched Fyn.ai. Fyn uses AI to tackle the weaknesses of existing solutions. At short, Fyn is a smart invoice parsing solution, that works with high accuracy, precisely because it does not depend on templates unlike the current OCR solutions while being easy to use for users that are not supposed to have a technical background.

Thanks to artificial intelligence, Fyn enables bookkeeping professionals to not only decrease booking costs, but also to truly automatically transform all invoices into structured data. No more templates, no more tinkering and verification, only results that you can rely on, at a very competitive price.

Try soon for Free and see what Fyn can do for your business.

Discover what Fyn can do