Document Processing Flow
Metamaze automates the processing of documents or e-mails in different formats, including, but not limited to, PDF files, word files, images, scans, .... Every document that is processed goes through a number of steps:
- Ingesting the document
- Pre-processing of a document
- Document creation (page management and document classification)
- Information extraction (entity extraction)
- Object recognition (such as signatures, stamps, logo's, ...)
- Validation of business rules
- Output of processed values
Document Processing Flow
Metamaze has several services (APIs) enabling you to programmatically ingest files. These API services are secured through different security mechanisms. You can configure different security API settings, such as mutual SSL certification, basic password and email authentication, bearer token authentication, etc. It is also possible to manually upload files through the User Interface.
These services provide the option to send metadata information from your own data sources so you can compare this information with information extracted from the documents, by using business rules.
If the textual content of the ingested file is not a computer readable format, such as a scanned file, image, or a PDF, the document must first be converted to text. To this purpose Metamaze uses OCR services provided by Amazon, Google or Azure. The services recognise all parts of a document such as words, lines, paragraphs, ... so they can be used as text for document classification and information extraction.
The document classification and page management process will split an uploaded file into separate pages. These pages are then merged back into the appropriate documents (page management model), hereby automatically detecting the document type and language (document classification).
For example, a loan application can be uploaded as a single PDF file, containing multiple documents, such as pay slips, purchase offer and more. During this step, Metamaze will split the PDF file into several documents with their corresponding type. These separated documents are then prepared to be used in the next step of the Metamaze processing pipeline, such as recognising and processing certain information from these documents.
If you only want to process one document type, you don't need document classification. If each uploaded file always represents one document that does not need to be split or merged with any other files, or if each page in an upload should become a separate document, you do not need page management.
In this step, information is extracted from each document. Each piece of extracted information is properly formatted, based on your format configurations. From a pay slip document, for example, the date of the document, name, address and national register number of the employee, the employer name and address, the gross wage, net wage and day of payment can be extracted. For each date, for example, you have set a certain format such as 01/12/2020 (DD/MM/YYYY). If the text '1 December 2020' is recognised in the document as, for example, the day of payment, this value will be converted to the correct format.
If you don't want to recognise information and only want to split files into documents and/or predict document types you don't need this step.
Object recognition makes it possible to recognise signatures or other objects that are not text.
If you don't want to recognise objects you don't need this step.
Business rules are used to validate the information extracted from the document through conditions you can create.
Metamaze provides all the necessary settings for creating different conditions that can be combined via boolean operators such as AND and OR. These conditions enable you can compare different elements with each other:
- The value of an entity extracted from a document, e.g. the net salary from the pay slip.
- How many times an entity is present in the document, e.g. two signatures must be present
- The number of pages of a document
- Information from meta data coming from external data sources and sent along with the document, e.g. when the customer sends information in a web form such as net wages in addition to uploading his loan application document, this information can be sent along to validate with the net wages recognised by Metamaze from his pay slip.
- A previously set fixed value
- A regular expression
Elements can be compared by using all kinds of boolean operators such as smaller than, larger than, equal to, .... The outcome of the validation of business rules is sent along with the output of the document and is also visible in the 'production pipeline module'.
If you don't want to validate the extracted information with business rules you don't need this step.
When all steps have been completed, the result is sent to your own service, application or data source. Using the project settings, you can select the desired configuration to get the information into your system.