Model Predictions

Metamaze has several AI models that are used when processing a document:

  • OCR model - This model ensures that words and objects are read from a non-text file format with their respective coordinates so that the document can be used in the software and the text can be used as input for the other models.

  • Page management model - This model ensures that a file consisting of several documents can be split into individual documents. There is 1 page management model per language.

  • Document classification model - This model will determine the document type for each document. There is 1 document classification model per language.

  • Object detection model - This model can recognise objects such as signatures. There is one object detection model per language.

  • Entity extraction model - This model can extract information from the document. There is one entity model per language per document type.

Which models will or will not be used in your project depends on the steps you have activated in the project settings.

The OCR model is a generic model that is maintained and improved by the Metamaze technical team. You don't have to maintain it yourself. The other models are dedicated models per project and you need to maintain them yourself:

  • Add training data

  • Annotate data

  • Manage training data

  • Training and deploying models

  • Expand training data with production data

Once your models are ready, you can start processing documents. When a model does a prediction, it will always give a certainty score (confidence). E.g. entity A has a value of 'BBB' in the document with a certainty score of X%.

It is possible to use the manual intervention module to perform a human validation check. If manual intervention is enabled, in certain cases documents will be asked to be checked:

  • The document type is recognized by the model with a confidence score lower than the set threshold for that document type.

  • An entity type is recognised by the model with a confidence score lower than the set threshold for that entity type.

  • An entity cannot be validated or converted to the desired format (e.g. type is date but the value found is not a date).

  • An entity has been designated as mandatorily present but was not found by the model.

You can make manual intervention mandatory for each step, document prediction and/or entity extraction, so that each document can be checked. If you do not have historical documents for training your models, you can also start with manual intervention. The Metamaze software provides all the tools to quickly label documents, which will always be faster than manual work. Meanwhile, you can also label your data and after a certain amount of time you can switch to an automated process.