Model Training

To start automatically processing documents, you need to upload training data to train your AI models such as page management, document classification and/or entity extraction. Training data are relevant examples of documents you anticipate to process.

These documents need to be uploaded into the system and labelled:

  • Split files into the right documents (page management)

  • Indicate the correct document type (document classification)

  • Indicate the correct language (language recognition)

  • Labelling the entity values in the documents (entity extraction)

These actions depend on the flow settings:

  • If each file you want to process represents one document, you don't need a page management model and you don't need to train the page management model.

  • If you have only created one document type, you should not train the document classification model and do not indicate any document types.

  • If you only process documents in 1 language, you do not have to indicate the language.

  • If you don't want to recognize information (entities) in documents, you shouldn't label them.

After you have uploaded files and performed the necessary labeling, you can train and test the models. When the predictions of these models reach the desired accuracy, you can roll them out (deploy) to production to start automatic document processing. It is always possible to roll back and deploy an older model.

Metamaze also has a module to perform quality checks on the training data where you can modify, delete and/or add data.

This chapter goes deeper into each of these steps:

  • Uploading of training data

  • Annotation of training data

  • Training data management

  • Train models and roll them out to production

  • Quality control and/or adjustment of training data