Train your models

To start automating your document processes, you need to train the relevant models. You can do this once you have sufficient labeled data (see below for more details).

Model management

Go to Menu > Training > Model management.

When there are sufficient documents labeled as training documents you can train or retrain the model.

You can start the following trainings:

  • Page management for a document type

  • Entity extraction for a document type

  • Document classification for the project

Note that page management and entity extraction trainings are always linked to a specific document type, while document type trainings always happen for the project as a whole.

In the overview of trainings, you can see the total number of documents and new documents. These new documents have not yet been used to train the model. This allows you to decide whether it is necessary to trigger a new training: if there are very little new documents, the model won't learn anything new, and triggering a training is unnecessary.

Training a model

Page management

You can only trigger a page management training if you have at least 10 documents with more than one page of the relevant document type in your training data. If you have more than one language in your data, you need at least 5 documents with more than one page per language. Languages with less documents will be discarded, hence the model will not make accurate predictions for those languages.

When clicking the train button for page management trainings, the following pop-up window opens:

In the overview you can see the total number of documents for each document type. Choose for which document type(s) to launch a training, you also have to option to choose from which projects you want to use documents for the training.

When clicking the next button, you get an overview for your training and when everything seems fine, you can start a new training by clicking the start training button.

Document classification

You can trigger a document classification training if there are at least 10 documents with status "Processed" in your training data. Additionally, you need minimum 5 documents for at least 2 different document types. If you have more than one language in your data, you need at least 5 documents per language, per type. Languages and document types with less documents will be discarded, hence the model will not make accurate predictions for those documents.

When clicking next, you get an overview of the training you are about to launch. By clicking on the 'Start Training', a new model will be trained based on the current training data.

If you want to exclude a document type from the document type prediction model, you need to specify this in the document type settings, see Configure a document type. You cannot exclude a document type through the model management module.

Entity extraction

In order to trigger a training for entity extraction, you need at least 10 documents with status "Processed" for the relevant document type, with at least one entity with at least 10 annotations. The amount of documents per language is irrelevant. Entities for which there are less than 10 annotations will be discarded.

When clicking the "Train" button for entity extraction, a pop-up window will open.

Select the document type(s) for which you want to launch a training and the type of training you want.

You can choose to do a full or an incremental training. A full training includes all the documents of the relevant document type with status "Processed" in your training data, and trains a model from scratch. Depending on your training data size, a full training can take a long time (even more than a day).

An incremental training only includes the documents that have been added since the last training, and simply updates the previous model version. Incremental trainings are much faster than full trainings, and should be selected whenever possible.

Full trainings are recommended only if the training is the first training ever, if your previous model version had a very low accuracy (F1 score < 70%), or if you notice that incremental trainings ceased to improve the quality of the predictions. In any other scenario, trigger an incremental training.

You can also choose which project documents you want to include when training a new model. When clicking next you get an overview for your training. By clicking start training a new model will be trained.

Testing and improving your models

At the end of a training, some suggested tasks will be added to the tasks module (see Tasks). These tasks allow you to do two things:

  • get an idea of how well your models perform by looking at real model predictions, without having to deploy your model to production

  • improve your training data and model accuracy by executing the tasks

The cycle of annotating data, reviewing annotations and retraining your models can be repeated until the predictions of your models reach the desired accuracy. At that moment, you can roll them out (deploy) to production to start automatic document processing. It is always possible to roll back and deploy an older model (see Model management for more details).

Alternatively, you can decide to go straight to production, even though your models are not very accurate yet or even when you do not have any models. Thanks to the human intervention module (see Validation), your team will be able to correct model predictions or add missing predictions to documents before the output is sent back to your system, allowing you to keep the data in your system clean. The annotations and corrections that are added in human intervention will be used to improve your models, and over time, your document process will be fully automated.

Last updated