Page Management

If page management is enabled for your project, you can analyse the training metrics in the Page management tab of the Model Management module. Select the document type you want to see the metrics for in the dropdown menu.

A series of graphs show the main metrics over time. These metrics include F1 Score, Precision, Recall and the number of pages in the dataset, for each training that has been done in the past. All metrics reported relate to the validation dataset. This dataset consists of 10% of all relevant training data and is automatically split off at the start of a training. If your training dataset is very small, be careful when interpreting the metrics: it is not recommended to generalise metrics calculated on a handful of documents.

The graphs can be exported in several formats by clicking on the hamburger menu:

To see more details for a particular training, select the relevant date of the training in the dropdown menu under the graphs. The confusion matrix for the training will be visualised. In the confusion matrix, you can inspect mispredictions of the model on the validation dataset. On the y-axis, you can see the actual page number (as annotated in the training data), and on the x-axis you see the predicted page number. Green areas correspond to correct predictions, red areas to wrong ones. For instance, for the example below, 36 examples that should have been identified as a first page of a document, were wrongly predicted as another page, while only 26 were correctly identified.

You can further drill down the metrics by selecting one of the document types (page numbers in case of page management) at the bottom of the page.

A more detailed confusion matrix for the document type will be shown, in which "Positive" corresponds to the selected document type, and "Negative" to all other document types.

The Precision-recall graph allows you to inspect how accurate your model will be with relation to a certain threshold. The x-axis represents the prediction threshold, the red line represents the precision and the blue line the recall. The legend for the graph becomes visible when you hover over it. For the example below, if the threshold is set to 87, precision will be 39% and recall 37%. Depending on the use case, you can set the threshold to have a higher precision (in case it is very important to have no false positives, and less important if there are false negatives) or a higher recall (in case it is very important to find all examples of a certain type, even if this means that there will be more false positives).

A succinct explanation of all the relevant metrics can be found at the bottom of the page:

Last updated