Document Classification
If your project has multiple document types, you can analyse the training metrics in the Document classification tab of the Model Management module.
A series of graphs show the main metrics over time. These metrics include F1 Score, Precision, Recall and the number of documents in the dataset, for each training that has been done in the past. All metrics reported relate to the validation dataset. This dataset consists of 10% of all relevant training data and is automatically split off at the start of a training. If your training dataset is very small, be careful when interpreting the metrics: it is not recommended to generalise metrics calculated on a handful of documents.
The graphs can be exported in several formats by clicking on the hamburger menu:
To see more details for a particular training, select the relevant date of the training in the dropdown menu under the graphs. The confusion matrix for the training will be visualised. In the confusion matrix, you can inspect mispredictions of the model on the validation dataset. On the y-axis, you can see the actual document type (as annotated in the training data), and on the x-axis you see the predicted document type. Green areas correspond to correct predictions, red areas to wrong ones. For instance, for the example below, 406 documents were correctly predicted to be of type "Compatibilité", while 9 of the documents that were labeled as being of this type, were predicted to be of type "Other/Info". If you notice that the model often mixes up two classes, it can be that these two classes are not well labeled, or that they are too similar to each other for the model to find properties to distinguish between them.
You can further drill down the metrics by selecting one of the document types at the bottom of the page.
A more detailed confusion matrix for the document type will be shown, in which "Positive" corresponds to the selected document type, and "Negative" to all other document types.
The Precision-recall graph allows you to inspect how accurate your model will be with relation to a certain threshold. The x-axis represents the prediction threshold, the red line represents the precision and the blue line the recall. The legend for the graph becomes visible when you hover over it. For the example below, if the threshold is set to 90, precision will be 91% and recall 69%. Depending on the use case, you can set the threshold to have a higher precision (in case it is very important to have no false positives, and less important if there are false negatives) or a higher recall (in case it is very important to find all examples of a certain type, even if this means that there will be more false positives).
A succinct explanation of all the relevant metrics can be found at the bottom of the page:
Last updated