Validation

A key component of any IDP platform is validation of predictions with low confidence scores.

The validation module in Metamaze allows for human verification after a prediction of an AI model. In Metamaze, there are two steps in which an AI model performs a prediction:

  • Document creation: the model predicts which pages in an uploaded file belong together and form a document of a certain document type. It splits an uploaded file into different documents (page management model) and predicts the document type of each individual document (document classification model). If the page management model is not used, we only talk about document classification.

  • Entity extraction: the model extracts relevant information (one or more words, also called entities) from the document.

Each prediction is accompanied by a prediction score. The document creation model predicts with x% confidence that a document belongs to a certain document type. The entity extraction model predicts with x% confidence that one or more words in the document are an entity of a certain type.

If human validation is enabled, an upload of a series of files or documents end up in human validation if one of the following conditions is met:

  • The document creation and prediction confidence score is lower than the set threshold for a certain document type.

  • The entity prediction confidence score is lower than the threshold set for that entity

  • An entity set as 'required' was not found

  • An entity could not be converted to a particular format correctly. E.g.: the AI model finds a value for the entity of type 'date' that does not represent a date. The system tries to convert the text to a date of a certain format and fails.

  • One or more business rules fail

It is possible to enforce human validation at each step as and additional quality control (see Human validation).

All documents treated in human validation will be automatically sent to training with the status 'input required'. After creating a suggested task on those documents to review and correct any erroneous annotations to promote it to golden training data (status 'input required' -> 'done'), the documents from human validation will be included in the next training. Checks on quality are important, the model accuracy will be negatively impacted by mistakes.

The validation module is available for most roles (check out: Roles & permissions).

Queue System

When you open the validation module, you will get two lists (queues) of uploads that require human validation:

  1. A list of all uploads for document creation (either document classification or page management, depending on your project settings)

  2. A list of all documents for entity extraction

In this example you can see that the queue for document creation is empty and there are 8 documents in the queue for entity extraction.

It is possible to set up filters too, for example, only process items of one or more document types and/or languages. And you can also filter on uploadId or filename.

Parking documents

It is possible to filter out all documents that have been parked, go over them and provide feedback to the user who has parked them. You can filter based on a specific user or someone who has parked a document. It is also possible to filter documents that have not been parked. You can also find all the ones you parked so that is easy for you to reach out to your manager or a colleague.

Understanding upload assignments

As a user, you can assign an upload to yourself by clicking on it. When you start a validation, the upload will automatically be removed from the queue for all users to prevent conflicts.

At this time, Metamaze does not support the assignment of uploads to specific users. The assignment of uploads is designed for individual users to claim ownership of uploads.

Viewing and taking over uploads assigned to other users

By using the Assigned filter in the queues, you can find uploads that are currently assigned to other users. By default, this filter is set to "Unassigned" so that you only see uploads that are not currently being worked on.

When opening one of those uploads, you automatically assign them to yourself. The original user will receive a pop-up warning that you took over the document, and will not be able to make any more changes to prevent conflicts from arising.

Last updated