Metamaze
Search…
Quality control of training data
The application contains a task module that makes it possible to quickly examine the documents of the training and adapt them if necessary. For example, you can check all documents for one or more entities and step by step he will present the pages where those entities occur for checking. This module is available at project as well as document type level.
You can view and create these tasks in the menu item below.
This module will show two panels where in the first you can create either a suggested task either a custom task. Where the former are tasks suggested by Metamaze, the properties of the task are already filled in. The latter are tasks you can create from scratch, all properties are custom set by the user.

Suggested tasks

Metamaze will suggest the creation of a task for both training and production data:

Suggested production tasks

With suggested tasks for production data, the following columns are shown:
    Status - blue still needs to be created, orange in progress,
    Model type - entity extraction or document classification
    Language - languages enabled in your project
    Document type - corresponds with the document types that are identified in project settings
    #Int-docs - amount of documents manually sent from production plus documents automatically sent from human validation
    Total - all documents that have been processed in production pipeline for this doc type and language since the last training
    Ratio - #Int-docs / Total

Suggested training tasks

With suggested tasks for training data, the following columns are shown:
    Status - blue still needs to be created, orange in progress, green is done
    Model type - entity extraction or document classification
    Language - languages enabled in your project
    Document type - corresponds with the document types that are identified in project settings
    Annotation confidence - how confidence is the model the annotation is done correctly
    Doc confidence - how confident is the model in detecting this document, the lower the value means the model is less familiar and thus training on this can add value
    Task type - review or annotation
    Total - amount of documents available for this suggested task

Suggested annotation tasks

Suggested annotation tasks help you speeding up the annotation process. Here Metamaze bundles unlabeled documents that add value for model training if you add them. Note that this only works on unlabeled documents that were uploaded before training the model.
By default following is enabled:
    Grouping of similar documents
    Documents that add the most value are ranked first (based on document confidence score)

Suggested review tasks

Suggested review help you improve the annotation quality of existing documents that are already annotated. Only documents that had the status DONE before the last training can be included in this task. Out of those documents, only the ones that are likely to be annotated wrongly are part of the suggested tasks.
By default following is enabled:
    Grouping of similar documents
    Autocorrect (based on annotation confidence)

Suggested vs custom tasks

When you create a new task, via the 'create task' button, the following pop-up window will appear for first suggested, second custom task.
When scrolling down:
In this window you are going to determine different properties of the task :
    A task description
    Filter options to select a group of documents for checking
      The language of the documents
      The source of the documents
        Charged via the training module
        Sent from production to training dates
      What do you want to check
        Entities or document types (page management will come in a next release)
    If you chose entities in the previous field
      Which entities of the document you want to examine
      Which entities are not relevant, i.e. which will be excluded
      By which user the entities are labeled.
In addition, you have the option to start the task immediately after it has been created.
You can also first calculate how many documents meet these conditions.

Once a task is created

From the second screen, a task can be started (in case you have been assigned a task) or edited (in case you have created it). It gives an overview where all tasks are shown with their properties and its current progress.
When the task is started, the first document will be opened in the labelling view and you can examine the different entities.
Note that the entity is immediately shown in the document. This module has two panels
    Document information
    The document
The document information has a number of sections:
    Task Information - Information such as name, progress and the amount of documents
    Metadata - The metadata of the document such as document type, language, name, ...
    Labels - All labels defined for that document type in the project settings.
    Labelled - The labels added to the document
    Actions
      Pattern - Activating a label pattern
      Cross - Set the document to failed status so that it is no longer present in the training data.
      Checkmark - Approve the document
      Park - If you are unsure, park the document and ask a colleague
      Previous - Go to the previous document
      Next - Go to the next document
On the document itself, you can perform all actions just like labelling. You can add, modify and delete entity. If you use the right and left arrows the application will scroll through the entities you wanted to control.
Last modified 2mo ago