📋Tasks
Tasks is the way in Metamaze to efficiently label and review your training data.
Last updated
Tasks is the way in Metamaze to efficiently label and review your training data.
Last updated
Metamaze contains a task module that makes it possible to quickly examine the documents in the training data and correct them if necessary. For example, you can select all documents for one or more entities and step by step it will present the pages where those entities occur for checking.
This module has two panels: the first one shows the created tasks, the second one shows suggestions for creating new tasks.
Metamaze will suggest tasks for both data that was uploaded in the training module and data that was uploaded in the production module and needed human intervention:
The suggested tasks show the following columns:
Task type - review or an annotation task
Documents - the number of documents in the task
Source - documents uploaded in training or production
Model type - entity extraction or document classification
Document type - corresponds with the document types that are defined in project settings
Language - the language of the documents in the task
Suggested annotation tasks help speed up the annotation process by choosing the most useful data to annotate from scratch. At the end of a model training, Metamaze will select, among the unlabelled documents present in the training module at that moment, those documents from which the model can learn the most. Annotating these documents will allow the model to become more accurate more quickly (active learning). It is thus recommended to have sufficient unlabelled data present in the training module when triggering a training. You will notice that the documents in the suggested annotation task already have some suggested annotations to speed up labelling.
In case you haven't trained a model yet, Metamaze will also bundle unlabelled documents for annotation. However the resulting task won't include model-assisted labelling or any pre-selection of particularly relevant documents.
By default, the following is enabled:
Grouping of similar documents
Documents that add the most value are ranked first (based on document confidence score)
Model-assisted labelling (predictions are loaded based on the previous training of the model)
Suggested review help you improve the annotation quality of existing documents that are already annotated. These documents can either already be in the training set or come from production.
Only documents that had the status PROCESSED
before the last training can be included in this task. Out of those documents, only the ones that are likely to contain annotation errors are part of the suggested tasks.
By default, the following is enabled:
Grouping of similar documents
Misannotation hints (based on annotation confidence score): in this section Metamaze will give suggestions about fields or document types that are likely to be misannotated. As a user you still need to validate those hints since they are merely suggestions.
The following documents are included
Documents that required human validation and for which the human validation was completed,
Documents that were manually sent to training from production
In both cases, these documents are included if the production status is PROCESSED
but the training status is Input required
because these documents have not yet been approved as training data. By performing this task, you will approve these production documents as training data.
This tasks resets after a model training as the results would be outdated. So only documents that were uploaded after the last model training are included.
By default, the following is enabled:
Grouping of similar documents
By clicking on a suggested task, a pop-up window will open:
The documents for the task have already been added, the only thing left to do is add operators and fill in the optional information if you want. Note that you can split tasks among several operators to keep them short.
When you create a new task, via the '+ Task' button, the following pop-up window will open:
In this window, you fill in fields for the task
Validation type
Entity type Review or add entity annotations on documents
Document type Review or assign document types to documents
Languages
Select which languages you want to allow for the task
Filter options to select a group of documents for checking
These will allow you to apply additional filters, such as the user who did the annotations or specific document statuses.
Annotated by user Filter based on annotations done by specific users
Document status Filter based on document status
Documents in tasks Filter based on documents in other tasks
Entities list Filter based on entities existing or not
Occurrences of entity Filter based on the number of times an entity occurs
Source Filter based on the source of the documents: training or production
Upload date Filter based on the upload date of a document
Validation date Filter based on the validation date on a document
Value of entity Filter based on the value of an entity
As a final step you can calculate how many documents meet these conditions. Just like the suggested tasks, you can assign operators and fill in optional information.
Once a task is created (suggestion or custom) the task will appear in the active tasks overview where all tasks are shown with their current progress.
Each task can be expanded to show the progress for each individual operator assigned to the task.
When a task is created to which you have been assigned as an operator, you get an extra link which will allow you to start or resume the task.
When the task is started, the first document will be opened in the labelling view and you can examine the different entities and/or the document type and language.
The task details show you a high level overview of the task which was created which is relevant to you. This means if a task of 400 documents is created and 100 documents were assigned to you, you will only see those 100 documents.
The task section displays information about the task:
Progress
Your current progress in percentage and progress bar widget.
Source
The source of the documents in the task:
Empty for custom tasks
Training
Production
Task type
Review Review documents that already have annotations
Annotation Annotate documents which have no annotations yet.
Deadline
The deadline by when the task should be completed.
Description
The description of the task
Assignee
Assignee of for the current task
Documents represented with colored and numbered buttons
Grey Document needs to be processed
Green Document marked as processed
Yellow Parked document
Red Document marked as failed
The 'metadata' shows all the properties of the document:
Status
Processing Document is being processed
Input required Document has been processed by Metamaze AI but requires human input to be completed
Processed Document has been processed
Failed Processing of the document failed
Type
The document type
Language
The language of the document which was set manually or predicted by the OCR step
Pages
The number of pages in the document
Name
The name of the document
ID
The unique ID of the document with handy copy button
Upload date
The date and time the document was uploaded in Metamaze
Actions
Copy URL Copy the URL for easy sharing with colleagues or support when you have a problem
Edit settings Edit the document settings, see #edit-settings
Download PDF Download a PDF version of the document
The document settings allow you to change the name of the document, the language and the document type. Changing the document type will delete all the annotations because the entities are different for each document type.
At the top of the document processing screen you have a number of buttons that help you navigate between documents and uploads.
Back - When in human validation, this button puts the document back in the queue. When in training or production data modules, or in tasks, it will go back to the overview of uploads, documents or tasks.
Upload done - This button is only available in human validation. You use this button when you have successfully finished processing all documents from an upload. The upload will then go to the next step 'output' to send the result to your system.
Done - When you are done with your intervention, you can mark the document as done.
Park - This button is only available in human intervention and in the tasks module. Use this button to park the document to handle it at a later time. A popup will be shown where you need to fill in a reason for parking, this will help you later when asking your colleagues or manager for feedback about the document. It reminds you of why you decided to put this specific document in parked.
Reject - This button is used if there is a problem with the document. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see Custom errors.
This list shows each entity defined for this document together with the number of times this entity was found in the document.
The list shows the following data:
Type
The type of the suggestion, can be one of suggested annotation, missing annotations, wrong indices, wrong label or wrong composite group.
Suggested
The suggested entity with its value in the document
Actions
Click on a row Selects the entity in the document view, giving you the option to apply or ignore.
The suggestion detail in the document view allows you to edit the annotation, validate or reject it.
The following actions are possible:
Edit - Allows you to edit an annotation by dragging the start and end cursor
Apply - Applies the suggestion and adds the annotation
Reject - The suggestion will be ignored and not added as an annotation
The list gives an overview of all entities with their values found in the document which have been predicted by a model or manually added by a person. An entity always has a color but if the entity was found by the model it has a lighter transparent color. If the entity was manually added by a person it has a darker color.
The list shows the following data:
Name
The name of the entity
Value
The value of the entity
Parsed
The parsed value of the entity
User
The user who did the annotation. The value AI indicates that the annotation was predicted by the model.
Score
The confidence score of entities that were predicted by the model. A red color indicates that the score is lower than the threshold.
Page
The page on which the entity appears
Actions
Click on a row Selects the entity in the document view, giving you actions to perform on the entity
You can click a row in this list to see the value in the document. You can add additional entity values in the document by clicking the first and the last word of an entity, or by selecting it with dragging, and then selecting the correct label in the dropdown menu that will open. More info on how to perform entity annotation can be found in Annotation of training data.
Selecting 1 or more entities through the checkboxes allows you to perform the following actions:
Delete annotations - Removes the selected annotations
Enrich annotation - Can only be used on 1 annotation at a time and allows you to link an enrichment to the entity.
The document preview allows you to see what the document looks like. It shows al the entities that were found. This view also has 2 modes:
Hybrid - the original document is shown
Formatted - the text is displayed as the OCR model has recognised it
When certain words are not readable in a display, you can always change the display. In the 2 possible views it is possible to add entities.
It is also possible to enlarge or reduce the display or to open it on a second screen.