Metamaze
Search…
Annotation of training data

Annotation of training data

The annotation of data can consist of several steps depending on the project flow settings (see chapters 1 and 2):
  • Page management - if a file can contain multiple documents and therefore needs to be split up.
  • Document classification - if multiple document types must be recognised
  • Entity extraction - if words are to be recognised as entity values from a document
In order to be able to do the labeling with a team, there is a queue system available so that each team member can take a document out of the queue for labeling. When someone takes a document out of the queue, it disappears from the queue of all other users.

ANNOTATION MODULE

Via the button below you can open the label module.

ANNOTATION QUEUES

When you open the label module, you will see one or two queues, depending on the flow settings:
  • Queue for document creation (page management and/or document classification).
  • Queue for entity extraction.
The first panel (the first queue) contains a list of all the documents that still require page management and/or document classification. Together we call this document creation. The second panel shows all already created documents in which you can label the different entities.
You can filter the lists by document type and language if you only want to label for a certain document type or language.
When a user wants to perform a task from the queue, this task is removed from the list so that it is no longer visible to other users and each task is performed by a single user. Remove a task from a queue by clicking on it.
You can always place a task back by clicking on the button below, but all changes already made will be lost.
When you complete the task or place it back in the queue, you can always check the option to automatically get the next task without having to re-enter the queues. This way you can efficiently go through the entire queue.

PAGE MANAGEMENT

This section only applies if you want to process files that consist of multiple documents, e.g. a PDF file of a wage request that consists of different document types such as wage slips, quotation or a loan request.
To train the page management model you will have to split the files that have been uploaded into separate documents and choose the right type and language for each document.
Clicking on the first item of the queue "document creation" will open the first task.
Then two panels are opened:
  • Files & Documents (1)
    • Files - a list of the different uploaded files with per file the different pages.
    • Documents - the different documents. This is initially empty because no documents have been created yet
  • Preview - preview of the currently selected page.(2)
You can then create documents by dragging page by page, or multiple pages at once to the middle panel. When you start the first task, the software automatically selects the first page of the first document so you can start right away.
The page selection shown as a preview in the second panel is indicated by a wide blurry blue shadow as a border.
Now select all the pages that make up one document. You can select or deselect one or more pages in different ways:
  • Hold down the CTRL button and click multiple pages
  • Hold down the SHIFT button and click on the first and last page to select a list of consecutive pages.
  • Press spacebar key to select a page and go to the next page with your right arrow key and press spacebar key again to select it.
The selected pages have a fixed thinner blue border and a blue page number.
Drag and drop these pages to the box with a + icon to turn it into a new document.
Dragging and dropping will create a new document
For this you need to determine the type and the language. You can also change the name of the new document.
Another solution for adding selected pages to a new document is to use the SHIFT-d key combination. You will get an extra screen to create a document in which these pages will appear. Then choose the document type and language.

Deleting page(s) from a document

You can delete one or more selected pages from a document using the backspace key. You can also delete an individual page with the red cross at the top right of the page.
If you have selected multiple pages and you only want to delete the page that is shown as a preview (blurry border) you can use the key combination SHIFT+backspace.
Deleted pages will reappear in the list of uploaded files. If you delete the last page of a document, you will be asked to delete the document as well.

Move page(s) to another existing document

When you select multiple pages, you can move them to another document at once by pressing the button below for that other document.
When you delete the created document by clicking the button below, all pages will be placed back in the original file in the left panel.
You can also drag pages between documents using your left mouse button.

Move page(s) to a new document

You can move one or more selected pages of a document to a new document using the key combination SHIFT+d. Choose a document name, type and language.
If you delete the last page of a document, you will be prompted to delete the document as well.

Undo page(s) selection

If you have selected multiple pages via double click or the space bar key, you can undo the selection via the esc key.

Undo & Redo

You can delete, move, ... undo actions via SHIFT-z. With SHIFT+y you can perform a 'redo' action, so the opposite.

Preview of a page

The right pane shows you the preview of the selected page. You can enlarge and reduce this example (1) or open it on a second screen (2).
You can use the arrows to walk through the different pages. Using the rotation buttons you can rotate the page if necessary. The last button resets the document to its actual size and location.
Using the slider you can increase or decrease the content of the different panels.
Metamaze provides shortcuts to simplify the most important actions in the software. In each form you can navigate via the tab key to the next form element and press a button via the enter key. Each dropdown list supports auto-complete for easier search and selection.
You can retrieve the shortcuts of a module at any time by using the key combination SHIFT+h.
When all the pages from the different files have been added, the newly created documents must be saved. When the option to open the next task is checked, you will be assigned a new task, if not the queues will be shown.

DOCUMENT CLASSIFICATION

If page management is not enabled but you work with different document types, the screen will look different. You only need to specify the correct document type and language.

ENTITY EXTRACTION

In the second queue the different documents are shown where you can label the entities.
You can apply a filter to this overview. You can filter based on language, document type, upload identifier and filename.
Clicking on a job from the queue will open the label window.

Label guide

Prior to document labeling, it is very important that you analyze documents and labeling instructions are clearly explained, so that from the start there is correct and consistent labeling in your project. Correct and consistent labeling has a direct impact on the model quality, as such is having clear labeling instructions a critical element for data labelers. More information can be found in Guidelines to annotate correctly.
A complete overview of the label guidelines is available via the menu on the left side of the application.
For each document type you can set the label guidelines for each entity. The label guidelines show a list of two parts:
  • General guidelines - You can manage this list yourself. You can add your own items.
  • Entity guidelines for a document type - Here you can define guidelines for each entity per document type.
If you click on the + icon in the list of general guidelines you can create an item.
It will then be added to the list.
The 'edit page' button on the top right allows you to add and save the label guidelines.
This manual is always available in the application. You can always right-click on an entity, the description can be retrieved from the manual of that specific entity.

Displaying a document

To label a document, the application consists of two parts:
  • Data of the document consisting of three parts
    • Metadata
    • Labels
    • mylabels
  • Preview of the document to be labeled.

Document data

This view consists of three parts, an overview of the document, my Labels and the result.
  • Metadata - Here the properties of the document are shown. You can edit it by pressing the 'edit' button.
  • Labels - Shows all entities that may appear in the document with the number of values that you have designated as this entity (standard 0).
  • My Labels - an overview of the entity and their values that you have indicated on the document.

Labelling of a document

There are two types of entities that can be identified in a document
  • A text entity
  • An image entity
To switch between these two types, you need to select the appropriate mode using the buttons below, found in the little collapsible sidebar on the right of the screen.
The labeling of an entity depends on its entity-type:
  • For a text entity, select the first word with a left mouse click and then select the last word with a left mouse click.
  • For an image (or object) such as handwritten text or signatures, draw a rectangle around what you want to label. You start by indicating the top left corner with a left mouse click and then draw the rectangle to release your mouse. Use the hybrid view of the document.
Then a pop-up window will appear where you can select the correct entity type.
Labelling is always done on the document itself. Three views are provided for this purpose:
  • Hybrid - the document is shown as it looks original
  • Processed - the text is displayed as the OCR model has recognized it
  • Plain - the text is presented sequentially.
When certain words are not readable in a display, you can always change the display. In the three possible views it is possible to add entities.
After an entity is labeled, it is added to "my labels", where you can edit or delete it afterwards.
The table consists of
  • Label - the name or type of the entity
  • Value - the value of the entity
  • Parsed - the converted value to the desired format
  • Page - the page number where the entity appears
  • Security score (R) - the score with which the model found the entity (only if you used the model via the lightning icon)
  • Class entity - text or image entity
  • Delete an entity
In this table you can click on a row. The label will be shown on the document. With the right and left arrow on your keyboard you can walk through the labels.
You can also delete entities on the document by clicking the red cross that appears when you hover your mouse over an entity.
If a model has already been trained, you can have an extraction done based on the model. The model will then display the entities found with a certain certainty score per entity.
After which you can add the missing entities yourself.
An entity always has a color but if the entity was found by the model it has a lighter transparent color. If the entity was manually indicated by a user it has a darker color.
It is also possible to enlarge or reduce the display and open it on a second screen.
If a document is not readable or it is impossible to label entities, you can classify this document as failed.
For this, you need to give a type of reason why this document does not suffice and a description of the problem. This document will then not be included in the training data and will set the status as failed.
It is also possible to look up and label different entities of the same entity-types at once.

Shortcuts

During labeling, a variety of shortcuts are available. View the overview via the key combination SHIFT+h.
For labeling, you can also create shortcuts for each entity in your own profile settings (see last chapter). When you press this shortcut, the software will ask you to highlight that entity.

Label pattern

To label the training documents, you can use label patterns. In this way, the application will guide you in labeling the document based on a predefined pattern. This will speed up the labeling process.
A manager can create a label pattern in the project settings. More info can be found in project settings.
So if you often want to label the same type of documents, it is advisable to create a label pattern for this.
Activate a label pattern by clicking on the 'pattern' button and selecting the desired label pattern.
From any label view (training data management module, labeling module, manual intervention module) you can as a manager add a label pattern based on the order of your labels. To do so, click on the + icon and choose one of the two options
  • Create Pattern My Labels - a pattern is created based on the order of your labels.
  • Create Pattern Result - a pattern is created based on the order of the user who labeled the document. (Only applicable in the training data management module, part documents)
When you select a label pattern, the pane will navigate the user per each consecutive entity so the user only needs to highlight the words.
This pane has a number of parts
  • Cross - stop the label pattern
  • Document type
  • Go to previous entity in the pattern
  • Entity type for which you must indicate the words
  • Question mark - Open the label guidelines for that entity
  • Go to the following entity from the label pattern
If you indicate the words for that entity, the software will ask you to indicate the next entity in the pattern.

End labeling

When a document has been labeled successfully, you need to confirm this using the button below. This will be recorded for the training of the model.
Last modified 2mo ago