Metamaze
Search…
Training data management
Metamaze provides an overview of all uploads and documents in the training data so you can see who labeled what. Only managers and administrators have access to these module. Training documents management can as well be done on document type level.

UPLOAD OVERVIEW

When you select the overview of the uploads, a view of all existing uploads will be shown.
Each line in this overview shows an upload with its properties. An upload can exist out of several documents or you have selected several files in your upload. These are located as a sub table in the row of an upload.

OVERVIEW

The overview consists of several columns:
  • Status
  • Upload ID
  • Upload date
  • Upload time
  • The user who performed the upload:
    • If it has been uploaded via the API, the value is 'system'
    • If it is uploaded into the management module via the upload button in the software, the value is the name of the user.
    • If it was sent from the production data to the training data, the value is the name of the user who has performed this action.
  • The user who created the documents (page management):
    • If a user has performed page management in the label module the value is the name of this user.
    • If a user has performed page management in manual intervention and the upload was sent from production to the training data, the value is the name of this user.
    • If page management is done by the page management model and the upload is sent from production to the training data, the value is 'system'.
  • Source
    • is the document loaded in the training data via the software or API is the value 'TRAINING
    • is the upload sent from production to the training data is the value 'PRODUCTION'.
  • The number of uploaded files
  • The number of uploaded pages
  • The number of documents created
  • A button to change the upload
  • A button to delete the upload
It's possible to customize your view and add/delete columns you would like to see/not like to see, by checking/unchecking a column name in the list which opens when you click on the '+' icon.
The following upload statuses are possible:
  • Queued
    • Files are ingested and queued for treatment by the pre-processor.
  • Preprocessing
    • The files are optimised to increase the quality of the OCR step. This includes page rotation, brightness optimisation, stain removal, font optimisation, ...
  • OCR extraction failed
    • The OCR model has not been able to read one or more pages of the file correctly.
  • Type of prediction
    • The document type of each page is determined by the document classification model
  • Page managing
    • The different pages are merged into a logical document by the page management model
  • Page management
    • The prediction of the page management model needs to be checked. With this status, the upload will be present in the label module, in the queue document creation / classification so that labellers can check the upload.
  • To label
    • At least one document from the upload is ready to be labeled. With this status, the documents will be present in the label module, in the queue entity extraction so labelers can label the document.
  • Labeling
    • One or more documents from the upload are tagged by a user
  • Done
    • All the documents from the upload are labeled and ready. With this status, the uploads and documents are used as training data.
Clicking on a row opens the page management module to manage the different documents that were created based on the opened upload.
In addition to the table, the screen of the page management module consists of two panels:
  • The left panel has two frames
    • A list of the original files from the upload.
    • A list of the documents created by the labeler by grouping pages as a document of a certain document type.
  • The right panel shows the selected 'preview' page.
You can activate or deactivate both screens via the menu on the right or via the shortcuts Shift+1 / Shift+2.
You can also show the selected page on a second screen by pressing the icon on the top right.
In this module, you can modify and save the created documents again. The documents that have been modified, will lose their entities and will have to be relabelled. They will reenter the entity extraction queue in the label module. How you can change documents, drag and drop pages, ... can be found in the page management section (2.3.2).

DOCUMENTS IN UPLOADS

When you open the row of an upload by clicking on the "+" sign at the beginning of the row, the different documents created from this upload will be shown.
Each row below this upload contains one document with the corresponding properties:
  • Status
  • Name of the document
  • Type of document
  • The language of document
  • User who labeled the document
  • The progress
  • The actions such as delete a document or update a document.
When you click on the row of a document, the application opens the label module and you can manage the different indicated entities on the selected document and see who labeled what.
In this overview you can also filter on certain data:
  • Status of an upload
  • ID of an upload
  • Date of upload
  • Users who have labeled
  • Entities that are included or excluded:
    • Include an entity by clicking the checkbox:
    • Exlude an entity by double clicking the checkbox:
The following statuses of a document are possible:
  • Not started
    • No user has started labeling this document yet.
  • Todo
    • When 4-eyes is set in the project settings and only one user has started labeling this document. So here another two user should start labeling
  • Ongoing
    • At least one user has already started labeling the document
  • Done
    • The document is fully labeled by at least one user, if the 4-eyes principle is not enabled.
    • The document is fully labeled by at least 2 people, if the 4-eyes principle is enabled.
  • Incomplete
    • There is a missing feature of the document that is mandatory such as the document type and language.
  • In conflict
    • When the 4-eyes principle has been activated and the two users have not labeled the document in exactly the same way
  • Failed
    • The document could not be processed by the application or was marked as faild by a user.

DOCUMENT OVERVIEW

When you select the second menu item 'documents', you will get an overview of all the documents that are currently present in the training data.
The columns are the same as in the subtable of an upload containing the documents from that upload. This table simply gives you the list of documents and provides filter and search fields for searching for one or more documents.
This view is available at both project and document type level, in the latter we also have columns organisation and project.

Displaying a document

To adapt the entities of a document, the application consists of two parts
  • an overview with the data and the different entities of the document on the left.
  • On the right-hand side, the document that is labeled is displayed.

Data of a document

This view consists of three parts, an overview of the document, my Labels and the result.
  • Overview
    • Metadata - Here the properties of the document are shown. You can change these by unlocking the document.
    • Labels - An overview of all entities defined for this document type and how many of them are labelled in total.
    • Labelled By - An overview of all users who have labelled the document.
  • My Labels - Your labels will be shown here if there are any.
  • Result - The result is shown here. This data is used as training data. If a user has labeled the document, his labels will appear in the result.

Label Author Table

This table shows which users labeled the document. This table has a number of fields:
  • A plus button to see which labels a user has added. This sub table also has a number of fields:
    • Label - the name of the entity
    • Value - the value of the entity
    • Parsed - the converted value to the desired format
    • Page number - the page number where the entity appears
    • Security score - the score with which the model found the entity (only if you used the model via the lightning icon)
    • Type of entity - text or image entity
  • The name of the user
  • The start date when the user has started labeling.
  • How many labels found
  • How many labels that have not yet been labeled
  • An eye icon to see the entity of that user in the preview document
  • A delete icon to remove the user's labels.

Customise labels

In this module you can overwrite the labels of a document by labelling the document itself and saving the document. To do this, go to 'My Labels', identify the entities and save the document. Your labels will overwrite the other ones and end up in the result. If you only want to modify one entity of a document or if you want to check and modify one or more entities in one or more documents, it is best to use the task module used for quality control. (See chapter 3.4)
Last modified 3mo ago