Metamaze
Search…
Human Validation Entity Extraction
When you take an item from the queue 'entity extraction', the human validation screen for entity extraction of that document opens.

SHORCUTS

Like all other pages, you can perform actions using shortcuts. Press the SHIFT-h key combination to get an overview of all shortcuts.

PANELS

The human validation screen for entity extraction consists of three panels:
    A panel with an overview of the upload with all documents coming out of the document creation step and the result of the business rules validation.
    A panel with all the information and items that need to be checked.
    A panel with the document you need to check.
You can activate or deactivate the panels via the menu on the right or via the shortcuts SHIFT+1 / SHIFT+2 / SHIFT+3.
You can also show the third panel on a second screen. To do so, press the icon at the top right. This can be useful if you have an extra larger screen. This also frees up extra space to show more information in the other panels.

EVALUATION OF DOCUMENTS

Overview of upload and documents

The first panel shows the upload with one or more documents you need to handle. Automatically the first document is selected and the result of the validation of the business rules of that document is shown.
The upload panel consists of several rows:
    The first row always shows the original upload. If you click on it you will see the result of the page management and document classification model. If you notice when processing documents that pages are wrong for a document, you can click on this upload row and treat the incorrect pages again. Afterwards, these modified documents will go through the entity extraction model again and will be queued for entity extraction control again. This row contains three fields:
      Time of upload
      Identifier of the upload
      Manual intervention status (full upload treatment is ready or not ready)
    The other rows show all documents resulting from the previous step 'page management and document classification'. By default, the first document is selected (see blue background color). These rows contain 4 fields
      Name of the document
      Document type
      Document language
      Status of the document (ready or not)
You can select other documents by clicking on a row.
The business rules panel gives an overview of all business rules and their validation result (red or green dot) for the selected document.

Processing a single document

The second panel consists of several parts:
    Metadata - the properties of the document
    Labels - the entities defined for this document with the number of times they appear in the document
    Issues - a list of all manual intervention conflicts you need to resolve
    Labelling - a list of all entities found or manually placed in the document. Entities have a transparent color if they are found by the model or a dark color if they are applied manually in the document.
The third panel shows the document.

Metadata

The 'metadata' section shows all the properties of the document:
    The name of the document
    The user who performed the upload. Via the API, the value will be 'system'.
    The language of the document
    The number of pages of the document
    The document type
    The date of the upload
You can change the information. If the language or document type is changed, the human validation upload will disappear and redo the step 'entity extraction' and it will reenter the entity extraction queue.

Labels

This list shows each entity defined for this document together with the number of times a value is present in the document for this entity.

Issues

This list shows an overview of all possible conflicts:
    An entity type is recognised by the model with a security score lower than the set threshold for document types.
    An entity cannot be validated (e.g. type is a date but the value is not a date) or converted to the desired format.
    An entity has been designated as mandatorily present but was not found by the model.
The intention is that you look at each conflict and take the right action. By default, the first conflict is selected so that you can immediately see the value in the document in the third panel.
The list consists of 4 columns:
    The entity with its value in the document
    The threshold set for that entity
    The collateral score with which the entity value was found by the entity extraction model. Red color shows that the certainty score is lower than the threshold.
    A number of actions
      Cross icon - Do not approve entity value, e.g. due to a misprediction of the model. The entity will be deleted.
      Tick icon - Approve entity value. The entity will be added to the 'labelling' list
      Eye icon - View the entity value in the document. Clicking the row has the same result. The entity will be shown in the document with its value and converted value.

Labeling

At the top, there are input fields to enter entities manually. For example, if the OCR model could not convert part of the document correctly (e.g. due to a coffee stain) and the value cannot be indicated on the document, you can manually add an entity value by selecting the entity in the first input field, then enter the value and optionally enter the page number. You can enter multiple manual entity values using the “plus” button.
The list gives an overview of all entities with their values found in the document or manually indicated or added:
    The name of the entity
    The value of the entity
    The converted value of entity
    The collateral score of the value of the entity if the model had found it. A red color indicates that the score is lower than the threshold.
    The page in which the entity appears
    The class of the entity (text, image or composite)
    A delete icon to remove the label.
You can click a row in this list to see the value in the document. You can use the keys 'right/left arrow' to walk through the entity.
You can add additional entity values in the document by clicking on the first and then last word to define the entity type for the selection of the words. Another option is to use label patterns to speed up labelling.
More info on how to perform entity annotation can be found in Labelling of training data.
When you have finished processing the document, click on the checkmark. If you are not sure what to do with this document and you want to check first with your colleague or manager, click on the parking sign. This allows you to take the next document in the queue for now and to come back later by filtering on 'parked by users'. If the document is not treatable, click on the cross. Afterwards, the next document will be selected automatically. If it was the last document, you will be asked to submit it and the next manual intervention item will be started.
In case you have clicked on the parking sign, following screen will pop-up:
Here you need to fill in a reason for parking, this will help you later with asking feedback of your colleague or manager for all documents you have parked. It reminds you of why you decided to put this specific document in parked. This reason can be found in the second panel or the panel with all the information and items that need to be checked between metadata and annotations.

FINISH EVALUATION

At the bottom left of the human validation document processing panel you have a number of actions to complete your evaluation:
    Put the human validation document back in the queue - You use this button to restore the upload when you are not sure how to handle this upload, for example.
    Batch ready - You use this button when you have finished successfully processing all documents from this upload. The upload will then go to the next step 'output' to send the result to your system.
    Send as failed - This button is used if there is a problem with the upload. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see chapter 'project management'.
    Option to immediately get the next item out of the queue. This option will immediately take the next item out of the queue taking into account your queue filter settings. If this option is disabled you will return to the queued page.
Last modified 8mo ago