Entity extraction

When you take an item from the queue 'entity extraction', the document screen for entity extraction of that document opens.

Screen information

The validation screen for entity extraction consists of three panels:

  1. A panel with an overview of the upload with all documents coming out of the document creation step

  2. A panel with all entities, business rules and issues

  3. A panel with the preview of the selected document

You can collapse the first panel by choosing the partial view option:

You can collapse each section to reduce the visible information. They will stay collapsed across different sessions until you decide to make them visible again.

Evaluation of documents

Overview of upload and documents

The first panel shows the upload with one or more documents you need to handle and the metadata of the currently selected document.

Metadata document

The 'metadata' shows all the properties of the document:

FieldDescription

Status

  • Processing Document is being processed

  • Input required Document has been processed by Metamaze AI but requires human input to be completed

  • Processed Document has been processed

  • Failed Processing of the document failed

Type

The document type

Language

The language of the document which was set manually or predicted by the OCR step

Pages

The number of pages in the document

Name

The name of the document

ID

The unique ID of the document with handy copy button

Upload date

The date and time the document was uploaded in Metamaze

Actions

  • Copy URL Copy the URL for easy sharing with colleagues or support when you have a problem

  • Edit settings Edit the document settings, see #edit-settings

  • Download PDF Download a PDF version of the document

Edit document settings

The document settings allow you to change the name of the document, the language and the document type. There is an extra option to allow you to stay on the document to manually annotate entities. This is handy for documents you want to process immediately with only a few entities.

Changing the document type will delete all the annotations because the entities are different for each document type.

Documents

The first panel shows the upload with one or more documents you need to handle. Automatically the document is selected depending on which document you selected from the Uploads or documents view. You can select other documents by clicking on a row.

The following properties of the document list are shown:

FieldDescription

Status

Icons for the following statuses:

  • Processing Document is being processed

  • Input required Document has been processed by Metamaze AI but requires human input to be completed

  • Processed Document has been processed

  • Failed Processing of the document failed

Number

The position of the document in the upload

Actions

  • Page management Opens the page management screen for the upload. If you notice when processing documents that pages are wrong for a document, you can correct these in the page management screen. Afterwards, these modified documents will receive new predictions from the entity extraction model and will be queued for verification again if applicable.

  • Reject upload Rejects the whole upload. This action is only available in human validation.

Processing a single document

The second panel consists of several parts:

  • Entities - the entities defined for this document with the number of times they appear in the document

  • Issues - a list of all manual intervention conflicts you need to resolve (entities and business rules)

  • Annotations - a list of all entities found or manually added to the document. Entities have a transparent color if they were found by the model or a dark color if they were manually added to the document

  • Business rules - a list of all blocking and non-blocking business rules

The third panel shows the document.

Entities

This list shows each entity defined for this document together with the number of times a value is present in the document for this entity.

Issues

This list shows an overview of all possible issues for entities and business rules.

The list can contain any of the following possible issues:

  • Parsing failed (e.g. type of an entity is a date but the value is not a date or the entity couldn't be converted into the desired output format)

  • An entity type is recognized by the model with a confidence score lower than the configured threshold.

  • No entities are defined

  • A business rule is failing

  • An entity has been restricted and fails the restriction eg.

    • an entity is mandatory but was not found by the model

    • a required entity couldn't be found

    • the number of maximum unique occurrences was exceeded

    • no unique value could be found

  • An unsupported language was found

  • An enrichment failed

  • A required enrichment was not found

  • Couldn't find text

  • Unkown error

  • Model was not trained yet

The list shows the following data:

FieldDescription

Name

The entity with its value in the document

Issue

An explanation of the issue for the entity

Threshold

The threshold set for that entity

Score

The confidence score with which the entity value was found by the entity extraction model. Red color shows that the score is lower than the threshold.

Actions

  • Click on a row If the issue relates to an entity that was found, the entity will be shown in the document with its value and converted value.

All issues need to be solved before you can mark a document as Done.

Issue detail

The issue detail in the document view allows you to edit the annotation, validate or reject it.

The following actions are possible:

  • Edit - Allows you to edit the annotation by dragging the start or end cursor

  • Validate - Approve entity value. The entity will be added to the annotations list.

  • Reject - Do not approve entity value, e.g. due to a misprediction of the model. The entity will be deleted.

Annotations

The list gives an overview of all entities with their values found in the document or manually added:

  • The name of the entity

  • The value of the entity

  • The parsed value of entity

  • The user who added the annotation (AI if the entity was predicted)

  • The confidence score of the value of the entity if found by the model

  • The page on which the entity appears

You can click a row in this list to see the value in the document. You can use the keys 'right/left arrow' to walk through the entities.

You can add additional entity values in the document by clicking on the first and then last word, and selecting the correct label in the drop-down menu. More info on how to perform entity annotation can be found in Annotation of training data.

Bulk actions

By selecting the checkbox next to the annotations, you can delete entities.

Manual annotation

At the top you can find the "+ Manual annotation" button. Clicking it will open a pop-up window with input fields to enter entities manually. For example, if the OCR model could not convert part of the document correctly (e.g. due to a stamp on the page) and the value cannot be indicated on the document, you can manually add an entity value by selecting the entity label in the first input field, then enter the value and optionally enter the page number. You can enter multiple manual entity values using the “plus” button.

For composites & groups, the tooltip was expanded to make it easier to add manual entities for composites. Simply click on your composite or groups in the document, and a tooltip will appear, allowing you to add a manual annotation directly. To make the process even more convenient, we have preselected the composite/group for you.

Dragging entities into other composites/groups and move them between composites/groups

You can simply move entities by dragging them in the annotations list.

Business rules

The business rules panel gives an overview of all business rules and their validation result (red or green dot) for the selected document.

When expanding the business rule you can see in detail why a business rule is failing.

Enrichments

Enrichments are only visible in human validation. They are used to add additional external information to either a specific entity or to the document as a whole. The enrichments section is only visible for document types that have enrichments configured (see Enrichments).

In case no enrichments were found, the section will be empty.

If a certain enrichment is required for processing, there will be an issue related to it in the Issues section.

The list shows the following data:

FieldDescription

Name

Name of the enrichment as specified in the project settings

Enrichment Value

The value of the enrichment, or an error message in case the enrichment failed. Error messages are displayed in red.

Linked with

In case the enrichment is linked to the document as a whole, this column contains a document icon. If it is linked with a specific entity, it displays the label and the value of the relevant entity

Actions

  • Click on a row Allows you to modify the enrichment

  • Selecting multiple rows Allows you to delete or retry enrichments

These actions are only possible if the document has status "Input required"

It is possible to manually add an enrichment, by clicking "Link enrichment". A pop-up window will open, where you can select the name of the relevant enrichment, and the value.

Finish evaluation

At the top you have a number of actions to end your evaluation:

  • Back - You can use this button to put the document back in the queue if, for example, you are not sure how to handle this document.

  • Park - If you are not sure what to do with this document and you want to check first with your colleague or manager. This allows you to take the next document in the queue for now and to come back later by filtering on 'parked by users'. Fill in a reason for parking, this will help you later while asking your colleagues or manager for feedback. It reminds you of why you decided to park this specific document. The reason for parking will be shown in a yellow bar at the top of the second pane.

  • Reject - Used if there is a problem with the document, the next document will be loaded automatically. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see Custom errors.

  • Done - Mark the document as done when you have finished processing the document

Last updated