Entity extraction

When you take an item from the queue 'entity extraction', the document screen for entity extraction of that document opens.

Screen information

The validation screen for entity extraction consists of three panels:

  1. A panel with an overview of the upload with all documents coming out of the document creation step

  2. A panel with all entities, business rules and issues

  3. A panel with the preview of the selected document

You can collapse the first panel by choosing the partial view option:

You can collapse each section to reduce the visible information. They will stay collapsed across different sessions until you decide to make them visible again.

Evaluation of documents

Overview of upload and documents

The first panel shows the upload with one or more documents you need to handle and the metadata of the currently selected document.

Metadata document

The 'metadata' shows all the properties of the document:

Edit document settings

The document settings allow you to change the name of the document, the language and the document type. There is an extra option to allow you to stay on the document to manually annotate entities. This is handy for documents you want to process immediately with only a few entities.

Changing the document type will delete all the annotations because the entities are different for each document type.

Documents

The first panel shows the upload with one or more documents you need to handle. Automatically the document is selected depending on which document you selected from the Uploads or documents view. You can select other documents by clicking on a row.

The following properties of the document list are shown:

Processing a single document

The second panel consists of several parts:

  • Entities - the entities defined for this document with the number of times they appear in the document

  • Issues - a list of all manual intervention conflicts you need to resolve (entities and business rules)

  • Annotations - a list of all entities found or manually added to the document. Entities have a transparent color if they were found by the model or a dark color if they were manually added to the document

  • Business rules - a list of all blocking and non-blocking business rules

The third panel shows the document.

Entities

This list shows each entity defined for this document together with the number of times a value is present in the document for this entity.

Issues

This list shows an overview of all possible issues for entities and business rules.

The list can contain any of the following possible issues:

  • Parsing failed (e.g. type of an entity is a date but the value is not a date or the entity couldn't be converted into the desired output format)

  • An entity type is recognized by the model with a confidence score lower than the configured threshold.

  • No entities are defined

  • A business rule is failing

  • An entity has been restricted and fails the restriction eg.

    • an entity is mandatory but was not found by the model

    • a required entity couldn't be found

    • the number of maximum unique occurrences was exceeded

    • no unique value could be found

  • An unsupported language was found

  • An enrichment failed

  • A required enrichment was not found

  • Couldn't find text

  • Unkown error

  • Model was not trained yet

The list shows the following data:

All issues need to be solved before you can mark a document as Done.

Issue detail

The issue detail in the document view allows you to edit the annotation, validate or reject it.

The following actions are possible:

  • Edit - Allows you to edit the annotation by dragging the start or end cursor

  • Validate - Approve entity value. The entity will be added to the annotations list.

  • Reject - Do not approve entity value, e.g. due to a misprediction of the model. The entity will be deleted.

Annotations

The list gives an overview of all entities with their values found in the document or manually added:

  • The name of the entity

  • The value of the entity

  • The parsed value of entity

  • The user who added the annotation (AI if the entity was predicted)

  • The confidence score of the value of the entity if found by the model

  • The page on which the entity appears

You can click a row in this list to see the value in the document. You can use the keys 'right/left arrow' to walk through the entities.

You can add additional entity values in the document by clicking on the first and then last word, and selecting the correct label in the drop-down menu. More info on how to perform entity annotation can be found in Annotation of training data.

Bulk actions

By selecting the checkbox next to the annotations, you can delete entities.

Manual annotation

At the top you can find the "+ Manual annotation" button. Clicking it will open a pop-up window with input fields to enter entities manually. For example, if the OCR model could not convert part of the document correctly (e.g. due to a stamp on the page) and the value cannot be indicated on the document, you can manually add an entity value by selecting the entity label in the first input field, then enter the value and optionally enter the page number. You can enter multiple manual entity values using the “plus” button.

For composites & groups, the tooltip was expanded to make it easier to add manual entities for composites. Simply click on your composite or groups in the document, and a tooltip will appear, allowing you to add a manual annotation directly. To make the process even more convenient, we have preselected the composite/group for you.

Dragging entities into other composites/groups and move them between composites/groups

You can simply move entities by dragging them in the annotations list.

Business rules

The business rules panel gives an overview of all business rules and their validation result (red or green dot) for the selected document.

When expanding the business rule you can see in detail why a business rule is failing.

Enrichments

Enrichments are only visible in human validation. They are used to add additional external information to either a specific entity or to the document as a whole. The enrichments section is only visible for document types that have enrichments configured (see Enrichments).

In case no enrichments were found, the section will be empty.

If a certain enrichment is required for processing, there will be an issue related to it in the Issues section.

The list shows the following data:

It is possible to manually add an enrichment, by clicking "Link enrichment". A pop-up window will open, where you can select the name of the relevant enrichment, and the value.

Finish evaluation

At the top you have a number of actions to end your evaluation:

  • Back - You can use this button to put the document back in the queue if, for example, you are not sure how to handle this document.

  • Reject - Used if there is a problem with the document, the next document will be loaded automatically. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see Custom errors.

  • Done - Mark the document as done when you have finished processing the document

Last updated