Entity extraction
When you take an item from the queue 'entity extraction', the document screen for entity extraction of that document opens.
Screen information
The validation screen for entity extraction consists of three panels:
A panel with an overview of the upload with all documents coming out of the document creation step
A panel with all entities, business rules and issues
A panel with the preview of the selected document
You can collapse the first panel by choosing the partial view option:
You can collapse each section to reduce the visible information. They will stay collapsed across different sessions until you decide to make them visible again.
Evaluation of documents
Overview of upload and documents
The first panel shows the upload with one or more documents you need to handle and the metadata of the currently selected document.
Metadata document
The 'metadata' shows all the properties of the document:
Field | Description |
---|---|
Status |
|
Type | The document type |
Language | The language of the document which was set manually or predicted by the OCR step |
Pages | The number of pages in the document |
Name | The name of the document |
ID | The unique ID of the document with handy copy button |
Upload date | The date and time the document was uploaded in Metamaze |
Actions |
|
Edit document settings
The document settings allow you to change the name of the document, the language and the document type. There is an extra option to allow you to stay on the document to manually annotate entities. This is handy for documents you want to process immediately with only a few entities.
Changing the document type will delete all the annotations because the entities are different for each document type.
Documents
The first panel shows the upload with one or more documents you need to handle. Automatically the document is selected depending on which document you selected from the Uploads or documents view. You can select other documents by clicking on a row.
The following properties of the document list are shown:
Field | Description |
---|---|
Status | Icons for the following statuses:
|
Number | The position of the document in the upload |
Actions |
|
Processing a single document
The second panel consists of several parts:
Entities - the entities defined for this document with the number of times they appear in the document
Issues - a list of all manual intervention conflicts you need to resolve (entities and business rules)
Annotations - a list of all entities found or manually added to the document. Entities have a transparent color if they were found by the model or a dark color if they were manually added to the document
Business rules - a list of all blocking and non-blocking business rules
The third panel shows the document.
Entities
This list shows each entity defined for this document together with the number of times a value is present in the document for this entity.
Issues
This list shows an overview of all possible issues for entities and business rules.
The list can contain any of the following possible issues:
Parsing failed (e.g. type of an entity is a date but the value is not a date or the entity couldn't be converted into the desired output format)
An entity type is recognized by the model with a confidence score lower than the configured threshold.
No entities are defined
A business rule is failing
An entity has been restricted and fails the restriction eg.
an entity is mandatory but was not found by the model
a required entity couldn't be found
the number of maximum unique occurrences was exceeded
no unique value could be found
An unsupported language was found
An enrichment failed
A required enrichment was not found
Couldn't find text
Unkown error
Model was not trained yet
The list shows the following data:
Field | Description |
---|---|
Name | The entity with its value in the document |
Issue | An explanation of the issue for the entity |
Threshold | The threshold set for that entity |
Score | The confidence score with which the entity value was found by the entity extraction model. Red color shows that the score is lower than the threshold. |
Actions |
|
All issues need to be solved before you can mark a document as Done.
Issue detail
The issue detail in the document view allows you to edit the annotation, validate or reject it.
The following actions are possible:
Edit - Allows you to edit the annotation by dragging the start or end cursor
Validate - Approve entity value. The entity will be added to the annotations list.
Reject - Do not approve entity value, e.g. due to a misprediction of the model. The entity will be deleted.
Annotations
The list gives an overview of all entities with their values found in the document or manually added:
The name of the entity
The value of the entity
The parsed value of entity
The user who added the annotation (AI if the entity was predicted)
The confidence score of the value of the entity if found by the model
The page on which the entity appears
You can click a row in this list to see the value in the document. You can use the keys 'right/left arrow' to walk through the entities.
You can add additional entity values in the document by clicking on the first and then last word, and selecting the correct label in the drop-down menu. More info on how to perform entity annotation can be found in Annotation of training data.
Bulk actions
By selecting the checkbox next to the annotations, you can delete entities.
Manual annotation
At the top you can find the "+ Manual annotation" button. Clicking it will open a pop-up window with input fields to enter entities manually. For example, if the OCR model could not convert part of the document correctly (e.g. due to a stamp on the page) and the value cannot be indicated on the document, you can manually add an entity value by selecting the entity label in the first input field, then enter the value and optionally enter the page number. You can enter multiple manual entity values using the “plus” button.
For composites & groups, the tooltip was expanded to make it easier to add manual entities for composites. Simply click on your composite or groups in the document, and a tooltip will appear, allowing you to add a manual annotation directly. To make the process even more convenient, we have preselected the composite/group for you.
Dragging entities into other composites/groups and move them between composites/groups
You can simply move entities by dragging them in the annotations list.
Business rules
The business rules panel gives an overview of all business rules and their validation result (red or green dot) for the selected document.
When expanding the business rule you can see in detail why a business rule is failing.
Enrichments
Enrichments are only visible in human validation. They are used to add additional external information to either a specific entity or to the document as a whole. The enrichments section is only visible for document types that have enrichments configured (see Enrichments).
In case no enrichments were found, the section will be empty.
If a certain enrichment is required for processing, there will be an issue related to it in the Issues section.
The list shows the following data:
Field | Description |
---|---|
Name | Name of the enrichment as specified in the project settings |
Enrichment Value | The value of the enrichment, or an error message in case the enrichment failed. Error messages are displayed in red. |
Linked with | In case the enrichment is linked to the document as a whole, this column contains a document icon. If it is linked with a specific entity, it displays the label and the value of the relevant entity |
Actions |
These actions are only possible if the document has status "Input required" |
It is possible to manually add an enrichment, by clicking "Link enrichment". A pop-up window will open, where you can select the name of the relevant enrichment, and the value.
Finish evaluation
At the top you have a number of actions to end your evaluation:
Back - You can use this button to put the document back in the queue if, for example, you are not sure how to handle this document.
Park - If you are not sure what to do with this document and you want to check first with your colleague or manager. This allows you to take the next document in the queue for now and to come back later by filtering on 'parked by users'. Fill in a reason for parking, this will help you later while asking your colleagues or manager for feedback. It reminds you of why you decided to park this specific document. The reason for parking will be shown in a yellow bar at the top of the second pane.
Reject - Used if there is a problem with the document, the next document will be loaded automatically. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see Custom errors.
Done - Mark the document as done when you have finished processing the document
Last updated