Entity extraction
Last updated
Last updated
When you take an item from the queue 'entity extraction', the document screen for entity extraction of that document opens.
The validation screen for entity extraction consists of three panels:
A panel with an overview of the upload with all documents coming out of the document creation step
A panel with all entities, business rules and issues
A panel with the preview of the selected document
You can collapse the first panel by choosing the partial view option:
You can collapse each section to reduce the visible information. They will stay collapsed across different sessions until you decide to make them visible again.
The first panel shows the upload with one or more documents you need to handle and the metadata of the currently selected document.
The 'metadata' shows all the properties of the document:
Status
Processing Document is being processed
Input required Document has been processed by Metamaze AI but requires human input to be completed
Processed Document has been processed
Failed Processing of the document failed
Type
The document type
Language
The language of the document which was set manually or predicted by the OCR step
Pages
The number of pages in the document
Name
The name of the document
ID
The unique ID of the document with handy copy button
Upload date
The date and time the document was uploaded in Metamaze
Actions
Copy URL Copy the URL for easy sharing with colleagues or support when you have a problem
Download PDF Download a PDF version of the document
The document settings allow you to change the name of the document, the language and the document type. There is an extra option to allow you to stay on the document to manually annotate entities. This is handy for documents you want to process immediately with only a few entities.
Changing the document type will delete all the annotations because the entities are different for each document type.
The first panel shows the upload with one or more documents you need to handle. Automatically the document is selected depending on which document you selected from the Uploads or documents view. You can select other documents by clicking on a row.
The following properties of the document list are shown:
Status
Icons for the following statuses:
Number
The position of the document in the upload
Actions
Page management Opens the page management screen for the upload. If you notice when processing documents that pages are wrong for a document, you can correct these in the page management screen. Afterwards, these modified documents will receive new predictions from the entity extraction model and will be queued for verification again if applicable.
Reject upload Rejects the whole upload. This action is only available in human validation.
The second panel consists of several parts:
Entities - the entities defined for this document with the number of times they appear in the document
Issues - a list of all manual intervention conflicts you need to resolve (entities and business rules)
Annotations - a list of all entities found or manually added to the document. Entities have a transparent color if they were found by the model or a dark color if they were manually added to the document
Business rules - a list of all blocking and non-blocking business rules
The third panel shows the document.
This list shows each entity defined for this document together with the number of times a value is present in the document for this entity.
This list shows an overview of all possible issues for entities and business rules.
The list can contain any of the following possible issues:
Parsing failed (e.g. type of an entity is a date but the value is not a date or the entity couldn't be converted into the desired output format)
An entity type is recognized by the model with a confidence score lower than the configured threshold.
No entities are defined
A business rule is failing
An entity has been restricted and fails the restriction eg.
an entity is mandatory but was not found by the model
a required entity couldn't be found
the number of maximum unique occurrences was exceeded
no unique value could be found
An unsupported language was found
An enrichment failed
A required enrichment was not found
Couldn't find text
Unkown error
Model was not trained yet
The list shows the following data:
Name
The entity with its value in the document
Issue
An explanation of the issue for the entity
Threshold
The threshold set for that entity
Score
The confidence score with which the entity value was found by the entity extraction model. Red color shows that the score is lower than the threshold.
Actions
Click on a row If the issue relates to an entity that was found, the entity will be shown in the document with its value and converted value.
All issues need to be solved before you can mark a document as Done.
The issue detail in the document view allows you to edit the annotation, validate or reject it.
The following actions are possible:
Edit - Allows you to edit the annotation by dragging the start or end cursor
Validate - Approve entity value. The entity will be added to the annotations list.
Reject - Do not approve entity value, e.g. due to a misprediction of the model. The entity will be deleted.
The list gives an overview of all entities with their values found in the document or manually added:
The name of the entity
The value of the entity
The parsed value of entity
The user who added the annotation (AI if the entity was predicted)
The confidence score of the value of the entity if found by the model
The page on which the entity appears
You can click a row in this list to see the value in the document. You can use the keys 'right/left arrow' to walk through the entities.
You can add additional entity values in the document by clicking on the first and then last word, and selecting the correct label in the drop-down menu. More info on how to perform entity annotation can be found in Annotation of training data.
By selecting the checkbox next to the annotations, you can delete entities.
At the top you can find the "+ Manual annotation" button. Clicking it will open a pop-up window with input fields to enter entities manually. For example, if the OCR model could not convert part of the document correctly (e.g. due to a stamp on the page) and the value cannot be indicated on the document, you can manually add an entity value by selecting the entity label in the first input field, then enter the value and optionally enter the page number. You can enter multiple manual entity values using the “plus” button.
For composites & groups, the tooltip was expanded to make it easier to add manual entities for composites. Simply click on your composite or groups in the document, and a tooltip will appear, allowing you to add a manual annotation directly. To make the process even more convenient, we have preselected the composite/group for you.
You can simply move entities by dragging them in the annotations list.
The business rules panel gives an overview of all business rules and their validation result (red or green dot) for the selected document.
When expanding the business rule you can see in detail why a business rule is failing.
Enrichments are only visible in human validation. They are used to add additional external information to either a specific entity or to the document as a whole. The enrichments section is only visible for document types that have enrichments configured (see Enrichments).
In case no enrichments were found, the section will be empty.
If a certain enrichment is required for processing, there will be an issue related to it in the Issues section.
The list shows the following data:
Name
Name of the enrichment as specified in the project settings
Enrichment Value
The value of the enrichment, or an error message in case the enrichment failed. Error messages are displayed in red.
Linked with
In case the enrichment is linked to the document as a whole, this column contains a document icon. If it is linked with a specific entity, it displays the label and the value of the relevant entity
Actions
Click on a row Allows you to modify the enrichment
Selecting multiple rows Allows you to delete or retry enrichments
These actions are only possible if the document has status "Input required"
It is possible to manually add an enrichment, by clicking "Link enrichment". A pop-up window will open, where you can select the name of the relevant enrichment, and the value.
At the top you have a number of actions to end your evaluation:
Back - You can use this button to put the document back in the queue if, for example, you are not sure how to handle this document.
Reject - Used if there is a problem with the document, the next document will be loaded automatically. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see Custom errors.
Done - Mark the document as done when you have finished processing the document
Edit settings Edit the document settings, see
Processing Document is being processed
Input required Document has been processed by Metamaze AI but requires human input to be completed
Processed Document has been processed
Failed Processing of the document failed
Park - If you are not sure what to do with this document and you want to check first with your colleague or manager. This allows you to take the next document in the queue for now and to come back later by filtering on 'parked by users'. Fill in a reason for parking, this will help you later while asking your colleagues or manager for feedback. It reminds you of why you decided to park this specific document. The reason for parking will be shown in a yellow bar at the top of the second pane.