📄Documents
Last updated
Last updated
When you use the toggle to switch between upload and documents view, you will get an overview of all the documents that are currently present in the production data.
This table simply gives you the list of documents and provides filter and search fields for searching for one or more documents.
You can quickly find specific documents by making use of the filtering functionality.
Filtering by "Entity Value" functions akin to a "starts with" criterion.
Example: If you filter on "Entity Value" with the term "ABC," the results will include documents where an entity value starts with "ABC," such as "ABC123" or "ABCxyz". Documents with entity values that contain "ABC" but don't start with it will be ignored, such as "123ABC" or "asdABC123".
Selecting documents allows you to perform an action on multiple documents:
Send to training Sends the selected documents that have been processed to the training data so that your models can be enriched with production data (sending to training data happens automatically for uploads that required input from the user. Note that the annotations on these documents still have to be reviewed and a new training has to be triggered before the model is updated!)
Retry upload(s) You can retry the document classification or entity extraction
When you click on a document row, you can view extra details such as when it was uploaded, which steps of the pipeline were performed and what the current status is of the selected document.
By clicking "View document", you can inspect the document more closely.
The 'metadata' shows all the properties of the document:
Field | Description |
---|---|
Status |
|
Type | The document type |
Language | The language of the document which was set manually or predicted by the OCR step |
Pages | The number of pages in the document |
Name | The name of the document |
ID | The unique ID of the document with handy copy button |
Upload date | The date and time the document was uploaded in Metamaze |
Actions |
|
The first panel shows the upload with one or more documents you need to handle. Automatically the document is selected depending on which document you selected from the Uploads or documents view. You can select other documents by clicking on a row.
The following properties of the document list are shown:
Field | Description |
---|---|
Status | Icons for the following statuses: |
Number | The position of the document in the upload |
Actions |
|
At the top of the document processing screen you have a number of buttons that help you navigate between documents and uploads.
Back - When in human validation, this button puts the document back in the queue. When in training or production data modules, or in tasks, it will go back to the overview of uploads, documents or tasks.
Upload done - This button is only available in human validation. You use this button when you have successfully finished processing all documents from an upload. The upload will then go to the next step 'output' to send the result to your system.
Done - When you are done with your intervention, you can mark the document as done.
Park - This button is only available in human intervention and in the tasks module. Use this button to park the document to handle it at a later time. A popup will be shown where you need to fill in a reason for parking, this will help you later when asking your colleagues or manager for feedback about the document. It reminds you of why you decided to put this specific document in parked.
Reject - This button is used if there is a problem with the document. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see Custom errors.
This list shows each entity defined for this document together with the number of times this entity was found in the document.
The list gives an overview of all entities with their values found in the document which have been predicted by a model or manually added by a person. An entity always has a colour but if the entity was found by the model it has a lighter transparent colour. If the entity was manually added by a person it has a darker colour.
The list shows the following data:
Field | Description |
---|---|
Name | The name of the entity |
Value | The value of the entity |
Parsed | The parsed value of the entity |
User | The user who did the annotation. The value AI indicates that the annotation was predicted by the model. |
Score | |
Page | The page on which the entity appears |
Actions |
|
You can click a row in this list to see the value in the document. You can add additional entity values in the document by clicking the first and the last word of an entity, or by selecting it with dragging, and then selecting the correct label in the dropdown menu that will open. More info on how to perform entity annotation can be found in Annotation of training data.
The table view allows you to view the annotations in a table format. This is very handy for viewing composites in a table like format. What's more is that the icon in the top right corner allows you to customize the table:
Which entities to show
Which enrichments to show
Which values to show for enrichments when dealing with entries
The order of entities and enrichments
With these customization options, you can easily set the order of the columns and only show relevant ones when you need to compare entity values and what enrichments returned for example.
Selecting 1 or more entities through the checkboxes allows you to perform the following actions:
Delete annotations - Removes the selected annotations
Enrich annotation - Can only be used on 1 annotation at a time and allows you to link an enrichment to the entity.
You can "Shift" click to select a range of annotations to make bulk selection easier.
The document preview allows you to see what the document looks like. It shows al the entities that were found. This view also has 2 modes:
Hybrid - the original document is shown
Formatted - the text is displayed as the OCR model has recognised it
When certain words are not readable in a display, you can always change the display. In the 2 possible views it is possible to add entities.
It is also possible to enlarge or reduce the display or to open it on a second screen.
Processing Document is being processed
Input required Document has been processed by Metamaze AI but requires human input to be completed
Processed Document has been processed
Failed Processing of the document failed
The confidence score of entities that were predicted by the model. A red color indicates that the score is lower than the threshold. If OCR scores are enabled (see ), you can view the prediction and OCR scores as separate values for the confidence score.