Entity extraction
Last updated
Last updated
Once a document has a type assigned, you can add annotations to train the entity extraction model.
Prior to document annotation, it is very important that you analyse the documents and establish clear and unambiguous labeling guidelines, to ensure correct and consistent labeling in your project right from the start. Since the quality of the data annotations has a direct impact on the model accuracy, we strongly recommend readingGuidelines to annotate correctly before starting to annotate entities for your first project.
More information on how add annotation guidelines to your project can be found in the Project guidelines.
In order to label documents to train the entity extraction model, create an entity annotation task (see Tasks). Once you start the task, you will arrive upon the following screen:
For more information about the different sections, you can check out the Document overview page.
With handy automation buttons to:
annotate a table easily (which will result in composites or groups for each row of a table)
To switch between the modes, you need to select the appropriate one using the buttons at the top of the document preview.
The labeling of an entity depends on its entity-type:
For a text, composite or paragraph entity, click on the first and the last word of the entity. You can also choose the "Drag" option, by clicking the corresponding button located on the top right side. If you activate the "Drag" option, select the text you want by pressing your mouse (and keeping it pressed), dragging your mouse until you selected the desired text and releasing your mouse.
After selecting the text or image, a pop-up window will appear where you can select the correct entity type. Keep in mind that the list is filtered by the type you selected at the top.
You can delete entities on the document by either clicking the garbage bin icon that appears when you hover your mouse over an entity.
You can also click on the entity (through the annotations section or on the document preview)and click the garbage bin in the pop-up window.
If the tooltip is in the way of text, you can simply drag it away.
An entity always has a color but if the entity was found by the model it has a lighter transparent color. If the entity was manually annotated by a user it has a darker color.
If a document is not readable or it is impossible to label entities, you can classify this document as failed.
Once you click reject, you need to specify the error that occurs for this document, and a Reason for rejecting it. The status of the document will be changed to "Failed", and it will not be included in the training data when triggering a training.
Annotating data in a table-like structure can be time-consuming, more so when there are a lot of rows to annotate. To speed up this process significantly, you can use the table annotation feature.
For more information and a step by step guide, check out the following page:
When all entities have been labeled, click "Done" to finalise the document. The status will change to "Processed" and the document will be included in the next training.
There are multiple types of entities that can be identified and annotated on a document (check out the section)
For an image (or object) such as handwritten text or signatures, draw a rectangle around what you want to label. You start by clicking the top left corner and dragging your mouse to the bottom right corner, without releasing it. Images can only be annotated in the hybrid view of the document (see for more details).
After an entity is labeled, it is added to the section. By clicking on a row in this table, the preview will scroll to the location on the document where the corresponding annotation is located, and a pop-up window with details concerning the entity will open.