Metamaze
Search…
Project Settings
Once you have selected a project, you can use the menu item 'project settings' to manage the project settings for that project.
This module consists of a number of panes. Ranging from top left to bottom right you can find the following panes:
    Summary
    I/O settings
    Workflow
    Users
    Document types
    Entities
    Business rules
    Label patterns

SUMMARY

In the summary tab you can find the document flow overview.

INPUT AND OUTPUT SETTINGS

Choose the security settings for connecting to the Metamaze services to upload documents and to send the output to your backend systems.
To upload files you can connect to the API of Metamaze via token authentication or mutual SSL certification. For technical details on how to implement, see the Metamaze API documentation.
To send the output information to your systems you can choose between bearer authentication, basic authentication with email and password or mutual SSL certification. Also enter the domain name of your service.

WORKFLOW

In the workflow pane you can find the document flow settings:
These settings deal with manual intervention and document labeling for training your models. If you want to change these you simply click on "setup workflow".
It consists of 4 different settings:
1. Manual Intervention Behaviour - These settings make it possible to always make manual intervention mandatory for document classification and/or entity extraction. This can have two reasons:
a. Quality control - If you always want to carry out a human validation check at a certain step in the process pipeline.
b. No initial training data for your models - If you do not have historical documents to train your document classification and entity extraction models, you can always start from the manual intervention module. Your colleagues can process the documents through the manual intervention module and meanwhile the documents are labeled and the training data is extended to scale over time to automation.
2. Document creation thresholds - You can set a general threshold for all document types, a score from 0 to 100, or an individual threshold for each document type. If a document is classified by the document classification model as a certain document type with a security score lower than this threshold, this document will be placed in the manual intervention module for a human validation check.
3. Entity extraction thresholds - You can set a general threshold for all entities, a score from 0 to 100, or an individual threshold for each entity type. If one or more words from the document are recognized by the entity extraction model as a certain entity type with a certainty score lower than the threshold, this document will end up in the manual intervention module for a human validation check.
4. Custom errors - custom errors can be used in the manual intervention module to indicate certain errors where you want to perform certain actions. A custom defined problem consists of a code and a message. You can choose between document classification and entity extraction errors. Indicating these problems in manual intervention will send these codes and messages to your service.

USERS

The user management module allows you to add or remove uses from a project.
More information about the different user roles can be found in the chapter ‘General settings – User Management’

DOCUMENT TYPES

Create a document type for each type you aim to recognise in this project.
This section provides an overview of all your document types and allows you to create new or existing document types.
The table has a number of columns:
    The check box in the first column will make sure you only see the entities and business rules in the following project setup modules, which belong to this document type. You can check more than one checkbox.
    The model type column indicates what model was chosen in the create or edit document type screen (see below printscreen), this will show text or layout and text.
    The number of entities and business rules of each document type. Via the plus button you can add an entity or business rule to that specific document type.
    You can change a document type by clicking on the pencil icon. This will open the following modal which allows you to set a regular expression (this allows you to choose keywords which in case matched, change the document type) or delete the document type from the project:
    You can update the settings of a document type by clicking on the wheel icon.
You can create a new document type or use an existing one by clicking on the plus button.

ENTITIES

Entity are words you want to extract from documents, such as employee name, street, house number, postal code, municipality, net wage, ....
The settings provide an overview of the entities per document type.

With an existing document type

In case you have created an existing document type, the shared entities will be loaded automatically. You can choose to use or not use them with the toggle at the right hand side:
Click on the pencil icon to change the settings of one entity.
Part of the settings are managed by the document owner, when your user is part of this organization you will be able to edit these by going to the document type settings. You'll be directed to this page by clicking on the gearing wheel next to "settings managed by". This will open the following screen:
By clicking on the pencil icon next to the entity, you can edit these:
In case your user is not part of the organization mentioned you will need to contact this organization to discuss changes at this level.

With a new document type

When you have created a new document type, the entities list will initially be empty. You can start creating new entities by clicking on the plus button.
After filling in the entity name and selecting the correct document type, one of the following entity classes can be chosen:
    1.
    Text - A text entity is an entity that has a value in a textual form. When labelling documents, you will be able to select one or more words to indicate a value for this entity.
    2.
    Image - An image entity is for recognizing objects such as handwritten text, signatures, ... Labeling an object in a document is done by drawing a rectangle around the object.
    3.
    Composite - A composite entity is a group of other entities, e.g. an order line consisting of different entities such as the product name, product number, quantity, price per unit, .... When creating a composite entity you can select the entities that belong to it in the next step.
If you chose a text entity you can also set an entity type:
This type will be used for validating and converting the value to a certain format. For example, if you choose the type 'date', Metamaze will validate the value found by the model for this entity and convert it to the format you define yourself. If Metamaze would detect a value for this entity that is not a date (conversion and validation failed) it will be put into the manual intervention module for checking (if this step is enabled).
There are different types for text entities:
    Regular - This is a text type entity. There is no validation or conversion to a particular format.
    Number - This is a numerical entity. Choose the desired input format for decimals and thousands.
    Currency - This is an entity of the currency type.
    Date - This is an entity of the date type. Choose the desired date output format. For a complete list of supported format strings, see this link.
    Search - This is a special entity that is more likely to be found by searching for a match in the document using a regular expression you can set. No AI model is used for this.
After choosing the appropriate entity type, you can optionally indicate to which composite entity it belongs below "part or composite".
Next it is possible to indicate the following:
    Remove punctuation - This setting allows to delete punctuation, for instance with license plates you typically have a punctuation as follows 1-ABC-123
    Remove spaves - This setting allows to delete redundant spaces in case these are found
    Required - This setting determines whether the entity is required to be identified. If a mandatory entity is not found by the entity extraction model, the document will end up in manual intervention module.
    Manual Input validation regex - If you want to validate the value of an entity according to a certain format, you can define a regular expression. If it is not validated correctly, the document will end up in manual intervention. Important to note that this regex is not case sensitive.
    Max unique occurences - Here you are able to indicate whether this entity for instance only occurs once in a document.
    Color - The color of entity as it will be indicated in a document. Click on the square to change the color.
    Override threshold - Setting an individual threshold for an entity.
An entity will be marked in the chosen color if the entity was tagged in the training/labeling module or the manual intervention module by a user. If the entity value was recognized by the entity extraction AI model, the same color will be displayed in a more transparent styling.

ENRICHMENTS

What are enrichments?

Data enrichments allow you to embed custom code, custom logic and additional data sources into your processing pipeline by integrating an API call to an external system. Rather than doing the custom logic after Metamaze extraction, by using enrichments you can have human validation on the enrichments as necessary so your operators can stay in the same view.
Examples of when it makes sense to use an enrichment
    For external data lookup
      Lookup the BIC code for a given IBAN number
      Lookup of product category or accounting code based on the extracted product description
      Lookup of contract type based on contract number
    For intelligent decision making
      Interpreting extracted text and classifying it
      After extracting the relevant clause and then deciding if a marriage contract is of type "separate" or "joint".
      After extracting a formula for interest rates, deciding if it is of type "fixed", "mixed" or "floating".
    For data validation. (Note that for simple logic a Business Rule can be used too.)
      Validate IBAN number
      Validate VAT number
    For embedding custom machine learning models
      Sentiment analysis
      Existing or 3rd party extraction models
    For custom parsing or standardisation
      Geocoding of addresses
      Standardising non-standard entities like product codes.

Developing your own enrichment

For the API requirements, we refer to the API documentation.
For a full code example of an enrichment, please see our GitHub page.

Configuring enrichments

Enrichments exist in two types:
    1.
    Document enrichments are enrichments where you add information to the whole document, without the need to link it to a specific entity.
    2.
    Entity enrichments can be linked to a specific entity value (simple or composite).
Enrichments can fail for the following reasons
    Response HTTP code is anything other than 200
    Data format is incorrect
    Enrichment name is incorrect or is not exactly the same as in the project settings
    Enrichment value has an incorrect type
    No response received after timeout seconds.
When an enrichment fails the following situations can occur
    1.
    If the enrichment is required but missing from the output, the document will go to human validation.
    2.
    If the enrichments fails, the enrichment is either skipped or goes to human validation as configured.
If a document is being handled in human validation, the enrichment will be called every time an entity is added / removed. An enrichment that had its value overwritten manually will not be replaced anymore.

BUSINESS RULES

Business rules can be used to verify custom business logic like
    setting a certain static comparison for the value for an entity (e.g. require validation for invoice amount > 1000 euro)
    comparing expected metadata with extracted data (e.g. a bill of lading number vs the reference number on file)
    implementing a check on the number of pages of a document
    custom logic via an API call
In this panel you can create a business rule or edit existing ones:
This list shows all business rules, grouped by document type. The pencil icon will allow you to modify a rule, with the trash icon you can delete a business line. You can also view the business line via the plus icon.
You can create a new business rule by clicking on the plus button.
Give the business line a name and choose the document type to which it applies.

Soft vs Hard business rules.

In the above screen you also have the option to define this rule as either a soft either a hard business rule. If you choose 'hard' this means documents are automatically sent to human validation if this business rule doesn't succeed.

Business rule operands

Metamaze supports setting conditions that can be combined via boolean operators such as AND and OR. Using these conditions you can compare different elements (operands) with each other.
    The value of an entity extracted from a document, e.g. the net salary from the pay slip.
    How many times the entity is present the document, e.g. two signatures must be present
    The number of pages of a document
    Information from meta data from your own external data sources that can be sent along with the document, e.g. when the client, in addition to uploading his loan application document, sends information in a web form such as net wages, this information can be sent along to validate with the net wages present in his wage sheet.
    An own chosen fixed value (static value)
    A regular expression for comparing text
After choosing your first operand, you choose an operator:
You can add and combine multiple conditions using OR and AND operators.

API business rules

Sometimes, you need to do additional external validation on your business rules that require advanced custom logic, external data or custom code. To do so, choose API as the operand of your business rule and set the endpoint that access incoming POST or GET requests.
For every action that is taken on a document, Metamaze will send the appropriate request to this API call with the following JSON body:
1
{
2
"document": {
3
"name": str,
4
"language": str, // culture like nl-nl
5
"matchedEntityValues": [
6
{
7
// For TEXT entities
8
9
"entityId": objectid,
10
"entityName": str,
11
"text": str,
12
"isApproved": bool,
13
"isManuallyAdded": bool,
14
},
15
{
16
// For IMAGE entities
17
18
"entityId": objectid,
19
"entityName": str,
20
"coords": {
21
"x0": float,
22
"x1": float,
23
"y0": float,
24
"y1": float
25
},
26
"isApproved": bool,
27
"isManuallyAdded": bool,
28
}
29
],
30
31
"metadata": {} // if metadata is added to API input
32
}
Copied!
Your API should return a JSON body in the format
1
{
2
"result": ...
3
}
Copied!
The type of the result will be compared with the value of the other operand in the business rule on a best effort basis.
If you need authentication on your API, this is possible by opening a support ticket at [email protected]
Last modified 16d ago