🔗Enrichments

What are enrichments?

Enrichments allow you to embed custom code, custom logic, and external data sources into your processing pipeline by integrating an API call. Rather than doing the custom logic after Metamaze extraction, by using enrichments you can perform all human validation on the enrichments within Metamaze. That way, users only have to perform validations in one software application instead of multiple. Enrichments can also be self-learning by using the callback.

Examples of when it makes sense to use an enrichment

  • For external (fuzzy) data lookup (Python example and Typescript example)

    • Lookup the BIC code for a given IBAN number

    • Lookup of product category or accounting code based on the extracted product description

    • Lookup of contract type based on contract number

    • Try to match a supplier first by VAT number, if no match by supplier name, if then no match by address, ... Return multiple matches and have the user choose and search from a table if needed.

  • For intelligent decision-making (Python example)

    • Interpreting extracted snippets of text and classifying it, for example for payment terms or Incoterms

    • Standardizing free-form descriptions (like inspection tasks) into a pre-defined, standardized list, taxonomy or hierarchy.

    • Interpreting paragraphs of text, for example, legal clauses to decide if a marriage contract is of type "separate" or "joint".

  • For applying business logic (Typescript example)

    • For a purchase order, use business logic to decide if an order can be fulfilled automatically. This logic can include if stock level is sufficient, lead time is sufficient, incoming deliveries are expected, ... and return different messages to customer: "Order accepted", "Order accepted but next time take into account lead time", "Can not fulfil order", ...

  • For data validation. (Note that for simple logic a Business Rule can be used too.)

    • Validate IBAN number

    • Validate VAT number

  • For embedding custom machine learning models

    • Sentiment analysis

    • Existing or 3rd party extraction models

  • For custom parsing or standardisation

    • Geocoding of addresses

    • Standardising non-standard entities like product codes.

Developing enrichments

For the API requirements, we refer to the API documentation.

For a full code example of an enrichment, please see our GitHub page.

Standard enrichment: Invoice recognition

Deciding whether a document is an invoice or a credit note can now be easily configured through a standard enrichment. You can configure how you want this recognition to be done.

General settings

The general settings section allows you to configure various settings for your enrichment. These settings include:

Enabled - This setting lets you enable or disable the enrichment

Required - You can set whether the enrichment is required or not.

Name - You can set a name for your enrichment to identify it easily.

Human Validation - This setting lets you enable or disable human validation for enrichments:

  • Skip - Will skip validation for this enrichment completely (be aware that this means failed enrichments will not cause uploads to go into validation)

  • Human validation - When an enrichment raises an error the document will be set to "Input required" and validation has to be done

Document Types - You can select the document types for which you want the enrichment to work.

Recognition rules

The recognition section allows you to configure the way an invoice should be recognized.

Regex suggestion - an easy to copy regex for your regex entity

Result when recognized as credit - the value the enrichment should return when recognized as a credit

Result when recognized as debit - the value the enrichment should return when recognized as a debit

Recognition rules that can be added:

Find text

This rule will try and find a piece of text (based on an entity) to recognize if the document is a credit note.

  • Find based on text in an annotation - the entity that should be used

  • Located on page - the page number that the entity should be present in

  • Position within a percentage of the page (dark area will be ignored) - which part of the page should be looked in, a value of 50 will look a the top half of the page, a value of 100 will look at the full page

Negative amount

This rule will try and find a negative amount for the entity to recognize if the document is a credit note.

  • Recognize based on negative amount in an annotation - the entity that should be used to find a negative value

Standard enrichment: OpenAI GPTx

Leveraging the power of OpenAI GPTx to ask your documents questions can now be as easy as configuring some simple fields for an enrichment. You can choose between 2 options:

  • entire document text (for when you want to summarize text, ask questions about the whole document, ...)

  • entity text (for when you want to ask questions or give instructions on a specific entity)

General settings

The general settings section allows you to configure various settings for your enrichment. These settings include:

Enabled - This setting lets you enable or disable the enrichment

Required - You can set whether the enrichment is required or not.

Name - You can set a name for your enrichment to identify it easily.

Human Validation - This setting lets you enable or disable human validation for enrichments:

  • Skip - Will skip validation for this enrichment completely (be aware that this means failed enrichments will not cause uploads to go into validation)

  • Human validation - When an enrichment raises an error the document will be set to "Input required" and validation has to be done

Document Types - You can select the document types for which you want the enrichment to work.

Instructions (entire document text)

The instructions section allows you to give instructions to GPT on what you want it to do with the document text.

OpenAI api key - API Key (see Prerequisites: OpenAI credentials)

OpenAI api organisation id - Organisation ID (see Prerequisites: OpenAI credentials)

Prompt - The prompt you want to send to GPT. You can be very creative here!

Limit possible answers to the following list - Adding answers will force GPT to give back responses limited to the list eg. yes/no questions.

Instructions (entity text)

The instructions section allows you to give instructions to GPT on what you want it to do with the text of an entity.

OpenAI api key - API Key (see Prerequisites: OpenAI credentials)

OpenAI api organisation id - Organisation ID (see Prerequisites: OpenAI credentials)

Prompt - The prompt you want to send to GPT. You can be very creative here!

Limit possible answers to the following list - Adding answers will force GPT to give back responses limited to the list eg. yes/no questions.

Configuring custom enrichments

Enrichments can fail for the following reasons

  • Response HTTP code is anything other than 200

  • The data format is incorrect

  • Enrichment name is incorrect or is not exactly the same as in the project settings

  • The enrichment value has an incorrect type

  • No response is received after timeout seconds.

When an enrichment fails, the following situations can occur

  1. If the enrichment is required but missing from the output, the document will go to human validation.

  2. If the enrichments fails, the enrichment is either skipped or goes to human validation as configured.

If a document is being handled in human validation, the enrichment will be called every time an entity is added / removed. An enrichment that had its value overwritten manually will not be replaced anymore.

General settings

The general settings section allows you to configure various settings for your enrichment. These settings include:

Enabled - This setting lets you enable or disable the enrichment

Required - You can set whether the enrichment is required or not.

Name - You can set a name for your enrichment to identify it easily.

Allow empty values - When enabled allows your validators to fill in empty values for your enrichments

Type - There are different types of enrichments available, including:

  • Enrichment - This type of enrichment can be linked to another enrichment, allowing you to chain multiple enrichments and process them sequentially.

  • Entity - This type of enrichment can be linked to a specific entity value (simple or composite).

  • Document - This type of enrichment lets you add information to the whole document, without the need to link it to a specific entity.

Human Validation - This setting lets you enable or disable human validation for enrichments:

  • Skip - Will skip validation for this enrichment completely (be aware that this means failed enrichments will not cause uploads to go into validation)

  • Human validation - When an enrichment raises an error the document will be set to "Input required" and validation has to be done

Version - Depending on the API contract you use, you should select the appropriate version here. We recommend using the most recent version at all times, as it provides the most features.

Document Types - You can select the document types for which you want the enrichment to work.

Metadata - This setting lets you add fixed metadata to be sent with the enrichment.

Triggers

In this section, you can choose when to trigger enrichments. Enrichments can be triggered based on the following:

  • After another enrichment

  • After entity extraction is completed

  • After labeling an entity

  • After validation (updating or creating a linked enrichment)

  • After document classification

Depending on the trigger you choose, you'll need to set an additional option. For the first option, you'll need to select an enrichment. For the 'After entity extraction is done' and 'After labeling an entity' options, you'll need to choose a document type.

Value type

Enrichments can return either a String or Entries. Strings are simple to set up. If you need multiple columns or fields, choose entries.

Value type: returning Strings

For strings, a text/select box is used to choose values in the validation interface. The enrichment API expects a response like

{
    "enrichments": [
        {
            "name": ..., 
            "link": ...,
            "value": "MY_OPTION"
        }
    ]
}

If you want to display a multi-select component for the user, you can use the options API call which expects a response like

[
    {
        "value": "MY_OPTION", 
        "label": "User-friendly label for option 1"
    },
    {
        "value": "second_option", 
        "label": "Different user-friendly label for option 2"
    }
    ... // more options
]

If you want the enrichment to show an exception, an exception field can be added to the response

{
    "enrichments": [
        {
            "name": ..., 
            "link": ...,
            "value": "MY_OPTION",
            "exception": "MY_EXCEPTION"
        }
    ]
}

Adding this field will highlight the exception in red in the UI. The value passed in the response will then be filled in as a placeholder in the text box of the validation interface.

Value type: returning Entries

If you want to return database records or objects containing multiple fields, you can select the value type Entries.

In the enrichment API response, you need to return a dictionary as the value, including the mandatory id key. For example:

{
    "enrichments": [
        {
            "name": ..., 
            "link": ...,
            "value": [
                {
                    "id": "123",
                    "column1": "foo",
                    "column2": "bar"
                }
            ]
        }
    ]
}

In validation, a table will be shown where each row represents an entry in the value response list. To configure this table, you must specify the columns.

The keys of the dictionaries contained in the value should be the same as the configured column names. For example, if you have a column called company_name, you should have a key called company_name in the value of the enrichment in the API response too. Additionally, every entry must have an id field, which is displayed in the linked enrichments section during human validation.

If you want the user to be able to search through all records in the All tab, you can configure the options API call. The options must follow the same structure and have the same keys as defined in the columns in the Value type section.

Example of options API call response:

[
    {
        "id": "row_id_1",
        "column1": "foo",
        "column2": "bar"
    },  
    {
        "id": "row_id_2",
        "column1": "fizz",
        "column2": "buzz"
    },  
    ... // more options
]

Similarly to the String value type enrichment, adding an exception field to the response will highlight the exception in the validation interface with the returned value filled in as a placeholder.

For a full example including code, please see Tutorial for creating a new enrichment.

Webhook

To properly calculate linked enrichments, the webhook must adhere to the API contract outlined in the documentation. This webhook specifies the exact logic that will be executed. The following fields can be configured:

  • URL - the URL that will be triggered

  • Timeout - The time in second when a timeout is forced.

  • HTTP method - The method that will be used to call the URL

  • Authentication type

    • Bearer - A token will be needed

    • Basic authentication - A username and password will be needed

    • None

Options

When validating data manually, you have the option to use predefined choices. If the data is a value type, a select box will be displayed with all available options. If the data are entries type, a table with all possible options will be shown. This table will have the columns that you have defined.

  • URL - the URL that will be triggered

  • Timeout - The time in second when a timeout is forced.

  • HTTP method - The method that will be used to call the URL

  • Authentication type

    • Bearer - A token will be needed

    • Basic authentication - A username and password will be needed

    • None

Callback

When you create, delete, or modify linked enrichments, you have the option to execute a callback to receive notifications on certain changes. This feature can be particularly useful for enrichments that require user input data (self-learning enrichments).

  • URL - the URL that will be triggered

  • Timeout - The time in seconds, after which a timeout error is raised

  • HTTP method - The method that will be used to call the URL

  • Authentication type

    • Bearer - A token will be needed

    • Basic authentication - A username and password will be needed

    • None

Testing enrichments as a developer

During the development of an enrichment, you want to be able to test your enrichment with some example data.

This can be easily achieved with the "Playground" section. You need to:

  1. select an enrichment

  2. fill in a document id (from training or production)

When you press the "Generate" button, an example request body based on the document will be generated and stored in the clipboard. You can now paste this request body into a tool like Postman for example.

Last updated