Tutorial for creating a new enrichment

Supplier look-up in Python

In this tutorial, we'll define a new enrichment step by step for looking up a supplier based on a provided VAT number.

The complete code of this tutorial can be found on GitHub.

1. Create and configure a simple test project

For this tutorial, we will assume you are familiar with working in the Python programming language.

  1. Create a new project in Metamaze with a clear name like Development project for testing enrichments.

  2. In that project, create a new document type and give it name, like Purchase Order

  3. Add an entity to that document type called VAT number. Make this entity required so that all documents go to human validation, which is helpful for testing purposes.

Next, we will configure the enrichments API.

2. Create a new API endpoint for matching a supplier

First, we define a boilerplate server Flask server with some Bearer token authentication


import logging
from os import environ as env

# Basic Flask imports and configuration
from flask import Flask, jsonify, request
from flask_httpauth import HTTPTokenAuth

app = Flask(__name__)
auth = HTTPTokenAuth(scheme="Bearer")

# Define your Bearer token
BEARER_TOKEN = env.get("BEARER_TOKEN", "[optionally set bearer token here]")

@auth.verify_token
def verify_token(token):
    return bool(token == BEARER_TOKEN)
    
##### ADD API CALLS HERE LATER

if __name__ == "__main__":
    app.run(host="0.0.0.0", debug=True)

    # to debug locally, use the following command:
    # FLASK_APP=server.py FLASK_ENV=development flask run --port 5001

Then, let's add some example data.

EXAMPLE_SUPPLIERS_DB = [
    ("ABC Company", "Kerkstraat 1, 1000 Brussel", "BE0123456789"),
    ("XYZ Corporation", "123 Main St, New York", "US987654321"),
    ("PQR Enterprises", "456 Elm St, Los Angeles", "US123456789"),
    ("LMN Corporation", "789 Oak St, Miami", "US543216789"),
    ("DEF Ltd", "10 Baker St, London", "GB987654321"),
    ("GHI SARL", "20 Rue de la Paix, Paris", "FR123456789"),
    ("JKL Srl", "Via Roma 1, Rome", "IT987654321"),
    ("MNO GmbH", "Hauptstraße 10, Berlin", "DE123456789"),
    ("RST S.L.", "Calle Mayor 5, Madrid", "ES987654321"),
    ("UVW Sp. z o.o.", "ul. GΕ‚Γ³wna 15, Warsaw", "PL123456789"),
]

In real life, this data would be populated by reading from a database or a reference file.

We'll do some transformations on the data to make it a bit easier to work with. Let's create simple dictionaries from the data and store it in a new SUPPLIERS list.

SUPPLIERS = [
    {
        "id": vat_number,
        "company_name": company_name,
        "company_address": address,
        "company_vat_number": vat_number,
    }
    for company_name, address, vat_number in EXAMPLE_SUPPLIERS_DB
]

Great! Now, we are ready to define our first API call. We are going to use a GET request to the /api/find-supplier route, and re-use the token-based authentication we defined earlier.

@app.route("/api/find-supplier", methods=["GET"])
@auth.login_required
def find_supplier():
    content = request.json

The body of the API call will contain the whole document (reference). In this case, we want to match based on the found VAT number, so let's first find the correct value:

    # Find the first value of the entity with name "VAT number"
    vat_number = None
    for annotation in content["annotations"]:
        if annotation["entity"]["name"] == "VAT number":
            vat_number = annotation["text"]
            link = annotation["link"]
            break  # Stop at first match of VAT number

    # Stop if no VAT number was found
    if vat_number is None:
        return jsonify({"enrichments": []})

Make sure that the entity name matches the entity name you used to configure in Metamaze precisely, including the casing.

To improve matching rates, it often makes sense to make the lookup a bit more robust instead of matching strings exactly. In real life, you would often even use fuzzy matching on multiple fields to find the correct field. For an example of how you could use fuzzy matching, see the example FuzzyPurchaseOrderEnrichment (Typescript). In this case, we'll ignore all non-alphanumeric characters by adding

    vat_number = "".join([c for c in vat_number if c.isalnum()])

We'll find the first match in our reference data by using the next function

supplier = next(
    (supplier for supplier in SUPPLIERS if supplier["company_vat_number"] == vat_number),
    None,
)

Finally, all we need to do is return the found supplier object, if it exists

    if supplier:
        return jsonify({"enrichments": [{"name": "Find supplier", "value": [supplier], "link": link}]})
    else:
        return jsonify({"enrichments": []})

We're only returning one match here. Later, you can extend that by returning multiple potential matches, or custom exception codes.

Now, let's start-up the Flask server by opening a terminal and running

FLASK_APP=server.py FLASK_ENV=development flask run --port 5001

To make sure Metamaze can access the little debug server running on your local machine, you can use a free tool like ngrok. For example, you could run ngrok http 5001 and get output like this:

Session Status                online
...
Forwarding                    https://7032-109-135-42-38.ngrok-free.app -> http://localhost:5001

Grab the public URL (in this case https://7032-109-135-42-38.ngrok-free.app) and store it somewhere. We'll need it to configure the enrichment endpoint in the next step.

Note that ngrok tunnels are temporary and for debugging purposes only. If the session times out and you restart the tunnel, it will be on a different URL. You will need to change the URL in the enrichment settings too.

Remember, if you want the full code example, you can find it here.

3. Configure the Find Supplier enrichment

  1. In Metamaze, navigate to the Project Settings > Enrichments and click on the blue + Create button to add a new enrichment.

  2. Configure the General settings

    1. We'll give it the name Find supplier. Note that this needs to be exactly the same name as you are returning in the API call from before.

    2. Let's enable Human validation, and make the enrichment required. This will make it easier to debug.

    3. We'll link the document type "Purchase Order" we created in the set-up.

Then, we will define the Triggers.

  1. Add a trigger "After entity extraction" of the document type "Purchase Order". This will make sure that the enrichments is triggered when entities are predicted automatically. Note that we didn't train a model in this tutorial.

  2. Add a trigger "After labeling" of the entity VAT number. This will make sure that when we change an annotation manually, the enrichment will be re-triggered.

In the section "Value types", we will take Entries since we are returning full objects, not just simple strings. We can define the columns that we want to show. On the left side, take the exact same name as you will return in the API. On the right side, we can give them some user-friendly labels.

Finally, we'll define the webhook. If you are using ngrok, make sure you are using the live tunnel URL, and appending the route (/api/find-supplier) we defined in our code. Also make sure you are using the same Bearer token as you are expecting.

Click the Create button to finish your enrichment.

Upload a document and test

Navigate to your Production Uploads, and upload a new file. For example, we can use the test file.

Since we have not trained any model, there will be no predictions at the start, and the document will look empty:

Add an annotation for the VAT number by clicking on the document and choosing the entity VAT number.

The enrichment will be triggered automatically, and you will see the result:

By clicking on the enrichment line, you can see all the details of the object too:

If you look in the "All" tab, you'll notice that you can't search for suppliers here. We'll configure that in the next section.

(Optional) Add a second API call to list all suppliers

Add a new route to the Flask server

We can also add a second API call to list all the suppliers. In the code, add a new route on /api/list-suppliers by adding


@app.route("/api/list-suppliers", methods=["GET"])
@auth.login_required
def list_suppliers():
    return jsonify(SUPPLIERS)

The keys of the dictionaries contained in the suppliers list should be the same as the configured column names. These column names are set in the enrichment settings in the Metamaze platform.

For example, if you have a column called company_name, you should have a key called company_name in the dictionary too.

Here's an example object that we are returning

 [
    {
        "id": "BE0123456789",
        "company_name": "ABC Company",
        "company_address": "Kerkstraat 1, 1000 Brussel",
        "company_vat_number": "BE0123456789",
    },  
    ... // other objects
]

Configure the options in the enrichment

Navigate back to the "Find supplier" enrichment in the Project Settings and go to the "Options" section. Fill in the new API route on the correct URL with the correct Bearer token.

Click Update.

Test the options

Now, in the All tab, you will be able to search and select from a list of all suppliers

Last updated