๐Ÿ“ƒParsing

When extracting information from documents data types like dates and numbers can be parsed in Metamaze. This has the advantage of:

  • not needing to parse dates & numbers in your own application

  • parsing rules that can be configured for each of your projects separately

  • output provided by Metamaze being formatted the way you need it to be

General behavior

Under the hood the parser will always try to look at context within a document to parse ambiguous dates and numbers. This means it will try to find non-ambiguous dates within a document to learn the format and apply that format to the ambiguous dates and numbers within that same document.

Date parsing

Parsing dates can sometimes be a challenge and depending on the project it can be different for each project. Metamaze allows you to configure how you parse dates.

Missing data

When parts of a date are missing, you can define default rules on how to handle the situation.

When missing the day part of a date you can choose to:

  • Go to human validation

  • Use the first day of the month

  • Use the last day of the month

When missing the year part of a date you can choose to:

  • Go to human validation

  • Use the closest year

  • Use the current year

  • Use the next year

  • Use the previous year

AI parsing

You can enable the use of AI as a parsing fallback. This can be enabled when the parsing fails and/or when the parsing stops when dealing with ambiguous dates.

A text field is shown (when the AI functionality is enabled) to allow you to give extra instructions for parsing.

Failed parsing

Sometimes the parsing just fails. Here you can decide how to deal with that situation:

  • Go to human validation

  • Make the entity value blank

  • Remove the entity

Parsing ambiguous two-part dates

Metamaze allows you to configure how to deal with two-part dates. You can treat the dates in following formats:

  • day - month

  • month - year

  • year - month

  • month-day

  • week - year

  • year - week

  • Closest to upload date

There is a special option "Stop" that allows you to exclude parsing options. When the parsing stops this way you can:

  • Go to human validation

  • Make the entity value blank

Closest to upload date: will choose the date that is closest to the upload date, eg. Upload date: 01-01-2023 Date on document: 01-03-2023 Ambgious date because it can be 01-03-2023 or 03-01-2023 This rule will choose 03-01-2023 as it is the closest date to the upload date out of the 2 possible dates

Parsing ambiguous three-part dates

Metamaze allows you to configure how to deal with three-part dates. You can treat the dates in following formats:

  • day - month - year

  • month-day - year

  • year - month - day

  • Closest to upload date

There is a special option "Stop" that allows you to exclude parsing options. When the parsing stops this way you can:

  • Go to human validation

  • Make the entity value blank

Closest to upload date: will choose the date that is closest to the upload date, eg. Upload date: 01-01-2023 Date on document: 01-03-2023 Ambgious date because it can be 01-03-2023 or 03-01-2023 This rule will choose 03-01-2023 as it is the closest date to the upload date out of the 2 possible dates

Test parser

You are able to test your parser configuration. A default set of examples is provided, but you can fill in your own date value and press the "Test value" button to see how the parser parses your input.

Number parsing

Parsing numbers can sometimes be a challenge and depending on the project it can be different for each project. Metamaze allows you to configure how you parse numbers.

AI parsing

You can enable the use of AI as a parsing fallback. This can be enabled when the parsing fails and/or when the parsing stops when dealing with ambiguous numbers.

A text field is shown (when the AI functionality is enabled) to allow you to give extra instructions for parsing.

Failed parsing

Sometimes the parsing just fails. Here you can decide how to deal with that situation:

  • Go to human validation

  • Make the entity value blank

  • Remove the entity

Parsing ambiguous numbers with decimals

Metamaze allows you to configure how to deal with ambiguous number formats. You can do the following:

  • Treat decimal signs always as decimals

  • Treat decimal signs always as thousand separators

  • Go to human validation

  • Make the entity value blank

  • Set default settings

    • Use one of the following as a thousand seperator

      • Dot

      • Comma

    • Use one of the following as a decimal seperator

      • Dot

      • Comma

Test parser

You are able to test your parser configuration. A default set of examples is provided, but you can fill in your own number value and press the "Test value" button to see how the parser parses your input.

The parser currently supports parsing natural language dates and numbers in English, French, and Dutch. If you require natural language parsing in other languages, please contact us via Getting support.

Last updated