📃Parsing
Last updated
Last updated
When extracting information from documents data types like dates and numbers can be parsed in Metamaze. This has the advantage of:
not needing to parse dates & numbers in your own application
parsing rules that can be configured for each of your projects separately
output provided by Metamaze being formatted the way you need it to be
Under the hood the parser will always try to look at context within a document to parse ambiguous dates and numbers. This means it will try to find non-ambiguous dates within a document to learn the format and apply that format to the ambiguous dates and numbers within that same document.
Parsing dates can sometimes be a challenge and depending on the project it can be different for each project. Metamaze allows you to configure how you parse dates.
When parts of a date are missing, you can define default rules on how to handle the situation.
When missing the day part of a date you can choose to:
Go to human validation
Use the first day of the month
Use the last day of the month
When missing the year part of a date you can choose to:
Go to human validation
Use the closest year
Use the current year
Use the next year
Use the previous year
You can enable the use of AI as a parsing fallback. This can be enabled when the parsing fails and/or when the parsing stops when dealing with ambiguous dates.
A text field is shown (when the AI functionality is enabled) to allow you to give extra instructions for parsing.
Sometimes the parsing just fails. Here you can decide how to deal with that situation:
Go to human validation
Make the entity value blank
Remove the entity
Metamaze allows you to configure how to deal with two-part dates. You can treat the dates in following formats:
day - month
month - year
year - month
month-day
week - year
year - week
Closest to upload date
There is a special option "Stop" that allows you to exclude parsing options. When the parsing stops this way you can:
Go to human validation
Make the entity value blank
Closest to upload date: will choose the date that is closest to the upload date, eg. Upload date: 01-01-2023 Date on document: 01-03-2023 Ambgious date because it can be 01-03-2023 or 03-01-2023 This rule will choose 03-01-2023 as it is the closest date to the upload date out of the 2 possible dates
Metamaze allows you to configure how to deal with three-part dates. You can treat the dates in following formats:
day - month - year
month-day - year
year - month - day
Closest to upload date
There is a special option "Stop" that allows you to exclude parsing options. When the parsing stops this way you can:
Go to human validation
Make the entity value blank
Closest to upload date: will choose the date that is closest to the upload date, eg. Upload date: 01-01-2023 Date on document: 01-03-2023 Ambgious date because it can be 01-03-2023 or 03-01-2023 This rule will choose 03-01-2023 as it is the closest date to the upload date out of the 2 possible dates
You are able to test your parser configuration. A default set of examples is provided, but you can fill in your own date value and press the "Test value" button to see how the parser parses your input.
Parsing numbers can sometimes be a challenge and depending on the project it can be different for each project. Metamaze allows you to configure how you parse numbers.
You can enable the use of AI as a parsing fallback. This can be enabled when the parsing fails and/or when the parsing stops when dealing with ambiguous numbers.
A text field is shown (when the AI functionality is enabled) to allow you to give extra instructions for parsing.
Sometimes the parsing just fails. Here you can decide how to deal with that situation:
Go to human validation
Make the entity value blank
Remove the entity
Metamaze allows you to configure how to deal with ambiguous number formats. You can do the following:
Treat decimal signs always as decimals
Treat decimal signs always as thousand separators
Go to human validation
Make the entity value blank
Set default settings
Use one of the following as a thousand seperator
Dot
Comma
Use one of the following as a decimal seperator
Dot
Comma
You are able to test your parser configuration. A default set of examples is provided, but you can fill in your own number value and press the "Test value" button to see how the parser parses your input.
The parser currently supports parsing natural language dates and numbers in English, French, and Dutch. If you require natural language parsing in other languages, please contact us via Getting support.