Before you start
What you need to know before starting a new Metamaze project.
Before you can start with creating a Metamaze project, you need to have a clear view of what the process is that you want to automate and how. Specifically, an answer is needed to the following questions
Decision | Typical options |
How many document types are there? Is there an exhaustive list available? Is document type prediction needed? |
|
Does text information need to be extracted with entity extraction? |
|
Are image detection models needed? |
|
Which type of page management is needed? |
|
How will the documents be sent to Metamaze? Where should the results be sent to? |
|
Which languages are in scope? | |
Do you have annotated data? Describe the format and size of the data you have. | |
To prepare for a scoping meeting, it helps to bring an overview of
- List of all document types that you need to extract
- Per document type, a list of entities (fields) you want to extract.
- Per entity, basic information like
- Required?
- Minimum number of occurrences
- Maximum number of unique occurrences
- Parsing / standardisation required?
- Annotated examples of all document types and entities. Preferably a couple of examples per document type is needed to estimate diversity.
Last modified 1yr ago