Going to production
This page describes a process for production usage
After configuring your project completely and performing user acceptance testing, there are a couple of things you should ensure before going to production.
- Make sure your labelling guidelines are complete and adapted for production usage
- Train your operators to
- use the Metamaze human validation module including exceptions like adding text that was handwritten or misrecognised, performing page management, ...
- follow the labelling guidelines very accurately
- know and understand the business process
- Have a clearly defined process and responsibilities for following up on human validation, promotion of production data to training data, pipeline analysis, creation of suggested tasks, model training and model deployment. Below is a recommended overview of roles and responsibilities.
The following responsibilities need to be taken up when running in production. In the initial weeks of production usage, we recommend to promote data, perform detailed pipeline analyses, create at least suggested tasks (if time allows create as well a custom task) and re-train weekly to ensure you get to higher automation rates fast. When you see the accuracy is plateau-ing, you can switch to monthly reviews.
If you would see bad performance in your models during production usage, here are a couple of tips to find out what is causing them
- Bad scans that cause bad OCR issues. These issues have to be fixed at the source and cannot be fixed in Metamaze. You will need to train the people scanning that you expect clean, correctly oriented, straight, high-resolution scans (min >150 preferably 300 dpi)
- Irrelevant documents. Inform the people scanning or sending the documents that they should only send relevant documents.
- Production data annotations are low quality and worsen the model. We recommend creating suggested and/or custom tasks on production data before retraining the model.
- Document content or layout different in production compared to training data can cause the models to underperform. In that case we recommend adding a sufficient amount of recent documents from production to training data. For example if you have only trained a date entity with values from October, but you go to production in January, it might cause bad recognition. This is however easily solved by adding some production data.
The analytics module can help you find problematic document types, entities or languages.