Rossum Developer Hub

Rossum Data Capture for Developers and Integrators

Welcome to the Rossum developer hub. You'll find comprehensive guides and documentation to help you implement Rossum as quickly as possible, as well as support if you get stuck.

Let's jump right in!

Developer Guides    API Reference    User Help Center    Feature wishlist

Using AI Confidence Thresholds for Automation in Rossum

Confidence score is an indication of the extent to which the AI Engine is confident that it got the text and the location of the field correctly. Confidence score may range from 0 to 1.

When starting with Automation, we offer an option to automate processing of fields where the AI Engine confidence score is higher than some threshold.

By default, when you enable Confident Automation, all data fields captured in the selected queue have a confidence threshold of 0.975. This threshold gives the AI a confidence bar that it needs to pass in order to allow automatic export of data. Setting the bar this high allows very little, if any automation by default, and ensures that unless the AI is extremely confident on every single data field it captures, the document will stop for manual review and confirmation.

But what does the threshold mean numerically? We calibrate the confidence scores to correspond to the probability of a given value being right. This means that the 0.975 threshold expresses a requirement for 97.5% accuracy, i.e. that documents that are sent to accounts payable software automatically should have at maximum 2.5% error rate on average. (Note that this calibration process is currently experimental, as of 2020 we are still improving the confidence intervals.)

Setting confidence score in the app

You can set the confidence score threshold for a target Queue in the app. This threshold will be applied to all fields extracted on the Queue:

  1. Click on settings "gear" icon
  2. Select the "Automation" tab
  3. Pick the target Queue
  4. Select "Confident" Automation
  5. Scroll to the 'Field automation - Score threshold' part
  6. Define the necessary confidence score threshold
Updating confidence score threshold.Updating confidence score threshold.

Updating confidence score threshold.

Different score thresholds among fields

There are situations where some fields might require different confidence score thresholds. This might be needed when you are not particularly concerned about the error rate for a specific field. Per-field confidence score threshold can be set in the Extraction schema.

Follow the steps below to achieve such goal:

  1. Click on settings "gear" icon
  2. Select the "Automation" tab
  3. Pick the target Queue
  4. Select "Confident" Automation
  5. Scroll to the Automation data part
  6. Update the score threshold for the fields that you see in the Automation data table
Setting up confidence score threshold for a specific field.Setting up confidence score threshold for a specific field.

Setting up confidence score threshold for a specific field.

How does this help to automate the documents?

If some field's confidence score was higher than the score threshold, you should be seeing a "score" key in the datapoint's validation sources.

validation_sources:["score"]

📘

Combining confidence scores with other automation components

Learn how the different automation components interact together to automate the documents as well as how the automation works with hidden and required fields.

What is the correct threshold?

When looking for the perfect threshold, there is a trade-off dilemma between how many documents will stop for manual review and how many documents will be automatically exported with an error.

Having a low threshold might lead to a lot of automatically exported documents. However, a lot of them may contain errors. Conversely, having a high threshold would stop most of the documents for manual review. Which might need more manual labor resources.

From our experience, customers who are looking for a high level of Automation will need a Dedicated AI Engine. In such case, the Rossum AI team will provide an in-depth automation score thresholds analysis and fine-tuning that is needed for your business.

However, you can do your own analysis based on the confidence score thresholds and the corresponding error rate. Consider adjusting confidence thresholds to match your requirements and error rate tolerance, and customizing the platform with custom business rules.

🚧

Confidence scores vs. accuracy estimates

As mentioned above, the AI engine's confidence scores represent the estimates on the accuracy of any given extraction. Therefore, the confidence score threshold should ideally correspond to the desired average accuracy level.

In practice, estimating this accuracy correctly in all cases is still an unsolved problem in Artificial Intelligence at large. Our research team is working hard on improving the accuracy of our estimates. Nevertheless, we recommend you conduct own experiments on your particular set of data to make sure the confidence scores match your needs.

In general, you may find Rossum's AI engine a little pessimistic, with real accuracy higher than hinted by the typical scores, but this may vary field-by-field.

Updated a day ago


Using AI Confidence Thresholds for Automation in Rossum


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.