Data Capture Automation with Rossum

Data Capture Automation means that document import, processing, validation, and export happen seamlessly without waiting for human review in most cases.

This represents the most advanced stage of a Cognitive Data Capture process implementation. Consider this: you import a document to Rossum. Right away, Rossum's AI will process the document. Once done, Rossum will decide whether the extracted data is good to send to the downstream system automatically or if it needs a human to review and confirm its results.

To make this happen, Rossum offers automation on two levels:

  • Per-field automation allows operators to validate only a subset of fields, while fields that were confirmed automatically need not be revisited by a human. The automatically validated fields have a grey tick next to a field (or absence thereof). The TAB key moves field by field, and the ENTER key skips only between fields that require per-field automation. A basic level of per-field automation is active by default, while additional implementation can expand on this.
  • Whole-document automation allows operators to skip the manual validation phase for an entire document entirely. The basic idea is that a document is automated if all its captured fields are automated. The detailed mechanism depends on the configuration of a particular queue. Rossum currently offers two types of whole-document automation: Confident and Always automation.

Example of automated document.

Automation Requirements

While whole-document automation (Confident or Always) is ultimately desirable, achieving it is typically a gradual long-term process, and intermediate phases can already bring a lot of value in terms of overall effort reduction when compared to manual data entry.

The main prerequisite for automation is a high level of AI accuracy. The required level of accuracy depends entirely on how exactly the data is post-processed and used - we see users with 75% accuracy opting for full automation, while others demand 99% accuracy.

Suppose the out-of-the-box accuracy of the system is sufficient. In that case, automation rollout is fairly straightforward in terms of automating all documents by default (and potentially setting up exception checks to keep specific cases for human review).

However, even if the general AI accuracy is lower than needed, Rossum can be configured to automate a subset of documents that matches a sufficiently higher level of accuracy. Determining a sufficient level of accuracy to automate a document is based on AI output, the platform's built-in checks, historical values, and custom rules defined in your integration.

Key components of Automation

To decide if a document would meet the business expectations and, therefore, should be automatically sent to a downstream system, Rossum needs a strong "compass." We consider various use cases and perspectives on data extraction automation, so it's important for us to give you the flexibility to control automation levels and parameters. Currently, we work with a combination of these tools for automation:

  1. Built-in validation checks
  2. Extraction confidence scores
  3. History-based data checks
  4. Custom and database validation checks

The sequence of the automation mentioned above components forms a nice framework that can get you on track when desiring some level of automation.

Turn the automation ON

Remember that the automation mode has to be enabled first on the queue if you want to export the document automatically without human touch when all the fields are automatically validated. Read more about configuring the automation framework.

The per-field automation (where the grey tick is shown next to a field when the AI thinks the value could be automated) is turned ON by default.


Grey tick is shown next to fields that can be automated.

Built-in validation checks

As the first step of the automation pipeline, Rossum will automatically run several basic checks on some of the extracted values. It applies to fields that Rossum can find on the document and form rules with other fields on the document. An example of such a rule can be the equation "total amount == total amount base + total tax".

You can use the built-in checks for automating fields where at least one of the rules is fulfilled for a given field (positive check). In other cases, it can block the automation of the fields (negative checks).

Find out more about the positive and negative built-in validation checks.

Extraction confidence scores

On top of the built-in validation checks, Rossum will try to automate fields based on the confidence score. That score indicates just how confident the AI Engine is that it got the text and the location of the field correctly.

The confidence score and the score threshold set on a queue or set per field can start automating values that have confidence over the user-defined score threshold. Keep in mind that the automation based on confidence scores is tried only for fields that were not blocked by the negative built-in checks.

Read more about the confidence scores.

History based data checks

Besides deciding about the automation based on the data captured on the document, Rossum has access to the historically confirmed documents. For static values, such as vendor name, vendor ID, and IBAN, it makes sense to compare the values on the document to the historical data and use that as an insight for automation.

Read more about the history-based data checks.

Custom and database validation checks

As a final step of the automation pipeline, you can customize the automation behavior to your needs. Our built-in checks are naturally limited to a small number of generalizable rules. To augment the automation logic, you can build your own business-specific validation checks. Like built-in checks, custom checks may prevent documents with errors from being automatically passed to the downstream system (negative checks). Alternatively, custom checks may verify that values are certain to be correct, and there's no reason to stop automation (positive checks).

Examples of such custom automation checks could be:

  • Mark value as validated if the "Total amount" on the invoice matches the sum of the "Total amount" column in line items
  • Mark value as validated if "Total amount" equals the sum of "Total base" and "Total tax."
  • Stop Automation if "Due date" is before "Issue date."

The most powerful checks in this class are checks based on databases such as vendor databases or PO databases. These checks enable very high automation rates. It is done by automatically validating captured data if it matches database objects such as vendor address for the vendor of the given name or amounts and line items of the matching purchase order.

Read about how to set up custom validation checks.

Automation of fields not found on document

Rossum does not yet return a confidence score that a value was not found on a document. However, if you have a confident automation setup, fields that are not required and where no value is found on document will be automated.

You can avoid automation of such fields by setting the field to be required. If this feature is critical for your workflow, you can vote for it on our feature portal.


Automated fields without any value.

Making the automation components work together

Combining the results of the automation components listed above is the key to success. A good practice is to:

  1. Decide which fields should always have some value filled. Such fields should be set as required.
  2. Fine-tune the automation components to your needs.

Moreover, there might be situations when you will be unsure about how the automation workflow behaves in some specific setups (hidden fields, required fields, etc.). Therefore, we wrote down a list of the most common scenarios you might be asking about.