Rossum Developer Hub

Rossum Data Capture for Developers and Integrators

Welcome to the Rossum developer hub. You'll find comprehensive guides and documentation to help you implement Rossum as quickly as possible, as well as support if you get stuck.

Let's jump right in!

Developer Guides    API Reference    User Help Center    Feature wishlist

Data Capture Automation with Rossum

Data Capture Automation means that document import, processing, validation, and export happen seamlessly without waiting for human review in most cases.

This represents the most advanced stage of a Cognitive Data Capture process implementation. Consider this: you import a document to Rossum. Right away, Rossum's AI will process the document. Once done, Rossum will decide whether the extracted data is good to automatically send to the downstream system, or if it needs a human to review and confirm its results.

To make this happen, Rossum offers automation on two levels:

  • Per-field automation allows operators to validate only a subset of fields, while fields that were confirmed automatically need not be revisited by a human. The automatically validated fields have a grey tick next to a field (or the absence of thereof); while the TAB key moves field by field, the ENTER key skips only between fields that require per-field automation. A basic level of per-field automation is active by default, while additional implementation can expand on this.
  • Whole-document automation allows operators to entirely skip the manual validation phase for an entire document. The basic idea is that a document is automated if all its captured fields are automated. The detailed mechanism depends on the configuration of a particular queue.
Example of automated document.Example of automated document.

Example of automated document.

Automation Requirements

While whole-document automation is ultimately desirable, achieving it is typically a gradual long-term process, and intermediate phases can already bring a lot of value in terms of overall effort reduction when compared to manual data entry.

The main prerequisite for automation is a high level of AI accuracy. The required level of accuracy depends entirely on how exactly the data is post-processed and used - we see users with 75% accuracy opting for full automation, while others demand 99% accuracy.

If the out-of-the-box accuracy of the system is sufficient, automation rollout is fairly straightforward in terms of automating all documents by default (and potentially setting up exception checks to keep specific cases for human review).

However, even if the general AI accuracy is lower than needed, Rossum can be configured to automate a subset of documents that matches a sufficiently higher level of accuracy. Determining the sufficient level of accuracy to automate a document is based on AI output, the platform's built-in checks, historical values and custom rules defined in your integration.

Key components of Automation

To decide if a document would meet the business expectations and, therefore, should be automatically sent to a downstream system, Rossum needs a strong "compass". We consider various use cases and perspectives on data extraction automation, so it's important for us to give you the flexibility to control automation levels and parameters. Currently, we work with a combination of these tools for automation:

  1. Built-in validation checks
  2. Extraction confidence scores
  3. History based data checks
  4. Custom and database validation checks

The sequence of above mentioned automation components forms a nice framework that can get you on the track when desiring some level of automation.

Turn the automation ON

Keep in mind that the automation mode has to be enabled first on the queue of your choice if you would like to export the document automatically without human touch when all the fields were automatically validated. Read more about configuring the automation framework.

The per-field automation where the grey tick is shown next to a field when the AI thinks the value could be automated is turned ON by default.

Grey tick is shown next to fields that can be automated.Grey tick is shown next to fields that can be automated.

Grey tick is shown next to fields that can be automated.

Built-in validation checks

As the first step of the automation pipeline, Rossum will automatically run a number of basic checks on some of the extracted values. This applies to fields that can be found on the document and form rules with other fields on the document. An example of such a rule can be the equation "total amount == total amount base + total tax".

The built-in checks can be used for automating fields where at least one of the rules is fulfilled for a given field (positive check). In other cases, it can block automation of the fields (negative checks).

Find out more about the positive and negative built-in validation checks.

Extraction confidence scores

On top of the built-in validation checks, Rossum will try to automate fields based on the the confidence score. That score is an indication of just how confident the AI Engine is that it got the text and the location of the field correctly.

The confidence score, together with the score threshold set on a queue or set per-field, can start automating values that have confidence over the user defined score threshold. Keep in mind that the automation based on confidence scores is tried only for fields that were not blocked by the negative built-in checks.

Read more about the confidence scores.

History based data checks

Besides deciding about the automation based on the data captured on the document, Rossum has access to the historically confirmed documents. For static values, such as vendor name, vendor ID and IBAN, it makes sense to compare the values on the document to the historical data and use that as an insight for automation.

Read more about the history based data checks.

Custom and database validation checks

As a final step of the automation pipeline, you can customize the automation behavior to your needs. Our built-in checks are naturally limited to a small number of generalizable rules. You can build your own business-specific validation checks to augment the automation logic. Like built-in checks, custom checks may prevent documents with errors from being automatically passed to the downstream system (negative checks). Or alternatively, custom checks may verify that values are certain to be correct and there's no reason to stop automation (positive checks).

Examples of such custom automation checks could be:

  • Mark value as validated if the “Total amount” on the invoice matches the sum of the “Total amount” column in line items
  • Mark value as validated if “Total amount” equals the sum of “Total base” and “Total tax”
  • Stop Automation if “Due date” is before “Issue date”

The most powerful checks in this class are checks based on databases such as vendor database or PO database. These checks enable very high automation rates by automatically validating captured data if it matches database objects, such as vendor address for the vendor of the given name, or amounts and line items of the matching purchase order.

Read about how to setup custom validation checks.

Making the automation components work together

Combining the results of the automation components listed above is the key to success. A good practice is to:

  1. Decide which fields should always have some value filled. Such fields should be set as required.
  2. Fine-tune the automation components to your needs.

Moreover, there might be situations when you will be unsure about how the automation workflow behaves in some specific setups (hidden fields, required fields, etc). Therefore, we wrote down a list of the most common scenarios you might be asking about.

Updated 4 months ago

Data Capture Automation with Rossum


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.