Implementing Custom Rules for Automation in Rossum

Ultimately, to achieve high-quality automation, you should implement custom checks to take advantage of your particular setup. Such cases could be:

  • value extracted from the document can be matched towards expected formats and values (e.g., PO number matching a particular regular expression, or date must be in a certain range)
  • value extracted from the document can be compared to master data (e.g., a list of products)
  • value extracted from the document can be mathematically checked against other values (e.g., amounts, or dates vs. terms)
  • automatically compute and set values for some fields
  • show a message to operators for low confidence score fields

Hook extension is what you need

Rossum can send the captured data from a document to a webhook or a serverless function whenever Rossum's AI Engine is done with the automatic data capture via the Rossum API.

At that moment, the default components of the Rossum platform would already attempt to mark fields as automatically validated (either based on AI confidence scores or some built-in checks). However, this hook can mark more fields as validated based on your custom logic. And crucially, this happens before the Rossum platform determines whether the document shall be confirmed automatically or left for human review.

Checking values after Rossum's AI Engine extraction

When you upload a document to Rossum, it is switched to an "importing" state. Rossum's AI Engine starts processing the document and tries to locate and extract all the necessary fields. Once the automatic extraction is done, the document is switched to the "to_review" state. If you have a hook in place, the data is first sent to the hook when the document switches from the "importing" to "to_review" status.

The extracted data would be sent to your endpoint listening for action "initialize". When the hook call returns, Rossum decides whether the document can be automated or whether it will be switched to the "to_review" state by checking the status of individual fields:

  • In the always automation level, each field must have a valid value set (non-empty if required).
  • In the confident automation level, each field must have at least one validation source.
{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "hook": "https://example.rossum.app/api/v1/hooks/781",
  "action": "user_update",
  "event": "annotation_content",
  "annotation": {
    "document": "https://example.rossum.app/api/v1/documents/314621",
    "id": 314521,
    "queue": "https://example.rossum.app/api/v1/queues/8236",
    "schema": "https://example.rossum.app/api/v1/schemas/223",
    "pages": [
      "https://example.rossum.app/api/v1/pages/551518"
    ],
    "modifier": null,
    "modified_at": null,
    "confirmed_at": null,
    "exported_at": null,
    "assigned_at": null,
    "status": "to_review",
    "previous_status": "importing",
    "rir_poll_id": "54f6b91cfb751289e71ddf12",
    "messages": null,
    "url": "https://example.rossum.app/api/v1/annotations/314521",
    "content": [
        {
            "id": 1123123,
            "url": "https://example.rossum.app/api/v1/annotations/314521/content/1123123",
            "schema_id": "basic_info",
            "category": "section",
            "children": [
              {
                "id": 20456864,
                "url": "https://example.rossum.app/api/v1/annotations/1/content/20456864",
                "content": {
                  "value": "1000.0",
                  "normalized_value": "1000.0",
                  "page": 2},
                 "schema_id": "amount_total"

In the hook, you might implement a functionality that takes the "amount_total" value ("1000.0") and compares it to the summation of the "amount_total_base" + "amount_total_tax" fields.
If the values match, you could fill the "validation_source" which will be taken into account when deciding whether the document can be automated.

The validate endpoint could return the following data as part of the response:

"operations": [
    {
      "op": "replace",
      "id": "197467",
      "value": {
        "content": {
          "value": "1000.0",
          "position": [916, 168, 1190, 222],
          "rir_position": [916, 168, 1190, 222],
          "rir_confidence": 0.80,
          "page": 1
        },
        "hidden": false,
        "validation_sources": ["connector"]
      }
    }
]

You can see that the response set the "validation_sources" to "connector" which means that the Connector was responsible for validating the correctness of the value.

Additionally, you can show "data_matching" validation source to the users if your extension performed validation of the data against master data database.

882

Master data match validation source.

If all fields are automated, the whole document may be automated. But even if this is not achieved, validating fields automatically means less work for the human reviewers, who can use the check marks in the user interface and the ENTER key to focus only on the fields that require manual validation.

📘

Combining custom rules with other automation components

Learn how the different automation components interact together to automate the documents as well as how the automation works with hidden and required fields.

Setting values after Rossum's AI Engine extraction

If you would want to change the value from automatic computation on your side or just fill an empty field with a new value, then you can just change the "value" to, say, "1100.0" and empty the "position" of the field since the value is not extracted from the original document anymore.

The validate endpoint could return the following data as part of the response:

"operations": [
    {
      "op": "replace",
      "id": "197467",
      "value": {
        "content": {
          "value": "1100.0",
          "position": null,
          "rir_position": [916, 168, 1190, 222],
          "rir_confidence": 0.80,
          "page": 1
        },
        "hidden": false,
        "validation_sources": ["connector"]
      }
    }
]

Showing messages to the operator

The last situation is where you would only like to show the user a message in the UI that they should double-check a specific field. In that case, you can use the "messages" part of the validate hook response. You can choose between "info", "error" and "warning" message types.

Note that each validate call must return the complete list of messages, as this list is not persistent but is overwritten each time the hook is called.

"messages": [
    {
      "content": "Invalid invoice number format",
      "id": "197467",
      "type": "error"
    }
  ]

The ideal opportunity is to combine this with the automation check. If a rule succeeds, the field can be marked as validated. In contrast, if it fails, a warning should be attached to the data point to draw the user's attention to the field.