Rossum Developer Hub

Rossum Data Capture for Developers and Integrators

Welcome to the Rossum developer hub. You'll find comprehensive guides and documentation to help you implement Rossum as quickly as possible, as well as support if you get stuck.

Let's jump right in!

Developer Guides    API Reference    User Help Center

How To Change Schema During Extraction

Typical Use Case

In a single Queue, you need to have some variation in what data to capture from different documents. You would like to show and export a different set of fields in each document based on some dynamic condition.

Your first thought could lead you to a PATCH request of the Schema from your webhook.
However, that would be wrong: The UI wouldn't change and the extracted data wouldn't reflect the change of the schema. Furthermore, all documents in the queue would be affected by this change.

How To Change Extraction Schema Shape on the Fly

While it's not possible to dynamically add/remove fields from a schema, each fields has attributes hidden and can_export, which can be set through the standard endpoint of your webhook.

The idea is to add all possible fields to the schema and turn those datapoints on and off, based on some business logic with the usage of hidden and can_export.

Let's go through the steps together, following an example use-case of products in line items.

📘

Example use-case: Products in Line Items

You have 2 shops. In the first shop, you sell balloons. In the second shop, you sell baboons. You're using Rossum to process Purchase Orders from both shops. All the header fields and most of the columns in line items are the same for both shops.

You have 1 custom header field, stating whether the document is from Baboon shop or Balloon shop.

For all POs from the balloon shop a line item should include the following columns:
Description, Quantity, Material

For all POs from the baboon shop a line item should include the following columns:
Description, Quantity, Sex

Let's use hidden and can_export flags to avoid having to fill in the sex of a balloon, or a material of a baboon.

Step 1: Create a webhook

Follow this guide on How To Set Up An Extension

Step 2: Create a condition inside of /validate method

This condition is a part of your business logic. It can be literally anything. For reference on the find_by_schema_id function look here.

@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
    annotation_tree = request.json['annotation']['content']
  operations = []
    
  shop_type = find_by_schema_id(annotation_tree, 'shop_type')['value']
    
  is_balloon_shop = shop_type == 'balloon shop'
  is_baboon_shop = shop_type == 'baboon shop'

Step: 3: Get shop specific fields

Here we create a simple helper function, that has appropriate schema ids hard-coded in. It's these fields for which hidden and can_export attributes will be toggle in the next step.

When hiding an entire column, like we are here, we need to set the attributes for each field (in each row) with given schema id. That's why we use find_all_by_schema_id, which does the same as find_by_schema_id expect it returns all occurrences of that field.

Note: We presume that schema_id of Material column is line_item_material and for Sex column it's line_item_sex.

def get_shop_specific_fields(annotation_tree):
    sex_fields = find_all_by_schema_id(annotation_tree, 'line_item_sex')
    material_fields = find_all_by_schema_id(annotation_tree, 'line_item_material')
  return material_fields, sex_fields

Step 4: Hide/Show shop specific fields

To separate our beautiful baboon/balloon logic from the rest of the webhook, we put the code inside of the toggle_shop_specific_fields function. Here we collect the shop specific fields using get_shop_specific_fields function and add attributes based on our condition.

from helpers import get_shop_specific_fields

def toggle_shop_specific_fields(annotation_tree):
  shop_type = find_by_schema_id(annotation_tree, 'shop_type')['value']

  is_balloon_shop = shop_type == 'balloon shop'
  is_baboon_shop = shop_type == 'baboon shop'

  balloon_fields, baboon_fields = get_shop_specific_fields(annotation_tree)

  operations.extend([
    { 
      'op': 'replace',
      'id': field['id'], 
      "hidden": not is_balloon_shop,
      "can_export": is_baboon_shop 
    }
    for field
    in balloon_fields
  ])

  operations.extend([
    { 
      'op': 'replace',
      'id': field['id'], 
      "hidden": not is_baboon_shop,
      "can_export": is_baboon_shop 
    }
    for field
    in baboon_fields
  ])

  return operations

@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
  annotation_tree = request.json['annotation']['content']
  operations = toggle_shop_specific_fields(annotation_tree)
    
  return jsonify({ "operations": operations })
def get_shop_specific_fields(annotation_tree):
    sex_fields = find_all_by_schema_id(annotation_tree, 'line_item_sex')
    material_fields = find_all_by_schema_id(annotation_tree, 'line_item_material')
  return material_fields, sex_fields

Step 5 (bonus): Optimise

But right now our logic will be called after every single update function, the user does. That's so unnecessary!

We want to change the shape of the table only:

  • when /validate is called for the first time
  • when the shop_type field changes
@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
  annotation_tree = request.json['annotation']['content']
  updated_datapoint_ids = request.json['updated_datapoint_ids']
  operations = []
    
  shop_type_id = find_by_schema_id(annotation_tree, 'shop_type')["id"]

  shop_type_changed = shop_type_id in updated_datapoint_ids
  action = request.json['action'] == "initialize"

  if action or shop_type_changed:
      operations.extend(toggle_shop_specific_fields(annotation_tree))
    
  return jsonify({ "operations": operations })

Now, we can be sure our table has an appropriate shape every time!

In the example, we were hiding an entire column of a table. That's the more complex case. You can use the knowledge of hidden and can_export attributes for any field.

Be sure to look at our docs regarding field attributes.

Updated about a month ago


What's Next

Do you want to learn more some advance use-cases for your webhook?

Vendor Matching In Master Database Using API

How To Change Schema During Extraction


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.