How To Change Schema During Extraction

Typical Use Case

In a single Queue, you need to have some variation in what data to capture from different documents. You would like to show and export a different set of fields in each document based on some dynamic condition.

Your first thought could lead you to a PATCH request of the Schema from your webhook.
However, that would be wrong: The UI wouldn't change and the extracted data wouldn't reflect schema change. Furthermore, all documents in the queue would be affected by this change.

How To Change Extraction Schema Shape on the Fly

While it's not possible to dynamically add/remove fields from a schema, each field has an attribute hidden, which can be set through the standard endpoint of your webhook.

The idea is to add all possible fields to the schema and turn those datapoints on and off, based on some business logic with the usage of hidden.

Let's go through the steps together, following an example use-case of products in line items.

📘

Example use-case: Products in Line Items

You have a shop for selling balloons. You're using Rossum to process Purchase Orders from this shop. You have 1 custom header field, stating whether the ordered balloon needs an extra note to be added to the PO, based on the PO number. If the PO number starts with 'A', the balloon needs an extra note for your colleagues. If it doesn't, the note is not needed.

Let's use hidden flag to avoid having the field present every time, even when it is not necessary.

❗️

Feature limitations

This feature is not available for tables, only for header fields.

Step 1: Create a webhook

Follow this guide on How To Set Up An Extension

Step 2: Create a condition inside of /validate method

This condition is a part of your business logic. It can be literally anything. For reference on the find_by_schema_id function look here.

@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
	annotation_tree = request.json['annotation']['content']
	operations = []
    
	po_field = find_by_schema_id(annotation_tree, 'order_id')['value']
	note_field = find_by_schema_id(annotation_tree, 'order_note')['value']
    
	needs_note = po_field.startswith("A")

Step 3: Hide/Show PO note field

To separate the purchase orders that need a note from the rest of the webhook, we create a hidden attribute to tells us whether the note field should be hidden or not. Then we pass this value to operations - an object that will make this change happening in our response.

@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
	annotation_tree = request.json['annotation']['content']
	operations = []
    
	po_field = find_by_schema_id(annotation_tree, 'order_id')
	note_field = find_by_schema_id(annotation_tree, 'order_note')
    
	needs_note = po_field["value"].startswith("A")
	hidden = False if needs_note else True

	operations.extend([
    { 
      "op": "replace",
      "id": note_field["id"],
      "value": {
         "hidden": hidden
      } 
    }
  ])
    
	return jsonify({"messages": [], "operations": operations })
def get_shop_specific_fields(annotation_tree):
	sex_fields = find_all_by_schema_id(annotation_tree, 'line_item_sex')
 	material_fields = find_all_by_schema_id(annotation_tree, 'line_item_material')
  return material_fields, sex_fields

Step 4 (bonus): Optimise

But right now our logic will be called after every single update function, the user does. That's so unnecessary!

We want to change the shape of the table only when the purchase order field is updated. For this purpose, we need to make the webhook to listen only to the annotation_content.initialize and annotation_content.user_update events. We can make this change in the UI. Afterwards, we will update our code appropriately:

@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
	annotation_tree = request.json['annotation']['content']
	messages, operations = [], []
	action = request.json["action"]
	is_initial = action == "initialize"
	updated_datapoints = request.json["updated_datapoints"]
	po_field = find_by_schema_id(annotation_tree, 'order_id')
  
	if not is_initial or po_field["id"] in updated_datapoints:
    return jsonify({"messages": messages, "operations": operations})
  
	note_field = find_by_schema_id(annotation_tree, 'order_note')	
	needs_note = po_field["value"].startswith("A")
	hidden = False if needs_note else True

	operations.extend([
    { 
      "op": "replace",
      "id": note_field["id"],
      "value": {
         "hidden": hidden
      } 
    }
  ])
    
	return jsonify({"messages": messages, "operations": operations })

Now, we can be sure our schema has an appropriate shape every time!

Be sure to look at our docs regarding field attributes.


What’s Next

Do you want to learn more some advance use-cases for your webhook?