How To Change Schema During Extraction
Typical Use Case
In a single Queue, you need to have some variation in what data to capture from different documents. You would like to show and export a different set of fields in each document based on some dynamic condition.
Your first thought could lead you to a PATCH request of the Schema from your webhook.
However, that would be wrong: The UI wouldn't change and the extracted data wouldn't reflect schema change. Furthermore, all documents in the queue would be affected by this change.
How To Change Extraction Schema Shape on the Fly
While it's not possible to dynamically add/remove fields from a schema, each field has an attribute hidden
, which can be set through the standard endpoint of your webhook.
The idea is to add all possible fields to the schema and turn those datapoints on and off, based on some business logic with the usage of hidden
.
Let's go through the steps together, following an example use-case of products in line items.
Example use-case: Products in Line Items
You have a shop for selling balloons. You're using Rossum to process Purchase Orders from this shop. You have 1 custom header field, stating whether the ordered balloon needs an extra note to be added to the PO, based on the PO number. If the PO number starts with 'A', the balloon needs an extra note for your colleagues. If it doesn't, the note is not needed.
Let's use
hidden
flag to avoid having the field present every time, even when it is not necessary.
Feature limitations
This feature is not available for tables, only for header fields.
Step 1: Create a webhook
Follow this guide on How To Set Up An Extension
Step 2: Create a condition inside of /validate method
This condition is a part of your business logic. It can be literally anything. For reference on the find_by_schema_id
function look here.
@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
annotation_tree = request.json['annotation']['content']
operations = []
po_field = find_by_schema_id(annotation_tree, 'order_id')['value']
note_field = find_by_schema_id(annotation_tree, 'order_note')['value']
needs_note = po_field.startswith("A")
Step 3: Hide/Show PO note field
To separate the purchase orders that need a note from the rest of the webhook, we create a hidden
attribute to tells us whether the note field should be hidden or not. Then we pass this value to operations - an object that will make this change happening in our response.
@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
annotation_tree = request.json['annotation']['content']
operations = []
po_field = find_by_schema_id(annotation_tree, 'order_id')
note_field = find_by_schema_id(annotation_tree, 'order_note')
needs_note = po_field["value"].startswith("A")
hidden = False if needs_note else True
operations.extend([
{
"op": "replace",
"id": note_field["id"],
"value": {
"hidden": hidden
}
}
])
return jsonify({"messages": [], "operations": operations })
def get_shop_specific_fields(annotation_tree):
sex_fields = find_all_by_schema_id(annotation_tree, 'line_item_sex')
material_fields = find_all_by_schema_id(annotation_tree, 'line_item_material')
return material_fields, sex_fields
Step 4 (bonus): Optimise
But right now our logic will be called after every single update function, the user does. That's so unnecessary!
We want to change the shape of the table only when the purchase order field is updated. For this purpose, we need to make the webhook to listen only to the annotation_content.initialize
and annotation_content.user_update
events. We can make this change in the UI. Afterwards, we will update our code appropriately:
@app.route('/change_schema_shape', methods=['POST'])
def change_schema_shape():
annotation_tree = request.json['annotation']['content']
messages, operations = [], []
action = request.json["action"]
is_initial = action == "initialize"
updated_datapoints = request.json["updated_datapoints"]
po_field = find_by_schema_id(annotation_tree, 'order_id')
if not is_initial or po_field["id"] in updated_datapoints:
return jsonify({"messages": messages, "operations": operations})
note_field = find_by_schema_id(annotation_tree, 'order_note')
needs_note = po_field["value"].startswith("A")
hidden = False if needs_note else True
operations.extend([
{
"op": "replace",
"id": note_field["id"],
"value": {
"hidden": hidden
}
}
])
return jsonify({"messages": messages, "operations": operations })
Now, we can be sure our schema has an appropriate shape every time!
Be sure to look at our docs regarding field attributes.
Updated almost 2 years ago
Do you want to learn more some advance use-cases for your webhook?