Updating the Extraction Schema using CLI

In this article about configuring the captured fields, we have learned that to non-trivially modify the sidebar (and what information is exported from the platform), we need to alter the extraction schema. The schema object specifies the set of data points that are extracted from a document, organized in sections. One queue always has just one schema associated with it.

The best way to manage a queue schema is the in-app extraction schema editor.

However, sometimes it may be helpful to use the API to edit the schema – particularly when automating schema changes or managing the schema as an Excel spreadsheet. Then, it is time to use the rossum command line tool.

First, you need to find out the id of the schema in question by listing the queues using the rossum queue list command:

$ rossum queue list
  id  name                 workspace  inbox                      schema    users  connector
----  -----------------  -----------  -----------------------  --------  -------  -----------
8962  Received invoices         8277  [email protected]           46697    11829

In this case, the schema id is 46697.

Download the schema first:

rossum schema get 46697 --format xlsx -O mySchema.xlsx

Modify the Excel file as needed - for example, update the field labels, change which fields are required, or add a new custom field. Depending on your preferences, you may find it easier to work with a JSON file rather than an Excel spreadsheet. Replace the filename extension xlsx with json when executing the Rossum commands.

Finally, upload the schema back to the queue:

rossum schema update 46697 --rewrite mySchema.xlsx

After a successful update, the new schema will be applied to all the documents in your queue.

You can check out the article on how to add a custom field to your schema.

📘

In the past, we recommended creating a new schema object every time a datapoint id is created or deleted. Because of that, a new schema object with a new id was created. It is not necessary to follow this recommendation anymore.
If you need to create a new schema object, you can still do so by omitting the --rewrite command. A schema object created like this will have a new schema id, so make sure to get its up-to-date id again first before doing other actions with the schema.