In the high-level overview of configuring the captured fields, we have learned that in order to non-trivially modify the sidebar (and what information is exported from the platform), we need to modify the extraction schema. The schema object specifies the set of datapoints that are extracted from a document, organized in sections. One queue has always just one schema associated with it.
The best way to manage a queue schema is the in-app extraction schema editor.
However, sometimes it may be useful to use the API to edit the schema – in particular when automating schema changes, or when managing the schema as an Excel spreadsheet. Then, it is time to use the rossum command line tool.
First, you need to find out the id of the schema in question, by listing the queues using the
rossum queue list command:
$ rossum queue list id name workspace inbox schema users connector ---- ----------------- ----------- ----------------------- -------- ------- ----------- 8962 Received invoices 8277 [email protected] 46697 11829
In this case, the schema id is 46697.
Download the schema first:
rossum schema get 46697 --format xlsx -O mySchema.xlsx
Modify the Excel file as needed - for example updating the field labels, changing which fields are required, or adding a new custom field. You may find it easier to work with a JSON file rather than Excel spreadsheet, depending on your preferences. Simply replace the filename extension xlsx with json when executing the rossum commands.
Finally, upload the schema back to the queue:
rossum schema update 46697 --rewrite mySchema.xlsx
After a successful update, the new schema will be applied to all the documents in your queue.
You can check out the article on how to add a custom field to your schema.
In the past, it was recommended to create a new schema object every time a datapoint id is created or deleted. This caused that a new schema object with a new id was created. It is not necessary to follow this recommendation anymore.
If you would need to create a new schema object, you can still do so by omitting the
--rewritecommand. A schema object created like this will have a new schema id, so make sure to get its up-to-date id again first before doing other actions with the schema.
Updated 6 months ago