Rossum Developer Hub

Rossum Data Capture for Developers and Integrators

Welcome to the Rossum developer hub. You'll find comprehensive guides and documentation to help you implement Rossum as quickly as possible, as well as support if you get stuck.

Let's jump right in!

Developer Guides    API Reference    User Help Center    Feature wishlist

Getting Duplicate Documents over API

Rossum performs a simple duplicate detection based on the md5 hash of the document. Thanks to that, identical documents are recognized and linked together. Read this article in order to find out how duplicated documents look in the UI.

Getting duplicates for single annotation

On the API level, you can find out whether an annotation is a duplicate of some other document by issuing a GET request on the following endpoint: https://elis.rossum.ai/api/v1/annotations?id=2707700&sideload=relations.

The response will return a single annotation with ID 2707700 in the results. When looking at the annotation object, you will notice a key called “relations”. The key refers to a list containing possible relations to other annotations. And one of the relation types is “duplicate”.

Since we already specified the “sideload=relations” parameter, we will get the relations objects in the single request and we can match them with the relations mentioned on the annotation.

{
    "pagination": {
        "total": 1,
        "total_pages": 1,
        "next": null,
        "previous": null
    },
    "results": [
        {
            "document": "https://elis.rossum.ai/api/v1/documents/2709836",
            "id": 2707700,
            "queue": "https://elis.rossum.ai/api/v1/queues/26191",
            "schema": "https://elis.rossum.ai/api/v1/schemas/207141",
            "relations": [
                "https://elis.rossum.ai/api/v1/relations/9209"
            ],
            "pages": [
                "https://elis.rossum.ai/api/v1/pages/5997566"
            ],
            "modifier": "https://elis.rossum.ai/api/v1/users/33131",
            "modified_at": "2020-10-12T14:59:29.645351Z",
            "confirmed_at": null,
            "exported_at": null,
            "assigned_at": "2020-10-12T14:59:29.645351Z",
            "status": "reviewing",
            "rir_poll_id": "32528119ac264cd2a4dc5319",
            "messages": [],
            "url": "https://elis.rossum.ai/api/v1/annotations/2707700",
            "content": "https://elis.rossum.ai/api/v1/annotations/2707700/content",
            "time_spent": 19,
            "metadata": {},
            "automated": false
        }
    ],
    "relations": [
        {
            "id": 9209,
            "type": "duplicate",
            "key": "3afc2a362803b1cb95cd5e372b18f74f",
            "parent": null,
            "annotations": [
                "https://elis.rossum.ai/api/v1/annotations/972010",
                "https://elis.rossum.ai/api/v1/annotations/2707652",
                "https://elis.rossum.ai/api/v1/annotations/3185419"
            ]
        }
    ]
}

If you do not know how to easily test our API, read our article about using Postman.

Getting duplicates for all annotations in "To Review"

If you would like to get all the duplicated documents in the to_review status, you can issue another GET request on https://elis.rossum.ai/api/v1/annotations?queue=26191&sideload=relations&status=to_review.

Additionally, if you would like to get the statuses and modifiers of the related annotations, you can fetch them by ID as GET on https://elis.rossum.ai/api/v1/annotations?id=972010,2707652,3185419&sideload=document,modifiers.

🚧

Export endpoint does not offer duplicate documents information

Currently, the /export endpoint does not allow to sideload information about duplicate documents.

Updated about a month ago

Getting Duplicate Documents over API


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.