Getting Duplicate Documents over API

Rossum performs a simple duplicate detection based on the md5 hash of the document. Thanks to that, identical documents are recognized and linked together. Read this article to find out how duplicated documents look in the UI.

Getting duplicates for single annotation

On the API level, you can find out whether an annotation is a duplicate of another document by issuing a GET request on the following endpoint: https://example.rossum.app/api/v1/annotations?id=2707700&sideload=relations.

The response will return a single annotation with ID 2707700 in the results. When looking at the annotation object, you will notice a key called “relations.” The key refers to a list containing possible relations to other annotations. And one of the relation types is “duplicate.”

Since we already specified the “sideload=relations” parameter, we will get the relations objects in the single request, and we can match them with the relations mentioned in the annotation.

{
    "pagination": {
        "total": 1,
        "total_pages": 1,
        "next": null,
        "previous": null
    },
    "results": [
        {
            "document": "https://example.rossum.app/api/v1/documents/2709836",
            "id": 2707700,
            "queue": "https://example.rossum.app/api/v1/queues/26191",
            "schema": "https://example.rossum.app/api/v1/schemas/207141",
            "relations": [
                "https://example.rossum.app/api/v1/relations/9209"
            ],
            "pages": [
                "https://example.rossum.app/api/v1/pages/5997566"
            ],
            "modifier": "https://example.rossum.app/api/v1/users/33131",
            "modified_at": "2020-10-12T14:59:29.645351Z",
            "confirmed_at": null,
            "exported_at": null,
            "assigned_at": "2020-10-12T14:59:29.645351Z",
            "status": "reviewing",
            "rir_poll_id": "32528119ac264cd2a4dc5319",
            "messages": [],
            "url": "https://example.rossum.app/api/v1/annotations/2707700",
            "content": "https://example.rossum.app/api/v1/annotations/2707700/content",
            "time_spent": 19,
            "metadata": {},
            "automated": false
        }
    ],
    "relations": [
        {
            "id": 9209,
            "type": "duplicate",
            "key": "3afc2a362803b1cb95cd5e372b18f74f",
            "parent": null,
            "annotations": [
                "https://example.rossum.app/api/v1/annotations/972010",
                "https://example.rossum.app/api/v1/annotations/2707652",
                "https://example.rossum.app/api/v1/annotations/3185419"
            ]
        }
    ]
}

If you do not know how to easily test our API, read our article about using Postman.

Getting duplicates for all annotations in "To Review"

If you want to get all the duplicated documents in the to_review status, you can issue another GET request on https://example.rossum.app/api/v1/annotations?queue=26191&sideload=relations&status=to_review.

Additionally, suppose you would like to get the statuses and modifiers of the related annotations. In that case, you can fetch them by ID as GET on https://example.rossum.app/api/v1/annotations?id=972010,2707652,3185419&sideload=document,modifiers.

🚧

Export endpoint does not offer duplicate documents information

Currently, the /export endpoint does not allow to sideload information about duplicate documents.