Rossum Developer Hub

Rossum Data Capture for Developers and Integrators

Welcome to the Rossum developer hub. You'll find comprehensive guides and documentation to help you implement Rossum as quickly as possible, as well as support if you get stuck.

Let's jump right in!

Developer Guides    API Reference    User Help Center    Feature wishlist

Extracting Data from Email Content with Custom Function

When using the Rossum's email inbox, your vendors might be sending you values of specific fields in the email content or subject of the email. Such values could be the Invoice ID, PO number or category of the invoice.

You can extend the Rossum's default behavior to capture the values from the email content and pre-fill the captured values of the selected fields in the validation schema/extraction schema.

Define mapping of the email fields to Extraction schema

Rossum can use several values from email to fill the values in the validation screen. Have a look at how to import such documents via email and use the fields in the extraction schema.

Rossum can also be extended with a custom extension listening to email.received event action, that receives the email metadata and tries to extract values from the email content. See below how to tell Rossum's Extraction schema that a field's value should be filled with value from email called (email:category in rir_field_names).

{
        "rir_field_names": [
          "email:category"
        ],
        "constraints": {
          "required": false
        },
        "default_value": null,
        "category": "datapoint",
        "id": "category",
        "label": "Category",
        "type": "string"
      }

Example function for parsing values from email content

In order to extract custom values from the email content and propagate them to the annotation content you should:

  • Create a custom function with the code shown below
  • Assign the function to email.received event action
  • Assign the function to a specific queue
import re

settings = {
    "email_fields": [
        {"id": "email:category", "regexps": ["HEX[0-9]+"]},
        {"id": "email:invoice_id", "regexps": ["ID-[0-9]+"]}
    ]
}

"""
The rossum_hook_request_handler is an obligatory main function that accepts
input and produces output of the rossum custom function hook.
:param payload: see https://api.elis.rossum.ai/docs/#annotation-content-event-data-format
:return: dict with files to be processed
"""


def rossum_hook_request_handler(payload):
    if payload['event'] == 'email' and payload["action"] == "received":

        try:
            files = main(payload)

        except Exception as e:
            print("Serverless function exception: {0}".format(e))
            return payload["files"]

        return {"files": files}


"""
Try to pass parsed values from email content to each of the documents to be processed.
:param payload: dict representing the payload
:return: dict with the API response
"""

def main(payload):
    incoming_files = payload["files"]
    email_subject = payload["headers"]["subject"]
    email_body = payload["body"]["body_text_plain"]

    accepted_files = []

    for file in incoming_files:
        
        print("Processing file: {0}".format(file))
        for field in settings["email_fields"]:
            
            print("Looking for field: {0}".format(field["id"]))
            parsed_values = parse_values_from_text(email_subject + email_body, field["regexps"])

            if parsed_values != []:
                
                if "values" not in file:
                    file["values"] = []
                
                print("Parsed values: {0}".format(parsed_values))
                file["values"].append({"id": field["id"], "value": ",".join(parsed_values)})

        accepted_files.append(file)
        
    print(accepted_files)

    return accepted_files

"""
Find all occurrences of the field's values defined by regular expressions.
:param text: Text to be searched
:return: list of found values.
"""

def parse_values_from_text(text, regexps):
    values = []

    for regexp in regexps:
        matches = re.findall(regexp, text)

        values += matches

    return values

Testing Input

You can use the sample input below for testing you custom function in Rossum's developer UI.

{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "hook": "https://api.elis.rossum.ai/v1/hooks/781",
  "action": "received",
  "event": "email",
  "files": [
    {
      "id": "1",
      "filename": "image.png",
      "mime_type": "image/png",
      "n_pages": 1,
      "height_px": 50,
      "width_px": 150
    },
    {
      "id": "2",
      "filename": "MS word.docx",
      "mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "n_pages": 30,
      "height_px": null,
      "width_px": null
    },
    {
      "id": "3",
      "filename": "agreement pdf.pdf",
      "mime_type": "application/pdf",
      "n_pages": 3,
      "height_px": 3510,
      "width_px": 2480
    },
    {
      "id": "4",
      "filename": "unknown_file",
      "mime_type": "application/pdf",
      "n_pages": 1,
      "height_px": null,
      "width_px": null
    }
  ],
  "headers": {
    "from": "[email protected]",
    "to": "[email protected]",
    "subject": "Invoice ABC from email",
    "date": "Mon, 04 May 2020 11:01:32 +0200",
    "message-id": "15909e7e68e4b5f56fd78a3b4263c4765df6cc4d"
  },
  "body": {
    "body_text_plain": "This is my invoice for categories HEX10, HEX30. And the Invoice ID is ID-123456 "
  }
}

Updated 5 months ago

Extracting Data from Email Content with Custom Function


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.