ixp

latest

false

Communications Mining user guide

Elasticsearch integration

Communications Mining™ offers a rich set of built-in analytics tools. However, sometimes it is necessary to join the predictions from Communications Mining with data that can't be uploaded as part of Communications Mining comments. In these cases a common solution is to index the Communications Mining predictions and any additional data into Elasticsearch and use a tool like Kibana to drive analytics.

This tutorial describes how to import Communications Mining data into Elasticsearch and visualize it in Kibana.

The data used in the examples throughout this tutorial is generated dummy emails from the insurance domain.

Storing data in Elasticsearch

First, let's define the data that we want to import into Elasticsearch. Communications Mining™ API provides the comment text, comment metadata, predicted labels and predicted general fields in a nested JSON object. The following is an example of a raw comment provided by the Communications Mining API.

Note:

You may notice different metadata fields depending on how your data was ingested into Communications Mining. To learn more about comment object fields, check Comments.

{
  "comment": {
    "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "timestamp": "2021-03-29T08:36:25.607Z",
    "messages": [
      {
        "body": {
          "text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
        },
        "subject": {
          "text": "Change of address - Policy SFG48807871"
        },
        "from": "CPX8460080@broker.com",
        "to": ["underwriter@insurer.com"],
        "sent_at": "2021-03-29T08:36:25.607Z"
      }
    ]
    // (... more properties ...)
  },
  "labels": [
    {
      "name": ["Admin"],
      "probability": 0.9995054006576538
    },
    {
      "name": ["Admin", "Change of address"],
      "probability": 0.9995054006576538
    }
  ],
  "entities": [
    {
      "name": "address-line-1",
      "formatted_value": "19 Essex Gardens",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 63,
        "char_end": 79,
        "utf16_byte_start": 126,
        "utf16_byte_end": 158
      }
    },
    {
      "name": "post-code",
      "formatted_value": "SW17 2UL",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 81,
        "char_end": 89,
        "utf16_byte_start": 162,
        "utf16_byte_end": 178
      }
    },
    {
      "name": "policy-number",
      "formatted_value": "SFG48807871",
      "span": {
        "content_part": "subject",
        "message_index": 0,
        "char_start": 27,
        "char_end": 38,
        "utf16_byte_start": 54,
        "utf16_byte_end": 76
      }
    }
  ]
}
{
  "comment": {
    "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
    "timestamp": "2021-03-29T08:36:25.607Z",
    "messages": [
      {
        "body": {
          "text": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL"
        },
        "subject": {
          "text": "Change of address - Policy SFG48807871"
        },
        "from": "CPX8460080@broker.com",
        "to": ["underwriter@insurer.com"],
        "sent_at": "2021-03-29T08:36:25.607Z"
      }
    ]
    // (... more properties ...)
  },
  "labels": [
    {
      "name": ["Admin"],
      "probability": 0.9995054006576538
    },
    {
      "name": ["Admin", "Change of address"],
      "probability": 0.9995054006576538
    }
  ],
  "entities": [
    {
      "name": "address-line-1",
      "formatted_value": "19 Essex Gardens",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 63,
        "char_end": 79,
        "utf16_byte_start": 126,
        "utf16_byte_end": 158
      }
    },
    {
      "name": "post-code",
      "formatted_value": "SW17 2UL",
      "span": {
        "content_part": "body",
        "message_index": 0,
        "char_start": 81,
        "char_end": 89,
        "utf16_byte_start": 162,
        "utf16_byte_end": 178
      }
    },
    {
      "name": "policy-number",
      "formatted_value": "SFG48807871",
      "span": {
        "content_part": "subject",
        "message_index": 0,
        "char_start": 27,
        "char_end": 38,
        "utf16_byte_start": 54,
        "utf16_byte_end": 76
      }
    }
  ]
}

The schema of the raw comments returned by the Communications Mining API is inconvenient for filtering and querying this data in Elasticsearch, so you should change the schema before ingesting the data into Elasticsearch. The following is an example flattened schema you can use. You should add all fields that you need for your use-case.

{
  "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "timestamp": "2021-03-29T08:36:25.607Z",
  "subject": "Change of address - Policy SFG48807871",
  "body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
  // (... more fields ...)
  "labels": ["Admin", "Admin > Change of address"],
  "entities": {
    "policy_number": ["SFG48807871"],
    "address-line-1": ["19 Essex Gardens"],
    "post-code": ["SW17 2UL"]
  }
}
{
  "id": "c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "uid": "49ba2c56a945386c.c7a1c529-3f57-4be6-9102-c9f892b81ae51",
  "timestamp": "2021-03-29T08:36:25.607Z",
  "subject": "Change of address - Policy SFG48807871",
  "body": "The policyholder has changed their address to the new address: 19 Essex Gardens, SW17 2UL",
  // (... more fields ...)
  "labels": ["Admin", "Admin > Change of address"],
  "entities": {
    "policy_number": ["SFG48807871"],
    "address-line-1": ["19 Essex Gardens"],
    "post-code": ["SW17 2UL"]
  }
}

Note:

A comment can have zero, one, or multiple labels, so the labels field needs to be an array. Additionally, if one or more general field types have been configured for the dataset, a comment will have zero, one, or more general fields of each general field type. The hierarchical label names in the raw API response are themselves arrays (["Admin", "Change of address"]), and should be converted to strings ("Admin > Change of address").

Fetching Data

In order to fetch the data, we recommend using the. For an overview of all available data download methods, check Downloading data. When creating a Stream, you should set the thresholds for each label so that labels with confidence scores below the threshold are discarded.

This is easiest to do from the Communications Mining™ UI by going to the "Streams" page of a dataset. Having used the confidence scores to determine whether a label applies, you can then import just the label names into Elasticsearch. For a discussion on when we recommend dropping or keeping label confidence scores, check Labels for Analytics.

General fields do not have confidence scores, so no special handling is required.

Note:

Model change management When you create a Stream, you specify a model version, which is used to provide predictions when fetching comments from the Stream. Even as you continue to train new model versions in the platform, your Stream will use the model version you specified, providing you with deterministic results.

To upgrade to a new model version, you must create a new Stream that uses that model version, then update your code to use the new Stream. For this reason, we recommend that you make the Stream name configurable in your code.

To make sure that analytics that use predictions stay consistent, you should re-ingest predictions for historical data using the updated model version. You can do that by the Stream to the timestamp before your oldest comment, and re-ingesting the data from the start.

Visualizing Data in Kibana

Once you indexed the data in Elasticsearch, you can start building visualizations. This section provides simple examples for a number of common visualization tools in Kibana.

Timelion

You can use the following expression to produce a plot of top 5 most common labels over time.

Note:

This shows both top-level category and subcategory labels.

.es(index=example-data,split=labels:5,timefield=@timestamp)
    .label("$1", "^.* > labels:(.+) > .*")
.es(index=example-data,split=labels:5,timefield=@timestamp)
    .label("$1", "^.* > labels:(.+) > .*")

Figure 1. Top 5 labels in a dataset plotted over time.

Bar Chart

This bar chart shows the top 20 sender email addresses in the dataset. Sender and recipient email addresses are part of comment metadata in email-based datasets.

Figure 2. Top 20 sender email addresses.

Pie Chart

This pie chart shows subcategory labels under the top-level "Claim" label. The label categories are defined by the user training the model.

Figure 3. Subcategories of the Claim label.

On this page

Storing data in Elasticsearch
Fetching Data
Visualizing Data in Kibana
Timelion
Bar Chart
Pie Chart

Was this page helpful?

PREVIOUSFetching data for Tableau with Python

NEXTGeneral field extraction

Communications Mining user guide

Storing data in Elasticsearch​

Fetching Data​

Visualizing Data in Kibana​

Timelion​

Bar Chart​

Pie Chart​

Was this page helpful?

Storing data in Elasticsearch

Fetching Data

Visualizing Data in Kibana

Timelion

Bar Chart

Pie Chart