Using IP address data in Elasticsearch

IP address and network data can be stored and searched very easily in Elasticsearch.

The two field types commonly used for storing IP address data are:

  • ip for storing a single IP address
  • ip_range for storing IP networks; ranges of IP addresses

Let’s create a mapping to use both of these field types:

PUT routes
{
  "settings": {
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "destination": {
        "type": "ip_range"
      },
      "nextHop": {
        "type": "ip"
      }
    }
  }
}

Each route has a name, a range of destination IP addresses, and the IP of the next hop for matching traffic.

We can add some data to test with. This will create two routes; Route 1 with the range 192.168.1.0 to 192.168.1.127 and Route 2 with the range 192.168.1.128 to 192.168.1.255. Elasticsearch deals with creating the actual range from the CIDR notation:

POST routes/_doc
{
  "name": "Route 1",
  "destination": "192.168.1.0/25",
  "nextHop": "192.168.2.1"
}

POST routes/_doc
{
  "name": "Route 2",
  "destination": "192.168.1.128/25",
  "nextHop": "192.168.2.2"
}

To find the next hop for traffic destined for a given IP address, we can match the IP against the destination field using a regular match query. Doing this will find documents where the ip_range contains the IP address:

GET routes/_search
{
  "query": {
    "match": {
      "destination": "192.168.1.15"
    }
  }
}

...

{
  ...
  "hits" : {
    ...
    "hits" : [
      {
        ...
        "_source" : {
          "name" : "Route 1",
          "destination" : "192.168.1.0/25",
          "nextHop" : "192.168.2.1"
        }
      }
    ]
  }
}

A match query will also match against an IP field. To find routes with a given nextHop address:

GET routes/_search
{
  "query": {
    "match": {
      "nextHop": "192.168.2.2"
    }
  }
}

...

{
  ...
  "hits" : {
    ...
    "hits" : [
      {
        ...
        "_source" : {
          "name" : "Route 2",
          "destination" : "192.168.1.128/25",
          "nextHop" : "192.168.2.2"
        }
      }
    ]
  }
}

An ip field can also be searched using a range query to find documents with an IP in a given network, although Elasticsearch won’t let us use CIDR notation here:

GET routes/_search
{
  "query": {
    "range": {
      "nextHop": {
        "gte": "192.168.2.0",
        "lte": "192.168.2.4"
      }
    }
  }
}

...

{
  ...
  "hits" : {
    ...
    "hits" : [
      {
        ...
        "_source" : {
          "name" : "Route 1",
          "destination" : "192.168.1.0/25",
          "nextHop" : "192.168.2.1"
        }
      },
      {
        ...
        "_source" : {
          "name" : "Route 2",
          "destination" : "192.168.1.128/25",
          "nextHop" : "192.168.2.2"
        }
      }
    ]
  }
}

One edge-case that cropped up recently on the Elastic discussion forum is searching for documents based on the exact network using CIDR notation. In this example, we could be looking for a specific route.

Using a match query doesn’t work in this case:

GET routes/_search
{
  "query": {
    "match": {
      "destination": "192.168.1.0/25"
    }
  }
}

...

{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: '192.168.1.0/25' is not an IP string literal.",
        "index_uuid" : "FfFYQWtfTQOVxlJM4lUJXg",
        "index" : "routes"
      }
    ],
    ...
  }
}

To allow this type of query, add a keyword multi-field to the ip_range field. The route can then be found using the original CIDR notation:

PUT routes_fixed
{
  "settings": {
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "destination": {
        "type": "ip_range",
        "fields": {
          "raw": {
            "type": "keyword"
          }
        }
      },
      "nextHop": {
        "type": "ip"
      }
    }
  }
}

POST _reindex
{
  "source": {
    "index": "routes"
  },
  "dest": {
    "index": "routes_fixed"
  }
}

GET routes_fixed/_search
{
  "query": {
    "match": {
      "destination.raw": "192.168.1.0/25"
    }
  }
}

...

{
  ...
  "hits" : {
    ...
    "hits" : [
      {
        ...
        "_source" : {
          "name" : "Route 1",
          "destination" : "192.168.1.0/25",
          "nextHop" : "192.168.2.1"
        }
      }
    ]
  }
}

Multi-fields are incredibly useful when you have a piece of data that needs to be queried in multiple ways. Text data is a common use-case. A piece of text can be stored in a regular text field with a keyword version as a multi-field so the exact text can be queried or used in aggregations. Using multiple analyzers is also a common requirement.