LoginSignup
0
0

More than 5 years have passed since last update.

Elasticsearch Array Support and Type Mapping

Last updated at Posted at 2017-10-23

This post is mostly about what I felt bothering while using Elasticsearch/Kibana and necessary procedures/configurations to deal with the troubles.

1. Elasticsearch Json Support

A while ago I sent a post request to Elasticsearch, then it returned an error saying like "type":"illegal_argument_exception","reason":"mapper [sys-id] of different type, current_type [long], merged_type [text]".

root@ubuntu:~# curl -XPOST "http://10.62.130.226:9200/test_index/test_type"  -d ' 
{
"sys-id":["50*******", "xbrick", 1]
}
'

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"mapper [sys-id] of different type, current_type [long], merged_type [text]"}],"type":"illegal_argument_exception","reason":"mapper [sys-id] of different type, current_type [long], merged_type [text]"},"status":400}

According to the official document https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html, Elasticsearch does not support arrays which contain multiple type in it such as integer and string like this [ 10, "some string" ].
The document says,

When adding a field dynamically, the first value in the array determines the field type. All subsequent values must be of the same datatype or it must at least be possible to coerce subsequent values to the same datatype.
Arrays with a mixture of datatypes are not supported: [ 10, "some string" ]

What was bothering to me was the fact that json input come from various platforms via REST API.
If I cautiously parse json and pick values for Elasticsearch, indeed, the type inconsistency error can easily be avoided. However, given that there are several platforms with dozens of API endpoints which returns various key-value pairs in different json formats, I thought writing code for parsing certain json inputs from REST API and re-formatting them in accordance to Elasticsearch supported format is not an efficient way to handle json inputs. (If you write code for parsing certain key-value pairs in json, you need to rewrite code every time any other key-value pairs become necessary because of some reasons.)
As a result, in order to deal with this Elasticsearch array support limitation, I just wrote very simple code which converts all Integer type, e.g. 1 and 2, into String "1" and "2". Just because of this simple json re-formatting, all json inputs from various REST API endpoints are passed to Elasticsearch with no errors.

Memo
At the time, that decision passing all json inputs from REST API to Elasticseach seemed not a so bad idea because it would reduce the cost of future code rewriting and there was no data loss because of json re-formatting. It was guaranteed that all data from APIs is stored in Elasticseach at least.
Looking back, on the other hand, I think I should've designed and picked more carefully what values were really necessary to be retrieved from API responses. Indeed, planning ahead what data you really need before you see is hard. However, provide that Elasticsearch was not so flexible as I expected (described in the later part of this post), array type values are not so well supported by Kibana as well, most of data from API was not used (because API returns massive amount of json), some values need to be changed their type by mapping, too many values displayed on Kibana GUI harmed user usability and so on, system architecture for json-formatting/value-selection should've been considered a little bit more.

2. Type Mapping

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html

Not all digits are stored as numeric values in json which is returned from REST API. (Especially in my case, because of the inconsistency type error in array, all digits in json are converted into string format). Those string formatted digits (enclosed by quotations) are considered as text type in Elasticsearch. As a result, you need to add mapping configuration to Elasticsearch index for using the values as numeric type.

You can use a put request for adding mapping configuration to an index as below.

curl -XPUT "http://elasticsearch_endpoint/index_name" -d '
{
  "mappings": {
    "type_name": {
      "properties": {
        "value_name_to_change_type_mapping": {
          "type": "value_type_to_convert" 
        },
        "example_data_GB": {
          "type": "double"  <- You can set mapping type into which "example_data_GB" inputs are converted.
        }
      }
    }
  }
} '

This method is simple, however, what you need to be careful is you can't update mapping configuration for existing values. It means, if "example_data_GB" is already stored as text type in your index, you can't change the value type into numeric.
So, in case you want to use other values as numeric but they are stored as text, what should you do? In that case, you need to consider re-indexing.

3. Re-indexing

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/docs-reindex.html

This is one of the reasons why I think Elasticseach is not so flexible. In case some value types in an existing index need to be converted, you need to re-index the existing index.
Re-indexing itself is not a hard procedure. Just use the following command. Note that the source and target index should have different names.

curl -XPOST 'http://elasticsearch_endpoint/_reindex?pretty'  -d'
{
  "source": {
    "index": "source_index"
  },
  "dest": {
    "index": "target_index"
  }
}
'

You can convert existing type mapping using re-indexing by the following steps.
(The below steps supposes that you don't want the original index's name to be changed.)
Here, Step 5 is the most important part to convert type mapping. You need to know that re-indexing is just moving data from an index to the other. The target index doesn't inherit the original index's mapping or other configuration. As a result, not just an additional mapping configuration, but also you need to carefully add existing configurations to the target.

1. Create a new temp index
2. Reindex the original index to the temp index.
3. Delete the source original index
4. Create a new index with the same name of the original index. 
5. Add mapping and other configurations to the new original index. 
6. Reindex to the new index from the temp one.

4. Pipeline

https://www.elastic.co/guide/en/elasticsearch/reference/master/accessing-data-in-pipelines.html

Elasticsearch and Kibana support time-series format data but most of API json responses don't contain timestamp. Therefore, to use json inputs as time-series data, you need to add timestamp to json when they are passed to elasticsearch.
One of the methods is programmatically adding timestamp to json before passing json to Elasticsearch. But also you can use pipeline API built in Elasticsearch in the following steps.

First, Define an ingest pipeline.
In the below command, a new pipeline timestamp is created.
By calling the pipeline when posting a json to elasticsearch, a timestamp field is added to the json.

curl -XPUT 'http://elasticsearch_endpoint/_ingest/pipeline/timestamp?pretty'  -d'
{
  ""description"" : ""Adds a timestamp field"",
  ""processors"" : [ 
{
    ""set"" : {
      ""field"": ""timestamp"",
      ""value"": ""{{_ingest.timestamp}}""
    }
  }]
}

Second, add mapping configuration to convert timestam format. (Optional)

curl -XPUT "http://elasticsearch_endpoint/index_name" -d '
{
  "mappings": {
    "index_type": {
      "properties": {
       "timestamp": {
          "type":   "date",
          "format": "EEE MMM dd HH:mm:ss ZZZ yyyy"
        }
      }
    }
  }
} '

Third, post json to Elasticsearch through the pipeline timestamp.

curl -XPOST 'http://elasticsearch_endpoint/index_name/type_name/?pipeline=timestamp&pretty  -d '
<json to post>
'

Through the pipeline, Elasticsearch stores json inputs with an additional field timestamp so that you can handle the json as time-series data.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0