Pipeline Rule - converted numbers changed to zeros

(Steve Applegate) #1

I’m trying to create a pipeline rule to split a message containing comma seperated lines into individual fields. The first 3 fields are strings and work fine. The remaining three fields are converted to numbers, but always have zero as the value. Help please!

Pipeline Rule:
rule “Split Disk Pool Usage”
let x = split(",", to_string($message.message));
set_field(“PoolType”, x[1]);
set_field(“PoolName”, x[2]);
set_field(“PoolCapacity”, to_double(x[3]));
set_field(“PoolFreeSpace”, to_double(x[4]));
set_field(“PoolUsed”, to_long(x[5]));

example message:

search results:
NetBackup | PoolCapacity | PoolFreeSpace | PoolName | PoolType | Pool Used
NBUDiskPool | 0 | 0 | dp_disk_nbu5230-03 | PureDisk | 0

message field from search results:

(Jan Doberstein) #2

remove the number conversion in the processing pipelines. That does not make sense!

Because the type in the processing pipeline does not have any influence how the data is stored in Elasticsearch.

(Steve Applegate) #3

Ok, I’m stumped then. Here’s the post I referenced when trying to get the fields stored as numbers: Changing a field type from string to numeric and mid-setup and past string values. I must have misunderstood.

When I removed the number conversion from the rule, the rule failed. I’m assuming that was because those three fields had already been created as being numeric (with zero as every value). To test my theory, I renamed the fields, and the rule immediately started working and storing what looks like the correct numbers in the new fields. However, when I try to create a chart using one of the fields, the number of field occurances gets charted instead of the field value. That’s my root problem. To me, this seems to be caused by the field being stored as a string instead of as a number. Just a guess though, because I don’t know how to check the field data type. If I’m correct in my guess, I have no idea why something like 15791.01 would be interpreted as a string.

(Jan Doberstein) #4

you can force elasticsearch to store that as a number with a custom mapping (take a look at out docs how to create one)

(Steve Applegate) #5

I’ve removed the data type conversion from the rule

rule "Split NBUDiskPool message into fields"
    contains(to_string($message.message), "NBUDiskPool", true)
  let x = split(",", to_string($message.message));
  set_field("DP_Type", x[1]);
  set_field("DP_Name", x[2]);
  set_field("DP_Capacity", x[3]);
  set_field("DP_FreeSpace", x[4]);
  set_field("DP_PercentUsed", x[5]);

I’ve created the custom mapping. Due to the newer version of ElasticSearch, I had to add -H’Content-Type: application/json’ to the end of the line, so it looks like this:

curl -X PUT -d @‘graylog-custom-mapping.json’ ‘http://localhost:9200/_template/graylog-custom-mapping?pretty’ -H’Content-Type: application/json’

I rotated the index and recalculated the index ranges. I can see that ElasticSearch is using the new data types. I then ran the “simulation” on my pipeline rule, and I can see the message field successfully being broken out into individual fields


Unfortunately, these new fields are not being added to ElasticSearch. Only the message field is.


(Jan Doberstein) #6

are you sure ?

Did you see on the left no added fields?

(Steve Applegate) #7

I don’t see any new fields in the Search page. The “NetBackup” field gets created by an Extractor on the Input. The rest of the fields should be created by the Pipeline rule


(Jan Doberstein) #8

and did you connect the processing pipeline you put the rule in to any stream of messages? And what is your processing order in System > configuration?

(Steve Applegate) #9

When data comes into the Input, an extractor pulls off the first item of the line and puts it in a field called “NetBackup”. There is then a stream that grabs everything that has a “NetBackup” field. This stream is connected to my pipeline that contains the rule that splits the message into fields. When I generate a new log entry (I use NXLog), I can watch the Rule for my pipeline stage and see the message go through.

(Steve Applegate) #10

New info! Just found these entries in the graylog server.log file. I can see that Elasticsearch is trying to use the data type that I defined in the custom mapping file. It’s failing though

2019-02-13T13:02:53.013-05:00 WARN [Messages] Failed to index message: index=<graylog_1515> id=<94f61040-2fb9-11e9-aa0b-0050569313ff> error=<{“type”:“mapper_parsing_exception”,“reason”:“failed to parse field [DP_Capacity] of type [float]”,“caused_by”:{“type”:“number_format_exception”,“reason”:"For input string: “3\u00006\u00000\u00009\u00005\u0000.\u00009\u00007"”}}>

(Jan Doberstein) #11

you have defined a float and try to input a string - according to that message.

(Steve Applegate) #12

Yes, this has always been the problem. I have a log file with data in it like this:


I ship it to Graylog as Raw/Plaintext. I want the first three fields stored as text, and the last three fields stored as numbers. It seems on the surface that this would be an easy thing to do. I’ve tried using GROK extractors, and I’ve tried using Pipeline rules, I’ve tried custom mappings in ElasticSearch. Still no luck. I just can’t figure out why something like 72191.94 can’t be stored as a number

(Jan Doberstein) #13

please read on https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

(Steve Applegate) #14

Thanks Jan. I’ve read that document several times, and as far as I can tell I’m doing things correctly. I read in a string from a raw/Plaintext input. Here’s an example:


I split the line into new fields using a pipeline rule. At this point, the values are strings. If I convert them to numbers in the rule, the values become zeros (see initial problem description) Here’s the output from the Graylog Simulator for that Pipeline:


I have created a custom mapping file for Elasticsearch as follows:

  "template": "graylog_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "DP_Capacity" : {
          "type" : "long"
        "DP_FreeSpace" : {
          "type" : "long"
        "DP_PercentUsed" : {
          "type" : "long"

So as it stands, if I try to store the string representation of a number like 80994 into a numeric field in Elasticsearch, I get a datatype mismatch error. If I try to convert the string to a number in Graylog first, and then store it in Elasticsearch, the value is always zero. Am I missing a step somewhere? I’ve read every document that I can find on the subject from both Graylog and Elasticsearch.