Pipeline Rule - converted numbers changed to zeros

(Steve Applegate) #1

I’m trying to create a pipeline rule to split a message containing comma seperated lines into individual fields. The first 3 fields are strings and work fine. The remaining three fields are converted to numbers, but always have zero as the value. Help please!

Pipeline Rule:
rule “Split Disk Pool Usage”
when
has_field(“NetBackup”)
then
let x = split(",", to_string($message.message));
set_field(“PoolType”, x[1]);
set_field(“PoolName”, x[2]);
set_field(“PoolCapacity”, to_double(x[3]));
set_field(“PoolFreeSpace”, to_double(x[4]));
set_field(“PoolUsed”, to_long(x[5]));
end

example message:
NBUDiskPool,PureDisk,dp_disk_nbu5230-03,80964.26,15791.01,80

search results:
NetBackup | PoolCapacity | PoolFreeSpace | PoolName | PoolType | Pool Used
NBUDiskPool | 0 | 0 | dp_disk_nbu5230-03 | PureDisk | 0

message field from search results:
NBUDiskPool,PureDisk,dp_disk_nbu5230-03,80964.26,15791.01,80

0 Likes

(Jan Doberstein) #2

remove the number conversion in the processing pipelines. That does not make sense!

Because the type in the processing pipeline does not have any influence how the data is stored in Elasticsearch.

0 Likes

(Steve Applegate) #3

Ok, I’m stumped then. Here’s the post I referenced when trying to get the fields stored as numbers: Changing a field type from string to numeric and mid-setup and past string values. I must have misunderstood.

When I removed the number conversion from the rule, the rule failed. I’m assuming that was because those three fields had already been created as being numeric (with zero as every value). To test my theory, I renamed the fields, and the rule immediately started working and storing what looks like the correct numbers in the new fields. However, when I try to create a chart using one of the fields, the number of field occurances gets charted instead of the field value. That’s my root problem. To me, this seems to be caused by the field being stored as a string instead of as a number. Just a guess though, because I don’t know how to check the field data type. If I’m correct in my guess, I have no idea why something like 15791.01 would be interpreted as a string.

0 Likes

(Jan Doberstein) #4

you can force elasticsearch to store that as a number with a custom mapping (take a look at out docs how to create one)

0 Likes

(Steve Applegate) #5

I’ve removed the data type conversion from the rule

rule "Split NBUDiskPool message into fields"
when
    contains(to_string($message.message), "NBUDiskPool", true)
then
  let x = split(",", to_string($message.message));
  set_field("DP_Type", x[1]);
  set_field("DP_Name", x[2]);
  set_field("DP_Capacity", x[3]);
  set_field("DP_FreeSpace", x[4]);
  set_field("DP_PercentUsed", x[5]);
end

I’ve created the custom mapping. Due to the newer version of ElasticSearch, I had to add -H’Content-Type: application/json’ to the end of the line, so it looks like this:

curl -X PUT -d @‘graylog-custom-mapping.json’ ‘http://localhost:9200/_template/graylog-custom-mapping?pretty’ -H’Content-Type: application/json’

I rotated the index and recalculated the index ranges. I can see that ElasticSearch is using the new data types. I then ran the “simulation” on my pipeline rule, and I can see the message field successfully being broken out into individual fields

image

Unfortunately, these new fields are not being added to ElasticSearch. Only the message field is.

image

0 Likes

(Jan Doberstein) #6

are you sure ?

Did you see on the left no added fields?

0 Likes

(Steve Applegate) #7

I don’t see any new fields in the Search page. The “NetBackup” field gets created by an Extractor on the Input. The rest of the fields should be created by the Pipeline rule

image

0 Likes

(Jan Doberstein) #8

and did you connect the processing pipeline you put the rule in to any stream of messages? And what is your processing order in System > configuration?

0 Likes

(Steve Applegate) #9

When data comes into the Input, an extractor pulls off the first item of the line and puts it in a field called “NetBackup”. There is then a stream that grabs everything that has a “NetBackup” field. This stream is connected to my pipeline that contains the rule that splits the message into fields. When I generate a new log entry (I use NXLog), I can watch the Rule for my pipeline stage and see the message go through.

0 Likes

(Steve Applegate) #10

New info! Just found these entries in the graylog server.log file. I can see that Elasticsearch is trying to use the data type that I defined in the custom mapping file. It’s failing though

2019-02-13T13:02:53.013-05:00 WARN [Messages] Failed to index message: index=<graylog_1515> id=<94f61040-2fb9-11e9-aa0b-0050569313ff> error=<{“type”:“mapper_parsing_exception”,“reason”:“failed to parse field [DP_Capacity] of type [float]”,“caused_by”:{“type”:“number_format_exception”,“reason”:"For input string: “3\u00006\u00000\u00009\u00005\u0000.\u00009\u00007"”}}>

0 Likes

(Jan Doberstein) #11

you have defined a float and try to input a string - according to that message.

0 Likes

(Steve Applegate) #12

Yes, this has always been the problem. I have a log file with data in it like this:

NBUDiskPool,Quantum,dxi6802-01-dp,72191.94,38209.15,47
NBUDiskPool,Quantum,gpdxi31-dp,1791.92,303.69,83
NBUDiskPool,Quantum,dxi6802-02-dp,36095.97,19442.09,46
NBUDiskPool,Quantum,dxi6802-01-DR-dp,72191.94,38209.15,47

I ship it to Graylog as Raw/Plaintext. I want the first three fields stored as text, and the last three fields stored as numbers. It seems on the surface that this would be an easy thing to do. I’ve tried using GROK extractors, and I’ve tried using Pipeline rules, I’ve tried custom mappings in ElasticSearch. Still no luck. I just can’t figure out why something like 72191.94 can’t be stored as a number

0 Likes

(Jan Doberstein) #13

please read on https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

0 Likes

(Steve Applegate) #14

Thanks Jan. I’ve read that document several times, and as far as I can tell I’m doing things correctly. I read in a string from a raw/Plaintext input. Here’s an example:

`NBUDiskPool,PureDisk,dp_disk_nbu5230-03,80994,12072,85`

I split the line into new fields using a pipeline rule. At this point, the values are strings. If I convert them to numbers in the rule, the values become zeros (see initial problem description) Here’s the output from the Graylog Simulator for that Pipeline:

image

I have created a custom mapping file for Elasticsearch as follows:

{
  "template": "graylog_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "DP_Capacity" : {
          "type" : "long"
        },
        "DP_FreeSpace" : {
          "type" : "long"
        },
        "DP_PercentUsed" : {
          "type" : "long"
        }
      }
    }
  }
}

So as it stands, if I try to store the string representation of a number like 80994 into a numeric field in Elasticsearch, I get a datatype mismatch error. If I try to convert the string to a number in Graylog first, and then store it in Elasticsearch, the value is always zero. Am I missing a step somewhere? I’ve read every document that I can find on the subject from both Graylog and Elasticsearch.

0 Likes

(Steve Applegate) #15

I’ve tried every possible method of splitting a log entry from a RAW/Plaintext UDP into fields, and then having one of the fields stored as a number. I have created a custom mapping and tried all Elasticsearch number types. I’ve tried Pipeline rules with and without datatype conversions. I’ve tried GROK extractors on the Input. I’ve tried Split & Index extractors on the Input. Nothing works. At this point, I feel fairly confident in saying that it can’t be done. Should I start a new, more precise forum topic about the inability to do this, or should I submit the issue as a bug report?

0 Likes

(Jan Doberstein) #16

you might hold it wrong - actually me and others are doing what you described as not working on a daily base …

Sorry to see you struggle with this basic task and I wish I could understand where you have taken the wrong turn to correct your way.

0 Likes

(Steve Applegate) #17

Thanks for your reply Jan. I agree, that this should be a very basic task. So simple, that I really shouldn’t have to do anything. Graylog should know that 71292 is a number when it comes in on an Input. Here’s another (simplified) angle on my problem:

Here’s a data example:

NBUDiskPool PureDisk dp_disk_nbu5230-03 80835 32257 60

This GROK pattern works in the GROK Debugger

%{DATA:NetBackup} %{DATA:DP_Type} %{DATA:DP_PoolName} %{NUMBER:DP_Capacity}

When I put that pattern into an extractor, I get this error:

image

If I remove %{NUMBER:DP_Capacity} from the extractor, the rest of the extractor works fine, because everything else is a string. If I change %{NUMBER:DP_Capacity} to %{DATA:DP_Capacity}, the extractor works fine and gets the additional field, but again… it’s a string

0 Likes

(Jan Doberstein) #18

I just took your example data and used it in the new Grok Tester that is given in 3.0 - The only missing option is "used named captures only) that is why you have the “BASENUM” field for example.

But anyway you see that the values are extracted correct.

The second part of the story is saving the data in elasticsearch. So we first look into the documentation of Elasticsearch:

https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

I do not know what value those number can reach so so I would considure long as the choice. If you have a fresh index without the fields beeing present Elasticsearch should recognize the type and set the numbers as numbers - but when giving a mapping that will look like:

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "_doc": {
      "properties": {
        "DP_PoolName": {
          "type": "string"
        },
        "DP_Type": {
          "type": "string"
        },
        "NetBackup": {
          "type": "string"
        },
        "DP_Capacity": {
          "type": "long"
        },
        "secondnum": {
          "type": "long"
        },
        "thirdnum": {
          "type": "long"
        }
      }
    }
  }
}
'

You have no need of setting the type in a processing pipeline or similar - this is only needed if you want to do anything with the value in the processing pipeline, because it does not have type awareness you need to set it.

I just tried your case and it is working for me.

0 Likes

(Steve Applegate) #19

Thank you for the examples Jan. I tried using the GROK pattern that you show, but it does not work for me. If I start with %{WORD:NetBackup}, it only extracts the letter “N” (see first image). If I put a space after %{WORD:NetBackup}, then I get an error (see second image). I checked my installed GROK patterns, and “WORD” is listed as being available (See third image). This is the same behavior I see if I try a NUMBER pattern.

image

0 Likes

(Jan Doberstein) #20
  • what Version of Graylog did you use?
  • could it be possible that the messages contain unprintable characters?
0 Likes