I’m trying to create a pipeline rule to split a message containing comma seperated lines into individual fields. The first 3 fields are strings and work fine. The remaining three fields are converted to numbers, but always have zero as the value. Help please!
Pipeline Rule:
rule “Split Disk Pool Usage”
when
has_field(“NetBackup”)
then
let x = split(",", to_string($message.message));
set_field(“PoolType”, x[1]);
set_field(“PoolName”, x[2]);
set_field(“PoolCapacity”, to_double(x[3]));
set_field(“PoolFreeSpace”, to_double(x[4]));
set_field(“PoolUsed”, to_long(x[5]));
end
example message:
NBUDiskPool,PureDisk,dp_disk_nbu5230-03,80964.26,15791.01,80
When I removed the number conversion from the rule, the rule failed. I’m assuming that was because those three fields had already been created as being numeric (with zero as every value). To test my theory, I renamed the fields, and the rule immediately started working and storing what looks like the correct numbers in the new fields. However, when I try to create a chart using one of the fields, the number of field occurances gets charted instead of the field value. That’s my root problem. To me, this seems to be caused by the field being stored as a string instead of as a number. Just a guess though, because I don’t know how to check the field data type. If I’m correct in my guess, I have no idea why something like 15791.01 would be interpreted as a string.
I’ve removed the data type conversion from the rule
rule "Split NBUDiskPool message into fields"
when
contains(to_string($message.message), "NBUDiskPool", true)
then
let x = split(",", to_string($message.message));
set_field("DP_Type", x[1]);
set_field("DP_Name", x[2]);
set_field("DP_Capacity", x[3]);
set_field("DP_FreeSpace", x[4]);
set_field("DP_PercentUsed", x[5]);
end
I’ve created the custom mapping. Due to the newer version of ElasticSearch, I had to add -H’Content-Type: application/json’ to the end of the line, so it looks like this:
I rotated the index and recalculated the index ranges. I can see that ElasticSearch is using the new data types. I then ran the “simulation” on my pipeline rule, and I can see the message field successfully being broken out into individual fields
Unfortunately, these new fields are not being added to ElasticSearch. Only the message field is.
I don’t see any new fields in the Search page. The “NetBackup” field gets created by an Extractor on the Input. The rest of the fields should be created by the Pipeline rule
and did you connect the processing pipeline you put the rule in to any stream of messages? And what is your processing order in System > configuration?
When data comes into the Input, an extractor pulls off the first item of the line and puts it in a field called “NetBackup”. There is then a stream that grabs everything that has a “NetBackup” field. This stream is connected to my pipeline that contains the rule that splits the message into fields. When I generate a new log entry (I use NXLog), I can watch the Rule for my pipeline stage and see the message go through.
New info! Just found these entries in the graylog server.log file. I can see that Elasticsearch is trying to use the data type that I defined in the custom mapping file. It’s failing though
2019-02-13T13:02:53.013-05:00 WARN [Messages] Failed to index message: index=<graylog_1515> id=<94f61040-2fb9-11e9-aa0b-0050569313ff> error=<{“type”:“mapper_parsing_exception”,“reason”:“failed to parse field [DP_Capacity] of type [float]”,“caused_by”:{“type”:“number_format_exception”,“reason”:"For input string: “3\u00006\u00000\u00009\u00005\u0000.\u00009\u00007"”}}>
I ship it to Graylog as Raw/Plaintext. I want the first three fields stored as text, and the last three fields stored as numbers. It seems on the surface that this would be an easy thing to do. I’ve tried using GROK extractors, and I’ve tried using Pipeline rules, I’ve tried custom mappings in ElasticSearch. Still no luck. I just can’t figure out why something like 72191.94 can’t be stored as a number
Thanks Jan. I’ve read that document several times, and as far as I can tell I’m doing things correctly. I read in a string from a raw/Plaintext input. Here’s an example:
I split the line into new fields using a pipeline rule. At this point, the values are strings. If I convert them to numbers in the rule, the values become zeros (see initial problem description) Here’s the output from the Graylog Simulator for that Pipeline:
I have created a custom mapping file for Elasticsearch as follows:
So as it stands, if I try to store the string representation of a number like 80994 into a numeric field in Elasticsearch, I get a datatype mismatch error. If I try to convert the string to a number in Graylog first, and then store it in Elasticsearch, the value is always zero. Am I missing a step somewhere? I’ve read every document that I can find on the subject from both Graylog and Elasticsearch.
I’ve tried every possible method of splitting a log entry from a RAW/Plaintext UDP into fields, and then having one of the fields stored as a number. I have created a custom mapping and tried all Elasticsearch number types. I’ve tried Pipeline rules with and without datatype conversions. I’ve tried GROK extractors on the Input. I’ve tried Split & Index extractors on the Input. Nothing works. At this point, I feel fairly confident in saying that it can’t be done. Should I start a new, more precise forum topic about the inability to do this, or should I submit the issue as a bug report?
Thanks for your reply Jan. I agree, that this should be a very basic task. So simple, that I really shouldn’t have to do anything. Graylog should know that 71292 is a number when it comes in on an Input. Here’s another (simplified) angle on my problem:
When I put that pattern into an extractor, I get this error:
If I remove %{NUMBER:DP_Capacity} from the extractor, the rest of the extractor works fine, because everything else is a string. If I change %{NUMBER:DP_Capacity} to %{DATA:DP_Capacity}, the extractor works fine and gets the additional field, but again… it’s a string
I just took your example data and used it in the new Grok Tester that is given in 3.0 - The only missing option is "used named captures only) that is why you have the “BASENUM” field for example.
But anyway you see that the values are extracted correct.
I do not know what value those number can reach so so I would considure long as the choice. If you have a fresh index without the fields beeing present Elasticsearch should recognize the type and set the numbers as numbers - but when giving a mapping that will look like:
You have no need of setting the type in a processing pipeline or similar - this is only needed if you want to do anything with the value in the processing pipeline, because it does not have type awareness you need to set it.
Thank you for the examples Jan. I tried using the GROK pattern that you show, but it does not work for me. If I start with %{WORD:NetBackup}, it only extracts the letter “N” (see first image). If I put a space after %{WORD:NetBackup}, then I get an error (see second image). I checked my installed GROK patterns, and “WORD” is listed as being available (See third image). This is the same behavior I see if I try a NUMBER pattern.