Changing a field type from string to numeric and mid-setup and past string values


(Bronius Motekaitis) #1

A custom script produces a json message and is logged into my new Graylog instance, but its numeric fields are being interpreted as String and are therefore treated and stored as String. Consequently, I cannot Chart against these values.

Ex. Message:

{"level": "6", "details": {"memorypercent": "0.4954", "load_01": "0.47", "load_05": "0.42", "load_15": "0.35", "disksda3percent": "0.64", "disksdc1percent": "0.64"}}

I fixed this by changing the source message, removing the quotes surrounding the numeric values. Ex. Message:

{"level": 6, "details": {"memorypercent": 0.4938, "load_01": 0.35, "load_05": 0.30, "load_15": 0.31, "disksda3percent": 0.64, "disksdc1percent": 0.64}}

Confirmed by reading a suggestion to narrow search to a time window showing only the recently logged items, even with only the now-numeric fields showing, I still get the “Could not create field graph
Field graphs are only available for numeric fields.” message. How can I get Graylog to recognize these fields as numeric?

Thanks
-Bronius


(Sachin) #2

Use Pipeline rules to convert them to to numeric using to_double function.


(Bronius Motekaitis) #3

Is this really necessary? If json presents the data as numeric, Graylog can’t just know it’s numeric? … and do I really have to do this for all json extracted fields I wish to be treated as numeric?

I did create a pipeline rule to convert using to_double as suggested, but it doesn’t seem to resolve my issue. Here’s the rule that I’ve applied to the All Messages stream against which I am querying to try to generate a chart (even though eventually I’d like it in the stream I’ve dedicated to this particular message type):

rule "convert string float to numeric"
when
    has_field("application_name") && $message.application_name == "sysmetrics"
then
//    debug(concat("original: ", to_string($message.details_load_01)));
    set_field("details_load_01", to_double($message.details_load_01));
//    debug(concat("converted: ", to_string(to_double($message.details_load_01))));
end

I have confirmed this pipeline rule is run by tail’ing /var/log/graylog-server/server.log where I see my debug messages.

Here’s the order of processors:

1	GeoIP Resolver	active
2	Message Filter Chain	active
3	Pipeline Processor	active

[edit]
Note: Generate Chart does work on a field if I create a new field based on the old like set_field("details_load_01_converted", to_double($message.details_load_01)); so it’s just not overwriting the original value. :frowning: I wonder what else I could try? And I wonder why the json parsed numeric is not automatically interpreted as such… Maybe I can delete the index and start over?
[/edit]


(Bronius Motekaitis) #4

Ok pipeline field rewrite rule is not necessary in my case, because indeed Graylog/Elasticsearch does treat a numeric json value as numeric on store, provided the original value seen is numeric to begin with, and I can do numeric operations on them like Generate Chart. I’ll have to rewrite all my json to not quote numeric fields if I want to most efficiently take advantage of this.

Following this old thread it was suggested to “roll the index” or “cycle the deflector.” I disabled my pipeline rule, and, when I didn’t see these exact StarTrek-like options in 2.3, on a gander I selected Rotate Active Index from the index Maintenance menu, limited my search to a short window after, and I am able to Generate Chart from the now numeric values.

Thanks!
-Bronius


(Jan Doberstein) #5

just as addition, you can force field type using a template, a custom mapping

http://docs.graylog.org/en/2.3/pages/configuration/elasticsearch.html#custom-index-mappings

That way you do not count on the way elasticsearch recognize the values and if the first message by accident has a string character the field would be - for that index as a string and not a number saved.


(Bronius Motekaitis) #6

Excellent tip - thanks @jan. I have at least one field coming from apache json that is a valid user id 90% of the time (Long) but sometimes “-” when not set by the application. Will Graylog crap out on “-” and not process the entire message, the field, or …?


(Jan Doberstein) #7

Hej @texas-bronius,

it will - as currently already - if the message content did not fit into the field message type. But if you have such known fields, you might want to check those fields/messages in a pipeline and remove those fields not matching the configured/wanted field type.

Just to have the messages save and clear.


(Bronius Motekaitis) #8

Sorry to be dense here (and I am also still looking online for clarification): If a field is known to be, say, Long, and message processor encounters a String like “-”, will the field be set to some empty value, will the field be discarded, or will the whole message get discarded? And when this happens, will there be a new error logged each time in graylog-server/server.log?

Thanks
-Bronius

[update]
I think I get it: If a rule has a failing statement (like trying to set_field(‘remoteIP’, to_ip($message.remoteIP)) when remoteIP is a string and not an IP Address Object), then the whole message is skipped. So to answer my question above, if to_long() were to fail, the whole message would be dropped. Except that with to_long(), the default is a 0. In my use case, since 0 is a valid user ID (“Anonymous user” in Drupal), I can’t have both “-” and “0” mean “Anonymous” but I can override the default for that call with, say, to_long($message.uid, -1) and I’m off to the races!
[/update]


(Jan Doberstein) #9

I’m glad that you find the answer before I was able to jump in - and shared it! You earn a double star for that!

thank you for the active work in the community!


(Jan Doberstein) #10

A post was split to a new topic: Change elasticsearch field mappings cause error in query


(system) #11

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.