Extracting XML fields

hi,

I am trimming my extractors. I found the following regex extractor for Windows XML log format to consume a bit of resources (due to backtracking, I guess):

regex_value: <Message>(.+?)</Message>

This one seems to sometime take quite a long time (like 2 seconds) to finish. What would be the most efficient way to extract the contents of the Message field. I did not find an XML extractor in Graylog, that would just grab everything within the tags, so I guess some other extractor type or regex expression would be better.

I am not sure if the log content had the < or > characters within these tags, so I am a bit reluctant to use the regex:

regex_value: <Message>([^<>]*)</Message>

To be honest, the most efficient way to process XML would be to do it before the message is being sent to Graylog.

Regular expressions are a rather abysmal way of “parsing” XML and require way more resources than necessary.

For a humorous take on why it’s a bad idea, see this StackOverflow answer:
http://stackoverflow.com/a/1732454

May I ask why you’re processing raw windows xml logs on graylog instead of using winlogbeats? it seems like it parses stuff on client side and sends easy to chew data to graylog

hi,

I use nxlog and the module xm_xml does not process nested structures (the interesting field is within RenderingInfo field.

I have never used winlogbeats and do not know how to use it. Looking at the configuration instructions, I don’t find the way to configure the log file (I cannot use the standard event interface for this log source due to limitations on Windows event logging).

After some testing I concluded I can exclude < and > in this regex, thus it should be sufficiently fast…

Nevertheless, a simple extractor for a single field within some text here brackets, where “Field” is configurable - extractor could be widely applicable.

Hi jtkarvo

Can you share your nxlog.conf file? It would be very helpful for me!

Thank :slight_smile:

I doubt that. :wink: What do you intend to do?

I’m trying to parse an XML file with NXLog and to send it into GrayLog. But My XML file is complex (it’s a Nessus Report) and I was thinking your nxlog.conf file would be helpful for me to understand how to parse correctly an XML file :slight_smile:

hi,

I don’t parse the XML file in NXlog, I just use xm_multiline for making sure each event is sent to graylog as one message. Then I use extractors in Graylog to grab just a couple of interesting fields.

In my case, it is not a good idea to parse on the log source, due to performance concerns.