Ideas for GROK pattern, problem: URL containing separators

I’ve got a Graylog system ingesting Palo Alto Firewall logs, including threat logs. For details on the format for PANOS 8.1, see https://docs.paloaltonetworks.com/pan-os/8-1/pan-os-admin/monitoring/use-syslog-for-monitoring/syslog-field-descriptions/threat-log-fields. The fields are basically all nicely separated by commas, making them quite easy to extract using an GROK pattern, even if it turns out to be quite elaborate. I’m running into a problem with the “URL/Filename” field, though, as it contains commas. Since the field is enclosed in double quotes, I’ve already extended the GROK pattern to use those, helping the extractor to work correctly most of the time. However, it fails on Facebook requests, which look as follows (all session data from real request replaced with random data):

www.facebook.com/ajax/pagelet/generic.php/LitestandTailLoadPagelet?fb_dtsg_ag=TOtgPxTtsjWfXm3ZCNe-YaH3NezsvalKvVwnd45gYJRlnmf:m1QhALXrMcjLT9DdiMqW5d_Os3eJaxnxEwy7ZOMrdQPeXO&ajaxpipe=1&ajaxpipe_token=QdinG4CekTqAg6Nf&no_script_path=1&data={"“client_stories_count”":2,"“cursor”":"“N2ahp70IeQf32nZaTkgMlCLXvXQsairyrnFtPtKCyMhtJmAFt9oEPwDDgVINwHquNvP2CZqlCMN6Uli2dLw8LPQp”","“pager_config”":""{/"“section_id/”":/"“773823267077810/”",/"“stream_id/”":null,/"“section_type/”":1,/"“most_recent/”":false,/"“ranking_model/”":null,/"“query_context/”":{/"“head_request_time/”":6689969570},/”“use_new_feed/”":true,/"“sequence/”":null}"","“scroll_count”":1,"“story_id”":null,"“client_query_id”":"“2296c0bb-3f33-12a7-e900-86027803efe9"”}&__user=2331088392&__a=1&dyn=wPEGM-NJ5inuWanyRyw43mpSuBn73UN94IRS9qEyDlifBxRhJqEuiHHvUzZPGqBnjAG8g95JyLLa6KKXTe8v7WBwq1eA1714G-YwMJHBndnQajzGABNDmrz8MhY5hDDtgE9f0vSDlMictPWn5J2UeFvhA92MZCsFL_9Y68DQtamO4Sw075gw5lLnumcbL-OhbkdjFp5RUNOxYAAYeB8cvVFxsnhac9C7kC9tF10xzY4bhYejKjH6Suc3QbmEBVQlpWuYH2S9uZtfPtK-PdkZWo65jQOce6djJhzLVH7NQ1Lbn7xu17&"

My GROK pattern: ,(?:"%{DATA:pafw_url_or_filename}")?, extracts the following into pafw_url_or_filename:

www.facebook.com/ajax/pagelet/generic.php/LitestandTailLoadPagelet?fb_dtsg_ag=TOtgPxTtsjWfXm3ZCNe-YaH3NezsvalKvVwnd45gYJRlnmf:m1QhALXrMcjLT9DdiMqW5d_Os3eJaxnxEwy7ZOMrdQPeXO&ajaxpipe=1&ajaxpipe_token=QdinG4CekTqAg6Nf&no_script_path=1&data={"“client_stories_count”":2,"“cursor”":"“N2ahp70IeQf32nZaTkgMlCLXvXQsairyrnFtPtKCyMhtJmAFt9oEPwDDgVINwHquNvP2CZqlCMN6Uli2dLw8LPQp”

I.e. it (correctly) stops when it hits the character sequence ‘",’ after ‘PQp"’. I would like to have it go on until it reaches the real end of the URL. Does anyone here have an idea how to do that?

I have noticed that Palo Alto appear to use double double quotes within the URL and a single double quote character only at its end, so I’d need something like the following to define the end of the GROK pattern: “a double quote character, followed by a comma, but not preceded by a double quote character”. Or: .

Is there a way to express that in GROK?

Many thanks in advance and cheers,
Tobias

PS: Full message and extractor available on request.

my personal advice:

Use the Plugin for Palo Alto provided with the Integrations Plugin ( https://docs.graylog.org/en/3.1/pages/integrations.html )

That will remove the need of basic parsing because the structure is known.

Jan

Thanks for the suggestion. I’ll give that a try. I wonder if the integration will also support PANOS 9.0, as I’ve got another Firewall cluster with that OS release.

And, in an attempt to answer my own question, I’ll give the pattern [^"]", a go, to indicate " followed by and ". I’ll see if that works, too.

The Palo Alto Networks input from the integrations package is causing heap memory exhaustion errors, first in Elasticsearch, then in Graylog. I’ve gone back to the syslog input and the GROK extractor for the time being…

The [^"]", pattern looks rather promising, though. If I add it to the GROK pattern, that then looking like this: ‘,(?:"%{DATA:pafw_url_or_filename}[^"]")?,’, it does successfully grab the entire URL (except for its last character, as that matches the ‘[^"]’, which is outside the assignment clause) and continues to assign the following fields correctly. Those had been falsely assigned elements of the URL previously.

Now I’ll try to get the expression ‘[^"]’ into a new data type, so I can use that instead of ‘DATA’ and get the entire URL into the field assignment.

A bit of a setback: Keeping [^"] in the GROK pattern caused Graylog’d processing to lock up after a while. I have now re-enabled the Palo Alto input. Since that caused Graylog to cease all processing when I was sending logs from all Palo Alto devices to this input, I reconfigured my setup to have only the firewall cluster with PANOS 8.1 send their logs to this input, while keeping the other cluster, that is already on PANOS 9.0, on the syslog input.

Forget that: I’m getting errors again, with the output queue growing to maximum size and Graylog subsequently running out of heap space…

I’m not sure that the Palo Alto integration actually works that well… I’m reconfiguring my rsyslog again…

@tobiasreckhard

the Palo Alto input reported working. You are the first person mention that issues are given. So please provide some of the errors you actually see. I guess it is related to elasticsearch field mapping or similar.

@jan I’m not exactly sure what you mean with your first sentence. The Palo Alto input did and does not complain, if that’s what you mean. I have renamed the field names in the CSVs to match my own, and when I’d been missing two commas in them, the input did complain and refuse to start until I’d fixed the CSVs.

Since I’ve got PANOS 8.1 and PANOS 9.0 systems feeding into Graylog, I had added the extra fields that PANOS 9.0 has over 8.1 to the CSVs. I have removed those again, so they look the way they were originally, but with different field names.

I have now reconfigured my rsyslog to send the 8.1 format logs to the Palo Alto input and those in 9.0 format to the standard syslog/TCP input, the latter having GROK extractors (written by myself) for Palo Alto traffic, threat, system and correlation logs. I had made one change to the threat extractor compared to yesterday, so that it would keep ‘,“web-based-email,low-risk”“news,low-risk”,’ in one field and not separate it into two – to do this, I changed the pattern from ‘,(?:%{DATA:pafw_url_category})?,’ to ‘,(?:"%{DATA:pafw_url_category}")?,’, which looked like it would do the trick in the extractor test.

However, Graylog stopped processing messages with this setup after a while (under an hour). When I restarted it, it processed messages for around three minutes before stopping its output again.

I have since been able to determine with some conviction that it is the PANOS 9.0 threat extractor that is causing the processing lockups. At least I deleted it almost two hours ago and the system has been running fine since then.

Neither Graylog’s nor Elasticsearch’s logs show any errors (Graylog subsystem logging levels set to info). I do, however, see that no more lines are added to /var/log/elasticsearch/graylog.log, whereas Elasticsearch normally takes no more than three minutes to add a further line to this log. But Elasticsearch is still running, and apparently completely normally.

So, currently:

  • PANOS 8.1 traffic, threat and system logs are processed successfully with the integration
  • PANOS 9.0 traffic, system and correlation logs are processed with my GROK extractors
  • PANOS 9.0 threat log fields are not being extracted, but stored as messages with unstructured data

I will now try to create a second Palo Alto input for the 9.0 logs, extending the CSVs again, but keeping it separate from the 8.1 input, so I don’t incur any side effects. I’ll report my findings here.

Still, I’m somewhat disappointed at how Graylog handles (or fails to handle) problems it’s having with GROK patterns. I don’t appreciate the silent lockup failure mode that much.

We have a few PAs here, running with 9.0.x and 8.1.x.
Due too the changed logging fields between both versions, you can’t mix PANOS 9.0.x and 8.1.x in one input. You need to separate them.
Edit: I did not need to change a lot between both versions. There are a few more fields in 9.0, which I had to add and I changed some field names for streamlining search results.

Update from today: my Graylog system failed to process messages yesterday at around 4PM, accumulating almost 6 million messages in its journal overnight. I had previously created a second “Palo Alto Networks” input for the PANOS 9.0 systems, adding the fields their traffic and threat logs have been extended by to the CSVs, and reconfiguring my rsyslog accordingly.

I tried just restarting Graylog thir morning, which did cause it to process a few thousand messages, but then it went to “0 out” again (BTW, “0 out” does actually happen every now and then in normal operation, but since new messages are steadily coming in or, as in this case, there are more than enough messages in the journal waiting to be processed, I practically never see “0 out” for more than a few seconds when the system is behaving normally).

Due to my fraught experiences with extractors in the past, I then took the rather radical step of deleting all extractors. Since then, the system has been chugging along fine, the journal has reached 4 million messages remaining and the indices have caught up until yesterday, 8:40 PM.

So: the Palo Alto Networks inputs appear to work fine. Extractors are, at least in my experience and regarding the GROK type, highly prone to errors in non-elementary messages, and have a tendency to fail silently and irrevocably. I’m heading over to pipelines and I’ll probably be exploring other forms of getting somewhat difficult data, such as Windows logs, into Graylog, so that it comes in as structured data already and doesn’t need GROK extractors.

he @shenke - did you mind sharing your PANOS 9.0.x field information?

That would allow us to make them part of the documentation. If you want, you could create a PullRequest to the docs yourself.

@shenke From my comparison between the 8.1 and the 9.0 fields, Palo Alto Networks have merely appended fields. These are:

Traffic Log:
65. UUID for rule
66. HTTP/2 Connection

Threat Log:
75. URL Category List
76. UUID for rule
77. HTTP/2 Connection

They made the following changes to the field names:

  • Source IP -> Source Address
  • Destination IP -> Destination Address

For the transition from 9.0 to 9.1, it appears to be similar. They have changed one field name in the documentation (“UUID for rule” became “Rule UUID”) and they have appended the following fields:

Traffic Log:

  • Link Change Count
  • Policy ID
  • Link Switches
  • SD-WAN Cluster
  • SD-WAN Device Type
  • SD-WAN Site
  • Dynamic User Group Name

Threat Log:

  • Dynamic User Group Name

I wonder if it really is necessary to feed Palo Alto Networks devices with different PANOS versions into separate inputs, as the description of the input says that it was tested with a few PANOS versions leading up to 8.1. I haven’t tried feeding my 8.1 systems into the 9.0 input yet, though, as I’m still busy recreating the bare minimum extractors I need for the dashboards with management attention. After that, I may give it a go, that’s a quick test, compared to the time I’ll need to wrap my head around pipelines and rules…

We had some problems with wrong mappings, so I decided to separate the inputs.

I’ll see, if I can create a PullRequest to expand the docs.

I also went for separate inputs, mainly to keep the changes I was making minimal and not to break the existing 8.1 input by extending it. The v8.1 systems that have been feeding into my Graylog are being taken out of service, as I have been told today, so the 8.1 input will probably be superfluous soon and I won’t be able to test if a Palo Alto Networks input geared for 9.0 fields will also handle 8.1 input.

To recap: After having removed all extractors, switching the processing of Palo Alto Networks messages to the dedicated Palo Alto Networks inputs, and reinstating only the bare minimum of hopefully well-behaved extractors (restricted to messages containing rather restrictive patterns) necessary for the currently active dashboards, the system has been running for a week without exhibiting the processing lockup problem. I have since had a technical video call session with Graylog and will be reinstating what I had been doing in extractors with pipelines, keeping clear of GROK patterns as far as possible. I hope that will prove more successful than my previous endeavours.