I’ve got a Graylog system ingesting Palo Alto Firewall logs, including threat logs. For details on the format for PANOS 8.1, see https://docs.paloaltonetworks.com/pan-os/8-1/pan-os-admin/monitoring/use-syslog-for-monitoring/syslog-field-descriptions/threat-log-fields. The fields are basically all nicely separated by commas, making them quite easy to extract using an GROK pattern, even if it turns out to be quite elaborate. I’m running into a problem with the “URL/Filename” field, though, as it contains commas. Since the field is enclosed in double quotes, I’ve already extended the GROK pattern to use those, helping the extractor to work correctly most of the time. However, it fails on Facebook requests, which look as follows (all session data from real request replaced with random data):
“www.facebook.com/ajax/pagelet/generic.php/LitestandTailLoadPagelet?fb_dtsg_ag=TOtgPxTtsjWfXm3ZCNe-YaH3NezsvalKvVwnd45gYJRlnmf:m1QhALXrMcjLT9DdiMqW5d_Os3eJaxnxEwy7ZOMrdQPeXO&ajaxpipe=1&ajaxpipe_token=QdinG4CekTqAg6Nf&no_script_path=1&data={"“client_stories_count”":2,"“cursor”":"“N2ahp70IeQf32nZaTkgMlCLXvXQsairyrnFtPtKCyMhtJmAFt9oEPwDDgVINwHquNvP2CZqlCMN6Uli2dLw8LPQp”","“pager_config”":""{/"“section_id/”":/"“773823267077810/”",/"“stream_id/”":null,/"“section_type/”":1,/"“most_recent/”":false,/"“ranking_model/”":null,/"“query_context/”":{/"“head_request_time/”":6689969570},/”“use_new_feed/”":true,/"“sequence/”":null}"","“scroll_count”":1,"“story_id”":null,"“client_query_id”":"“2296c0bb-3f33-12a7-e900-86027803efe9"”}&__user=2331088392&__a=1&dyn=wPEGM-NJ5inuWanyRyw43mpSuBn73UN94IRS9qEyDlifBxRhJqEuiHHvUzZPGqBnjAG8g95JyLLa6KKXTe8v7WBwq1eA1714G-YwMJHBndnQajzGABNDmrz8MhY5hDDtgE9f0vSDlMictPWn5J2UeFvhA92MZCsFL_9Y68DQtamO4Sw075gw5lLnumcbL-OhbkdjFp5RUNOxYAAYeB8cvVFxsnhac9C7kC9tF10xzY4bhYejKjH6Suc3QbmEBVQlpWuYH2S9uZtfPtK-PdkZWo65jQOce6djJhzLVH7NQ1Lbn7xu17&"
My GROK pattern: ,(?:"%{DATA:pafw_url_or_filename}")?, extracts the following into pafw_url_or_filename:
www.facebook.com/ajax/pagelet/generic.php/LitestandTailLoadPagelet?fb_dtsg_ag=TOtgPxTtsjWfXm3ZCNe-YaH3NezsvalKvVwnd45gYJRlnmf:m1QhALXrMcjLT9DdiMqW5d_Os3eJaxnxEwy7ZOMrdQPeXO&ajaxpipe=1&ajaxpipe_token=QdinG4CekTqAg6Nf&no_script_path=1&data={"“client_stories_count”":2,"“cursor”":"“N2ahp70IeQf32nZaTkgMlCLXvXQsairyrnFtPtKCyMhtJmAFt9oEPwDDgVINwHquNvP2CZqlCMN6Uli2dLw8LPQp”
I.e. it (correctly) stops when it hits the character sequence ‘",’ after ‘PQp"’. I would like to have it go on until it reaches the real end of the URL. Does anyone here have an idea how to do that?
I have noticed that Palo Alto appear to use double double quotes within the URL and a single double quote character only at its end, so I’d need something like the following to define the end of the GROK pattern: “a double quote character, followed by a comma, but not preceded by a double quote character”. Or: .
Is there a way to express that in GROK?
Many thanks in advance and cheers,
Tobias
PS: Full message and extractor available on request.