Pipeline rule to pull specific key/value pairs from URI

Question: Is there a way to write a clean Pipeline rule to extract parameters and values from a URI and store them in their own field?

Background

I am implementing Graylog, in part, to monitor performance on a web application that I host. To expose certain functionality and metrics I’ll need to extract some Key/Value pairs from GET requests and store them as fields. The GET strings are already coming into Graylog so it’s just a matter of parsing. I’m using Pipeline rules because I have many applications using the same (Filebeat) input, so I have a little Pipeline logic to sort them out first.

The first solution: Extract them all.

At first I assumed I could just extract ALL parameters from all requests, each to their own field, but I quickly discovered that Elasticsearch has a limited number of fields. In retrospect, this makes a lot of sense. So now I need a manageable way to specify which parameters to pull.

This is the original Pipeline rule that I used to extract all URI Parameters and Values, each into their own ElasticSearch field:

    rule "KV from HTTP GET"
when
    has_field("http_request")
    && regex("^GET.*\\?.*\\s", to_string($message.http_request)).matches == true
    
then
  //Extract the GET PARAMETER part of the string by itself
  let get_string = regex("^GET.*\\?(.*)\\s", to_string($message.http_request));
  let get_params = to_string(get_string["0"]);

  let get_map = key_value(
  value: get_params,
  delimiters: "&",
  kv_delimiters: "=",
  ignore_empty_values: true,
  allow_dup_keys: true,
  handle_dup_keys: "take_first",
  trim_key_chars: ""
);
  set_fields(fields: get_map, prefix: "http_param_");
end

It worked great. If Graylog received a log of GET /index.php?id=1234&abcd=xyz, this rulewould log the parameters in their own fields:

  • id: 1234
  • abcd: xyz

Which made things easy for me to run queries and stats on just about any part of the application. Until some vulnerability scanners came through and filled my application with junk parameters and values. Elasticsearch started rejecting new fields because it hit the limit of 1,000. Which was way more than I wanted anyway.

Lesson learned: Don’t trust user input. So now I need to pick and choose the GET parameters/values to extract to separate fields. I have maybe 20 that I actually need.

What I have been trying next

While considering a way to accomplish this, my hope was to keep the get_map function the same, but then copy only the keys/values that I define in a list, into a new variable and use that new map with set_fields. That way, it would ignore all the other fields that I don’t care about.

So, Is there a function that will let me copy items from a map with one line? Or another clean way to copy only certain parameters/values from a URI string?

On further research, it appears that there are no functions available to manipulate Map data. In the absence of that functionality, I have taken a messier approach as a workaround, and made it as logical as possible. I am still open to suggestions if anyone sees a better way.

Here’s the process I’m following now:

  1. Ingest the full URL parameters such as this. It contains some data I want to extract, and some I don’t care about.
    sfid=3555121&studentid=RS09122315Q01&eventid=990197&lang=0&multi=1&dt=32123153
  2. Run a regex pattern match for a parameter I want. NOTE: This tripped me up for a long time – The regex match MUST be in parenthesis for Graylog to return a result. It seems to only operate on groups and does not return a result without a matching group:
    let re = "(memberid=[0-9]+)";
    let param = regex(re, request_params);
  3. Concatenate that match into a new string containing only parameters that have matched my rules:
    let params_keep = concat(params_keep, to_string(param["0"]) + "&");
  4. After repeating this for several different parameters, I then use the key_value function like I was using in my earlier attempts. Except now I’m operating on the params_keep string, which only contains the parameters that I specifically extracted.
  5. Also like my earlier solution, I use the output from key_value to populate the fields I wanted to keep. The field names are prepended with http_param_ so I know where they came from, and to reduce the risk of a naming collision.
    let param_map = key_value(value: params_keep, delimiters: "&", kv_delimiters: "=", ignore_empty_values: true, allow_dup_keys: true, handle_dup_keys: "take_first", trim_key_chars: "");

So in total, below is the pipeline rule I used. I wish there was a cleaner way to do it, but I hope this can help someone else to parse URLs into fields they want to keep.

rule "Limited KV from HTTP Requests"
when
    has_field("http_request")
    && regex("^[A-Z]+.*\\?.*\\s", to_string($message.http_request)).matches == true
    
then
  //Extract the URI part of the string by itself
  let request_string = regex("^[A-Z]+.*\\?(.*)\\s", to_string($message.http_request));
  let request_params = to_string(request_string["0"]);
  
  //Store our list of desired parameters first in a concatenated string
  let params_keep = to_string("");
  
  //
  // Extract specific fields
  
  //Extract memberid
  let re = "(memberid=[0-9]+)";
  let param = regex(re, request_params);
  let params_keep = concat(params_keep, to_string(param["0"]) + "&");
  
  //Extract studentid
  let re = "(studentid=[A-Z0-9]+)";
  let param = regex(re, request_params);
  let params_keep = concat(params_keep, to_string(param["0"]) + "&");

  //Extract sfid
  let re = "(sfid=[0-9]+)";
  let param = regex(re, request_params);
  let params_keep = concat(params_keep, to_string(param["0"]) + "&");
  
  //Extract eventid
  let re = "(eventid=[0-9]+)";
  let param = regex(re, request_params);
  let params_keep = concat(params_keep, to_string(param["0"]) + "&");
  
  //Extract type
  let re = "(type=[^\\s&]+)";
  let param = regex(re, request_params);
  let params_keep = concat(params_keep, to_string(param["0"]) + "&");
  

  //
  //Process the extracted Key/Value pairs into a map
  let param_map = key_value(
  value: params_keep,
  delimiters: "&",
  kv_delimiters: "=",
  ignore_empty_values: true,
  allow_dup_keys: true,
  handle_dup_keys: "take_first",
  trim_key_chars: ""
);

  //Set fields based on our Key/Value Map.  Add a prefix of "http_param_" to the 
  //field name so we can know where the value came from and avoid collisions.
  set_fields(fields: param_map, prefix: "http_param_");
  
end

The benefit of this format, for my purpose at least, is that I can easily add to the list of fields that I want to extract. I simply need to copy/paste 3 lines and update the regex. It would be better if a map_extract function were available. That would reduce this whole thing down to just a few lines. But we’ll work with what we’ve got here and for now this does the trick.

2 Likes

Thanks for sharing :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.