Question: Is there a way to write a clean Pipeline rule to extract parameters and values from a URI and store them in their own field?
Background
I am implementing Graylog, in part, to monitor performance on a web application that I host. To expose certain functionality and metrics I’ll need to extract some Key/Value pairs from GET requests and store them as fields. The GET strings are already coming into Graylog so it’s just a matter of parsing. I’m using Pipeline rules because I have many applications using the same (Filebeat) input, so I have a little Pipeline logic to sort them out first.
The first solution: Extract them all.
At first I assumed I could just extract ALL parameters from all requests, each to their own field, but I quickly discovered that Elasticsearch has a limited number of fields. In retrospect, this makes a lot of sense. So now I need a manageable way to specify which parameters to pull.
This is the original Pipeline rule that I used to extract all URI Parameters and Values, each into their own ElasticSearch field:
rule "KV from HTTP GET"
when
has_field("http_request")
&& regex("^GET.*\\?.*\\s", to_string($message.http_request)).matches == true
then
//Extract the GET PARAMETER part of the string by itself
let get_string = regex("^GET.*\\?(.*)\\s", to_string($message.http_request));
let get_params = to_string(get_string["0"]);
let get_map = key_value(
value: get_params,
delimiters: "&",
kv_delimiters: "=",
ignore_empty_values: true,
allow_dup_keys: true,
handle_dup_keys: "take_first",
trim_key_chars: ""
);
set_fields(fields: get_map, prefix: "http_param_");
end
It worked great. If Graylog received a log of GET /index.php?id=1234&abcd=xyz
, this rulewould log the parameters in their own fields:
-
id:
1234
-
abcd:
xyz
Which made things easy for me to run queries and stats on just about any part of the application. Until some vulnerability scanners came through and filled my application with junk parameters and values. Elasticsearch started rejecting new fields because it hit the limit of 1,000. Which was way more than I wanted anyway.
Lesson learned: Don’t trust user input. So now I need to pick and choose the GET parameters/values to extract to separate fields. I have maybe 20 that I actually need.
What I have been trying next
While considering a way to accomplish this, my hope was to keep the get_map
function the same, but then copy only the keys/values that I define in a list, into a new variable and use that new map with set_fields
. That way, it would ignore all the other fields that I don’t care about.
So, Is there a function that will let me copy items from a map with one line? Or another clean way to copy only certain parameters/values from a URI string?