I’m new to Graylog and I’ve been trying to process Docker Registry log lines. I’ve got something that works but I’d like input from the experienced users as to whether there’s a better way. The Registry outputs log lines that are both Apache-like access lines and Go-style application log lines. I’m using fluentd to capture the output from the container and add information about the container process. I’m also setting ‘document_type’ to ‘docker-registry’ to distinguish it from other input sources.
Here’s an example that shows the two line types:
time="2017-12-22T13:11:21Z" level=info msg="response completed" go.version=go1.7.6 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.host=hip-dockerrh01.truehealthdiag.com http.request.id=755eaa21-887d-49a9-8011-a245d19acc7f http.request.method=PUT http.request.remoteaddr="10.213.213.189:50244" http.request.uri="/v2/ms-harvest-merge/manifests/v40" http.request.useragent="docker/17.09.1-ce go/go1.8.3 git-commit/19e2cf6 kernel/4.9.49-moby os/linux arch/amd64 UpstreamClient(Docker-Client/17.09.1-ce \\(darwin\\))" http.response.duration=181.200608ms http.response.status=201 http.response.written=0 instance.id=b2d432f3-0f86-4c7d-ad52-bea42df52fe1 version=v2.6.2
10.213.213.189 - - [22/Dec/2017:13:11:21 +0000] "PUT /v2/ms-harvest-merge/manifests/v40 HTTP/1.1" 201 0 "" "docker/17.09.1-ce go/go1.8.3 git-commit/19e2cf6 kernel/4.9.49-moby os/linux arch/amd64 UpstreamClient(Docker-Client/17.09.1-ce \\(darwin\\))"
I’m using two separate rules to parse these out:
rule "Parse Docker Registry Access Log Lines"
when
has_field("document_type") && to_string($message.document_type) == "docker-registry" && regex("^(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])", to_string($message.message)).matches == true
then
let message_field = to_string($message.message);
let parsed_fields = grok(pattern: "%{IPORHOST:client_ip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:apache_timestamp}\\] \"(?:%{WORD:method} /%{NOTSPACE:request_page}(?: HTTP/%{NUMBER:http_version})?|%{DATA:rawrequest})\" %{NUMBER:server_response} (?:%{NUMBER:bytes}|-)", value: message_field);
set_fields(parsed_fields);
set_field("type", "access_log");
end
This rule is pretty straight-forward. It filters using the document_type='docker-registry'
and checks that the line begins with an ip-address. If it does, it just groks the line, and marks it as type ‘access_log’.
It’s the next rule that gets nasty:
rule "Parse Docker Registry Key Value Lines"
when
has_field("document_type") && to_string($message.document_type) == "docker-registry" && regex("^time=", to_string($message.message)).matches == true
then
let message_field = to_string($message.message);
let parsed_fields = key_value(message_field, " ", "=", true, false, "take_last", "\"", "\"");
set_fields(parsed_fields);
set_field("original_message", message_field);
let msg = regex(".*?msg=\"(.*?)\"", message_field, ["message"]);
set_fields(msg);
let useragent = regex(".*?http.request.useragent=\"(.*?)\"", message_field, ["http_request_useragent"]);
set_fields(useragent);
rename_field("level", "loglevel");
set_field("loglevel", uppercase(to_string($message.loglevel)));
rename_field("time", "tx_timestamp");
remove_field("msg");
end
The ‘when’ clause is simple, it filters by document_type = 'docker-registry'
and makes sure the line begins with time=
.
Then we bust the line apart using the key_value
function. But the function doesn’t support quoted entries. So we strip out quotes and set the message fields to the results. Side note: I wish I could add a prefix to the keys found.
That’s simple enough, but there are two keys that routinely have spaces in them, the crucial msg
field and the http.request.useragent
field. So we regex both of those out and set them explicitly. Then we cleanup the level by uppercasing it and renaming it to loglevel
.
Any suggestions for making this better?