Parsing Docker Registry Log Entries

(Charles Wise) #1

I’m new to Graylog and I’ve been trying to process Docker Registry log lines. I’ve got something that works but I’d like input from the experienced users as to whether there’s a better way. The Registry outputs log lines that are both Apache-like access lines and Go-style application log lines. I’m using fluentd to capture the output from the container and add information about the container process. I’m also setting ‘document_type’ to ‘docker-registry’ to distinguish it from other input sources.

Here’s an example that shows the two line types:

time="2017-12-22T13:11:21Z" level=info msg="response completed" go.version=go1.7.6 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.method=PUT http.request.remoteaddr="" http.request.uri="/v2/ms-harvest-merge/manifests/v40" http.request.useragent="docker/17.09.1-ce go/go1.8.3 git-commit/19e2cf6 kernel/4.9.49-moby os/linux arch/amd64 UpstreamClient(Docker-Client/17.09.1-ce \\(darwin\\))" http.response.duration=181.200608ms http.response.status=201 http.response.written=0 version=v2.6.2 - - [22/Dec/2017:13:11:21 +0000] "PUT /v2/ms-harvest-merge/manifests/v40 HTTP/1.1" 201 0 "" "docker/17.09.1-ce go/go1.8.3 git-commit/19e2cf6 kernel/4.9.49-moby os/linux arch/amd64 UpstreamClient(Docker-Client/17.09.1-ce \\(darwin\\))"

I’m using two separate rules to parse these out:

rule "Parse Docker Registry Access Log Lines"
    has_field("document_type") && to_string($message.document_type) == "docker-registry" && regex("^(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])", to_string($message.message)).matches == true
    let message_field = to_string($message.message);
    let parsed_fields = grok(pattern: "%{IPORHOST:client_ip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:apache_timestamp}\\] \"(?:%{WORD:method} /%{NOTSPACE:request_page}(?: HTTP/%{NUMBER:http_version})?|%{DATA:rawrequest})\" %{NUMBER:server_response} (?:%{NUMBER:bytes}|-)", value: message_field);
    set_field("type", "access_log");

This rule is pretty straight-forward. It filters using the document_type='docker-registry' and checks that the line begins with an ip-address. If it does, it just groks the line, and marks it as type ‘access_log’.

It’s the next rule that gets nasty:

rule "Parse Docker Registry Key Value Lines"
    has_field("document_type") && to_string($message.document_type) == "docker-registry" && regex("^time=", to_string($message.message)).matches == true
    let message_field = to_string($message.message);
    let parsed_fields = key_value(message_field, " ", "=", true, false, "take_last", "\"", "\"");
    set_field("original_message", message_field);
    let msg = regex(".*?msg=\"(.*?)\"", message_field, ["message"]);
    let useragent = regex(".*?http.request.useragent=\"(.*?)\"", message_field, ["http_request_useragent"]);
    rename_field("level", "loglevel");
    set_field("loglevel", uppercase(to_string($message.loglevel)));
    rename_field("time", "tx_timestamp");

The ‘when’ clause is simple, it filters by document_type = 'docker-registry' and makes sure the line begins with time=.

Then we bust the line apart using the key_value function. But the function doesn’t support quoted entries. So we strip out quotes and set the message fields to the results. Side note: I wish I could add a prefix to the keys found.

That’s simple enough, but there are two keys that routinely have spaces in them, the crucial msg field and the http.request.useragent field. So we regex both of those out and set them explicitly. Then we cleanup the level by uppercasing it and renaming it to loglevel.

Any suggestions for making this better?

(system) closed #2

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.