Parsing Docker Registry Log Entries

ctwise · December 22, 2017, 1:26pm

I’m new to Graylog and I’ve been trying to process Docker Registry log lines. I’ve got something that works but I’d like input from the experienced users as to whether there’s a better way. The Registry outputs log lines that are both Apache-like access lines and Go-style application log lines. I’m using fluentd to capture the output from the container and add information about the container process. I’m also setting ‘document_type’ to ‘docker-registry’ to distinguish it from other input sources.

Here’s an example that shows the two line types:

time="2017-12-22T13:11:21Z" level=info msg="response completed" go.version=go1.7.6 http.request.contenttype="application/vnd.docker.distribution.manifest.v2+json" http.request.host=hip-dockerrh01.truehealthdiag.com http.request.id=755eaa21-887d-49a9-8011-a245d19acc7f http.request.method=PUT http.request.remoteaddr="10.213.213.189:50244" http.request.uri="/v2/ms-harvest-merge/manifests/v40" http.request.useragent="docker/17.09.1-ce go/go1.8.3 git-commit/19e2cf6 kernel/4.9.49-moby os/linux arch/amd64 UpstreamClient(Docker-Client/17.09.1-ce \\(darwin\\))" http.response.duration=181.200608ms http.response.status=201 http.response.written=0 instance.id=b2d432f3-0f86-4c7d-ad52-bea42df52fe1 version=v2.6.2
10.213.213.189 - - [22/Dec/2017:13:11:21 +0000] "PUT /v2/ms-harvest-merge/manifests/v40 HTTP/1.1" 201 0 "" "docker/17.09.1-ce go/go1.8.3 git-commit/19e2cf6 kernel/4.9.49-moby os/linux arch/amd64 UpstreamClient(Docker-Client/17.09.1-ce \\(darwin\\))"

I’m using two separate rules to parse these out:

rule "Parse Docker Registry Access Log Lines"
when
    has_field("document_type") && to_string($message.document_type) == "docker-registry" && regex("^(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])", to_string($message.message)).matches == true
then
    let message_field = to_string($message.message);
    let parsed_fields = grok(pattern: "%{IPORHOST:client_ip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:apache_timestamp}\\] \"(?:%{WORD:method} /%{NOTSPACE:request_page}(?: HTTP/%{NUMBER:http_version})?|%{DATA:rawrequest})\" %{NUMBER:server_response} (?:%{NUMBER:bytes}|-)", value: message_field);
    set_fields(parsed_fields);
    set_field("type", "access_log");
end

This rule is pretty straight-forward. It filters using the document_type='docker-registry' and checks that the line begins with an ip-address. If it does, it just groks the line, and marks it as type ‘access_log’.

It’s the next rule that gets nasty:

rule "Parse Docker Registry Key Value Lines"
when
    has_field("document_type") && to_string($message.document_type) == "docker-registry" && regex("^time=", to_string($message.message)).matches == true
then
    let message_field = to_string($message.message);
    let parsed_fields = key_value(message_field, " ", "=", true, false, "take_last", "\"", "\"");
    set_fields(parsed_fields);
    set_field("original_message", message_field);
    let msg = regex(".*?msg=\"(.*?)\"", message_field, ["message"]);
    set_fields(msg);
    let useragent = regex(".*?http.request.useragent=\"(.*?)\"", message_field, ["http_request_useragent"]);
    set_fields(useragent);
    rename_field("level", "loglevel");
    set_field("loglevel", uppercase(to_string($message.loglevel)));
    rename_field("time", "tx_timestamp");
    remove_field("msg");
end

The ‘when’ clause is simple, it filters by document_type = 'docker-registry' and makes sure the line begins with time=.

Then we bust the line apart using the key_value function. But the function doesn’t support quoted entries. So we strip out quotes and set the message fields to the results. Side note: I wish I could add a prefix to the keys found.

That’s simple enough, but there are two keys that routinely have spaces in them, the crucial msg field and the http.request.useragent field. So we regex both of those out and set them explicitly. Then we cleanup the level by uppercasing it and renaming it to loglevel.

Any suggestions for making this better?

system · January 5, 2018, 1:26pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Capture part of log starting from to as entire event Graylog Central (peer support)	4	482	October 22, 2018
Remote inputs on dockerized graylog Graylog Central (peer support)	11	5311	August 24, 2018
Monitoring Docker Containers with Graylog Monitoring Graylog Central (peer support)	2	6687	July 31, 2018
Docker swarm log services on graylog Graylog Central (peer support)	2	3435	August 3, 2018
Inputs for Docker container logs Graylog Central (peer support)	2	2132	August 3, 2020

Parsing Docker Registry Log Entries

Related topics