Send apache log to graylog

Hi pbhenny,

I’m still learning myself, but I struggled to understand the concepts and methods used to get log files into graylog when I first started learning, so hopefully my description will help you figure out the bits and pieces you are missing.

The basic concepts you need to understand are these:

  • Apache writes its files to the filesystem, and doesn’t natively support syslog, or some other type of log shipping protocol.
  • Something needs to read those files from the filesystem, and send them to graylog.
  • filebeat is a common tool for reading files from a filesystem and sending the logs to graylog or elk.
  • You can manually configure filebeat on each host, or graylog has a nifty feature called “collector sidecar” which jochen linked you to.
  • Using the sidecar collector, you can configure all your filebeat options (like which files to read) via the graylog server web UI, and just point the collector-sidecar process on your clients to the graylog server so they can download that config any time it changes automatically.
  • You’ll need to create a Beats input on the graylog server. An input is basically a port that graylog listens on for clients to send logs to.
  • You’ll also need to create a “collector” configuration via the web UI, which will be used by the sidecar-collector process on your clients to download the filebeat configuration options (like which files to read from the filesystem).
  • Optionally, if you want to extract certain portions of the apache logs into their own fields, you’ll need to setup an extractor on your beats input, with a grok pattern that pulls out the fields you’re looking for. That will allow you to search for things like http_response_code:20[?] to find any “200” http response codes.

So in practice for me that looked like…

On my CentOS/RHEL client side:

  • Install graylog-collector-sidecar, which also came pre installed with the filebeat binary, or you could specify a separately installed version if you wanted a different version.
  • in /etc/graylog/collector-sidecar/collector_sidecar.yml - change “server_url” to point to your graylog server on port 9000 (ie: server_url: http://yourserver.domain.com:9000).
  • Note that the tags listed in collector_sidecar.yml are “linux” and “apache” by default, so its already setup for apache. These tags dictate which configs are downloaded to that client.
  • install the graylog startup script: graylog-collector-sidecar-service install
  • start the collector-sidecar service

On the server side:

  • Make sure your server is listening on port 9000 on an interface your clients can access. By default it may only be listening on localhost.
  • Make sure your firewall allows port 9000, and port 5044 (the default beats port) from your clients.
  • Follow the Step by step guide that Jochen posted for collector sidecar, which will help you create a graylog beats input (on port 5044 by default).
  • Continue following the step by step guide to create an apache “collector” configuration for your apache log files. The collector configuration is what is read by the collector-sidecar process on your clients.
  • While following the step by step guide, make sure you tag your apache collector with the “apache” tag.
  • The example in the step by step guide only grabs access_log by default, but you can easily grab error log or other similar files using one collector. For my apache install, I use: [’/var/log/httpd/access*_log’,’/var/log/httpd/ssl*_log’,’/var/log/httpd/*access_log’,’/var/log/httpd/*error_log’]

For Bonus points, you can setup an extractor for your apache logs. I use the “combined” format for my apache logs, and here is what I did to set it up:

  • Read the manual page on extractors.
  • Go to System > Inputs > find your filebeats input > select “Manage Extractors” to the right of your filebeats input.
  • Choose “Get started” > Choose your filebeats input > Load Message > Hit Load Message until an apache access log shows up > Scroll down to the “message” field > Select Extractor Type > Grok pattern
  • Select “Named captures only”
  • Where it says “Grok pattern”, use this: %{COMBINEDAPACHELOG}
  • Alternatively, I didn’t like the built in grok, and so I modified it slightly: %{IPORHOST:http_clientip} \S+ \S+ [%{HTTPDATE:timestamp}] “(?:%{WORD:http_method} %{NOTSPACE:http_request}(?: HTTP/%{NUMBER})?|%{DATA:rawrequest})” %{NUMBER:http_response} (?:%{NUMBER:http_response_bytes}|-) %{QS:http_referrer} %{QS:http_agent}
  • Give it a name, and click “Update extractor”
  • 5 second grok primer: Grok is a regular expression language, that allows you to specify extractable fields in your regular expression, and to reference previously created “grok” rules inside other grok rules. So you can establish what an IP address looks like, call it IP, and then reference it in other groks as %{IP} without having to put the regex for matching IP’s in each rule you create.
4 Likes