Send apache log to graylog


(Pbhenny) #1

hi,
I am new pin this forum and i want to know how it’s possible to send apache logs (error and access) to graylog.

thx for support


(Jochen) #2

You can read the Apache httpd log files using any log shipper which supports a format compatible with Graylog, such as Filebeat or NXLOG.

Also make sure to check the documentation about the Graylog Collector Sidecar at http://docs.graylog.org/en/2.3/pages/collector_sidecar.html.


#3

@Jochen, I’ve never found a use case for the Graylog Collector. It must be the generic wiki article that doesn’t do a good job at explaining why you would even care to use it.

It makes sense to use a system like Kafka to handle message digestion at the perimeter in a huge enterprise, but GC instead sits inside the server and acts as the middle-man between inputs, providing what benefits exactly? It also seems very counter-intuitive to provide a GUI configurable interface from within Graylog itself, since most settings are stored in the server.conf file. It would make more sense to build out an additional collectors.conf file that has all of your node information and doesn’t rely on two separate coding languages just to reanimate a deprecated application. Why not redirect this time and energy towards building out and expanding the dashboards? I have a list of over a dozen items in the dashboard alone that are still in need of some very deserved TLC to help keep Graylog relevant.

But, TLDR, why would you need that if you are only ingesting Apache Logs? All you need is to utilize a GELF_TCP input and a program like nxlog to get it there. Extractors are your friend, but the GC won’t help you there. It honestly seems like wayy too much work just for cosmetics.


Graylog Collector vs. Collector Sidecar
#4

Hi pbhenny,

I’m still learning myself, but I struggled to understand the concepts and methods used to get log files into graylog when I first started learning, so hopefully my description will help you figure out the bits and pieces you are missing.

The basic concepts you need to understand are these:

  • Apache writes its files to the filesystem, and doesn’t natively support syslog, or some other type of log shipping protocol.
  • Something needs to read those files from the filesystem, and send them to graylog.
  • filebeat is a common tool for reading files from a filesystem and sending the logs to graylog or elk.
  • You can manually configure filebeat on each host, or graylog has a nifty feature called “collector sidecar” which jochen linked you to.
  • Using the sidecar collector, you can configure all your filebeat options (like which files to read) via the graylog server web UI, and just point the collector-sidecar process on your clients to the graylog server so they can download that config any time it changes automatically.
  • You’ll need to create a Beats input on the graylog server. An input is basically a port that graylog listens on for clients to send logs to.
  • You’ll also need to create a “collector” configuration via the web UI, which will be used by the sidecar-collector process on your clients to download the filebeat configuration options (like which files to read from the filesystem).
  • Optionally, if you want to extract certain portions of the apache logs into their own fields, you’ll need to setup an extractor on your beats input, with a grok pattern that pulls out the fields you’re looking for. That will allow you to search for things like http_response_code:20[?] to find any “200” http response codes.

So in practice for me that looked like…

On my CentOS/RHEL client side:

  • Install graylog-collector-sidecar, which also came pre installed with the filebeat binary, or you could specify a separately installed version if you wanted a different version.
  • in /etc/graylog/collector-sidecar/collector_sidecar.yml - change “server_url” to point to your graylog server on port 9000 (ie: server_url: http://yourserver.domain.com:9000).
  • Note that the tags listed in collector_sidecar.yml are “linux” and “apache” by default, so its already setup for apache. These tags dictate which configs are downloaded to that client.
  • install the graylog startup script: graylog-collector-sidecar-service install
  • start the collector-sidecar service

On the server side:

  • Make sure your server is listening on port 9000 on an interface your clients can access. By default it may only be listening on localhost.
  • Make sure your firewall allows port 9000, and port 5044 (the default beats port) from your clients.
  • Follow the Step by step guide that Jochen posted for collector sidecar, which will help you create a graylog beats input (on port 5044 by default).
  • Continue following the step by step guide to create an apache “collector” configuration for your apache log files. The collector configuration is what is read by the collector-sidecar process on your clients.
  • While following the step by step guide, make sure you tag your apache collector with the “apache” tag.
  • The example in the step by step guide only grabs access_log by default, but you can easily grab error log or other similar files using one collector. For my apache install, I use: [’/var/log/httpd/access*_log’,’/var/log/httpd/ssl*_log’,’/var/log/httpd/*access_log’,’/var/log/httpd/*error_log’]

For Bonus points, you can setup an extractor for your apache logs. I use the “combined” format for my apache logs, and here is what I did to set it up:

  • Read the manual page on extractors.
  • Go to System > Inputs > find your filebeats input > select “Manage Extractors” to the right of your filebeats input.
  • Choose “Get started” > Choose your filebeats input > Load Message > Hit Load Message until an apache access log shows up > Scroll down to the “message” field > Select Extractor Type > Grok pattern
  • Select “Named captures only”
  • Where it says “Grok pattern”, use this: %{COMBINEDAPACHELOG}
  • Alternatively, I didn’t like the built in grok, and so I modified it slightly: %{IPORHOST:http_clientip} \S+ \S+ [%{HTTPDATE:timestamp}] “(?:%{WORD:http_method} %{NOTSPACE:http_request}(?: HTTP/%{NUMBER})?|%{DATA:rawrequest})” %{NUMBER:http_response} (?:%{NUMBER:http_response_bytes}|-) %{QS:http_referrer} %{QS:http_agent}
  • Give it a name, and click “Update extractor”
  • 5 second grok primer: Grok is a regular expression language, that allows you to specify extractable fields in your regular expression, and to reference previously created “grok” rules inside other grok rules. So you can establish what an IP address looks like, call it IP, and then reference it in other groks as %{IP} without having to put the regex for matching IP’s in each rule you create.

#5

@unilogger - To the best of my knowledge, collector-sidecar doesn’t actually sit in between the message stream (like kafka) would, so its not really a “middle man between inputs”. It’s basically a glorified configuration management tool. It sits in the background on clients, and checks the graylog server for filebeat, and nxlog configurations, and when there are changes, it generates a filebeat, or nxlog configuration file, and then restarts the filebeat or nxlog process on the client with the new config.

As for resources being better used elsewhere, I tend to agree :slight_smile: It’s cool, and it’s nifty, but there are other things I’d rather have. In theory it’s great that you only have to deploy your sidecar config once, and then you’re done… but you still have to individually configure tags on each sidecar client, so you can’t just have a single generic package you deploy out. You have to have one for apache servers, another for mysql, another for your favorite custom app, etc. It also lags behind the available options of filebeat for example. I wanted to enable the “symlinks” option in filebeat, but the collector UI doesn’t support that option.


(Jochen) #6

4 posts were split to a new topic: Graylog Collector vs. Collector Sidecar


(Jochen) #7

That’s what snippets are for: http://docs.graylog.org/en/2.3/pages/collector_sidecar.html#snippets


#8

The problem with snippets is that they are applied “globally” to your entire filebeat config file, instead of at a specified section of the config. In my case, the symlinks option needs to be applied to the “apache” or “jvm” or “whatever” prospector that I’m creating.

Also, it appears I can only have one output “host” setting per filebeat.yml - If I configure one beat collector to send to server1:5044, and another to send to server2:5044, my generated filebeat config only shows server2:5044.

Anyway, I don’t need a response or solution to this - in my case it’s not critical to my design and I was able to work around these. I was just saying that while its a very “slick” and even nice feature, it can be laborious and error prone to try to write a nice pretty GUI front end for something that was meant to be configured via txt file, and I could imagine that time and effort being spent elsewhere improving the core graylog product.


(Jan Doberstein) #9

@TJgrayD

Also, it appears I can only have one output “host” setting per filebeat.yml - If I configure one beat collector to send to server1:5044, and another to send to server2:5044, my generated filebeat config only shows server2:5044.

Just to have it said - this is a limitation of filebeat and following this issue it will not be changed. Yes the GUI is not well designed in that part and does not clear give you the knowledge about the limitation.


(system) #10

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.