u/BourbonInExile in Reddit
A bit of disclosure up front: I work at Graylog. I’ve been a software engineer on the Integrations team (mostly building Enterprise features like the O365 input and the BigQuery output) since March of 2020 and in May of 2021 I became the US Engineering Team Lead. Prior to this past weekend, I’ve done a lot of running Graylog from the IntelliJ debugger on test data but no actual running Graylog at home on my own data.
So last Thursday, Jeff and Aaron dropped a video on YouTube going over running Graylog on a Raspberry Pi using Docker (LINK) and I decided it was time to get off my butt and turn one of my Raspberry Pis into a Graylog box. I figured I’d take a few minutes to generally write up the process so others can complain that this write-up is woefully out of date when it turns up in their Google search 3 years from now.
Step 0 - Prep
So right off the bat, I knew I was going to have to totally reimage one of my Raspberry Pi 4s. They were both running Raspbian, which is 32-bit only and Graylog via Docker requires a 64-bit OS. So Step 0 was getting everything that mattered migrated over to the other Pi. That meant migrating my Pi-hole setup (and updating the router to point to Pi-hole on the new box) and the various Reddit scripts I run to make r/wetshaving a nice sub to participate in.
Step 1 - New OS
With a fully disposable Raspberry Pi 4 (8GB) at my disposal, I followed this tutorial to get Ubuntu installed. Once it was up and running on Ubuntu, I also installed a few of my favorite extras like Oh My Zsh and screen. And, of course, I installed Docker.
If I had known on Saturday morning what I learned on Sunday afternoon, this is also the point where I should have reformatted my USB hard drive to a more Linux-friendly file system (I picked ext3) so that it would be easier to store all of my Graylog data somewhere other than the 32GB micro-SD card that serves as the main storage for the Raspberry Pi.
Step 2 - Getting Graylog Running
Graylog has some decent documentation (but I’m biased since I wrote some of it). I hopped over to the 4.1 Docker Installation page and copied the “Example Version 3” docker-compose YAML file. I then made a few of the modifications mentioned by Aaron in the previously linked YouTube video. When it was all said and done, I ran
docker-compose up and - after waiting for things to download and go through brand-new-cluster initialization - I was able to log into my Graylog instance.
Over the course of the weekend, I continued to tweak my docker-compose YAML file by modifying the memory allocated to Elasticsearch (2GB out of my 8GB, though the general rule of thumb I’ve heard for relatively small installations is 50% of your RAM to Elastic and 25% to Graylog itself), and updating it to use bind mounts instead of standard volumes.
Once it was all up and running, I smashed the button to get a 30-day free trial of Graylog Enterprise because there were some enterprise features I knew I wanted to use like the GreyNoise data lookup.
In retrospect, I should have just gone straight to the Small Business License (aka “Free Enterprise”), which gives me all the enterprise features (with no support) as long as my traffic is under 5GB per day. I honestly didn’t even realize this was an option until someone asked me today why I picked the 30-day trial (short duration, unlimited data) over the small business (data cap, unlimited duration). I just didn’t know any better and I’m not ashamed to admit it.
Step 3 - Feeding the Beast
I decided I wanted to collect logs from a few different sources:
- My ASUS router, which has the firewall active
- Pi-hole running on my other Raspberry Pi
- rsyslog logs from both Raspberry Pis
I started up the Syslog UDP and Syslog TCP inputs in Graylog, both listening on port 1514.
Getting the rsyslog logs from both Raspberry Pi boxes was pretty easy. It was just a matter of editing
/etc/rsyslog.conf to add the line:
*.* basically says “give me everything”. The
@@ tells it to use TCP rather than UDP (just one
@ ). The IP:port is obviously my Graylog instance. The
;RSYSLOG_SyslogProtocol23Format tells it to format the data nicely in a way Graylog is already expecting. And, of course, I had to bounce the rsyslog service to get it to pick up the changes.
For the Pi-hole logs, I took a cue from this article and just updated
/etc/dnsmasq.d/01-pihole.conf to set the
log-facility value to
local5 , which sends the logs to rsyslog instead of a Pi-hole log file. Maybe I’ll follow through on some of the other suggestions in there for reducing disk usage and keeping the Pi-hole UI nicely populated later.
The ASUS router logs were the biggest pain in my 4$$. No matter what I jammed into the ASUS web UI, I couldn’t get any data to appear in Graylog. In retrospect, I wonder if the router was sending null-terminated logs instead of newline-terminated, resulting in my Graylog input catching all the data but never knowing seeing the “end” of the first record. I ended up updating the
rsyslog.conf file on my Graylog box to accept TCP/UDP syslog data on port 514 and sent the ASUS logs there. Rsyslog was then kind enough to forward them onward into Graylog.
Step 4 - Making Sense of My Data
Now I had a whole jumble of data coming into Graylog and I wanted to start doing something with it. I set up new Index Sets for my Pi-hole and ASUS logs and then created matching streams for each. Messages go into the router stream if the source contains my router model and into the Pi-hole stream if the source contains “dnsmasq”. Since most of the data I care about is coming through in the
message field, I created new pipelines for each stream to parse the data out into new fields. Where it made sense, I tried to use Graylog Schema field names. I also used custom Grok expressions to parse the message field based on keywords in the message.
In order to get a bit more information about the traffic being dropped by the firewall on my router, I set up a GreyNoise lookup and worked it into the pipeline. With the extra info from GreyNoise, I was able to make a nice little dashboard so I could see where the packets being dropped by the router were coming from (US is the current winner followed by Netherlands and Russia). For IPs that GreyNoise didn’t give data on, I added a Whois lookup, which at least gives me country code and AS organization for the logged IPs.
Maybe this week I’ll try to do some dashboarding around the Pi-hole data (who in my house tries to access the most blacklisted sites? the answer might surprise you!).
I also want to do something to highlight root/sudo usage on all the boxes I’m monitoring.
If I get really ambitious, maybe I’ll even set up log collection from my laptop, which I expect is going to be trouble because the contents of system.log appear to be in local time (UTC-04) without the timezone specified and I just know Graylog is going to treat them as UTC time and I’m going to have to write pipeline rules to fix the timestamp.
If there’s interest, I’d be happy to share my
docker-compose.yaml file and maybe an export of my pipelines/rules/dashboards via GitHub for anyone who wants to get a head start on a similar setup.
Edit: File(s) in GitHub