Has anyone successfuly used Graylog to automatically ingest logs from Cisco’s Umbrella service (formally OpenDNS)? Umbrella stores the logs in AWS S3 buckets. I don’t think that the Graylog options to read AWS Logs are from S3 buckets - looks like it’s expecting a Kinesis stream.
They recommend the use of s3tools which allows you to run a command line utility. Can this be automated directly in Graylog or will I have to have a totally seperate process to download the logs. According to the Umbrella support article:
The logs are stored in a compressed (gzip) archive in CSV format. Logs are uploaded every ten minutes so there’s a minimum of delay between network traffic coming from your network, being logged by Umbrella and then being available to download from S3.
I haven’t worked with these logs specifically, but have used logstash (or nxlog) to read in the CSV data and have Graylog pull that data apart. If you go this route, then a cron job would download the relevant files from S3 and unzip them and logstash would watch the directory for new files and send the data to Graylog. This solution let’s logstash worry about what is new and what isn’t.
A second option would be something like this:
A cron job downloads new files
For each new file, send the contents to a Graylog Raw/Plaintext input you have configured.
zcat filename.csv.gz | nc 12205
Create extractors/pipelines to parse the CSV data
I’ve been trying to follow the instruction in the GitHub README. One statement confuses me:
Important: The IAM user you configured in “System → Configurations” has to have permissions to read S3 objects and delete and read notifications from SQS:
What is the IAM user? I don’t see anywhere to configure a user in System → Configurations. I didn’t think that the “AWS Plugin Configuration” was related to the S3 plugin (and it doesn’t contain a field for a user anyway).
but you notice that the both plugins are totally different. Their purpose is another and they are not connected. They are even not written by the same person.
I am now confused by your reply. What plugins are you referring to? I thought we only mentioned one in this thread so far (sherzberg/graylog-plugin-s3) and that is the one I am talking about.
IAM is the way user access is managed on AWS. On a new account it’s not fully setup by default. The default method is to use the root web login or the root access key for API interaction like using the S3 tools or using any tool like the graylog plugin.
Although the default starting point is the root access key but you should NEVER use that if there is anything important on that AWS account. You need to spend some moments in learning how to rollout IAM and then create a restricted user that can only READ the S3 objects. Using that restricted user’s access key is the safe option that won’t compromise the security of your whole AWS account.
In simple words using the root access key and not configuring IAM is like sharing your linux root user password with some other machine that only has the need to copy logs from your /var/log. Y
Excellent! That is the information I was needing to finish the configuration. Because I am new to Amazon AWS, this is adding a few moving parts to this whole setup.
Ok, I think I’ve got everything in place, but unfortunately, I am not getting any logs pulled from my S3 bucket.
Any idea how I can investigate this further? I’ve made a post on the Github support forum, but I figured I’d ask here too in case there’s something in Graylog that could help. The Graylog log file /var/log/graylog-server/server.log (I’m running Debian) doesn’t contain anything helpful.