Logs from text file, JSON formatted file and database to Graylog

Hello guys,

I’m new to Graylog and need some information regarding getting different logs to Graylog.
I have 3 types of logs, each generated by a different application: a text file where new logs are appended to, JSON formatted files and database entries.

So far, if I have understood correctly, with ELK Stack I can use Logstash Gork pattern to parse text file lines, Logstash JDBC input to import data from database and Filebeat prospectror with JSON option to proccess JSON files. But I would like to know how can I parse and collect mentioned types of logs to Graylog? What are the most common methods?

Thank you in advance!

Collecting files would also be done through FileBeats, using either plain FileBeats or the Graylog “Sidecar Collector” which is a piece of agent software prepared by the Graylog team.

I don’t know about pulling stuff from databases and I’ll actually have to investigate that because I’m interested too :slight_smile:

You would collect all logs with Filebeat, because it can do that. For the processing, if you do it on the filebeat side be aware that any fields that are not “message” are (as of Graylog 2.4) quietly discarded. Your best option is:

  1. Grab all logs with Filebeat, but be sure to add a ‘fields’ entry per log file type that indicates what type of log it is
  2. Set up streams with matching rules where you split off the plain text stuff into one stream, the JSON stuff into another stream
  3. Set up a pipeline, attach it to the JSON stream, then set up the rules to parse the JSON out of the message field, and munge/modify/drop whatever you don’t need
  4. ???
  5. Profit!

For database entries, you may need to indeed use logstash jdbc input to a logstash GELF output, to a Graylog GELF input.

1 Like

Since I would need to use Filebeat and also Logstash, would it be more reasonable to use ELK Stack then or would the Graylog still be easier to set up and maintain even with this setup?

That fully depends on what your goals are for the setup really :slight_smile:

I have 3 different applications that generate logs that are saved in different formats in 3 different servers. My goal is to get all of them into one centralized server (into ElasticSearch) to have good visual overview of logs and make searching through logs easier. In centralized server I want to store logs that are maximum one year old and remove older ones from central server (but not from the original server where the logs are first stored to). I don’t care much about notifications, but I need some simple authentication (should be easy in Graylog, but also ELK with nginx doesn’t seem too hard to set up).

I first thought using Graylog since the setup and maintenance were said to be easier (I might want to add Kibana to Graylog for better visualization in the future). But since I need to use Filebeat and Logstasg with Graylog and also use elasticsearch-curator to remove older logs (I think) then I thought that maybe the full Elastic setup might in the end be better.

But I don’t have previous experience with any of them, so I don’t really know which way to go. Any help would be appreciated! :slight_smile:

Basically you’re in the same boat I was in a few months ago.

I opted for Graylog for two reasons:

  • I have no specific needs that would tie me into Kibana. Graylog’s functionality was enough for us.
  • The enterprise licensing for Graylog is much friendlier than ELK-stack’s enterprise costs.

Well, that and their team made a great first impression. Between @taylor in sales, the company’s website and their documentation site (both in content and in writing style), they seem like a great bunch.

Thank you for your insight, I agree that the documentation seems pretty good. Also I forgot to mention that I will be using free version of the chosen log management system, since I’m setting the system up as a project for my university and they prefer free software.

I’ll echo the totally not a robot’s statements, the reason I went with Graylog was that our dev team just wants logs, they don’t care about visualisations, and that Graylog comes out of the box with LDAP and role based access controls. Kibana in it’s “free” version does not, and we’ve had some hilarity with devs trying to manage the cluster when I wasn’t looking by using Kibana’s dev tool.

On top of that, in the long run, from a financial standpoint Graylog’s pricing is just much more attractive than getting an ELK stack with x-pack. With our current setup we’d be looking at stupid amounts of money to get features that Graylog already has. When the time comes (soon, staff people, soon) I’ll gladly convince my higher ups to spend money on a Graylog Enterprise license, if only for the support. Because feature wise, the open source version totally hits the spot.

Now, for visualisations, yes, Graylog is lacking in that department. On the other hand, you do get a search API, so it’s “easy enough” to write some code that uses that to wrangle your visualisations for you. Also because I find that while Kibana does visualise better, it still doesn’t do it quite right, so we have home-brew stuff running anyway, especially for our data analysts who always want things different.

So the long of it is above, the short of it: I picked Graylog for feature-richness, ease of management, and future much lower cost than a full blown enterprisey ELK stack.

1 Like

Thank you very much for the thorough overview, I think I’ll try out the Graylog since, like you said, I too need logs and a easy way to search through them, visualization is not the main priority.

Exactly this!

The fact that we’re lacking the Elastic enterprise license was also less of a problem once we added SearchGuard’s free version. TLS encryption for our traffic was a hard requirement and you either need Elastic Enterprise, or the free SearchGuard. Mongo’s free version supports TLS out of the box, so that was the final box to tick.

By the way, I have gone a very similar route for our password management platform :slight_smile:

CyberArk’s solutions are expensive (comparable to Splunk-vs-Graylog), so I looked for a smaller player in the market that offers feature-parity. We found ClickStudio’s PasswordState which is a tremendous tool for the money it involves. Less than $7k as a one-time payment for an enterprise license, with roughly $5k a year for tech support.

Just, in general, never be afraid to experiment. Graylog has some images for Docker etc. available that can get you up and running in a minimal way in no time, and it allows you to fiddle and experiment with it. Feel it up good, as it were :wink:

Outside of that, from a usage perspective, in the last week when we were in “production” with the whole show, I’ve scaled up from 1 to 3 Graylog nodes, which was as complicated as pointing our very limited Ansible playbook at a few blank servers and getting some coffee. Okay, okay, and stopping a local input and turning it into a global one. And adding 2 hosts to our Collector configs. So all in all 15 minutes worth.

And now with Graylog 2.5 being fresh off the press, there’s support for Elasticsearch 6.x so I’ve upgraded our Graylog cluster from 5.6.13 to 6.5.1 with a total of 30 minutes of downtime because you can just stop processing on the Graylog nodes, ingestion keeps going and writing to the journal, then after you resume processing and wait an hour and it’s all set.

Admittedly there are still a few things that I wish Graylog had, but those are either something I can solve by writing a plugin, or something I can solve by contributing back to the project (which if you ask me is an ideal situation to be in). In our case, our CTO has given me permission to scalp a few of the java devs and get them to work on plugins if we need them, which is something you can’t do with closed source stuff.

Password management we actually do with HashiCorp Vault - or rather, secret management. It’s currently being integrated into everything we do, and we use HashiCorp Nomad as container orchestrator so that part is hilariously easy.

The only “open end” is user passwords, but we generate strong passwords for new accounts, and if people change it through our management interface it gets ran past an LDAP policy, then dragged through the haveibeenpwned database, and then it’s either set or rejected.

Everything else basically follows from there since you can authenticate to Vault with LDAP, and applications either have Nomad deliver a token to them (securely), or they use Vault’s API to get themselves a token. We also use it for our internal PKI, best thing since sliced bread! :slight_smile:

We also use it for our internal PKI, best thing since sliced bread!

HashiCorp Vault can work as a PKI? O_o

I’ve mostly run ADCS so far, which isn’t bad at all, but this sounds interesting too!

Sure, a little bit off-topic but here’s the docs - the short of it, you mount a PKI secret backend, then generate your CA (and if you’re being anally retentive like myself) a bunch of intermediaries, then set up a role that defines some maximums and config options, and then all you do is a simple vault write your_intermediate_pki/issue/your-role-name common_name="foo" ip_sans="1.2.3.4" ttl=8670h format="pem" and have a certificate and key delivered right then and there.

Since it has an API (with ACL and proper token support), we’ve rolled this into our Ansible deploys where each server is issued a server certificate, and depending on the role of the server a few additional certificates.

Sky’s the limit, really :smiley: If you follow the example in the documentation (read all the way to the end where they describe how to set up an intermediate) you can be up and running in no time. One thing to do is to add your root CA to system trust store, and when you issue certificates, concatenate the cert with the supplied ca_chain field to get the whole thing in one go.

Anyway… back on topic! Graylog rocks. And so on. And so forth. :smiley:

Thanks for sharing dude! I’ll poke around a bit with that! Our needs are a lot stricter than “poke the API and it’ll give you a cert”, but it could definitely be cool!

It can be as strict as you want, because to poke the API you need a token, and to obtain a token you need to either be given one issued by someone with access, or you need to exchange a role id/secret id for a token (AppRole authentication). The fun thing about it is that I can give you a secret ID that’s only valid for 10 minutes and a single use, and that will get you a token that is valid for perhaps 2 minutes, and a single use. And that token has a policy attached that states you can only issue a certificate, but nothing else.

And if you set your certificate role up properly you can block it from just issueing blindly to any hostname, so it has to match a certain domain, certain format, can totally disallow SAN/IP SAN, and so on and so forth.

It’s a lot to take in but I think that even from a governmental security standpoint it’d be useful :slight_smile: (I was one of those dodgy IT security types before I became a dodgy devops type by the way :D)

1 Like

That sounds brill’! Let’s not further derail this topic :smiley: I’ve put this on my list of things to investigate :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.