Hi,
do you plan to make Graylog more friendly to common IT guys with smaller networks in some release?
I mean, most of us wish to install all components on single machine and not worry about shards being corrupted, and we don’t also need features like clusters or support for thousands of machines.
Is your machine reliable?
Is the underlying storage reliable?
FWIW, most databases have problems with shutting down systems the hard way (i. e. turn off power).
I understand you need a beast like ES when your network is huge, but who has a network of thousands of PCs and don’t have money to pay for a stable solution?
You do not need to use Graylog in a cluster.
I am running it in our company in a singele server setup and after some tweaking it is running quite well.
But I have to say, tweaking took me a lot of time and trouble, because it is not good documented where and how to tweak to get the best performance out of your setup.
Sometimes it’s sufficient to run a central syslogd which receives and aggregates the logs from all other systems and search the log messages locally with grep.
Not really trying to dispute your experiences, but they are contrary to mine. I’ve now created 3 one-node Graylog servers (so running GL, ES, and Mongo on the same node) in two different cloud providers, (Rackspace, and AWS) and have had zero issues with any of the things you’re experiencing.
I would look at your hardware if you’re running it on your own. My setup isn’t huge, but the main GL server gets around 6-10GB of logs per day, and Graylog/ES handles it all very well. I’ve never had any ES shard corruption, but then, I’ve never had to hard-reset my cloud machines while ES was still running, so YMMV.
I know this might sound counter to what you said in your OP. But if you need to hard-boot a DB on the regular, maybe you actually want an ES cluster. So if one node goes down it can be rebuilt from the other 2 in the cluster and you don’t lose anything.
Alternately if you’re losing power a lot, get a UPS which can trigger a script when you lose power to gracefully shut the server down.
Hm, sounds like making simple things difficult. I never needed clusters, UPS or clouds until I found Elastic Search I’d rather like to hear, why something user friendly like mysql is a bad idea, at least for small fish (like most of us).
Feel free to use a log management solution which is backed by a different data store, but Graylog requires Elasticsearch and we won’t change that in the foreseeable future.
Ok. I understand that ES has better performance for searching. My point was that this performance gain could be very expensive for some of us. Every time I tried using Graylog, in few days it ended hunting forums reading about shards, nodes and clusters and how to get it working back again. At the end, deleting everything and starting over from scratch was the fastest method.
It supports MySQL and other relational databases as back-end.
We have used it in the past. It has some cool features but it is not open source and terribly slow when you have a lot of logs.
We are running a single Graylog node backed by a 2 node ES cluster (ElasticSearch as a Service from AWS).
Set-up was really easy and we never experienced the problems you are reporting.
Thanks for your input Marteen. I end up with Logalyze, it is open source and you don’t have to care about shards and how to convince ElasticSearch not to cripple your data (it uses relational db). It doesn’t look so cool and modern like Graylog, and may be not as robust as GrayLog is, but it does its job pretty well.
I’ve been running Graylog for nearly a year, as a VM in ESXi, on consumer level hardware, sending ultimately well over a million logs to it (yep, small fry), and have never seen the problems you mentioned. As a somewhat “common” IT guy myself I find Graylog very easy to install and maintain (far easier than a pure ELK stack) and some simple research produces links that provide tips on how to properly configure indexes and shards – maybe took me an hour of dedicated searching. For example, on my single node install, all of my indexes are set to archive after they’ve grown to no more than 30% of the RAM assigned to the VM (roughly 2.4GB), and I set the number of shards to 1, and no replicas. Now, I am experimenting with learning ELK better and my Graylog install is currently turned off, but I can say that Graylog makes things relatively simple.
To be honest, anyone who has a home or small office lab who isn’t running UPS is asking for trouble, no matter what software you’re running – loss of power can be catastrophic for DBs and for some hardware (I have personally witnessed unrecoverable SQL database corruption due to unexpected power events, so this is not a problem singular to Elasticsearch or Graylog). I have multiple UPSes in my home lab to cover all the important machines – NAS, ESXi, main workstation. The recommendations to verify integrity of hardware are perfectly valid.
All that said, if you’re experiencing corruption so frequently, there is something else going on in the system. You never told us the system specs, or provided any log files. Besides, Graylog is free and runs perfectly well on consumer level hardware, as I mentioned. I’m running ESXi on a i7-4790s, with 32GB RAM, on a desktop motherboard, with one SSD for a cache drive and several spinner drives for VM storage and it’s been fine for three years (except for the occasional goof up on my part when configuring stuff, usually due to not reading documentation thoroughly). My Graylog VM has 8GB RAM and 4 cores associated with it. Graylog has, so far, been the best log management solution I’ve seen yet, though it does have its limitations.
Well,
if you google for graylog + corrupted or unassigned shards, you can see, I have not made this up.
I tried GrayLog twice- first it was virtual appliance, second time it was installation on linux VM. Both times, it ended up with corrupted shards pretty soon (once I accidentaly reset the box myself, then it was the power loss). Oh, and watch out for full disks, cache, too much of logs comming, etc., this corrupts ElasticSearch too.
You are right databases like SQL might get corrupted after power loss too, but you will have it running back within a minute, provided you have data backup. With ElasticSearch, you are going to study forums and trying various jsons until you give up- promising commands like _cluster/reroute are not going to fix anything.