Our company is running Graylog to process Windows security event logs of all of our ~500 Windows hosts. I will try to define our environment first.
We have ~70 domains with ~5 Windows hosts each. Every domain has it’s own VLAN and could be seen as separate network.
We first tried to process all of our data to 2 Graylog nodes and everything was fine but the traffic from our logs started to overload our firewalls.
Now we have 1 Elasticsearch cluster, 1 central Graylog and a Graylog docker container in every domain which works as a proxy to prevent big amounts of connections (which results in 70 Graylog nodes). Everything worked great in the beginning but after 2 months different problems with Elasticsearch index started to occur every 2 months. I am busy with upgrading our Graylog, Elasticsearch and MongoDB to the newest version (We are running Graylog 2.4 currently), but I recently started to question this idea with all those 70 Graylog nodes. I have never seen anyone use more than 2-3 Graylog nodes to process data.
What do you think? Is it the right way? Should it be working just fine or was Graylog never designed to work like this?