New to Graylog - need help getting started

Hello to everyone.

I know that there is an entire documentation, but it seems to get right down to business and I feel I am missing some basic information. If this information is in the documentation- I apologize for missing it, and will appreciate if someone can post links to the correct pages.

as I hinted in the title- I am completely new to Graylog. I’m in fact completely new to the whole log collection\siem field.
I began reading the documentation and realized that I had too many questions to continue on with installation:

  1. Sizing and scaling options: obviously for the first time I should use the appliance to test and see if I like the system at all, but it seems to me that after installing and configuring everything it will be a waste of time, to scrap everything and start from the beginning with the right architecture. How do I know if the appliance will be enough for our organization or if I should use a Graylog cluster (and of how many nodes)?
  2. Elasticsearch what is this? is it required? if not what added value will it have for my setup? and similar to my previous question what size should I have it?
  3. Storage - I read in the documentation that the database saves only the metadata of the logs, however I couldn’t find where the logs themselves are saved and how and where I can configure them to be saved (SMB, NFS and etc).
  4. windows servers - I read that Graylog is unable to collect logs from windows servers directly and require an agent to be installed. is there any plans to implement an agent-less solution in the future? to be honest I am not a fan of installing an agent on every server I want to monitor. I am actually wondering if I can configure all windows servers to forward their logs to a dedicated windows server with the Graylog agent installed and collect them from there. will this work? are there other or better solutions for this?

these are the questions I have so far. Thanks in advance for the help.

Hi @gkman, I wanted to address your questions. Hope it helps! :slight_smile:

  1. How many sources are you looking to ingest, about how many records per day, and how large are these records. Are you talking a small line of information or a Windows Event Log entry. The larger the “document” you need to index, the harder it will be to index on the system. Also, what type of hardware is at your disposal. Do you have alot of CPU cores and RAM available or are you using something much smaller. The more resources on the box, the more you can cram into it (requires a bit of engineering). With that said, if resiliency and redundancy is a goal, you might consider standing up enough resources/nodes for Graylog, Elasticsearch and MongoDB.
  2. Elasticsearch is where the data gets stored. You have to have this. The size depends on teh amount of data you are trying to store and the retention period of the data. For example, trying to store data for 7 days takes alot less resources than storing data for a year.
  3. Elasticsearch will save what you give it. Oftentimes, the raw message is preserved when and the metadata of the object become fields. The goal is to “normalize” these field names across your various datasets so that you can correlate the data based on a field - source_ip, domain_name, etc.
  4. You can totally setup Windows Event Forwarding and have that one server send the logs. There are various solutions out there. NXLog and WinLogBeat are two options for forwarding the ForwardedEvents log to the Graylog installation. This has the added benefit of only forwarding the events that you really want/need to and it can be deployed to workstations and servers via GPO if you are running Active Directory.

I’ll try to answer any more questions you might have.

Hi @billmurrin thanks for the reply.

I have few follow up questions.

  1. is there a formula for calculating the size of the cluster? (for example something like: (client number) * (logs per node) * (0.01 for small line logs or 0.3 for windows events) = RAM required). same question for Elasticsearch…-
  2. How difficult is it to expand Graylog in the future- can I just go ahead and install the vm or expanding required eraly planning and configuration?
  3. in regards to the previous question- Is Elaticsearch included in the Graylog vm appliance? the documentation led me to believe it does. is it easy to migrate the data to an external elasticsearch source\ cluster or I should split them up from the very beginning?

thanks again for the help,

Hey there,

1.) There used to be a calculator online to calculate Graylog nodes, but it has since been taken down. Everything is very scalable. Elasticsearch, mongodb and graylog-server. I don’t really have a great reference for you because it will largely depend on your hardware specs (Cores, RAM, HDDs), the amount of data processing/enrichment you are doing to transform the data, and the amount of data you are going to throw at it. Sorry.
2.) You can install the vm to test it out, but in production, you probably want to install the components on a server. I use the VM for testing and would not use it in production, though I see that there are quite a few that do.
3.) Everything is included in the appliance. It comes out of the box ready to rock and roll. Download the ova file and then import it into your VM application. It works very well.

Ok. Ill give it a go. thanks again for the help. :+1:

1 Like

Hi- so I have installed the vm and configured it. I even managed to configure a server and collect from it logs.
so now I have new questions…

  1. search - how can I set it to case insensitive- in windows everything basically is case insensitive so to start checking each parameter before I search it is really big annoyance.
  2. graylog-sidecar - do I have to install both graylog-sidecar and nxlog on the clients (who I collect logs from) or can I may do with just one?
  3. log analysis - I may have gotten a wrong impression and maybe should have done some more reading- but does graylog enable cross-referencing logs between multiple sources to get a wider view of problems and issues?. (for example if I have an application error- would graylog be able to cross-reference it with the database server and show the application was caused due to a timeout?) how can I view such things?