We are deploying Graylog as a monitoring tool across multiple client sites. If its relevant we are using the docker images at all client sites and for our main server.
Our plan is to configure multiple Graylog instances (one at each client site) and then have those instances output some or all of their messages to our central Graylog server.
I’ve got this setup and working but I can’t see anyway to identify the location the message came from.
I would like to be able to have all messages from the same source system at each client land on the central server in a single stream so we can easily apply rules, extractors etc to all the different input types. Essentially, Windows Log Stream at client A, B, C, D all land into Central Windows Log Stream on our central server.
The issue I’m having is when they all land there I have no idea where they’ve come from. How can I identify or even better tag the source? For instance can I name each node with a docker environment variable and then have that passed on with the output as a field? Or can I at least see the source ip it came to us from, that would allow me to narrow down the client based on their public ip.
I’m assuming once I have this setup I will very easily be able to filter to different clients when I need to in searches or dashboards.
Thank you in advance for any guidance you can provide!
Correct me if I’m wrong you have remote Graylog server/s sending messages to a central Graylog server and you need a way to sort out logs/message from each DMZ? If this is correct? We have done this in a couple different ways, The best one I know of is to create a INPUT for each DMZ on the main server zone-01, zone-02,zone-03.
I have not used Graylog forwarding so I can’t say if there is a setting in there a that you can change… however, perhaps you can have your outlying Graylog servers create a new field that includes the source field as well as any other relevant data before you forward the message…
Hi @gsmith, thanks for the reply. I am aware of this option but I foresee an issue with scalability. Right now this will be deployed across a handful of sites, but I need something that can be automated to hundreds of sites using automation as much as possible. These are small sites but each will have its own server to potentially retain more data than will be retained at the central site.
I already have a more or less finalised method for deploying these servers in a 99% automated way.
In this context needing to setup multiple inputs on the central server would not be ideal.
But on a more limiting note these messages will be identical (apart from source) and I’d like to process them with the same extractors and then siphon them off into different streams on the basis of the data rather than the source. For example critical security or high risk events are retained longer and actively monitored by our staff, whereas run of the mill data can be cleared after a few weeks.
From what I understand the best way to achieve this use case is with an extra field.
Hi @tmacgbay, thanks for the reply, that would work!
Unfortunately I can’t figure it out, the extractor stuff I’ve found details how to extract fields from the messages but what I actually need is to just add a static field to every message on the server. I could add an extractor to the incoming streams to do this, but how to write it is beyond me, I was hoping the answer would be “oh yeah tick this box and Graylog will store the IP it received the message from”!
If it needs to be a customised extractor on each source server I can live with that, but can you guide me on how would I write an extractor to add a static field please?
Alternatively if it could pickup a system variable (ie the hostname of the server) and it is a field that would be even better, but seems a bit of a pipe dream!
On an input on the input screen next to the Stop Input button just click More Actions → Add Static Field and add the field you want to add! I’m sure this can be automated via the API when I need to.
Glad you found a solution! That will at least put your site name in. If you are using Beats or nxlgog, you can have the those sidecar configurations add in the host name - here is an example tfor a beats configuration that captures messages from Windows IIS and inserts the hostname as a field before the message is ssent to Graylog The line that does this: test_hostname: ${sidecar.nodeName}
You could also do it further down the path the message takes on at the satellite office. Attach a pipeline on the stream associated with the local input(s) and use the source field to create a new and separate field to be picked up later - in simplest form the rule in the pipeline would look like this:
rule "the One True Source"
when
true
then
set_field("true_source", $message.source);
end
Also - Mark your note as the solution for future searchers!
I completely understand, specially for a larger environments. Maybe two or three DMZ’s but after that it would be a pain.
Adding on to @tmacgbay suggested. If your using Nxlog and depending on the type of Graylog Input is in use, the input configuration on nxlog can be renamed for your DMZ, I have done this before to filter out specific nodes for searches.
For example if this was configured on nxlog as…
<Extension gelf>
Module xm_gelf
</Extension>
<Input zone-01> <---- place name here
Module im_msvistalog
</Input>