Trouble with Relationship between Indexes, Streams and Inputs

therealjoshuad · September 29, 2022, 2:23am

Hi All!

I’ve been a Graylog user for some time now, however, admittedly I’ve been underutilizing it’s many awesome features.

I’m the solo IT person in an SMB who recognizes the importance of log collection. Some time ago, I setup a simple single server instance of Graylog to and started my collection journey. I’ve been sending all our network gear and Windows Server logs into Graylog, but my efforts pretty much landed there.

Fortunately, we’re not in a highly regulated industry, so we’ve never really had to rely on them for any significant need. However, with that said, I know that we live in a very adversarial world, so I worry that I could be missing important things in the slew of logs that I’m collecting.

I’ve done plenty of simple searches to get the root cause of some simple network issues, and even gone as far as to use a few of the Dashboards for AD data from the content marketplace.

With that, I don’t think I truly grasped the rhyme or reason for some of the various functions within Graylog, and I think I may have ended up with a small mess of inputs and streams. I’ve also noticed that some searches yield duplicate messages, so I think some of the streams I created caused that. After some reading on the forum here, I think I initially though a stream was sort of like a saved search, and I didn’t use the “remove from all messages stream” option.

I’ve made it a point and I’m going to spend the next week or two really trying to hone in my Graylog instance so that I can use it in a more proactive way, but I was hoping someone from here would be willing to ELI5 (explain like I’m 5) how one might properly setup inputs, streams and indexes in a small mixed Windows, Cisco and (a few) linux servers shop? I tried reading, and re-reading the documentation on the subjects, but I just can’t seem to get it to “stick”, it seems like it’s written for someone who’s in the know about all of this stuff already.

I can tell you what I have now:

Indexes
I have the default Index Set, and somehow ended up with a “Cisco IOS” index. I think I was trying to do some parsing/extraction, and though I may have needed to have a separated index for Cisco IOS logs. Is that the case? I was trying to “decorate” the log messages so that it was searchable by fields, rather than just the raw messages.

Inputs
Is there any reason to have multiple inputs of the same type, or would I only need one input for each? I have three different network vendors sending in syslog. I think I created an input for each vendor type. Again I think I was going with if I needed to parse the different types of messages from each. Is there a better way? Here are the inputs I have now:

Beats
Cisco RAW Input (Raw/Plaintext UDP)
- I don’t think this one is being used
Firewall (Raw/Plaintext UDP)
- I have a Cisco firewall sending logs here
GELF_UDP
Syslog Input (Syslog UDP)
Syslog IOS UDP (Syslog UDP)

Streams

Finally, I created a bunch of streams of like devices. So I have one that has my firewall logs, another that has SiteA’s Switches logs, another with SiteB’s Router’s logs

gsmith · September 29, 2022, 3:04am

Hello && welcome @therealjoshuad

First, Organization is key no matter large or small.
Personally I list the devices in the network that are sending logs (i.e., Firewalls, Switches, Windows, Linux, Access points, etc…).
Creating indices from there for each, later this will prevent error showing from Elasticsearch that you have to many fields (1000), . Later on when you need to modify a server logs this will help narrow it down instead of a global search, Not all device as are created equal Over all this will give you better control. I wish someone would of told me this 6 years ago

As you know everything comes in the default index and stream “All streams” what needs to happen for better organization is re-route those logs into separate streams also as you know, click the tic box on called “Remove matches from ‘All messages’ stream”.

For example, with Linux logs,
I have an input called Linux Secure System GELF TCP that all my Linux server logs go to. Then they get shipped right to “all messages”

Picture time

Example of re-routing Linux logs:

Create input for Linux logs

Create Index for Linux logs

Create a stream to route those logs. There are two parts to this.

Index used
Stream Rules

At this point my logs from Linux server have their own Index, Stream and INPUTs
You ask why would you do this, well some log/s type are NOT capable of GELF UDP , some logs have to use Raw Plaintext or Syslog UDP, and some later on might send CEF type, or perhaps later on you may want to use Graylog Sidecar to make life easier. If you start getting in to some serious Dashboard/s and want to create a pipeline or Extractors to modify logs that are coming from Linux servers, you only need to adjust the Linux INPUT not the global one " One, Stop, Shop" input.

When creating saved search’s you can use the separate Stream or Indices to filter down what you need, I believe this make life easier and makes less white noise in the long term.

Here is another good example, member had problem with a device in the network sending bad logs and creating issue with there journal filling up, since there using only one input for everything it will take some time to figure out which device/node/sever etc… is doing this. Having separate INPUTS you can “Pause” (i.e, stop) that input and figure out what set of devices the issue is coming from. Maybe its a switch. Now its would be easier to find, just a thought. As for different vendors, I do separated them like Cisco Firewalls, Fortinet. Also switches, Dell, cisco, Force10. Its just easier when I need to do some configuration or searches.

That’s just me in a shell, it does give one some room to expand and stay organized.

Hope that helps

H2Cyber · September 29, 2022, 7:41pm

Is there any reason to have multiple inputs of the same type, or would I only need one input for each?

Extractors are configured within inputs, and consume a fair amount of processing power. If you are planning to use extractors. then the ideal setup is to have at least one input per log type. That way, you can be sure that the processing power going into extractors is not wasted on irrelevant messages.

For example, suppose you have a Cisco Firewall and a Fortinet Firewall, both sending logs in different formats but on raw UDP. You don’t want your Cisco Extractors to be applied on your Fortinet logs, and you don’t want your Fortinet Extractors to be applied on your Cisco logs (that would be a waste of CPU cycles). Therefore, you should have at least one Input for Cisco Firewalls (with Cisco specific Extractors), and an Input for Fortinet Firewalls (with Fortinet specific extractors).

system · October 13, 2022, 7:42pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help undertsanding the flow Graylog Central (peer support)	11	41	June 20, 2025
Separate index for each input Graylog Central (peer support)	3	4529	September 24, 2019
Best practice for splitting one input into multiple indices Graylog Central (peer support) pipeline-rules , route-to-streampl	1	2379	August 7, 2020
Graylog in layman's terms Graylog Central (peer support)	10	557	June 22, 2022
Create correct inputs Graylog Central (peer support)	6	2308	July 27, 2021

Trouble with Relationship between Indexes, Streams and Inputs

Related topics