Help with understanding a few key fundamental conepts of graylog. Raw vs Syslog input

Hello

I’m new to graylog and I realise that I have some issues with a few fundamental concepts. Me not understanding them makes troubleshooting a bit hard. Bare with me now since I’m new here and thanks for patience and understanding. If this is clearly described in the documentation somewhere, feel free to point me there. Been looking but can’'t find it.

So, when I have multiple inputs on the same port, like Raw UDP and Syslog UDP on port 1514, what mechanism decides what goes to what input? Is it how the message is tagged from the source/sender? Here in my installation it seems a bit random and after a restart of graylog messages that previously was “handled” by RAW input is now ‘received by’ Syslog input.

And in this particular case, when received by Syslog input graylog have some issues with extracting a proper source from the message. And from the looks of it I cannot see that GrayLog stores the IP of a ‘sender’/host somewhere to be accessed/parsed?

Is it in general considered bad and not best practice with multiple inputs on the same port as described above?

Cheers and thanks

I believe Inputs like syslog try to do some basic extractions for you whereas raw leaves all extraction up to you. In the case where you have non-standard format syslog coming in, you may want to switch to RAW to make sure you can capture what you want. As such, I would recommend a separate port for each input so that you (and Graylog) have a clear sense of what is coming in and how to handle it.

Hello

Sadly port cannot be configured in all our devices so some need to share the default 514 port. Some of them sends in “proper” syslog format and some does not.

But what I don’t understand is if I have both RAW and Syslog inputs on same port activated, am I supposed to see ALL the incoming messages in both those?

It feels like there’s some sort of hierarchy here that I fail to understand and that it’s also somewhat random. And that there’s some of hidden “i will for now on claim the messages from this device” system. So at the moment some messages that the syslog input can NOT properly parse end up there anyway and some that it should be able to parse is only seen in the raw input.

If I just could tag into the source IP of the received UDP packet I guess it would be possible to better direct messages in graylog. But it seems like this isn’t stored in the messages other than if it was properly sent as text in the message itself?

You can use iptables to redirect incoming data from a specific IP/range to a specific port. Having more than one input type on a port will lead to unexpected outcomes.

There are plenty of posts about Graylog inputs, ports and iptables in the forum such as this one hopefully that will set you on your way!!

1 Like

aaah! That’s a very good point and strategy! Thanks!

Hello,
Just adding on to @tmacgbay When I seen this statement

Is it possible to adjust port numbers instead. Below will give you less issues later on.

Examples:

Input Raw/Plaintext UDP - 1514
Input Raw/Plaintext TCP - 1515
Input Syslog UDP - 1516
Input Input Raw/Plaintext UDP - 514

Then your 514 port can be redirected as @tmacgbay suggest. As your environment grows you can make adjustments to specific Inputs, for particular devices coming in.
Hope that helps

1 Like

Will play some more with this. Thanks guys! :pray: :+1:

So in summary, having multiple inputs on the same port is bad practice and should be avoided. Correct?

You can, but there might be problems later. You should look at what you want to do in the future and plan for it now. Perhaps this might help.

https://docs.graylog.org/docs/collect

My suggestion is a good start for Inputs but you may need to fine tune it a bit for your environment. I personally like to Group my Firewalls on one Input/port, Switches on another input/port , Windows on separate input/port, etc… This way If I need to extract data from message for a particular device with port number, its made easy. For me different ports help with my network configuration and security. Even for creating widgets, searches, or notification I can execute it a lot quicker if my devices are separated by Inputs & port. This will depend on the type/format of messages being received and from where.

1 Like

Thanks! Yes, i’ve been reading a few snippets on the net and seen a few youtube clips about it I have take the suggested approach.

Need one more clarification. No matter what single input I have on a port, it will display the incoming message even though it might be strangely formatted? A message will never be muted/ignored because graylog can’t fully parse it?

Within reason, yes… I am sure one could concoct a message that would cause issue.

1 Like

Hello,

Good question, like @tmacgbay stated “Within reason”.

We had a couple posts in the forum that does pertain to this question/issue. What I noticed a while back Graylog has another Default Stream as shown below. Not sure if its used for Enterprise version or not.

I have So many different streams in my lab I have over looked this one.

EDIT: Just found this.

https://docs.graylog.org/docs/indexer-and-processing-failures

2 Likes

But how would one go about troubleshooting something like on this screenshot then? How do I trace that back to the host that sent it? Because graylog does NOT store the IP of the host that sent the message? Is this by design or some sort of limitation?

If the Syslog input is not giving all the pieces you want, it’s sometimes that the sending device is not following standards - usually the way to solve that is to have them messages sent to a RAW input and do all the parsing yourself.

yes. the problem here is that it’s hard to know what host that the message is coming from. :grimacing: hence the question about if glog stores the IP of the sending host somwhere not visible in the message but reachable from a rule or something?

You could query Elasticsearch directly with something like what is shown here but I don’t know of instances where Graylog would hid data like that.

Hello,

Within that screenshot you posted the source field looks kind of funky that is unless you have a host called “Last”. Either your using the wrong input for that device or your extractors are incorrect.
Like @tmacgbay suggested

Correct. That’s why I wanted to find what host that sent it.

You can use the gl2_remote_ip field to find out what device is sending the logs.

image

Hope this helps.

3 Likes

You may use tcpdump to sort that, though it might be troublesome depending on the amount

Ah yes, i’ve done that. Not sure how I would catch a message like that though. :thinking: I’ve only dumped on IP and port but maybe there’s an option to dump on actual message content? Will read up on tcpdump. :ok_hand: