Input not receiving any new messages

Hi. I’m new to graylog and trying to work my way through this.

Last week, I was having issues with Elasticsearch filling up. So I deleted the home volume and expanded the root volume. Since then, I have not been receiving any new messages. Can anyone help me through this issue? I’m using 3.8. I’m not sure what logs you would need to see but I can provide them.

Hello @zrevans826, welcome!

Are the core services (elasticsearch, mongod, graylog-server) running?
Are the inputs you’re expecting to receive messages through running?
Are your streams started?
Do you see queueing in the in/out buffers or the disk journal?

Hi. Yes, those are running except that when I just checked the status of graylog-server, I am getting this error Failed to load class “org.slf4j.impl.StaticLoggerBinder”

My inputs show as running.

Streams look like so.

In/Out Buffer looks like so…

I see 352 in 0 out in the second image. Is your disk journal filling up?

Its not giving me any errors indicating that it is. Only notification I have is an outdated version of Graylog.

Right below buffers in the second image there’s “Disk journal”. You would see it there.

Basically I’m trying to determine if anything is being indexed to help isolate this to a problem with Graylog or Elasticsearch.

Also, on the inputs page, do you see traffic being received by the pertinent input?

I see. No, it does not look like the journal is filling though that was the issue I was having before I delete the home partition.

I only have 1 input so far (just started about 3 weeks ago)

The number of unprocessed messages in the journal is concerning. I honestly don’t know what to make of such a large negative number of messages in the journal. It suggests that something is confused. @jan @aaronsachs @tmacgbay @shoothub you guys have any input on that?

@zrevans826, is there anything new showing up in your all messages stream? On the indices page, for the index to which these messages should be routed, what does it show for most recent message? How old is it?

Are there any errors in your Graylog log file?

Follow up, what does the contents of /var/lib/graylog-server/journal look like? We may try to flush all of the messages (clear/reset the disk journal) to see if messages start processing again. This means you will lose everything that has not yet been indexed. You will have to stop the graylog-server service to perform that task.

Here it is.

If you don’t have an issue with resetting the journal, stop the graylog-server service and delete the contents of /var/lib/graylog-server/journal

Then, start graylog-server and evaluate.

I completed that but its still not receiving any messages.

The disk journal is still showing a large amount of unprocessed messages.

Here is what my graylog-server.log says for today.

2020-11-30T13:29:56.164-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=syslog_udp, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0x0d5dc0a6, L:/0:0:0:0:0:0:0:0%0:1514]) should be 262144 but is 425984.
2020-11-30T13:29:56.164-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=nxlog_udp, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0x52015045, L:/0:0:0:0:0:0:0:0%0:3514]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=syslog_udp, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0x72862052, L:/0:0:0:0:0:0:0:0%0:1514]) should be 262144 but is 425984.
2020-11-30T13:29:56.164-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Cisco ASA, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} (channel [id: 0x2a4264e8, L:/0:0:0:0:0:0:0:0%0:5341]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=nxlog_udp, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0x5c1aa3e6, L:/0:0:0:0:0:0:0:0%0:3514]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Cisco ASA, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} (channel [id: 0xa34a3a66, L:/0:0:0:0:0:0:0:0%0:5341]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=syslog_udp, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0x75959ce6, L:/0:0:0:0:0:0:0:0%0:1514]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=nxlog_udp, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0x34f588d5, L:/0:0:0:0:0:0:0:0%0:3514]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [AbstractTcpTransport] receiveBufferSize (SO_RCVBUF) for input Beats2Input{title=Beats, type=org.graylog.plugins.beats.Beats2Input, nodeId=null} (channel [id: 0x1bb8b779, L:/0:0:0:0:0:0:0:0%0:5044]) should be 1048576 but is 425984.
2020-11-30T13:29:56.166-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=nxlog_udp, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0x29c75722, L:/0:0:0:0:0:0:0:0%0:3514]) should be 262144 but is 425984.
2020-11-30T13:29:56.166-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Cisco ASA, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} (channel [id: 0xa321863a, L:/0:0:0:0:0:0:0:0%0:5341]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Cisco ASA, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} (channel [id: 0x282c3459, L:/0:0:0:0:0:0:0:0%0:5341]) should be 262144 but is 425984.
2020-11-30T13:29:56.165-05:00 WARN [UdpTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=syslog_udp, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=79bd119d-aca8-483a-8988-fcb31b02867e} (channel [id: 0xf4018082, L:/0:0:0:0:0:0:0:0%0:1514]) should be 262144 but is 425984.
2020-11-30T13:29:56.167-05:00 INFO [InputStateListener] Input [Syslog UDP/5fa0329a6604fc1a29e78847] is now RUNNING
2020-11-30T13:29:58.322-05:00 WARN [Messages] Retrying 51 messages, because their indices are blocked with status [read-only / allow delete]
2020-11-30T13:30:00.292-05:00 WARN [Messages] Retrying 500 messages, because their indices are blocked with status [read-only / allow delete]
2020-11-30T13:30:01.119-05:00 WARN [Messages] Retrying 500 messages, because their indices are blocked with status [read-only / allow delete]
2020-11-30T13:30:02.344-05:00 WARN [Messages] Retrying 500 messages, because their indices are blocked with status [read-only / allow delete]
2020-11-30T13:34:23.330-05:00 WARN [LicenseChecker] License violation - Detected irregular traffic records

Your Elasticsearch index is read only. Are there errors in the Elasticsearch logs?

If not we can try setting them back to allow writes.

This is what I see in .current

Desired survivor size 17891328 bytes, new threshold 6 (max 6)

  • age 1: 309752 bytes, 309752 total
    : 314513K->479K(314560K), 0.0118425 secs] 726308K->429759K(1013632K), 0.0119893 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
    2020-11-30T13:37:58.063-0500: 597857.421: Total time for which application threads were stopped: 0.0124783 seconds, Stopping threads took: 0.0000787 seconds
    2020-11-30T13:38:14.070-0500: 597873.428: Total time for which application threads were stopped: 0.0003141 seconds, Stopping threads took: 0.0000742 seconds
    2020-11-30T13:38:33.077-0500: 597892.435: Total time for which application threads were stopped: 0.0003228 seconds, Stopping threads took: 0.0000779 seconds
    2020-11-30T13:38:34.078-0500: 597893.436: Total time for which application threads were stopped: 0.0006645 seconds, Stopping threads took: 0.0000526 seconds
    2020-11-30T13:38:35.081-0500: 597894.439: Total time for which application threads were stopped: 0.0003457 seconds, Stopping threads took: 0.0000529 seconds
    2020-11-30T13:38:54.088-0500: 597913.446: Total time for which application threads were stopped: 0.0003934 seconds, Stopping threads took: 0.0000793 seconds
    2020-11-30T13:38:56.092-0500: 597915.450: Total time for which application threads were stopped: 0.0003115 seconds, Stopping threads took: 0.0000740 seconds
    2020-11-30T13:39:00.096-0500: 597919.454: Total time for which application threads were stopped: 0.0003017 seconds, Stopping threads took: 0.0000551 seconds
    2020-11-30T13:39:14.099-0500: 597933.457: Total time for which application threads were stopped: 0.0003184 seconds, Stopping threads took: 0.0000771 seconds
    2020-11-30T13:39:30.108-0500: 597949.466: Total time for which application threads were stopped: 0.0003428 seconds, Stopping threads took: 0.0000731 seconds
    2020-11-30T13:39:49.122-0500: 597968.479: Total time for which application threads were stopped: 0.0003137 seconds, Stopping threads took: 0.0000741 seconds
    2020-11-30T13:40:03.130-0500: 597982.488: Total time for which application threads were stopped: 0.0002666 seconds, Stopping threads took: 0.0000587 seconds
    2020-11-30T13:40:33.146-0500: 598012.504: Total time for which application threads were stopped: 0.0002235 seconds, Stopping threads took: 0.0000562 seconds
    2020-11-30T13:40:34.147-0500: 598013.505: Total time for which application threads were stopped: 0.0002290 seconds, Stopping threads took: 0.0000520 seconds
    2020-11-30T13:41:00.153-0500: 598039.511: Total time for which application threads were stopped: 0.0002542 seconds, Stopping threads took: 0.0000474 seconds
    2020-11-30T13:41:05.155-0500: 598044.513: Total time for which application threads were stopped: 0.0003047 seconds, Stopping threads took: 0.0000755 seconds
    2020-11-30T13:41:30.160-0500: 598069.518: Total time for which application threads were stopped: 0.0002678 seconds, Stopping threads took: 0.0000626 seconds
    2020-11-30T13:44:24.304-0500: 598243.662: Total time for which application threads were stopped: 0.0003928 seconds, Stopping threads took: 0.0001333 seconds
    2020-11-30T13:44:33.311-0500: 598252.668: Total time for which application threads were stopped: 0.0002901 seconds, Stopping threads took: 0.0000646 seconds
    2020-11-30T13:46:33.368-0500: 598372.726: Total time for which application threads were stopped: 0.0004696 seconds, Stopping threads took: 0.0001045 seconds
    2020-11-30T13:46:53.270-0500: 598392.627: [GC (Allocation Failure) 2020-11-30T13:46:53.270-0500: 598392.627: [ParNew
    Desired survivor size 17891328 bytes, new threshold 6 (max 6)
  • age 1: 277248 bytes, 277248 total
  • age 2: 81480 bytes, 358728 total
    : 280095K->436K(314560K), 0.0114550 secs] 709375K->429715K(1013632K), 0.0116189 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]

That just looks like a GC issue, if the JVM isn’t stopping/restarting then I don’t think it’s related.

Have you tried cycling the active write index? If not give that a try. If that doesn’t work, try disabling global read only.

Assuming ES is running on the Graylog node and only services Graylog:
curl -X PUT “localhost:9200/_all/_settings” -H ‘Content-Type: application/json’ -d’{ “index.blocks.read_only_allow_delete” : null } }’

I’m finally starting to see both in and outgoing data. Right now it is showing 154 in and 8500 out. But its still showing unprocessed messages.

Utilization

4.83%

158,362 unprocessed messages are currently in the journal, in 3 segments.
277 messages have been appended in the last second, 12,910 messages have been read in the last second.

1 Like

What change did you make?

That’s going to be the backlog processing through if you just made a change to allow writes.

I cycled the active write index and then the “index.block.read_only_allow_delete” command you mentioned above. I then restarted graylog-server.

Ok. Are you still seeing that negative number on the journal size? It sounds like backlogged messages are clearing but I’m still a bit confused about that. Did you happen to resize the fs while the Graylog and Elasticsearch services were running?

Here’s my journal

Elasticsearch was running (oops) but I stopped graylog-server