Graylog Big Problem

Hello,

Elasticsearch is ingest logs, you will not see logs until its done, If the journal does not go down. I would look at your Elasticsearch/Graylog log files.

Ensure elasticsearch is functioning correctly. Depending on the amount of log ingesting you may need to in crease the buffer thread count, but if you do not have enough resources i.e., CPU , Memory, I would not increase those configurations until you have an adequate amount of resource .
I believe by default it set for processor_buffer =5 and output_buffer=3. You should have at least 8 CPUs the Graylog server.

Dear @gsmith ,
Thank you for your more information.
Recently, i have stopped sending logs to the Graylog server, but the process buffer output below still on status 100% as screenshot below, could you please help advise more? Thanks.

Hello,

Looks like you have three alerts on top of the picture. What do that show?
First I would take a look at elasticsearch status or perhaps curl commands to check the health of ES and post the results. If you do don’t forget to remove personal information.

Example:

systemctl status elasticsearch

curl -XGET http://es_node:9200/_cluster/health?pretty

Post an update on your Elasticsearch/Graylog configuration file and by chance did you tail -f the graylog log file when this was happing? if so what did you see? That picture could be a multiple reasons for this issue. Normally when I see buffers fill up its Elasticsearch , resource, configuration and/or connection issues.
I need to see if your ES status is good. Not only running in green but making sure its not stuck in read mode. That could cause the journal, buffers to fill up quick.

Dear @gsmith,
Thank you for your more information. Please find more information below:
-The three alerts here:

  • $ systemctl status elasticsearch
    ● elasticsearch.service - Elasticsearch
    Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
    Active: failed (Result: exit-code) since Wed 2022-11-16 18:46:20 +07; 1 weeks 0 days ago
    Docs: https://www.elastic.co
    Process: 31885 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=128)
    Main PID: 31885 (code=exited, status=128)

  • $ curl -XGET http://es_node:9200/_cluster/health?pretty
    curl: (6) Could not resolve host: es_node; Unknown error

  • $ tail -f /var/log/graylog-server/server.log
    … 11 more
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[?:?]
    … 1 more
    2022-11-24T09:17:00.848+07:00 ERROR [VersionProbe] Unable to retrieve version from Elasticsearch node: Failed to connect to /127.0.0.1:9200. - Connection refused (Connection refused).
    2022-11-24T09:17:00.920+07:00 ERROR [IndexFieldTypePollerPeriodical] Couldn’t update field types for index set <Default index set/62f4724c6e7df24565785364>
    org.graylog.shaded.elasticsearch7.org.elasticsearch.ElasticsearchException: An error occurred:
    at org.graylog.storage.elasticsearch7.ElasticsearchClient.exceptionFrom(ElasticsearchClient.java:140) ~[?:?]
    at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:100) ~[?:?]
    at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:93) ~[?:?]
    at org.graylog.storage.elasticsearch7.IndicesAdapterES7.resolveAlias(IndicesAdapterES7.java:139) ~[?:?]
    at org.graylog2.indexer.indices.Indices.aliasTarget(Indices.java:145) ~[graylog.jar:?]
    at org.graylog2.indexer.MongoIndexSet.getActiveWriteIndex(MongoIndexSet.java:202) ~[graylog.jar:?]
    at org.graylog2.indexer.fieldtypes.IndexFieldTypePollerPeriodical.lambda$schedule$4(IndexFieldTypePollerPeriodical.java:249) ~[graylog.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]
    Caused by: java.net.ConnectException: Connection refused
    at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:849) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestClient.performRequest(RestClient.java:259) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestClient.performRequest(RestClient.java:246) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553) ~[?:?]
    at org.graylog.shaded.elasticsearch7.org.elasticsearch.client.IndicesClient.getAlias(IndicesClient.java:1315) ~[?:?]
    at org.graylog.storage.elasticsearch7.IndicesAdapterES7.lambda$resolveAlias$2(IndicesAdapterES7.java:139) ~[?:?]
    at org.graylog.storage.elasticsearch7.ElasticsearchClient.execute(ElasticsearchClient.java:98) ~[?:?]
    … 11 more


Please kindly check and advise more…
Thanks,
Best Regards

Hey,

Man you really need to check your alerts those were from two months ago. This tells me you have a bigger problem then expected.

From the logs it looks like Elasticsearch crashed because of the warning two months ago. Basically when disk get full over 95% I believe Elasticsearch goes into read mode, it a safety precaution.
I’ll be honest, you have a lot of work to bring this back to life.

If this happened on my server I would shut Graylog service down.
Take ES out of read mode. This link should give you a better idea what needed.

The increase the volume to something larger if possible or this may happen again.

Once that is completed calculate the sizes of the indices and make sure it does not exceed the volume it resides on.

I would not start Graylog service until Elasticsearch status is good. Try to use cURL command to make sure you indices are good, no errors are found, system status is good, etc…

And if you did get this going, please pay attention to the errors shown, you could have avoided this two months ago.

Notice it states Could not resolve host: es_node; Unless you named your Elasticsearch node “es_node” this will not work. it should be IP Address , FQDN, or localhost.

I think you need to read this documentation.

1 Like