No logs after update to Graylog 4.2.8

I updated to 4.2.8 today and lost all “visible” messages. I can see both with tcp dump and msg/s on the inputs that I receive messages to the server. But nothing is displayed in the GUI in the streams or ‘all messages’ since the update.

Not sure if related but I have been having a very high CPU usage for a while on this vm.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND     
graylog   20   0 9322296   1.3g  40688 S 604.7  11.4 149:56.66 java   

I cannot see anything obvious in /var/log/graylog-server/server.log and time is properly set.

Not sure where to start the troubleshoot here, other than restoring snapshot before the update. :roll_eyes:

Going into system / nodes gives me:

There is 1 active node

d4137266 / m5-logger01.localdomain
System information is currently unavailable.

Clicking that node gives me the banana monkey and this in the log file:

2022-04-28T21:40:51.982+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
2022-04-28T21:40:56.084+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
2022-04-28T21:41:01.081+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
2022-04-28T21:41:06.081+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
2022-04-28T21:41:06.388+02:00 ERROR [AnyExceptionClassMapper] Unhandled exception in REST resource
java.net.SocketTimeoutException: timeout
	at okio.Okio$4.newTimeoutException(Okio.java:232) ~[graylog.jar:?]
	at okio.AsyncTimeout.exit(AsyncTimeout.java:286) ~[graylog.jar:?]
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:241) ~[graylog.jar:?]
	at okio.RealBufferedSource.indexOf(RealBufferedSource.java:358) ~[graylog.jar:?]
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:230) ~[graylog.jar:?]
	at okhttp3.internal.http1.Http1ExchangeCodec.readHeaderLine(Http1ExchangeCodec.java:242) ~[graylog.jar:?]
	at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.java:213) ~[graylog.jar:?]
	at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.java:115) ~[graylog.jar:?]
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:94) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:43) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
	at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:61) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[graylog.jar:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[graylog.jar:?]
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[graylog.jar:?]
	at okhttp3.RealCall.execute(RealCall.java:81) ~[graylog.jar:?]
	at retrofit2.OkHttpCall.execute(OkHttpCall.java:204) ~[graylog.jar:?]
	at org.graylog2.rest.resources.cluster.ClusterSystemResource.jvm(ClusterSystemResource.java:90) ~[graylog.jar:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_312]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_312]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_312]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_312]
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[graylog.jar:?]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[graylog.jar:?]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[graylog.jar:?]
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) ~[graylog.jar:?]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[graylog.jar:?]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[graylog.jar:?]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) ~[graylog.jar:?]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) ~[graylog.jar:?]
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255) [graylog.jar:?]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [graylog.jar:?]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [graylog.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [graylog.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [graylog.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [graylog.jar:?]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [graylog.jar:?]
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234) [graylog.jar:?]
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680) [graylog.jar:?]
	at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:356) [graylog.jar:?]
	at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [graylog.jar:?]
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) [graylog.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_312]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_312]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
Caused by: java.net.SocketException: Socket closed
	at java.net.SocketInputStream.read(SocketInputStream.java:204) ~[?:1.8.0_312]
	at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_312]
	at okio.Okio$2.read(Okio.java:140) ~[graylog.jar:?]
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[graylog.jar:?]
	... 52 more
2022-04-28T21:41:11.834+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
2022-04-28T21:41:16.941+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
2022-04-28T21:41:21.914+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
2022-04-28T21:41:26.938+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
[root@m5-logger01 ~]# 

Hello,

Need to ask a couple questions, I’m not sure about your configuration but this could be a connection issue or even a configuration issue.

Basic troubleshooting:

I found this, It may help

And this

Hello

Thanks! Will try with disabled firewall and I will look at the provided link. Here’s the status of the other asked items.

getenforce
Disabled
systemctl status graylog-server
● graylog-server.service - Graylog server
   Loaded: loaded (/usr/lib/systemd/system/graylog-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-04-28 20:40:19 CEST; 12h ago
     Docs: http://docs.graylog.org/
 Main PID: 43095 (graylog-server)
    Tasks: 270 (limit: 75148)
   Memory: 1.5G
   CGroup: /system.slice/graylog-server.service
           ├─43095 /bin/sh /usr/share/graylog-server/bin/graylog-server
           └─43143 /usr/bin/java -Xms1g -Xmx1g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -XX:>

Apr 28 20:40:19 m5-logger01.localdomain systemd[1]: Started Graylog server.
Apr 28 20:41:04 m5-logger01.localdomain graylog-server[43095]: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
Apr 28 20:41:04 m5-logger01.localdomain graylog-server[43095]: SLF4J: Defaulting to no-operation (NOP) logger implementation
Apr 28 20:41:04 m5-logger01.localdomain graylog-server[43095]: SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.`
systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-04-28 08:34:21 CEST; 24h ago
     Docs: https://www.elastic.co
 Main PID: 1856 (java)
    Tasks: 189 (limit: 75148)
   Memory: 4.9G
   CGroup: /system.slice/elasticsearch.service
           └─1856 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=t>

Apr 28 08:33:49 m5-logger01.localdomain systemd[1]: Starting Elasticsearch...
Apr 28 08:34:21 m5-logger01.localdomain systemd[1]: Started Elasticsearch.
systemctl status mongod
● mongod.service - MongoDB Database Server
   Loaded: loaded (/usr/lib/systemd/system/mongod.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-04-28 08:33:49 CEST; 24h ago
     Docs: https://docs.mongodb.org/manual
 Main PID: 1582 (mongod)
   Memory: 186.1M
   CGroup: /system.slice/mongod.service
           └─1582 /usr/bin/mongod -f /etc/mongod.conf

Apr 28 08:33:48 m5-logger01.localdomain systemd[1]: Starting MongoDB Database Server...
Apr 28 08:33:48 m5-logger01.localdomain mongod[1562]: about to fork child process, waiting until server is ready for connections.
Apr 28 08:33:48 m5-logger01.localdomain mongod[1562]: forked process: 1582
Apr 28 08:33:49 m5-logger01.localdomain mongod[1562]: child process started successfully, parent exiting
Apr 28 08:33:49 m5-logger01.localdomain systemd[1]: Started MongoDB Database Server.

curl -X GET http://localahost:9200/_cluster/health?pretty 
curl: (6) Could not resolve host: localahost
journalctl -xe
Apr 29 08:38:19 m5-logger01.localdomain systemd[83520]: Started Virtual filesystem service.
-- Subject: Unit UNIT has finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit UNIT has finished starting up.
-- 
-- The start-up result is done.
Apr 29 08:38:48 m5-logger01.localdomain su[83652]: (to root) admin on pts/0
Apr 29 08:38:48 m5-logger01.localdomain su[83652]: pam_systemd(su-l:session): Cannot create session: Already running in a session or user slice
Apr 29 08:38:48 m5-logger01.localdomain su[83652]: pam_unix(su-l:session): session opened for user root by admin(uid=1000)
Apr 29 08:40:14 m5-logger01.localdomain systemd[1]: Starting system activity accounting tool...
-- Subject: Unit sysstat-collect.service has begun start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sysstat-collect.service has begun starting up.
Apr 29 08:40:14 m5-logger01.localdomain systemd[1]: sysstat-collect.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- The unit sysstat-collect.service has successfully entered the 'dead' state.
Apr 29 08:40:14 m5-logger01.localdomain systemd[1]: Started system activity accounting tool.
-- Subject: Unit sysstat-collect.service has finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sysstat-collect.service has finished starting up.
-- 
-- The start-up result is done.
Apr 29 08:40:29 m5-logger01.localdomain PackageKit[2938]: search-file transaction /217_adeaaaea from uid 0 finished with success after 105ms
Apr 29 08:41:01 m5-logger01.localdomain pcp-pmie[4092]: Severe demand for real memory 13.1pgsout/s@m5-logger01.localdomain
Apr 29 08:41:14 m5-logger01.localdomain systemd[83520]: Starting Mark boot as successful...
-- Subject: Unit UNIT has begun start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit UNIT has begun starting up.
Apr 29 08:41:14 m5-logger01.localdomain systemd[83520]: Started Mark boot as successful.
-- Subject: Unit UNIT has finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit UNIT has finished starting up.
-- 
-- The start-up result is done.
lines 5490-5536/5536 (END)

There was a typo in localhost above… :astonished:

curl -X GET http://localhost:9200/_cluster/health?pretty 
{
  "cluster_name" : "graylog",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 332,
  "active_shards" : 332,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

lol. And when checking graylog it seems that messages once again is visible. I assume it fixed itself over the night? :roll_eyes: :rofl: So strange.

System / Nodes still doesn’t work though. But not sure it ever did to be honest. Guess I’ll have to investigate that one further.

Thanks for the help

and continuing my monologue. :sweat_smile: Seems more complex than that. Since last post it has stopped to receive messages. I’m fairly certain that the situation is something like this,

When logged into graylog WebUI the CPU of the system goes though the roof. There’s no cpu power left to display the messages.
When logged out the CPU goes down to normal levels and there’s power left to process the messages.

I root this theory in the CPU graph from vcenter.

Not sure how to troubleshoot this though or what could cause it. :roll_eyes:

Hello,

What’s your resources on this node (cpu, memory)?

Reason I ask this because if there’s not enough resources distributed for MongoDb, Elasticsearch and Graylog and your Operating system this could be a problem. Also what is your ingest rate for a day and Per second?

What I have found out on some issues is if the journal has a lot of messages you will not see message until the journal is under control meaning there’s a couple message and not 1,000+.

When you state receive do you mean you just cant see them in the Web UI? or messages are not arriving it to Graylog Server? Have you used tcpdump command to insure messages from remote nodes are arriving?

The status of these services look good, not seeing anything bad.

The links I post above are similar to this issue.

Sorry about the the typo above, its been a long day.

At the moment this is the config. Bumped it up to 12 from 8 to see if that would make any difference, it didn’t.

1-1.5GB per day. Not sure where I can se a grand total of msg/sec? :thinking: I can see per input but where do I see a total? I assume that would be in the nodes page but that doesn’t really work here. :sleepy:

And i have the counter in the upper right but atm it’s not really showing much activity, but java CPU usage is at 600%.

When you state receive do you mean you just cant see them in the Web UI? or messages are not arriving it to Graylog Server? Have you used tcpdump command to insure messages from remote nodes are arriving?

I do see them with tcpdump, and I do, normally, see on the input config page that messages are received for the different inputs. But just now they all say ’ No metrics available for this input’. I do not see the messages in the streams nor in ‘all messages’.

So not really sure what the java process that’s ran by the graylog user is doing. =/

Remember now that I googled this previously and that I straced that java process. Not sure if helpful?

strace -p 43143
strace: Process 43143 attached
futex(0x7f50aa7309d0, FUTEX_WAIT, 43144, NULL

Hello

Thanks for the add details. I see you have enough resources for the amount of logs your trying to ingest.

Next, tail the log files and what do you see?

NOTE: Adjust this commands to your environment.

[root@graylog ]# tail -f /var/log/graylog-server/server.log

[root@graylog ]# tail -f /var/log/elasticsearch/graylog.log

Try using HTOP if you can, its easier to read.

Next, since you have this shown.

System Information is currently unavailable

Can you show you configuration files for Elasticsearch and Graylog?

Last question for now,

How did this go?

Hi and thanks for helping out! :pray: :+1:

I did find this in server.log yesterday and scratched on the surface of it:

022-04-30T11:38:24.638+02:00 WARN  [ProxiedResource] Unable to call http://192.168.44.92:9000/api/system on node <d4137266-8358-41f5-9f7e-901f208850a1>: timeout
curl -i http://192.168.44.92:9000/api/system
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="Graylog Server"
X-Graylog-Node-ID: d4137266-8358-41f5-9f7e-901f208850a1
X-Runtime-Microseconds: 350
Content-Length: 0

Nothing superobvious in elastic logs. These are the last few lines.

[2022-04-30T04:09:37,284][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [vmware-all-msg_124/j1RNL7Z7STS9_32l_gk49g] update_mapping [_doc]
[2022-04-30T04:09:39,291][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [sec-winbeats-short_101/HGkC4WqgTnukygoaWYsi_g] update_mapping [_doc]
[2022-04-30T04:09:39,401][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [sec-winbeats-short_101/HGkC4WqgTnukygoaWYsi_g] update_mapping [_doc]
[2022-04-30T04:09:47,283][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [vmware-all-msg_124/j1RNL7Z7STS9_32l_gk49g] update_mapping [_doc]
[2022-04-30T04:09:47,285][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [vmware-all-msg_124/j1RNL7Z7STS9_32l_gk49g] update_mapping [_doc]
[2022-04-30T04:14:53,277][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [vmware-all-msg_124/j1RNL7Z7STS9_32l_gk49g] update_mapping [_doc]
[2022-04-30T08:25:35,280][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [sec-winbeats-short_101/HGkC4WqgTnukygoaWYsi_g] update_mapping [_doc]
[2022-04-30T10:35:01,290][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [sec-winbeats-short_101/HGkC4WqgTnukygoaWYsi_g] update_mapping [_doc]
[2022-04-30T10:35:01,310][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [sec-winbeats-short_101/HGkC4WqgTnukygoaWYsi_g] update_mapping [_doc]
[2022-04-30T10:49:55,282][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [vmware-all-msg_124/j1RNL7Z7STS9_32l_gk49g] update_mapping [_doc]
[2022-04-30T10:50:09,280][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [vmware-all-msg_124/j1RNL7Z7STS9_32l_gk49g] update_mapping [_doc]
[2022-04-30T10:50:09,308][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [vmware-all-msg_124/j1RNL7Z7STS9_32l_gk49g] update_mapping [_doc]
[2022-04-30T10:50:50,283][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [sec-winbeats-short_101/HGkC4WqgTnukygoaWYsi_g] update_mapping [_doc]
[2022-04-30T10:53:54,292][INFO ][o.e.c.m.MetadataMappingService] [m5-logger01.localdomain] [sec-winbeats-short_101/HGkC4WqgTnukygoaWYsi_g] update_mapping [_doc]

Graylog conf. Passwords and such removed.

grep -v ^\# /etc/graylog/server/server.conf | grep  .
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = xxx
root_password_sha2 = xxx
root_timezone = Europe/Stockholm
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 192.168.44.92:9000
elasticsearch_hosts = http://127.0.0.1:9200
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://127.0.0.1/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
transport_email_enabled = true
transport_email_hostname = 192.168.xx.xx
transport_email_port = 25
transport_email_use_auth = false
transport_email_subject_prefix = [graylog]
transport_email_from_email =xxx@xxx.xx
proxied_requests_thread_pool_size = 32

Elastic conf:

grep -v ^\# /etc/elasticsearch/elasticsearch.yml | grep  .
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
cluster.name: graylog
action.auto_create_index: false

No difference when disabling firewall.

Htop screenshot. Not sure it would paste properly as text.

Thanks again @gsmith :pray:

1 Like

when pulling this info java was ‘only’ sitting at 200%. The time of that CPU ramp down seems to coincide with last timestamp of message that I can get graylog to display in webui. BUT, i do see that the counter in top right is displaying incoming messages. WebUI is also sluperslow atm. And the last entry in the logfile in previous message is also at the time of the ramp down of CPU in this graph. :thinking: Nothing obvious in /var/log/messages at that time.

strace of graylog pid at current state.

strace -p 132944
strace: Process 132944 attached
futex(0x7fd89ff4a9d0, FUTEX_WAIT, 132945, NULL

Hello,

If the firewall & Selinux is disabled and the permission on Graylog Directory is correct this could be a DNS issue. Depending on this environment check the /etc/hosts & hostname is set.

Here is other members with the same Warning.

The Curl command is not correct. You set the Elasticsearch.yml file with 127.0.0.1
It should have been.

curl -i http://127.0.0.1:9200/api/system

You should of see something like this

HTTP/1.1 200 OK
X-Graylog-Node-ID: ac7773b1-403d-4d3d-acc7-98a779140854
X-Runtime-Microseconds: 8838
Content-Type: application/json

Here are some examples you could try to resolve the Warning issue.

[root@graylog ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.44.92graylog.domain.com
[root@graylog ~]# cat /etc/hostname
graylog.domain.com
[root@graylog ~]#
cluster.name: graylog
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 127.0.0.1 OR 0.0.0.0 OR  192.168.44.92
http.port: 9200
action.auto_create_index: false
discovery.type: single-node

Side Note: If you use the Host IP Address make sure you adjust graylog.conf.

Thanks @gsmith :pray:

The Curl command is not correct. You set the Elasticsearch.yml file with 127.0.0.1
It should have been.

curl -i http://127.0.0.1:9000/api/system

Looking back at my config I actually didn’t have that set, rather minimal config here.

grep -v ^\# /etc/elasticsearch/elasticsearch.yml | grep  .
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
cluster.name: graylog
action.auto_create_index: false

I added ‘network.host: 127.0.0.1’ to it, I modified /etc/hosts and verified that hostname is proper. Both before and after this I get a connection refused on 127.0.0.1:9000.

[root@m5-logger01 ~]# curl -i http://192.168.44.92:9000/api/system
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="Graylog Server"
X-Graylog-Node-ID: d4137266-8358-41f5-9f7e-901f208850a1
X-Runtime-Microseconds: 713
Content-Length: 0

[root@m5-logger01 ~]# curl -i http://127.0.0.1:9000/api/system
curl: (7) Failed to connect to 127.0.0.1 port 9000: Connection refused

[root@m5-logger01 ~]# sudo netstat -nlp | grep :9000
tcp6       0      0 192.168.44.92:9000      :::*                    LISTEN      2498/java   

Probably makes sense since I have this in graylog server.conf.

http_bind_address = 192.168.44.92:9000

Changed that to 127.0.0.1:9000 and restarted server:

[root@m5-logger01 ~]# sudo netstat -nlp | grep :9000
tcp6       0      0 127.0.0.1:9000          :::*                    LISTEN      6432/java         
  
[root@m5-logger01 ~]# curl -i http://127.0.0.1:9000/api/system
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="Graylog Server"
X-Graylog-Node-ID: d4137266-8358-41f5-9f7e-901f208850a1
X-Runtime-Microseconds: 20267
Content-Length: 0

After changing back graylog server.conf to

http_bind_address = 192.168.44.92:9000

And adding this to elasticsearch.yml

network.host: 192.168.44.92

I can see the nodes info in the webUI, so making progress here.

Still cannot connect to it with curl though, maybe since we use AD for auth?

curl -i http://192.168.44.92:9000/api/system
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="Graylog Server"
X-Graylog-Node-ID: d4137266-8358-41f5-9f7e-901f208850a1
X-Runtime-Microseconds: 600
Content-Length: 0

Java is still rocking 500% CPU though so no progress there. :hot_face: :hot_face:

strace -p 6879
strace: Process 6879 attached
futex(0x7f97841709d0, FUTEX_WAIT, 6880, NULL

Since I can see some metrics now, maybe there’s hints to what’s going on there? But this being my first graylog setup I have no idea what’s normal and how it should look. :thinking:

Hello,

I’m not sure if you know this but , when setting elasticsearch.yml file you need to use that IP ADDRESS to use curl. By Default Elasticsearch used 127.0.0.1
So if you execute curl command you need to use 127.0.0.1 or if you configure it with host IP ADDRESS then you need to use that.

Sum it up.
What I would expect to see by using default configurations.

Graylog Config ( This connects to Elasticsearch)

elasticsearch_hosts = http://127.0.0.1:9200

Elasticsearch Config ( Elasticsearch IP address used for curl, etc…)

network.host: 127.0.0.1
http.port: 9200

Curl Command (using the above configuration)

curl -i http://127.0.0.1:9200/api/system

You should not mix match addresses this could cause problems.
Myself I use static IP Address and configure my configuration files as such.

I can see you have a few message in the journal, This is normal for a node that has been rebooted or service was stopped.

Since you probably been working on Graylog service this nodes need to catch up on all the messages in the journal. Be patient and let Graylog do its thing. :+1:

Hello,

You used the wrong port :worried: it should have been 9200. Slow down man and take your time.

When you do a curl command use your settings in Elasticsearch.yml file

[root@graylog ~]# cat /etc/elasticsearch/elasticsearch.yml | egrep -v "^\s*(#|$)"
cluster.name: graylog
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 127.0.0.1          <--- The address needed for curl
http.port: 9200                  <--- The port needed for curl
action.auto_create_index: false
discovery.type: single-node
bootstrap.memory_lock: true

That’s why this also has to match.

elasticsearch_hosts = http://127.0.0.1:9200 <---  ip address and port  needs to match