Hi All,
We are running graylog 3.0 in Azure VM and Elasticsearch 6.8.1 in another separate Azure VM. The compute of Graylog & Elasticsearch VM has 4VCPUs, 16GB RAM, 6400 MAX IOPS
We have allocated graylog node heap space 8GB and elasticsearch node 8GB.
Now we started processing IIS logs to our environment via filebeat to beats port (5044) and data is coming to stream and below is the sample IIS message,
2019-07-29 10:02:54 W3SVC1240477980 RD00155DF24AE6 192.168.151.22 POST /Common/GetUnReadEmailTransactionByUser - 443 - 203.90.4.235 HTTP/2.0 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/75.0.3770.142+Safari/537.36 _ga=GA1.2.447190142.1534840655;+intercom-id-w7xf88vf=31273993-925e-416f-928d-cc56564fb546;+intercom-id-k1ocnq4d=03fdbaa2-39d6-4378-b54f-a6f02bec51df;+_fbp=fb.1.1558210096632.942788230;+hubspotutk=a70a5af60db08860f01ca8db086e7f85;+__hstc=47073656.a70a5af60db08860f01ca8db086e7f85.1558210098490.1558210098490.1560748468343.2;+_gid=GA1.2.112440141.1564301802;+loginUrl=mlrmc;+ASP.NET_SessionId=o3ouwe2m5f3o3g4hn5tl4nbn https://graylog.com/ChangeLog/Index graylog.com 200 0 0 688 932 78
Since the message is not JSON strutured so we have written grok pattern and added in graylog,
%{TIMESTAMP_ISO8601:iis.access.time} %{NOTSPACE:iis.access.hostname} %{NOTSPACE:iis.access.server_name} %{IPORHOST:iis.access.server_ip} %{WORD:iis.access.method} %{URIPATH:iis.access.url} %{NOTSPACE:iis.access.query_string} %{NUMBER:iis.access.port} %{NOTSPACE:iis.access.user_name} %{IPORHOST:iis.access.remote_ip} HTTP/%{NUMBER:iis.access.http_version} %{NOTSPACE:iis.access.useragent} %{NOTSPACE:iis.access.cookie} %{NOTSPACE:iis.access.referrer} %{NOTSPACE:iis.access.site_name} %{NUMBER:iis.access.response_code} %{NUMBER:iis.access.sub_status} %{NUMBER:iis.access.win32_status} %{NUMBER:iis.access.body_sent.bytes} %{NUMBER:iis.access.body_received.bytes} %{NUMBER:iis.access.request_time_ms}
In graylog we have created new pipeline and rules of it. we then configured filebeat stream to the pipeline and we started processing message and we got around avg 800 messages per minute to graylog. But within 10 minutes graylog becomes unstable and heap usage reached to maximum limit and all the messages were going to process buffer and no messages were processed to elasticsearch, now all the messages are in graylog journal.
Below is the response in server.log
ubuntu@graylog3:~$ sudo tail -f /var/log/graylog-server/server.log
2019-07-29T06:53:43.696Z INFO [NetworkListener] Started listener bound to [0.0.0.0:9000]
2019-07-29T06:53:43.698Z INFO [HttpServer] [HttpServer] Started.
2019-07-29T06:53:43.698Z INFO [JerseyService] Started REST API at <0.0.0.0:9000>
2019-07-29T06:53:43.698Z INFO [ServiceManagerListener] Services are healthy
2019-07-29T06:53:43.699Z INFO [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized?[LB:DEAD] to Running?[LB:ALIVE]
2019-07-29T06:53:43.699Z INFO [ServerBootstrap] Services started, startup times in ms: {GracefulShutdownService [RUNNING]=25, OutputSetupService [RUNNING]=172, BufferSynchronizerService [RUNNING]=180, KafkaJournal [RUNNING]=183, EtagService [RUNNING]=212, ConfigurationEtagService [RUNNING]=253, JournalReader [RUNNING]=260, LookupTableService [RUNNING]=260, InputSetupService [RUNNING]=263, StreamCacheService [RUNNING]=343, PeriodicalsService [RUNNING]=454, JerseyService [RUNNING]=21428}
2019-07-29T06:53:43.703Z INFO [ServerBootstrap] Graylog server up and running.
2019-07-29T06:53:43.717Z INFO [InputStateListener] Input [Beats/5d3d7fe5376cc71ed63ca6df] is now STARTING
2019-07-29T06:53:43.787Z INFO [InputStateListener] Input [Beats/5d3d7fe5376cc71ed63ca6df] is now RUNNING
2019-07-29T06:53:43.796Z WARN [AbstractTcpTransport] receiveBufferSize (SO_RCVBUF) for input Beats2Input{title=beats, type=org.graylog.plugins.beats.Beats2Input, nodeId=null} (channel [id: 0x0e6006ed, L:/0:0:0:0:0:0:0:0%0:5045]) should be 1048576 but is 425984.
2019-07-29T07:57:01.676Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0x6b291f6d, L:/172.16.2.6:5045 ! R:/52.163.89.113:11400]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T07:57:01.982Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0xaeb83439, L:/172.16.2.6:5045 ! R:/137.116.131.234:62272]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T07:57:02.059Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0xf0db807c, L:/172.16.2.6:5045 ! R:/137.116.131.234:63496]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T07:57:05.168Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0x89bc7c55, L:/172.16.2.6:5045 ! R:/13.76.211.33:21096]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T08:13:40.351Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0xebe6e5ac, L:/172.16.2.6:5045 ! R:/137.116.131.234:62273]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T08:13:44.796Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0x83029cd9, L:/172.16.2.6:5045 ! R:/13.76.211.33:21096]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T08:13:46.403Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0x348fd719, L:/172.16.2.6:5045 ! R:/52.163.89.113:11488]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T08:13:50.773Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0x04c460e4, L:/172.16.2.6:5045 ! R:/52.163.89.113:11384]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T08:13:50.979Z ERROR [AbstractTcpTransport] Error in Input [Beats/5d3d7fe5376cc71ed63ca6df] (channel [id: 0xb5059313, L:/172.16.2.6:5045 ! R:/137.116.131.234:63497]) (cause io.netty.channel.unix.Errors$NativeIoException: syscall:read(..) failed: Connection reset by peer)
2019-07-29T09:33:00.230Z INFO [connection] Opened connection [connectionId{localValue:11, serverValue:134}] to 192.16.2.6:27017
2019-07-29T09:33:00.231Z INFO [connection] Opened connection [connectionId{localValue:10, serverValue:136}] to 192.16.2.6:27017
2019-07-29T09:33:00.231Z INFO [connection] Opened connection [connectionId{localValue:9, serverValue:135}] to 192.16.2.6:27017
2019-07-29T09:33:21.741Z INFO [connection] Opened connection [connectionId{localValue:12, serverValue:137}] to 192.16.2.6:27017
2019-07-29T09:35:34.130Z INFO [connection] Opened connection [connectionId{localValue:13, serverValue:138}] to 192.16.2.6:27017
2019-07-29T09:35:34.137Z INFO [connection] Opened connection [connectionId{localValue:14, serverValue:139}] to 192.16.2.6:27017
2019-07-29T09:37:45.738Z INFO [connection] Opened connection [connectionId{localValue:16, serverValue:140}] to 192.16.2.6:27017
2019-07-29T09:37:45.739Z INFO [connection] Opened connection [connectionId{localValue:15, serverValue:141}] to 192.16.2.6:27017
2019-07-29T09:39:07.683Z INFO [connection] Opened connection [connectionId{localValue:18, serverValue:142}] to 192.16.2.6:27017
2019-07-29T09:39:07.713Z INFO [connection] Opened connection [connectionId{localValue:17, serverValue:143}] to 192.16.2.6:27017
2019-07-29T09:44:18.107Z ERROR [AnyExceptionClassMapper] Unhandled exception in REST resource
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_212]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_212]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_212]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_212]
at okio.Okio$2.read(Okio.java:140) ~[graylog.jar:?]
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[graylog.jar:?]
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:355) ~[graylog.jar:?]
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:227) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[graylog.jar:?]
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:61) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200) ~[graylog.jar:?]
at okhttp3.RealCall.execute(RealCall.java:77) ~[graylog.jar:?]
at retrofit2.OkHttpCall.execute(OkHttpCall.java:180) ~[graylog.jar:?]
at org.graylog2.rest.resources.cluster.ClusterNodeMetricsResource.byNamespace(ClusterNodeMetricsResource.java:122) ~[graylog.jar:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) ~[graylog.jar:?]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) ~[graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [graylog.jar:?]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [graylog.jar:?]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [graylog.jar:?]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:384) [graylog.jar:?]
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:224) [graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) [graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
^[[5~2019-07-29T09:49:12.655Z WARN [ProxiedResource] Unable to call http://graylog.southeastasia.cloudapp.azure.com:9000/api/system on node <7bd8596d-a8ab-406e-abca-0715d20b8f70>
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_212]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_212]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_212]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_212]
at okio.Okio$2.read(Okio.java:140) ~[graylog.jar:?]
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[graylog.jar:?]
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:355) ~[graylog.jar:?]
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:227) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[graylog.jar:?]
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:61) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[graylog.jar:?]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200) ~[graylog.jar:?]
at okhttp3.RealCall.execute(RealCall.java:77) ~[graylog.jar:?]
at retrofit2.OkHttpCall.execute(OkHttpCall.java:180) ~[graylog.jar:?]
at org.graylog2.shared.rest.resources.ProxiedResource.lambda$getForAllNodes$0(ProxiedResource.java:78) ~[graylog.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
2019-07-29T09:51:07.236Z WARN [NodePingThread] Did not find meta info of this node. Re-registering.
Though we have checked in Elastic logs but we couldn’t find any exceptions.
Below is the graylog server.conf,
############################
# GRAYLOG CONFIGURATION FILE
############################
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = SEtby4wg3kBPBwkYgqcxRlfCrRf8NbmWSLFvrXqsJ6SorC1WGwFWehPS2cQ1kjSc6Ea0cbhO080TSW
root_username = admin
root_password_sha2 = 00f86d4e5f5e3265f8546589dbf3a8834dce95e5d54388b5b458cfae
root_timezone = UTC
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 0.0.0.0:9000
http_publish_uri = http://graylog.southeastasia.cloudapp.azure.com:9000/
http_external_uri = http://graylog.southeastasia.cloudapp.azure.com:9000/
http_enable_cors = true
http_enable_gzip = true
http_max_header_size = 8192
http_thread_pool_size = 16
elasticsearch_hosts = http://190.32.3.4:9201
elasticsearch_connect_timeout = 10s
elasticsearch_socket_timeout = 60s
elasticsearch_max_total_connections = 20
elasticsearch_max_total_connections_per_route = 2
elasticsearch_max_retries = 2
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 5000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 8
outputbuffer_processors = 4
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://190.32.3.4:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
proxied_requests_thread_pool_size = 32
Note:-
We have done the same setup in another Azure VM where we have installed Graylog 2.5 and processing the same IIS logs without any issues and we didn’t change any graylog config and we used the default config.
Please share your thoughts and it would be very helpful.
Regards,
Ganeshbabu R