Slow down the Graylog system and not all inputs are process


(Tharaka) #1

Due to performance and JVM issue, upgraded to Graylog 2.2.2 from 2.0.x. After upgraded Graylog worked 5 days perfectly and smoothly. suddenly system becomes slow down.

Now the WEB GUI, takes considerable amount of time to load each pages and not all inputs logs are processing. In source area it is showing only 3 input sources logs were processed. But earlier it shows all input sources. (See below log history)

Some times in input segment it is showing inputs are not running, after 5-10 seconds it’ll automatically running. (see the below images as an example for single input)

The system is having 32GB of RAM.

Although I have increased the message_journal_max_size to 5gb

But I couldn’t rectify the issue.

As per my understanding and I guess there is one point that Graylog unable to handle due to it’s memory issue.

Can anyone face these kind of issues ?. I need to resolve this issue asap. Please share me the solution if any one has.


(Tharaka) #2

Refer the server log. It shows that input nodes timed outs

Note : I can’t go for fresh installation or restore the snap since existing Graylog server has valid date.


(Jan Doberstein) #3

Hej Tharaka,

the given information are little short as you did not include some kind of information what you had tried to resolve the issue. Additional without any log files we can only guess what might be the problem in your situation.

What did you find in the log files at the time where the processing drops? What actions did you perform latest on the system before the drop? Did you check if some other changes (network, routing, firewall) might have impact on the number of sources?

regards
Jan


(Tharaka) #4

Hey Jan.

This system is worked perfectly since August 2016. In January 2017 the system performance getting dropped. In the /var/log/graylog-server/server.log it always shows that OpenJDK JVM issue. So that I have upgraded into latest Graylog version on last Friday 10th March 2017. After the upgrade system worked perfectly and very smoothly. During that time no error observed. But Since day before yesterday the system got slowdown and the inputs are not working properly. All the input going down and going up time to time.

For further clarification, what kind of a log source are you required.

Any how I need to sort out this issue ASAP. Please help me to fix it.


(Jan Doberstein) #5

Hej,

the logfile from that day when slowdown was noticed will be of interest. But what you described sound that you are at the limits of your hardware.

Maybe you might want to look at this blog post about tuning and how to monitor graylog.

regards
Jan


(Tharaka) #6

Hi Jay,

It can’t be with Hardware. Because in my system it is used 32 VCPUs and 32GB of RAM. 1 TB of HDD and only 270GBs are used. So I don’t think the issue with H/W. This might be with Graylog inputs.

Please let me know how to attached the log file here.


(Jan Doberstein) #7

you can just use the upload function of the editor or you put the log elsewhere in the net and place a link.


(Tharaka) #8

Hi Jay.

I have uploaded the file to a 3rd party server and the link is shared here with decryption key.
This is server.log file (/var/log/graylog-server/server.log)

https://mega.nz/#!yZRH2bLI

Deykryption Key “!P2_eXjbuOYEtkqOT4kRobfWhOO92ZnP24Bqny3gklWs”


(Jochen) #9

How did you install Graylog?
How did you configure Graylog and its JVM settings?
What’s running on the system, just Graylog or also Elasticsearch etc.?


(Tharaka) #10

@jochen

This VM is used only for Graylog Server with Elasticsearch (Log analyzer)

OS : CentOS 6 (2.6.32-642.el6.x86_64)
Packages:
Elasticsearch 2.3.3
Graylog 2.2.2
MongoDB 3.2.7-1.el6.x86_64
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)

Packages installed using RPM

System is a VM and H/W specifications are
32 vCPUs
32 GB of RAM
1 TB of HDD

free -h
total used free shared buffers cached
Mem: 31G 15G 15G 204K 197M 7.4G
-/+ buffers/cache: 8.0G 23G
Swap: 8.0G 0B 8.0G

Apart from that Kibana and logstash installed in this system


(Jochen) #11

You can clearly see that your system is using 100% swap which is pretty much the worst case.

How did you configure all relevant components on that machine?
What are your JVM settings for Graylog and Elasticsearch?


(Tharaka) #12

@jochen

This Graylog system worked perfectly since August 2016 to January 2017 with Graylog 2.0.1 version and after some memory issue upgraded to version to 2.2.2 to 10th March 2017. After that system again worked perfectly till 14th March 2017 and getting system slowdown.


(Tharaka) #13

@jochen
Swap: 8.0G 0B 8.0G

As you can see, free space is 8.GB for swap. 0B used and 8GB are free

All necessary settings were configured from the beginning and issue raised recently.


(Jochen) #14

True, I misread the output of free.

That’s no answer to my question.


(Tharaka) #15

@jochen

Fine. I’ll glad to give you the details. Let me know what are the exact information that you are required ?


(Jochen) #16

Configuration of Graylog and Elasticsearch, including their JVM settings.


(Tharaka) #17

Refer the attachment for following files.

/etc/sysconfig/graylog-server
/etc/sysconfig/elasticsearch
/etc/graylog/server/server.conf
/etc/elasticsearch/elasticsearch.yml

download link https://mega.nz/#!2ZBkVLyZ

Decryption key ‘’!-3-grrzWvcYyUux2HX7CLi_vJW1Rk9we5DOjIg6k_0o"


(Jochen) #18

The ES_HEAP_SIZE setting is very low with 2 gigabytes. Try using at least 8 gigabytes for Elasticsearch.

Also make sure to check the logs of your Elasticsearch node(s).


(Tharaka) #19

I have increased the ES_HEAP_SIZE in to 8g and monitored for an hour.

Nothing happened and issue is still persist


(Jochen) #20

Please post the logs of all relevant components (Graylog, Elasticsearch, etc.).