Problem with linux server log forwarding

hello

My linux server send too many log to my graylog server (2M logs in 1minutes) and because of that my graylog server is laggy

i transfer my logs off my linux servers with syslog-ng


Syslog-ng conf
source s_src {
file("/var/log/messages");
file("/var/log/debug");
file("/var/log/error");
};

destination d_net {
tcp(“GraylogIP” port(3516) log_fifo_size(1000));
};

log {
source(s_src);
destination(d_net);
};


this configuration of syslog-ng is the same on both
my input on graylog is syslog tcp : bind_addr 0.0.0.0 port 3516
my real problem is my linux server send too many logs to my graylog server or its normal and then its my graylog configuration
is not enough good to treat all my logs this is only on my linux and not on my windows server to provide my logs to my graylog server
i used nxlog with this configuration:


Panic Soft
#NoFreeOnExit TRUE

define ROOT C:\Program Files\nxlog
define CERTDIR %ROOT%\cert
define CONFDIR %ROOT%\conf\nxlog.d
define LOGDIR %ROOT%\data

define LOGFILE %LOGDIR%\nxlog.log
LogFile %LOGFILE%

Moduledir %ROOT%\modules
CacheDir %ROOT%\data
Pidfile %ROOT%\data\nxlog.pid
SpoolDir %ROOT%\data

Module xm_syslog Module xm_charconv AutodetectCharsets iso8859-2, utf-8, utf-16, utf-32 Module xm_exec Module xm_fileop
# Check the size of our log file hourly, rotate if larger than 5MB
<Schedule>
    Every   1 hour
    Exec    if (file_exists('%LOGFILE%') and \
               (file_size('%LOGFILE%') >= 5M)) \
                file_cycle('%LOGFILE%', 8);
</Schedule>

# Rotate our log file every week on Sunday at midnight
<Schedule>
    When    @weekly
    Exec    if file_exists('%LOGFILE%') file_cycle('%LOGFILE%', 8);
</Schedule>
Module xm_gelf Module im_msvistalog Module im_mseventlog Module om_tcp Host graylogIP Port 3514 OutputType GELF_TCP

<Route 1>
Path win => out


Im on Graylog 4
elasticsearch : “version” : {
“number” : “7.10.2”,
“build_flavor” : “oss”,
“build_type” : “deb”,
“build_hash” : “747e1cc71def077253878a59143c1f785afa92b9”,
“build_date” : “2021-01-13T00:42:12.435326Z”,
“build_snapshot” : false,
“lucene_version” : “8.7.0”,
“minimum_wire_compatibility_version” : “6.8.0”,
“minimum_index_compatibility_version” : “6.0.0-beta1”

mongodb 4.0.28


i have try to limit the incoming flow from my linux server but it didn’t work
i can increase it a little but i dont have enough to responde to your prerequisites did you have some tips or tricks
i can do

Hello && welcome @Mowgly

  • What resource do you have for Graylog server?

  • How did you divide up those resources to Graylog and elasticsearch?

  • Can you show you configuration for Graylog/Elasticsearch?

thank you for your quick response @gsmith
Presently my graylog have 2core and 8go RAM
i can increase a little i think but it’s not sure

I don’t understand your question. My graylog and elasticsearch are running on the same VM

Graylog conf :

.# Path to the java executable.
JAVA=/usr/bin/java

.# Default Java options for heap and garbage collection.
GRAYLOG_SERVER_JAVA_OPTS="-Xms4g -Xmx4g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow"

.# Avoid endless loop with some TLSv1.3 implementations.
GRAYLOG_SERVER_JAVA_OPTS="$GRAYLOG_SERVER_JAVA_OPTS -Djdk.tls.acknowledgeCloseNotify=true"

.# Fix for log4j CVE-2021-44228
GRAYLOG_SERVER_JAVA_OPTS="$GRAYLOG_SERVER_JAVA_OPTS -Dlog4j2.formatMsgNoLookups=true"

.# Pass some extra args to graylog-server. (i.e. “-d” to enable debug mode)
GRAYLOG_SERVER_ARGS=""

.# Program that will be used to wrap the graylog-server command. Useful to
.# support programs like authbind.
GRAYLOG_COMMAND_WRAPPER=""
output_batch_size = 500

.# Flush interval (in seconds) for the Elasticsearch output. This is the maximum amount of time between two
.# batches of messages written to Elasticsearch. It is only effective at all if your minimum number of messages
.# for this time period is less than output_batch_size * outputbuffer_processors.
output_flush_interval = 1

.# As stream outputs are loaded only on demand, an output which is failing to initialize will be tried over and
.# over again. To prevent this, the following configuration options define after how many faults an output will
.# not be tried again for an also configurable amount of seconds.
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30

.# The number of parallel running processors.
.# Raise this number if your buffers are filling up.
processbuffer_processors = 5

outputbuffer_processors = 3

processor_wait_strategy = blocking

.# Size of internal ring buffers. Raise this if raising outputbuffer_processors does not help anymore.
.# For optimum performance your LogMessage objects in the ring buffer should fit in your CPU L3 cache.
.# Must be a power of 2. (512, 1024, 2048, …)
ring_size = 65536
#65536

inputbuffer_ring_size = 65536
inputbuffer_processors = 2

inputbuffer_wait_strategy = blocking

.# Manually stopped inputs are no longer auto-restarted. To re-enable the previous behavior, set auto_restart_inputs to true.
.#auto_restart_inputs = true

.# Enable the message journal.
message_journal_enabled = true

.# The directory which will be used to store the message journal. The directory must be exclusively used by Graylog and
.# must not contain any other files than the ones created by Graylog itself.
.#
.# ATTENTION:
.# If you create a seperate partition for the journal files and use a file system creating directories like ‘lost+found’
.# in the root directory, you need to create a sub directory for your journal.
.# Otherwise Graylog will log an error message that the journal is corrupt and Graylog will not start.
message_journal_dir = /var/lib/graylog-server/journal

and i have modified my IP and the timezone

elsticsearch conf :
I just added these two lines
cluster.name: gryalog
action.auto_create_index: false

Thank you for your quick response @gsmith
Presently my graylog have 2core and 8go RAM
i can increase a little i think but it’s not sure

I don’t understand your question. My graylog and elasticsearch are running on the same VM

Graylog conf :

# Path to the java executable.
JAVA=/usr/bin/java

# Default Java options for heap and garbage collection.
GRAYLOG_SERVER_JAVA_OPTS="-Xms4g -Xmx4g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow"

# Avoid endless loop with some TLSv1.3 implementations.
GRAYLOG_SERVER_JAVA_OPTS="$GRAYLOG_SERVER_JAVA_OPTS -Djdk.tls.acknowledgeCloseNotify=true"

# Fix for log4j CVE-2021-44228
GRAYLOG_SERVER_JAVA_OPTS="$GRAYLOG_SERVER_JAVA_OPTS -Dlog4j2.formatMsgNoLookups=true"

# Pass some extra args to graylog-server. (i.e. "-d" to enable debug mode)
GRAYLOG_SERVER_ARGS=""
# Program that will be used to wrap the graylog-server command. Useful to
# support programs like authbind.
GRAYLOG_COMMAND_WRAPPER=""
output_batch_size = 500
#500

# Flush interval (in seconds) for the Elasticsearch output. This is the maximum amount of time between two
# batches of messages written to Elasticsearch. It is only effective at all if your minimum number of messages
# for this time period is less than output_batch_size * outputbuffer_processors.
output_flush_interval = 1

# As stream outputs are loaded only on demand, an output which is failing to initialize will be tried over and
# over again. To prevent this, the following configuration options define after how many faults an output will
# not be tried again for an also configurable amount of seconds.
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30

# The number of parallel running processors.
# Raise this number if your buffers are filling up.
processbuffer_processors = 5
#5
outputbuffer_processors = 3
#3
processor_wait_strategy = blocking

# Size of internal ring buffers. Raise this if raising outputbuffer_processors does not help anymore.
# For optimum performance your LogMessage objects in the ring buffer should fit in your CPU L3 cache.
# Must be a power of 2. (512, 1024, 2048, ...)
ring_size = 65536
#65536

inputbuffer_ring_size = 65536
inputbuffer_processors = 2
#2
inputbuffer_wait_strategy = blocking

# Manually stopped inputs are no longer auto-restarted. To re-enable the previous behavior, set auto_restart_inputs to true.
#auto_restart_inputs = true

# Enable the message journal.
message_journal_enabled = true

# The directory which will be used to store the message journal. The directory must be exclusively used by Graylog and
# must not contain any other files than the ones created by Graylog itself.
#
# ATTENTION:
#   If you create a seperate partition for the journal files and use a file system creating directories like 'lost+found'
#   in the root directory, you need to create a sub directory for your journal.
#   Otherwise Graylog will log an error message that the journal is corrupt and Graylog will not start.
message_journal_dir = /var/lib/graylog-server/journal

and i have modified my IP and the timezone

elsticsearch conf :
I just added these two lines
cluster.name: gryalog
action.auto_create_index: false

Hello,

Looks like you posted twice and half you configuration file is only showing.

Here is a tip when posting configuration files.
Execute this

cat /etc/graylog/server/server.conf | egrep -v "^\s*(#|$)"

and Copy& Paste.

Example, which is much easier on the eyes to read.

Graylog_config
[root@graylog graylog_user]# cat /etc/graylog/server/server.conf | egrep -v "^\s*(#|$)"
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret =epOqmLi7r7CdZxl76QOQxr8bRUPYstNdcBuajsaSNbn22EBT17elgGTUJgbD
root_password_sha2 =272c3ac6b26a795a4244d8d2caf1d19a072fbc1c88d497ba1df7fef0a4171ea6
root_email = "greg.smith@me.com"
root_timezone = America/Chicago
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = graylog:9000
http_publish_uri = https://graylog:9000/
http_enable_cors = true
http_enable_tls = true
http_tls_cert_file = /etc/pki/tls/certs/graylog/graylog-certificate.pem
http_tls_key_file = /etc/pki/tls/certs/graylog/graylog-key.pem
http_tls_key_password = secret
elasticsearch_hosts = http://8.8.8.8:9200
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = true
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 5000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 8
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
message_journal_max_size = 12gb
lb_recognition_period_seconds = 3

Here are some suggestions from what was posted.

  • So you have 2 million messages coming in a short span of time. You do not have the resource to handle that many all at once. I would suggest increase the CPU by a factor of 4. Mean at least 8 CPU’s

  • Next suggestion I would increase Elasticsearch heap 4 /etc/sysconfig/elasticsearch && decrease Graylog heap to 2 /etc/sysconfig/graylog-server GRAYLOG_SERVER_JAVA_OPTS="-Xms2g -Xmx2g -XX:

I believe your Graylog configuration is default , this means you have the following configuration

processbuffer_processors = 5
outputbuffer_processors = 3
inputbuffer_processors = 2

If so This would require you to have at least 10 CPU cores. So what I believe is that this Graylog server is out of resource when trying to index a large amount of logs/message in a short time. Perhaps look into why you log shippers are send so many messages at once , or control what can be sent to Graylog and try to reduce the amount of logs sent.

@gsmith

thank you for your response i will try what you said.

now i have increase the number of my cpu from 2 to 10 and i have reduce the number of log sending by linux servers.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.