How to test the throughput

(Jolla) #1

Hi,

I have set up a graylog in kubernetes, and I want to test what is the throughput, is there some tool or way to do the test?

(Jan Doberstein) #2

to many variable with no information.

throughput is much more than just running one benchmark as you can’t measure like http-perf. It depend so much on the normalization and protocol that you have running that this is not easy to answer.

You need to know what you like to ingest, how you normalize and with that information you would be able to test the real throughput when you ingest that specific messages having all processing in place. But vanilla system does not mean anything.

(Jolla) #3

Hi Jan,

Thanks for your reply, I deploy graylog, elasticsearch, mongodb in kubernetes.
The logs are send to graylog server by fluentd through gef tcp protocal without TLS.https://github.com/roffe/kube-gelf

The resource limit for graylog is single pod 4C,8G, for elasticsearch cluster there are 3 master nodes(1C, 2G) and 2 data nodes(2C, 8G), for mongodb is 1C, 2G,

the configuration for graylog almost is using the default settings, I put it at here https://pastebin.com/fUxzAUNq
some configurations are overridden by the environment variable

GRAYLOG_SERVER_JAVA_OPTS=-Xms2048m -Xmx2048m -XX:NewRatio=1 -XX:MaxMetaspaceSize=256m -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow

In order to distinguish different namespace’s log, I create about 15 streams, the stream ruls type is “match regular expression”, such as “projectname-servicename-pod*”, for message, I don’t create

When message input rate reach about 5000msg/s, graylog will show an alert " process buffer full ", I need to stop some strems to solve this problem.
So I want to find a way to measure what the throughput is based on current configuration and also find how to optimise the performance.

Thanks

(Jan Doberstein) #4

you should capture the metrics somewhere to see what is going on and where the problem might be.

I can see from your configuration file that you use something 2.X of Graylog and you have the output_batch_size on default, same with the processors.

But again - without the metrics it is nearly impossible to tell where the problem is. If disable the streams helps with that you might want to rework that regular expressions.

(Jolla) #5

may I know what is the metrics mean, is it cpu usage or memory usage?

(Jan Doberstein) #6

By Metrics I mean the internal Graylog Metrics that are for example pushable via https://github.com/graylog-labs/graylog-plugin-metrics-reporter

or scrapped by Telegraf like in this example: https://grafana.com/dashboards/2549

(Jolla) #7

ok, got it , I’ll collect these metrics.