Graylog outgoing traffic size and indice sizes different

Hello,

We’re currently using graylog 2.4.6 on 2 nodes with Elasticsearch 5.6.13 on 8 nodes and mongodb 2.6.12. As graylog document says outgoing traffic size is everything sent from graylog to elasticsearch, when I look outgoing traffic it’s around daily 2 TB. And graylog do pricing according to this size.

We’re preferring daily rotation on graylog and when I look indices sizes on graylog, sum of all sizes on that day is around 1.0 TB. What could be the cause of this difference (50%). As far as I know elasticsearch has a compression algorithm but there isn’t any compression setting at our elasticsearch cluster. If you want I can post our settings.

How could I know exact size of our logs and what is causing difference between graylog outgoing traffic size and indices sizes. I have to give report to my managers about pricing calculation of graylog.

Thanks for your support.
Regards

he @alpykrbl

first you run a really old version of Graylog and you should update at least to a supported (what is current 3.0 and 3.1) what would include the update of MongoDB and you should update Elasticsearch to 6.8 too.

BUT now to answer the question, Graylog counts the data that is send to elasticsearch. If you store the data in two indices that is calculated and used two times. One time for every index.

Depending on the data you ingest the size is calculated like written in the documentation: https://docs.graylog.org/en/3.1/pages/enterprise/setup.html#details-on-licensed-traffic

As Elasticsearch does not store data in plain text files you always have some overhead. You should also check your log files - maybe you have ingest errors and store only half the messages. Without more details that is hard to tell.

Hi Jan,

Thanks for your quick response. Yes we’re planning an upgrade on Mongodb, Graylog and Elasticsearch one by one. Thanks for info.

I can say that we’re not storing data in two indices. Every log comes according to rule of that stream and goes to that stream indices. I don’t know if replicas are counting? We are using replication factor 1 but as far as I know this is handled by Elasticsearch.

When I check logs, there’s so much warnings saying that messages have invalid timestamps. We know about these failures, another team is trying to fix them but they’re not so critical for us. But can this messages have impact on Graylog output size? I always thought graylog is dropping these invalid messages before sending them to Elasticsearch.

[WARN ] 2020-01-23 19:54:10.005 [processbufferprocessor-4] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:09.9971691+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.020 [processbufferprocessor-3] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 1579798450.01 (type: STRING)
[WARN ] 2020-01-23 19:54:10.043 [processbufferprocessor-8] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.018374+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.066 [processbufferprocessor-8] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.0036454+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.105 [processbufferprocessor-8] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.0806027+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.129 [processbufferprocessor-4] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.0895905+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.207 [processbufferprocessor-1] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.1566736+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.292 [processbufferprocessor-6] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.2387795+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.331 [processbufferprocessor-8] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.271887+03:00 (type: STRING)
[WARN ] 2020-01-23 19:54:10.332 [processbufferprocessor-3] graylog2.inputs.codecs.GelfCodec - GELF message (received from IP:PORT) has invalid “timestamp”: 2020-01-23T19:54:10.271887+03:00 (type: STRING)

And do you know how can I check if Elasticsearch has compression enabled or not. We didn’t change any compression settings at Elasticsearch. Graylog is creating indices according to our parameters and on Graylog there’s no parameter about compression.

Thanks

he @alpykrbl

without the ability to look into you environment - it is just a guessing.

Your Graylog takes all the incoming messages, process them and count them for storing in elasticseearch - means they are accounted. But the last step in the chain the store in Elasticsearch is not successful because of the wrong data type and so the message is not stored but already counted.

Look for bulk ingest errors in Elasticsearch logs to verify that it is like this.

Jan

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.