How to aggregate bar chart by bytes?

QueenOfCode · June 17, 2022, 2:42pm

Hi,

I have Graylog 4.0 and would like to know how large my logs are (probably on a day-to-day basis). I was trying to create a bar chart and aggregate by number of bytes (since that’s the only “log size” metric I can see), but I don’t see an option for “bytes”. There is also no field for receivedBytes.

Basically I would just like to see the daily average size of the logs, so if a bar chart of that is not possible, is there any other way to see this?

Thanks.

gsmith · June 17, 2022, 11:01pm

Hello @QueenOfCode

Good question, I haven’t trend data on the size of logs per say. Its normally done by volume /time. But what I did find was metrics this is under System/Node → metrics.

Not sure if that will help ya.

What I have done was enabled Prometheus in Graylog config file , Install Grafana and created a dashboard for these metrics .

See more here.

Metrics

This was all done on my graylog node. I’m not that great with it yet, so I’m labbing it out.

QueenOfCode · June 20, 2022, 12:32pm

Ok thank you, I will try looking at this.

QueenOfCode · June 20, 2022, 1:27pm

I looked at some example messages again and saw that while there was a section in the original message like “bytesreceived=100” it did not get extracted because of some strange formatting. I added a small extractor using regex to parse out that number, so now I can visualize that one input using a widget with a daily bar chart of count(bytesreceived).

Now I just need to figure out how to do it for both my inputs.

UPDATE: I found that “gl_accounted_message_size” tells you the message size, and I used avg( gl_accounted_message_size) to create a bar chart (with data from the last 30 days). I was a little confused but found out that that actually calculates the average size of one log message for that day. I was able to use sum(gl_accounted_message_size) to find more reasonable numbers (which turn out to be the same numbers under System/Overview → Outgoing Traffic.

Under System/Overview → Outgoing Traffic, it says “Last 30 days: 99.4 GiB”. Does that tell me how many bytes/GB the logs take up each day or is that something different? But curl -XGET 'localhost:9200/_cat/allocation?v&pretty' returns 34.7GB under disk.used, so I am a little confused. Additionally, when I go to System/Nodes → Metrics, the total bytes read for both my inputs add up to <1GB. Which one can actually tell me how much space is used up by all of my logs each day?

ihe · June 22, 2022, 12:41pm

Hi @QueenOfCode
I think there are maybe two topics here: measuring the size of the logs and measuring sizes reported in the logs by the application.

Measuring the size of the logs:
Each message has a few fields which are always set, but not visible in the first place. The field “gl_accounted_message_size” is one of those. It counts the bytes of “real log data” thrown into elastic by this message. There are other fields as gl2_source_input (the id of the input) or gl2_source_node (the ID of the node if you run a cluster) which to not counted as “real log data”.
The number of bytes (or better Gigabytes) is relevant for the commercial versions of Graylog. Generally this offers a good rule of thumb how much data you ingest.

The metric “sum” counts those bytes for all matched messages for the given time. If you have a bar chart with time, it will give you an idea how much data is pushed into elastic over time.

Measuring size reported in the logs:
you will need a field of type integer to count the bytes of traffic/whatever in your application. As I understand your post, bytesreceived is the field in question. Here you will find an idea how it looks in my Graylog:

My field is named size_of_request, just put in your “bytesreceived” and you should have good chances to get it working.

QueenOfCode · June 22, 2022, 12:56pm

Hello, thanks for your response.

I actually don’t think I need the size reported in the logs, just how much space they actually take up daily on my VM. That way I can estimate if I need to increase resources for my VM. If I use the command curl -XGET 'localhost:9200/_cat/allocation?v&pretty', that tells me how much space I’m using on my VM, right? If I manually calculate day-to-day how much disk space was used from that command in the CLI, can I find out how much resources I need?

ihe · June 22, 2022, 1:17pm

well, yes and no.

the development over time will give you the answer. If you look every day, and note the usage, you can see the trend. This is some kind of manual monitoring and might be suitable for proof of concepts etc.

The usage in the long run is mostly stable. On what stream are your logs? Which index set is used for this stream? Is this index set configured to rotate the logs by time (P1D for once a day. e. g.) or by number of messages? How often does this rotation happen?
If your rotation is “full” Graylog will delete the oldest logs during the rotation and your disk-usage will stay stable.

QueenOfCode · June 22, 2022, 1:27pm

I use the default index set for all my logs, which is set to rotate by month. The retention setting is set to 3 months.

It seems like manual monitoring is the only way, since the other pieces of information on the Graylog UI itself don’t really tell me the actual disk space used each day.

(Also, silly question: does disk space here refer to the storage/memory on my VM?)

ihe · June 22, 2022, 1:54pm

There is no silly question
The output of curl -XGET 'localhost:9200/_cat/allocation?v&pretty' should give you the same sizing as your OS does.
gl2_source_input will give you a different number. It does not take the extra fields into account, and also does not know if your elastic has any replica shards, which adds disk-space for each replica and so on. From my experience, both are some kind of parallel, with some factor based on configuration.

QueenOfCode · June 22, 2022, 3:08pm

So to recap, the best way is to use curl -XGET 'localhost:9200/_cat/allocation?v&pretty' to estimate how much resources my VM uses up each day?

ihe · June 22, 2022, 3:23pm

yes, it will do. You might play a bit with the parameters:
curl -XGET 'https://localhost:9200/_cat/nodes?v=true&h=id,name,ip,port,version,master,diskTotal,diskUsed,diskUsedPercent&pretty'
will also work with multiple nodes. Choose the one you like more

QueenOfCode · June 22, 2022, 3:29pm

When I throw curl -XGET 'localhost:9200/_cat/allocation?v&pretty' in the CLI, I see that the disk.used has decreased by 0.1GB. If this is the final destination of the logs, why would disk space used decrease?

ihe · June 22, 2022, 3:37pm

I don’t know to be honest. If you count every 0,1GB Logging in scale might be the wrong topic. It could be Logs from the Elastic being rotated and deleted, but it is unlikely as you are rotating only once a month. It could be system-logs being rotated by you linux. It could be some cache for updates which was freed. To many possibilities

QueenOfCode · June 22, 2022, 3:39pm

Ok, thanks for your help!

system · July 6, 2022, 3:39pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Metrics for data ingest and generation per day/month Graylog Central (peer support)	4	5839	November 24, 2017
Aggregate on message size across multiple events Graylog Central (peer support)	6	1472	July 31, 2018
Graylog input statistics Graylog Central (peer support)	2	1029	March 6, 2018
Raw log data consumption Graylog Central (peer support)	5	2013	June 22, 2017
Use of gl2_accounted_message_size for measuring outgoing traffic Graylog Central (peer support) dashboards	9	1703	October 12, 2022

How to aggregate bar chart by bytes?

Related topics