Basically, this solution consists of three parts:
- A script collect_graylog_statistics.sh with a configuration file collect_graylog_statistics.conf
- Four systemd-units (could also done by cron)
- Dashboards in Graylog
The collector and the systemd-units can run on every machine holding a replica of the MongoDB. The script makes use of the tool bc, so that one has to be installed.
The collector-script
A look into the graylog database revealed a field “traffic”, wherein every hour the sum of traffic in Byte is saved. So the general idea is to get this values and add them. Bear in mind that I do not have much knoweldge about MongoDB - so this is a combination of imaptience, google-foo and bash-scripting… I’ll try to explain what I thought, but I can not guarantee its always correct.
The config-file is quite simple and holds all needed to access the database:
GUSER=graylog
GPASSWORD=*************
CAFILE=/etc/pki/trust/anchors/graylogca.pem
KEYFILE=/etc/mongod.crt
DATABASE=graylog
It is used in the following script:
#!/bin/bash
# Gets the graylog outgoing traffic from mongodb and aggregates it for days and month
PATH=$PATH:/usr/bin
CONFIG=collect_graylog_statistics.conf
. $CONFIG
MONGOSH="mongosh -u $GUSER -p $GPASSWORD --tls -tlsCAFile $CAFILE --tlsCertificateKeyFile $KEYFILE --host $(hostname) $DATABASE"
Just intializations upto now. After some considerations, I decided to let the script do the work of reporting the average amount per day within a month also. (This would also have been possible to do in graylog, it just seemed easyer to me.) So if the script is called with --monthly
, it gives out the monthly data, if not, it reports the daily data.
if [ "$1" == "--monthly" ];
then
DATE=$(date -d last-month +'%Y-%m')
STARTDATE="$(date -d last-month +'%Y-%m')-01T00:00:00.000Z"
DAYNUMBER=$(date -d "$(date +'%Y-%m-01') -1 day" +'%d')
ENDDATE="$(date +'%Y-%m-01')T01:00:00.000Z"
Here the first date of the bygone month, the last day and the number of days is gotten. The first hour is 1:00 O Clock, as the field traffic includes all the data from the hour before that. The last date is 0 O Lock of the next day. I also give a date with only the year and the month - this is only to make the dashboards more easy.
AMB=$((a=0;echo "DBQuery.shellBatchSize=100000;db.traffic.find({bucket: {\$gt: ISODate(\"$STARTDATE\"), \$lt: ISODate(\"$ENDDATE\")}},{output: 1});" $MONGOSH|grep Long|cut -d\" -f2|while read line; do a=$[$a+$line]; echo $a; done;)|tail -n1)
This consist of multiple parts:
echo "DBQuery.shellBatchSize=100000;db.traffic.find({bucket: {\$gt: ISODate(\"$STARTDATE\"), \$lt: ISODate(\"$ENDDATE\")}},{output: 1});" $MONGOSH
This query gets all traffic-fields between the start and the end date from the MongoDB. The DBQuery.shellBatchSize=100000
is needed, as MongoDB would only return a certain number of entries (I think it is 24 or something) else. The Output is piped into a while loop:
|grep Long|cut -d\" -f2|while read line; do a=$[$a+$line]; echo $a; done;
This just greps all needed fields and filters the data out and adds them to the sum of all already processed lines. The last output goes into AMB
:
AMB=$((....)|tail -n1)
As this is a database, I am quite sure there must be a much better way to do this.
AMGB=$(echo "scale=2;$AMB/1073741824"|bc|sed 's/^\./0./')
AAMB=$(echo "scale=2;$AMB/$DAYNUMBER"|bc|sed 's/^\./0./')
AAMBG=$(echo "scale=2;$AAMB/1073741824"|bc|sed 's/^\./0./')
The total sum is divided, so in AMGB
is the amount in Gigabyte. Also, it is devided by the number of days, and finally, the amount per days is devided to represent Gigabyte.
logger -t graylog_monthly_traffic "statistic_month=$DATE message_amount_byte=$AMB message_amount_gbyte=$AMGB message_amount_daily_average_byte=$AAMB message_amount_daily_average_gbyte=$AAMBG"
All data is written to the journal, using the syslog_identfier graylog_monthly and pairs. So in graylog, it can easily be extracted to fields.
To get the daily usage, it is much more easy; a start and an end time are calculated, as a date the year, month and day are given,the rest is very similar:
else
DATE=$(date -d yesterday -I)
STARTDATE="$(date -d yesterday -I)T00:00:00.000Z"
ENDDATE="$(date -I)T01:00:00.000Z"
AMB=$((a=0;echo "DBQuery.shellBatchSize=100000;db.traffic.find({bucket: {\$gt: ISODate(\"$STARTDATE\"), \$lt: ISODate(\"$ENDDATE\")}},{output: 1});" \
|$MONGOSH|grep Long|cut -d\" -f2|while read line; do a=$[$a+$line]; echo $a; done;)|tail -n1)
AMGB=$(echo "scale=2;$AMB/1073741824"|bc|sed 's/^\./0./')
logger -t graylog_daily_traffic "statistic_day=$DATE message_amount_byte=$AMB message_amount_gbyte=$AMGB"
fi
exit 0
The systemd units
I want to run this script every day shortly after 1 O’clock, and with the --monthly on every first day of a month shortly after that. So the systemd.timers look like this:
graylog_statistic.timer:
[Unit]
Description=regulary logging status to journal
#Requires=network.service
#After=network.service
[Timer]
OnCalendar=*-*-* 01:05:00
[Install]
WantedBy=timers.target
graylog_statistic_monthly.timer:
[Unit]
Description=regulary logging status to journal
#Requires=network.service
#After=network.service
[Timer]
OnCalendar=*-*-01 01:30:00
[Install]
WantedBy=timers.target
The according services are quite simple also:
graylog_statistic.service:
[Unit]
Description=logging status to journal
[Service]
Type=oneshot
ExecStart=/The Actual Path/collect_graylog_statistics.sh
graylog_statistic_monthly.service:
[Unit]
Description=logging status to journal
[Service]
Type=oneshot
ExecStart=/opt/capricorn/bin/collect_graylog_statistics.sh --monthly
The Dashboards
A look at a message in graylog shows the following fields:
Those can easily be used in a dashboard; here is one for daily usage:
And here it is for the monthly usage: