Display most frequently occurring messages

hi, im pretty new to graylog so im still in the early stages of learning how to get it to do what i want and my googlefu seems to be failing me

i have about 50 hosts that have been set up so far to send syslog messages to graylog and its been working flawlessly
now my primary goal at this stage is to show the 10 or so most frequently occurring messages (both overall as well as filtered per specific host)

when i was logging syslog to a mysql database, i was using this sql query to achieve exactly what i wanted (hopefully better explains my goal):

SELECT
 Message,
 COUNT(Message)
FROM
 SystemEvents
GROUP BY
 Message
ORDER BY COUNT(Message) DESC
LIMIT 20;

which would result in something along the lines of the following example:

Message Count
Starting Proxmox VE replication runner… 55
Finished Proxmox VE replication runner. 55
dhcpd[3291595]: DHCPDISCOVER 33
pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) 3
pvesr.service: Consumed 2.602s CPU time. 1


is there a way to get the same results in a dashboard widget within graylog?

Currently im running
Graylog: 4.2 / Elasticsearch: 7.10.2

It looks like the way Elastic is set up, counting unique messages is very expensive. When setting up a widget for that you get this message (which you probably have seen)

* Unable to perform search query: Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [message] in order to load field data by uninverting the inverted index. Note that this can use significant memory.].

Digging further, you can set fielddata to be true in Elastic but there are notable caveats. (Fielddata Mapping Parameter)

Doesn’t look like something I would want to do… maybe if it was a small, low churn index.

There may be another way of dicing the data that would get you the information you need?

Yep that was exactly what I tried doing first! I did read the documentation and some other forum discussions about enabling fielddata and I toyed with the idea of enabling it but decided against it just to save on any headaches it may cause down the road

I kept coming across talks of using keywords instead but I have not been able to work out what that means / how to implement a working solution with them yet


I have got some kind of solution in place using this plugin: Graylog output plugin JDBC
With this logs are now being sent by the hosts rsyslog to graylog, then being outputted from graylog to my mariadb galera cluster (almost in the same standard sql format as logging directly to sql with rsyslog as well)

The config if anyone is interested:

driver: org.mariadb.jdbc.Driver
fields: facility_num,level,application_name,process_id
logInsertAttributeQuery: <empty>
logInsertQuery: insert into SystemEvents (ReceivedAt, MessageID, FromHost, Message, Facility, Priority, SysLogTag, processid) values (?, ?, ?, ?, ?, ?, ?, ?)
password: supersecurepassword
url: jdbc:mariadb://ip-of-database/rsyslog
username: rsyslog

Also had to update the existing rsyslog sql database table

ALTER TABLE SystemEvents ADD COLUMN MessageID VARCHAR(64) AFTER processid;
ALTER TABLE SystemEvents MODIFY processid VARCHAR(60) NULL DEFAULT '';

So for now at least I can enable/disable the graylog ouput to sql and then perform the sql query as needed

It’s not a perfect solution so I’m still hoping someone might be able to help me get graylog to visualize this

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.