Scan and mask PII in graylog

Hello Community

This is my first post in this community, so please pardon me if I missed to share any required information. I am a new graylog user too

I am looking for ways to scan PII in graylog logs and mask the identified PII data.
I came across a few blogs, but none of those articles describe scanning for PII in detail. This blog talks about a very basic example - Redacting Message Fields for Privacy Purposes of Redacting username.
However; I am looking for a solution which can scan for details such as Birthdates, Phone numbers, SSNs and other Identity card numbers, Address, etc

Can you share ideas on how to achieve this in Graylog


Hello @sunnyjaisinghani
if you ingest data into Graylog you will need to structure them in one way or another. By structuring I mean parsing and put some values into fields with a meaningfull names. This could be an ip-address in a field called source_ip, this could be an email-id in a field called source_email and so on.
As soon as you have this parsing done, you are able to name all fields which contain PII. Those can be redacted by using pipelines, decorators or both.
If neccesary, you can clone a message without PII into another stream to make the separation even more strict.


Thanks for the response @ihe

I see that our logs are structured and one of the field we have is “message”
Within this field, if I have to look for PII data such as “Date of birth”, “Email”, “Address”, “Identity Card Numbers”, etc., where do i start from ?

Does graylog docs have examples of regex that can match PII data ?

and once I am able to get my regex working, how do I extract the PII data in a separate field ?

HI @sunnyjaisinghani
I would suggest to make yourself familar with Grok patterns. They combine multiple regex, which can capture multiple values into different fields.
After you build your Grok-Patterns you create a pipeline and run the grok there. I explained that on another case here: Regex Matching in Pipeline or Extractor - #16 by ihe
In a stage after your parsing you can replace the message field with some content without PII, again with a pipeline.

The structure you are seeing is the default parsing that is done for all messages. What you need is to parse the data in the message field, Each piece of information within that field should be split off into it’s own field (birthdate, email, address, ID numbers, etc).

Once this is done, you can then use decorators or pipelines to obscure, drop or replace the values in those fields containing PII, using either the grok patterns @ihe mentioned or regex based pipeline rules.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.