Scan and mask PII in graylog

sunnyjaisinghani · July 18, 2023, 7:08am

Hello Community

This is my first post in this community, so please pardon me if I missed to share any required information. I am a new graylog user too

I am looking for ways to scan PII in graylog logs and mask the identified PII data.
I came across a few blogs, but none of those articles describe scanning for PII in detail. This blog talks about a very basic example - Redacting Message Fields for Privacy Purposes of Redacting username.
However; I am looking for a solution which can scan for details such as Birthdates, Phone numbers, SSNs and other Identity card numbers, Address, etc

Can you share ideas on how to achieve this in Graylog

Thanks

ihe · July 18, 2023, 8:00am

Hello @sunnyjaisinghani
if you ingest data into Graylog you will need to structure them in one way or another. By structuring I mean parsing and put some values into fields with a meaningfull names. This could be an ip-address in a field called source_ip, this could be an email-id in a field called source_email and so on.
As soon as you have this parsing done, you are able to name all fields which contain PII. Those can be redacted by using pipelines, decorators or both.
If neccesary, you can clone a message without PII into another stream to make the separation even more strict.

sunnyjaisinghani · July 19, 2023, 9:19am

Thanks for the response @ihe

I see that our logs are structured and one of the field we have is “message”
Within this field, if I have to look for PII data such as “Date of birth”, “Email”, “Address”, “Identity Card Numbers”, etc., where do i start from ?

Does graylog docs have examples of regex that can match PII data ?

and once I am able to get my regex working, how do I extract the PII data in a separate field ?

ihe · July 19, 2023, 9:35am

HI @sunnyjaisinghani
I would suggest to make yourself familar with Grok patterns. They combine multiple regex, which can capture multiple values into different fields.
After you build your Grok-Patterns you create a pipeline and run the grok there. I explained that on another case here: Regex Matching in Pipeline or Extractor - #16 by ihe
In a stage after your parsing you can replace the message field with some content without PII, again with a pipeline.

chris.black-gl · July 19, 2023, 4:16pm

The structure you are seeing is the default parsing that is done for all messages. What you need is to parse the data in the message field, Each piece of information within that field should be split off into it’s own field (birthdate, email, address, ID numbers, etc).

Once this is done, you can then use decorators or pipelines to obscure, drop or replace the values in those fields containing PII, using either the grok patterns @ihe mentioned or regex based pipeline rules.

system · August 2, 2023, 4:17pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog Labs Article - Redacting Message Fields for Privacy Purposes Graylog Labs	1	448	March 28, 2023
Having an issue with graylog rule Graylog Central (peer support) pipeline-rules	33	520	March 24, 2023
Parsing Logs On Graylog Graylog Central (peer support)	8	8046	November 17, 2017
Regex in Search Field Graylog Central (peer support)	6	1088	September 20, 2018
I dont find src_ip in message body and also how to create separate field for the same? New to Graylog Community? READ-ME FIRST Guides	1	285	March 21, 2023

Scan and mask PII in graylog

Related Topics