Hello,
Currently we (companies which act inside EU) are facing GDPR concerns regarding log data and data anonymization, e.g. not storing personal-identifiable information for more than 14d in logs.
I would like to discuss the idea to add a new extractor type used to anonymize any string field/regex result into some SHA256, SHA512 or so. Logstash has a plugin to implement this, and maybe could be used as an example: https://www.elastic.co/guide/en/logstash/6.x/plugins-filters-fingerprint.html
There are some examples in which this feature would be great to have:
-
As a Sysadmin, I need to be GDPR-compliant, and at the same time would like to be able to view unique web accesses into my application in a range more than 14d, to help identify crawlers and/or other threats that are accessing my websites.
-
My logs have some personal data (Tax ID, for example) that need to be distinguishable for analysis, however this data cannot be exposed to the Graylog operator. Also in this case, the raw data is not important per se, but to be able to distinguish them is crucial.
Hope that this topic brings some new cool ideas for Graylog