GDPR how to be OK?

Hello,

I’m trying to find a way to store all logs and be ok with GDPR.
We need to limit access to olders log files to only GDPR team.

Graylog can’t create users profile with an access to 0-6months log and an other profile to 0-1 year.

Is it possible to have 2 graylog servers: one for admins, second for our GDPR team ?

First one used for all admins, for stats, checks… :

  • collect all logs with a 3months retention for example
  • send all logs to the second one

Second one, only used bu GDPR team :

  • get all logs from the first one, but with a one year retention log.

Thank you

Hi
yes, it is possible, but in this case you should run two different cluster, and store double the data.
I suggest, to use graylog API.
My idea:
create new stream at every month, forward all (related message) in it
set rights to see all group
the old streams remove the rights

Or you can create a lot of streams via script (api) or manual (eg. 1812,1901,1902…), and create a pipeline to store the message in stream by the arrival date.

Hi,
Thank you, great idea !

I will try this !

If you do it, please share with us the final solution.

1 Like

Ditto, that’s certainly worthy of a nice blog post :slight_smile:

The first solution has to be manual, it’s boring :slight_smile:

Other solution: a plugin

Graylog keep all logs for a year.
With a specific plugin :
If the user is limited by the plugin ( user management ), all requests done by this user is limited to 6 months old if he doesn’t put a smaller date.

We need a plugin administration to set a specific duration, not 6 months, but could be changed by graylog admin.
We need to active it on user profiles.

Is it possible to create de plugin ? I’m not a dev :confused: I don’t know if this could be easy with webUI, API etc…

Of course there’s one more issue: system administrators. They can always, always access your data. Even if it’s supposed to be off-limits.

Also, if someone manages to get the creds to ElasticSearch, they can easily bypass Graylog. So make sure that you’ve hamered down Elastic as well!

I would turn that and look from a different angle - that might make it easier to solve the issue.

How long did you really need the log data to be instand searchable? Is that really for that long time? I know not many environments that really need data for longer than 40 days. Most have aggregated data for that period of time and raw data only for a few days …

I do not know about what amount of data we are talking but having elasticsearch handle multiple TB or PB is not a sidejob - so if you can make some assumptions about that it would help to help you.

What is the Job of your GDPR Team? Do they need RAW Log data?

1 Like

You are right, for sysadmins, only few days are usefull to check what happens, get notification of lots of errors etc…

For RGPD, they need to have more/all logs. I don’t know how many, how many server I have to use. So, if I have to make two clusters, it’s not great :confused: I want something easy

So I try this:
I have created two indexes : first one, by default with graylog installation, at 6 months retention
Second one called “GDPR”, 1 year of retention.

To feed the second one, I have created a steam with this rule : “Rule always matches”

So, when a log comes in, two indexes are feed.

I created a role which can only see the stream GDPR.

The user GDPR can only see is own index, can create dashboard, alert etc…

What do you think ?

Thank you

it is not easy to give a recommendation - As I can’t imagine what your “GDPR Team” is doing.

You will, with your idea, have more data as you duplicate it when holding it in different indices. From my experience such plans becomes very quick a review when you put a price tag on the requirements.

Ask from what devices and services messages should be hold how long and what volumen will that be. If you have that complete amount of data on a daily base, take this multiply by 1.7 (just to be save) and then you have the need of storage per day you need. Now multiply that up o one year and you have the amount of data you need to manage. If you want to duplicate the messages for one month - take the daily volumen multiply that for one month and add this on top.

In a small environment having 10GB per day is not uncommon. Taking now the above example:

10GB/day * 365 Days = 3.65 terabytes
10GB/day * 31 Days = 310 gigabytes
The above * 1,7 = 4.17700 terabytes

Now taking what you find in https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster will make each index holding ~11GB on a daily base. When you need to be resilient - no data lost even if a node is nuked you need 3 Elasticsearch Nodes and a replication of 1. So the data gets duplicated - 22GB per day. The final needed storage on three nodes together will be ~ 8.35400 terabytes.

We can spin this up with no issues to even more details - but just to give you something to think about.

1 Like

Absolutely! Don’t just start building stuff, even if it’s very tempting. You need a proper set of requirements from them, especially with something as important as data security and privacy legislation.

Personally I don’t see why a privacy/legal team would need access to all server logs dating back to a year. I could understand them wanting specific security and access logs, to trace which users accessed which data. But as @jan already said: we can’t imagine what your “GDPR Team” is supposed to be doing all day :wink:

I mean, even our security auditors will not require full access to the full server logs dating back that far. It’s mostly security stuff, which is a limited subset.

So… time for talks, meetings, proposals and most importantly: lists of requirements.

To throw in a few cents here, my company requires logs be kept searchable for a minimum of 90 days, preferably more than that (180 even) - with a 5 year archive. Partially due to legal reasons, partially because we need to be able to look that far back. (This is backed by give or take 48Tb of storage).

You need a legal basis to store the log data, if it contains personal data, for a certain period of time. The right of revocation must not be forgotten. The data subject may request that his or her personal data be deleted.

Before you think about the technical implementation: Check how you can meet the requirements (purpose of processing personal data: storage).

Has a processing activity been created for this purpose?

A purpose is surely the own interest, but how is one to justify this for more than 3 months?

Here you can read about lawfullness of processing: https://gdpr-info.eu/art-6-gdpr/
A legal basis can be: processing is necessary for compliance with a legal obligation to which the controller is subject.

Why does the GDPR team need access to the log data and why so long? What is the purpose? This should be answered. The data protection officer should reduce the storage duration to a minimum and not to a maximum of time.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.