Good practice : how many streams is too many?

Hello !
I’m a graylog beginner and I want to use it in one of my projects and I am wondering what is the best way to do things. Very quick summary :

  • The project is a tool that exposes an API that many clients will be able to call
  • Each client is supposed to be able to visualize its own logs (user will have the logs of when they called the API and what they did, admins will see all the logs of their users, and so on).

Understanding that I will very often need to search for logs by user_id, or its company_id, I wanted to setup graylog in the following way :

  • have a stream for each type of action the user can make
  • have a stream for each company
  • have a stream for each user

And have a default pipeline on the all_message stream that would look like this :

pipeline "DefaultPipeline"
stage 0 match all
  rule "is_action_type_1?";
stage 0 match all
  rule "is_action_type_2?";

rule "is_action_type_1?"
    //test if field "action_type" is equal to "action_type_1"
    user_stream = "user" + user_id_to_string
    company_stream = "company" + company_id_to_string
    action_stream = "action_1"
    route_to_stream(user_stream, $message,false)
    route_to_stream(company_stream, $message,false)
    route_to_stream(action_stream, $message,false)

//Similar rule definition for is_action_type_2?

This would obviously require a LOT of streams (one per user) but I am under the impression it would not require much processing power since it is route based on the stream name or id directly.

Does this bypass the fact that graylog will match every message against each stream’s rules ? If no, is there a way to bypass that, so it won’t test each stream’s rule and just let me handle the routing in my default pipeline ?

Is this a good solution, or should I have way less stream (let’s say one per action_type and one per company) and then filter the result whenever I make a search ?

will their be some users or several hundred? This might be not very practical …

As long as you do the routing via pipelines having no stream rules this will not be checked on each incoming messages, but I do not know how well the system work with maybe several thousand of streams.

Your scenario work well with a defined number of companies and users, but does not ‘web scale’ very well.

Thank you, this answers my question. Indeed it might be thousands of users, so not a good idea.
Generally speaking, about how many streams is considered too many ? (Asking for an order of magnitude, or any way to evaluate that, not a precise number)

what you request isn’t yet measured - or shared with us. So it is not easy to answer. Mostly we know when something is not working for someone, but sharing a user story is not very common for most users.

If you need the ability to give someone access, split the logs by company. But more important normalize your logs. Extract user, company and actions into single fields and you can search fast(er) for them. Make heavy use of extractors/pipelines to cut out the valuable informations from the message and remove the clutter.

That would be my personal advice.

Thank you, this was were I was headed, glad you seem to confirm this approach !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.