"Stored in index" vs "Routed into streams"

I’ve set up indexes each for a stream with the purpose to keep the field count low. There’s a set for journald events, one for docker containers, syslog inputs, and so on.

I wanted to give access to another user, but only to a subset of hosts, so I created a stream, with the filter matching those hosts. The event types vary of course, as expected.
I had to select an index for this new stream though, and here comes the problem:

  • if I selected one of the existing indexes, it lead to messages ending up in the wrong index
  • if I create a new index for this stream, now there are messages going into this new stream from multiple inputs

In both of these cases, field type count is increased in the affected indexes, which is never good.

tl;dr: when multiple stream filters match a message, how does one set the preferred index?

Hi @InternetWorkAcct
welcome to the community!

Here is a little visualization of your issue:

Log 1 will be part of two streams, but is stored only on one index set. The amount of data is not increased a lot (just one ID more) and the log is not duplicated. Log 2 is duplicated across multiple index sets and therefore also stored twice on disk.

For routing I do prefer pipelines and rules over the “Stream Rules” on the stream-overview. With pipelines I can assign a log based on very specific conditions a stream, and also remove an assignment. Here the “remove_from_stream” and “route_to_stream” are the ones of your choice.