Trying to understand how the source field is populated

Description of your problem

So I am ingesting logs using fluentd running on an AWS ECS sidecar container. The fluentd container is using a type unix source to read from /var/run/fluent.sock.

I was not initially setting a field in the log messages that relates to hostname/host/source in the log message so I am trying to understand how Graylog populates the “source” field.

Currently the source is picking up the containers host name which is not very helpful and I would like to add a field that Graylog will use to populate source.

Description of steps you’ve taken to attempt to solve the issue

I have tried setting the hostname but because this is running in AWS ECS “awsvpc” networking mode I am unable to do this. I have also tried add a field called hostname to the message.

<source>
  @type unix
  path /var/run/fluent.sock
  tag foo-api
</source>

<filter foo-api.**>
  @type parser
  key_name log
  reserve_data false
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key timestamp
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>
</filter>

<filter foo-api.**>
  @type record_transformer
  <record>
    ecs_container_id "foo-api-#{Socket.gethostname}"
  </record>
</filter>

<filter foo-api.**>
  @type record_transformer
  <record>
    hostname "foo-api-#{Socket.gethostname}"
  </record>
</filter>

<filter foo-api.**>
  @type record_modifier
  <replace>
    key ecs_container_id
    expression /.*/
    replace foo-api-testing
  </replace>
</filter>

<match foo-api.**>
  @type copy
  <store>
    @type gelf
    host graylog.dns.uk
    port 12201
    protocol tcp
    flush_interval 5s
  </store>
  <store>
    @type stdout
  </store>
</match>

Its running in AWS ECS on Fargate so I am struggling to get to /var/log/fluentd/fluentd.log to see the message.

Environmental information

Graylog 3.2.6
fluentd:1.11.2
fluent-plugin-record-modifier 2.1.0

Operating system information

Alpine 3.13.5

hello,

This would be your index mapping. If you did not create your own template (static) for index then by default your using dynamic template which would populate the “source” field.

If you execute the following you be able to see you mappings.

curl 'http://localhost:9200/_mapping?pretty'

Thanks, I understand that a source field has been added to the mapping dynamically but I don’t understand were the field originated from.

"source" : {
     "type" : "keyword"
}

My messages do not have fields that would provide that detail for fluentd so I am trying to figure out what Graylog is keying off. Is a source field being presented to Graylog (it seems thats the case if a source field is in the mapping). I can’t find any documentation for fluentd that mentions adding a “source” field.

Sample Data:

{"@timestamp":"2021-09-15T09:29:06.695+0100","level":"INFO","thread_name":"main","service":"foo-api","@version":"1","logger_name":"liquibase.lockservice.StandardLockService","message":"Successfully released change log lock"}

If I try to do some REGEX replaces for the source field in fluentd nothing happens but I can change level, thread_name, service, logger_name and messages.

I am of course hampered by the fact that the container is running on AWS Fargate ECS so I can’t get on the container to see the /var/log/fluentd/fluentd.log to see the forwarded message detail.

I even tried adding a field with a combination of host, hostname and source but they just get ignored. I can see the config loading and parsing them correctly but they never appear in a message.

<filter foo-api.**>
  @type record_transformer
  <record>
    host "foo-api-#{Socket.gethostname}"
  </record>
</filter>

Hello,

I believe this would be from you log shipper. Unfortunately, I’m not to similar with fluentd and/or containers. So i might not be much help there.

Yeah I feel your frustrations. This is one of the reasons I like to have control over my own environment. AWS is real easy to setup but when you need to change anything sometimes you have to go through more issues/problems.

Just a suggestion have you tried using pipeline in Graylog?
I’m not sure all what you want to do with the source field but here is a couple ideas for a pipeline if you decide to go that route.

This would change the source data.

Rule “Field Data Change”
when
    has_field("source") AND contains(to_string($message.source), "wrong_name")
then
    set_field("source","right_name");
end

This would be able to change the Field.

Rule "Change Source Field Rename"
when
    has_field ("source") 
then
    rename_field("source", "NewSource");
end

Hope that helps

Thanks, I am going to mark this as a solution even though it wouldn’t be ideal for my use case (hopefully it will help someone else). From a purely Graylog view point this will work but it would require a rule for every log shipping container sidecar that would need maintaining.

I will need to tackle (as suspected) the log shipper. It’s pretty easy to create a new field in fluentd. I wanted to avoid a new field and just change the current source field. The problem has been if I try to manipulate source/host/hostname then (even though I see the config being loaded by fluentd) it doesn’t actually change them or create them (depending on which plugin I tried).

If you have Fluentd container logs routed to a stream (Not in All Messages) then attach the pipeline I showed you above and it will rename the field called ‘source’ to what ever you want. Not sure why you would have to do every log.
Steps:

  1. Create stream called Fluentd /w rules that would allow only Fluentd log into this stream.
  2. Pipeline to change field name in stream called Fluentd.
  3. Insure Message Processors Configuration is correct
  4. Done.

Sorry I could be more help, but if you do solve this field naming in fluentd maybe post it in the forum for others running into the same problem.