Graylog 3.1.4 - Containerized | Elasticsearch-oss 6.8.6 - on host | Filebeat 6.8.5 - on host

Friends! I have been trying to for the last two weeks to get this project up and running.

I will include all the details I can, but I am running out of ideas for troubleshooting. I have double checked all the compatibility matrix’s I can find - have changed around versions of ES and filebeat, but still am having problems.

Stuck point right now is that I am unable to ship data anywhere, despite networking rules being appropriated (telnet and netcat work)

I think this is due to my false understanding that filebeat was supposed to send directly to ES, and graylog somehow was aware of that. After ruminating on the fact that you create a beats input in graylog - that updating filebeat.conf on a sidecar - which triggers the filebeat service to re-write the filebeat.yml ( A functionality that I was unable to get working, so manually created a filebeat.yml file) it seems that filebeat actually feeds another filebeat that you are to have configured on the graylog “master” node as well?!? This bit of misunderstanding on roles and responsibilities of the stack are probably my issue. I have not found good documentation or other posts on people who have containerized graylog only, and connect it directly to ES clusters and data sources.

I have referenced the sidecar architecture diagrams, as well as the high level high level architecture diagrams for graylog - but still am not exactly sure what port 5044 (beats input default? ) for graylog - but still am not exactly sure what port 5044 (beats input default?) is supposed to do? The tcpTransportPort seems to be the manual override for data transfer to ES, what does that have to do with graylog?

Anyway - I was hoping you all could help point out a flaw in my configs that are stopping me from ingesting data. Please find my below information, and let me know what else you need. This is a real head scratcher for me.

POI’s:
- Using non-standard ports due to already available open ports in my specific development environment from other project. Would use standard in a production setup.
- CentOS Linux release 7.7.1908 - is on all servers in question.
- Graylog seems the ES cluster, and is in a green state.
- ES cluster is bootstrapped and in green status, with node 1 being the master.
- Filebeat service is running, no errors. (/var/log/filebeat/)
- Graylog-sidecar service is running, no errors. (/var/log/graylog-sidecar/
)
- Elasticsearch is running, no errors. (/var/log/elasticsearch/*
- Filebeat does recognize there are files to pickup (Had feedback from harvesters starting once - Maybe filebeat is smart and realized it’s already tried to read these back when my config was “bad”?)

Resource Legend:
10.49.39.49 = Filebeat / Graylog-Sidecar 1.0.2 / raw data source
10.49.39.48 = Graylog Dockerized w/mongo
10.49.39.163 = Master ElasticSearch (Node 1)
10.49.40.120 = Elasticsearch Data (Node 2)
10.49.40.116 = Elasticsearch Data (Node3)

graylog docker-compose.yml

version: '2'
services:
  # MongoDB: https://hub.docker.com/_/mongo/
  mongodb:
    image: mongo:3
  #Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/docker.html
  #elasticsearch:
  # image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.2
    #environment:
     # - http.host=10.49.39.163
     # - transport.host=10.49.39.163:9300
     # - network.host=0.0.0.0
     # - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
   # ulimits:
   #   memlock:
   #     soft: -1
   #     hard: -1
   # mem_limit: 1g
  # Graylog: https://hub.docker.com/r/graylog/graylog/
  graylog:
    image: graylog/graylog:3.1
    environment:
      # CHANGE ME (must be at least 16 characters)!
      - GRAYLOG_PASSWORD_SECRET=ThisIsAReallyCoolPassword
      # Password: ThisIsAReallyCoolPassword
      - GRAYLOG_ROOT_PASSWORD_SHA2=2a37dec9db8bde8760db6f4e55bd38dc9e86f2b374c2f4a4acafe02fc7ad43a4
      - GRAYLOG_HTTP_EXTERNAL_URI=http://10.49.39.48/
      - GRAYLOG_ELASTICSEARCH_HOSTS=http://10.49.39.163:9100,http://10.49.37.120:9100,http://10.49.39.116:9100
      #- GRAYLOG_ELASTICSEARCH_HOSTS=http://10.49.39.48:9100
      #- GRAYLOG_ELASTICSEARCH_DISCOVERY_ZEN_PING_UNICAST_HOSTS=10.49.39.163:9100, 10.49.37.120:9100, 10.49.39.116:9100
      - GRAYLOG_ELASTICSEARCH_DISCOVERY_ENABLED=true
      #- GRAYLOG_ELASTICSEARCH_NETWORK_HOST=10.49.39.163
      - GRAYLOG_ELASTICSEARCH_CLUSTER_NAME=graylog
    # - GRAYLOG_HTTP_BIND_ADDRESS=10.49.39.48:9000
    links:
      - mongodb:mongo
      #- elasticsearch
    depends_on:
      - mongodb
    #  - elasticsearch
    ports:
      # Graylog web interface and REST API
      - 80:9000
      # tcpTransportPort
      - 1761:1761
      #beatsManagementPort
      - 5044:5044
      # httpElasticSearchPort
      - 9100:9100
      # Syslog TCP
      - 1514:1514
      # Syslog UDP
      - 1514:1514/udp
      # GELF TCP
      - 12201:12201
      # GELF UDP
      - 12201:12201/udp
    volumes:
      - /opt/graylog:/var/lib/graylog/data

elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: graylog
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
node.master: true
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 0.0.0.0
# Set a custom port for HTTP:
#
http.port: 9100
transport.tcp.port: 1761
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["10.49.39.200", "10.49.37.183", "10.49.39.163"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
discovery.zen.minimum_master_nodes: 1
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

filebeat.yml

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

#=========================== Filebeat inputs =============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /opt/fdr/data/data/*/*.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after


#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml
#filebeat.registry.path:
#filebeat.registry_file: /var/lib/graylog-sidecar/collectors/filebeat/data/registry/filebeat/data.json
  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 3
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
 # hosts: ["10.49.39.48:1761"]

  # Enabled ilm (beta) to use index lifecycle management instead daily indices.
  #ilm.enabled: false

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["10.49.39.48:1761"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

# Configure processors to enhance or manipulate events generated by the beat.

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:

sidecar.yml

# The URL to the Graylog server API.
server_url: "http://10.49.39.48/api/"

# The API token to use to authenticate against the Graylog server API.
# This field is mandatory
server_api_token: "128b9hovgbgmg66bd6uhitnjrbm461hjh83kchvb93d18pf5epef"

# The node ID of the sidecar. This can be a path to a file or an ID string.
# If set to a file and the file doesn't exist, the sidecar will generate an
# unique ID and writes it to the configured path.
#
# Example file path: "file:/etc/graylog/sidecar/node-id"
# Example ID string: "6033137e-d56b-47fc-9762-cd699c11a5a9"
#
# ATTENTION: Every sidecar instance needs a unique ID!
#
#node_id: "file:/etc/graylog/sidecar/node-id"

# The node name of the sidecar. If this is empty, the sidecar will use the
# hostname of the host it is running on.
node_name: "fdr_sidecar"

# The update interval in secods. This configures how often the sidecar will
# contact the Graylog server for keep-alive and configuration update requests.
#update_interval: 10

# This configures if the sidecar should skip the verification of TLS connections.
# Default: false
#tls_skip_verify: false

# This enables/disables the transmission of detailed sidecar information like
# collector statues, metrics and log file lists. It can be disabled to reduce
# load on the Graylog server if needed. (disables some features in the server UI)
send_status: true

# A list of directories to scan for log files. The sidecar will scan each
# directory for log files and submits them to the server on each update.
#
# Example:
#     list_log_files:
#       - "/var/log/filebeat/"
#       - "/opt/app/logs"
#
# Default: empty list
#list_log_files: []

# Directory where the sidecar stores internal data.
#cache_path: "/var/cache/graylog-sidecar"

# Directory where the sidecar stores logs for collectors and the sidecar itself.
#log_path: "/var/log/graylog-sidecar"

# The maximum size of the log file before it gets rotated.
#log_rotate_max_file_size: "10MiB"

# The maximum number of old log files to retain.
#log_rotate_keep_files: 10

# Directory where the sidecar generates configurations for collectors.
collector_configuration_directory: "/var/lib/graylog-sidecar/generated"

# A list of binaries which are allowed to be executed by the Sidecar. An empty list disables the whitelist feature.
# Wildcards can be used, for a full pattern description see https://golang.org/pkg/path/filepath/#Match
# Example:
collector_binaries_whitelist:
      - "/usr/bin/filebeat"
      - "/usr/share/filebeat/bin/filebeat"
     # - "*"
#       - "/opt/collectors/*"
#
# Example disable whitelisting:
#     collector_binaries_whitelist: []
#
# Default:
#collector_binaries_whitelist:
#  - "bin/filebeat"
#  - "/usr/bin/packetbeat"
#  - "/usr/bin/metricbeat"
#  - "/usr/bin/heartbeat"
#  - "/usr/bin/auditbeat"
#  - "/usr/bin/journalbeat"
#  - "/usr/share/filebeat/bin/filebeat"
#  - "/usr/share/packetbeat/bin/packetbeat"
#  - "/usr/share/metricbeat/bin/metricbeat"
#  - "/usr/share/heartbeat/bin/heartbeat"
#  - "/usr/share/auditbeat/bin/auditbeat"
#  - "/usr/share/journalbeat/bin/journalbeat"
#  - "/usr/bin/nxlog"
#  - "/opt/nxlog/bin/nxlog"

UnderSystem-Sidecars you would have a "log collector - think of it as a template. An example from my setup:

the default template would look like this:
# Needed for Graylog
fields_under_root: true
fields.collector_node_id: {sidecar.nodeName} fields.gl2_source_collector: {sidecar.nodeId}

filebeat.inputs:
- input_type: log
  paths:
    - /var/log/*.log
  type: log
output.logstash:
   hosts: ["192.168.1.1:5044"]
path:
  data: /var/cache/graylog-sidecar/filebeat/data
  logs: /var/log/graylog-sidecar

That log collector would be used to build a configuration that would look like this (pulls from “Filebeat on Linux”)

That is applied to a machine that shows up under System->Sidecars->Administration by clicking the checkbox for the configuration you want to apply and pulling down the configuration menu to the right and selecting the configuration you want. Graylog will then copy the configuration to the machine and set the service to focus on it. the collector ships all its information to Graylog for messaging. Graylog than handles all storage and retrieval to/from the elasticsearch servers. (Meaning the IP in the configuration should point to a Graylog server)

Here is a typical install sidecar.yml for a windows client (Linux would be nearly the same) that helps the client to show up in Graylog for you to assign a configuration to it.

server_url: http://cmg-gl01:9000/api/
server_api_token: "<a whole bunch of random stuff generated for sidecar api token>" 
update_interval: 10
tls_skip_verify: true
send_status: true
list_log_files:
collector_id: file:C:\Program Files\Graylog\sidecar\collector-id
cache_path: C:\Program Files\Graylog\sidecar\cache
log_path: C:\Program Files\Graylog\sidecar\logs
log_rotation_time: 86400
log_max_age: 604800
tags: [windows]
collector_binaries_whitelist: []
backends:
    - name: nxlog
      enabled: false
      binary_path: C:\Program Files (x86)\nxlog\nxlog.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\nxlog.conf
    - name: winlogbeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\winlogbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\winlogbeat.yml
    - name: filebeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\filebeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\filebeat.yml
    - name: auditbeat
      enabled: true
      binary_path: C:\Program Files\Graylog\sidecar\auditbeat.exe
      configuration_path: C:\Program Files\Graylog\sidecar\generated\auditbeat.yml

so in the end the install on the client gets a custom sidecar.yml that tells it to report to the graylog server. When you manually apply your configuration in graylog (in administration) to the client, graylog pushes the configuration out and the client uses it to figure out what to send to Graylog

Hope that helps a little bit for understanding how to set up… hope it wasn’t too simplified!

Thanks much for the reply!

I was under the impression that the collectors portion of the graylog-sidecar was deprecated with the newer releases, requiring you to install your own collector on your terms. Using the Configuration/Administration section of the Settings > Sidecars area, and having you link a sidecar and a standalone “beat.”

Did I mistake that?

The Collectors section of Garylog is depreciated and now the Sidecars area should be used… the Graylog sidecar software to install for the client has been updated, the latest release is here. This includes some basic “beats” such as filebeat or winlogbeat (windows) but you could download other bin/exe from elastic (Like auditbeat) to the same area and configure them. The Graylog sidecar installation with beats allows you to manage their configuration from within Graylog.

Yep - And they have removed filebeat agent from the newest release of the graylog-sidecar.

Just confirmed by downloading, but it does mention it in both the new documentation and the commit history on that github page.

So that’s not the answer to my problem.

Duly noted that filebeat is not included… though not that hard to download and install. I would search around for community postings that have working linux sidecar.yml listed - the ones I have seen look different…

So how would graylog’s Settings > Sidecar > configurations apply to both the sidecar installed on a data server, and require an Input to be setup on the graylog node? Should I be seeing more options in my input dropdown, and maybe I am having a networking issue? Does the graylog sidecar initiate settings which filebeat recognizes and re-writes it’s filebeat.yml file, AND you have to create an input in graylog to reference that same sidecar’d config?

In the /var/lib/graylog-sidecar/generated folder I get a filebeat.conf file when I make a change. NOT a filebeat.yml file. That’s why I am assuming filebeat is directed to update certain portions in it’s filebeat.yml

image

  1. Creating an Input is like opening a port on a firewall.(inputs also have extractors to allow you to modify incoming messages) you need one so your sidecars can send messages in. Generally one input is needed to receive beats information from multiple clients. Your screen shot is choosing the Graylog server you have as the node the Input will be opened on. The Sidecar client installation configuration (sidecar.yml) should point to the Graylog server and input port you have set up.

  2. When the sidecar starts up on the client it should connect in to the Graylog server it has listed in it’s sidecar.yml file (which also has connection modifiers, configuration for local beats/nxlog to be used etc.)

  3. once the connection is set up, you use graylog (as I described above) to then push configuration for which areas/files to pull messages from (including restrictions on what to send) - these are the sidecar/generated/ <>.conf files that are pushed they should mirror what you set up in Graylog . If there are issues, sidecar/logs/ is a good place to look for what the client is/isn’t doing. sidecar/cache keeps track of where you are on the files you are sending messages from. (note: I am referencing from a windows system so there may be some differences)

This post has a lot of detail about sidecar configuration even though it is an older version of Graylog/sidecar. There are some examples of the sidecar.yml configuration in there that are close to what you want.

It seemed easier to talk through the process than to hit your questions individually… :crazy_face:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.