Friends! I have been trying to for the last two weeks to get this project up and running.
I will include all the details I can, but I am running out of ideas for troubleshooting. I have double checked all the compatibility matrix’s I can find - have changed around versions of ES and filebeat, but still am having problems.
Stuck point right now is that I am unable to ship data anywhere, despite networking rules being appropriated (telnet and netcat work)
I think this is due to my false understanding that filebeat was supposed to send directly to ES, and graylog somehow was aware of that. After ruminating on the fact that you create a beats input in graylog - that updating filebeat.conf on a sidecar - which triggers the filebeat service to re-write the filebeat.yml ( A functionality that I was unable to get working, so manually created a filebeat.yml file) it seems that filebeat actually feeds another filebeat that you are to have configured on the graylog “master” node as well?!? This bit of misunderstanding on roles and responsibilities of the stack are probably my issue. I have not found good documentation or other posts on people who have containerized graylog only, and connect it directly to ES clusters and data sources.
I have referenced the sidecar architecture diagrams, as well as the high level high level architecture diagrams for graylog - but still am not exactly sure what port 5044 (beats input default? ) for graylog - but still am not exactly sure what port 5044 (beats input default?) is supposed to do? The tcpTransportPort seems to be the manual override for data transfer to ES, what does that have to do with graylog?
Anyway - I was hoping you all could help point out a flaw in my configs that are stopping me from ingesting data. Please find my below information, and let me know what else you need. This is a real head scratcher for me.
POI’s:
- Using non-standard ports due to already available open ports in my specific development environment from other project. Would use standard in a production setup.
- CentOS Linux release 7.7.1908 - is on all servers in question.
- Graylog seems the ES cluster, and is in a green state.
- ES cluster is bootstrapped and in green status, with node 1 being the master.
- Filebeat service is running, no errors. (/var/log/filebeat/)
- Graylog-sidecar service is running, no errors. (/var/log/graylog-sidecar/)
- Elasticsearch is running, no errors. (/var/log/elasticsearch/*
- Filebeat does recognize there are files to pickup (Had feedback from harvesters starting once - Maybe filebeat is smart and realized it’s already tried to read these back when my config was “bad”?)
Resource Legend:
10.49.39.49 = Filebeat / Graylog-Sidecar 1.0.2 / raw data source
10.49.39.48 = Graylog Dockerized w/mongo
10.49.39.163 = Master ElasticSearch (Node 1)
10.49.40.120 = Elasticsearch Data (Node 2)
10.49.40.116 = Elasticsearch Data (Node3)
graylog docker-compose.yml
version: '2'
services:
# MongoDB: https://hub.docker.com/_/mongo/
mongodb:
image: mongo:3
#Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/docker.html
#elasticsearch:
# image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.2
#environment:
# - http.host=10.49.39.163
# - transport.host=10.49.39.163:9300
# - network.host=0.0.0.0
# - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
# ulimits:
# memlock:
# soft: -1
# hard: -1
# mem_limit: 1g
# Graylog: https://hub.docker.com/r/graylog/graylog/
graylog:
image: graylog/graylog:3.1
environment:
# CHANGE ME (must be at least 16 characters)!
- GRAYLOG_PASSWORD_SECRET=ThisIsAReallyCoolPassword
# Password: ThisIsAReallyCoolPassword
- GRAYLOG_ROOT_PASSWORD_SHA2=2a37dec9db8bde8760db6f4e55bd38dc9e86f2b374c2f4a4acafe02fc7ad43a4
- GRAYLOG_HTTP_EXTERNAL_URI=http://10.49.39.48/
- GRAYLOG_ELASTICSEARCH_HOSTS=http://10.49.39.163:9100,http://10.49.37.120:9100,http://10.49.39.116:9100
#- GRAYLOG_ELASTICSEARCH_HOSTS=http://10.49.39.48:9100
#- GRAYLOG_ELASTICSEARCH_DISCOVERY_ZEN_PING_UNICAST_HOSTS=10.49.39.163:9100, 10.49.37.120:9100, 10.49.39.116:9100
- GRAYLOG_ELASTICSEARCH_DISCOVERY_ENABLED=true
#- GRAYLOG_ELASTICSEARCH_NETWORK_HOST=10.49.39.163
- GRAYLOG_ELASTICSEARCH_CLUSTER_NAME=graylog
# - GRAYLOG_HTTP_BIND_ADDRESS=10.49.39.48:9000
links:
- mongodb:mongo
#- elasticsearch
depends_on:
- mongodb
# - elasticsearch
ports:
# Graylog web interface and REST API
- 80:9000
# tcpTransportPort
- 1761:1761
#beatsManagementPort
- 5044:5044
# httpElasticSearchPort
- 9100:9100
# Syslog TCP
- 1514:1514
# Syslog UDP
- 1514:1514/udp
# GELF TCP
- 12201:12201
# GELF UDP
- 12201:12201/udp
volumes:
- /opt/graylog:/var/lib/graylog/data
elasticsearch.yml
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: graylog
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
node.master: true
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 0.0.0.0
# Set a custom port for HTTP:
#
http.port: 9100
transport.tcp.port: 1761
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
discovery.zen.ping.unicast.hosts: ["10.49.39.200", "10.49.37.183", "10.49.39.163"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
discovery.zen.minimum_master_nodes: 1
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
filebeat.yml
###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /opt/fdr/data/data/*/*.log
#- c:\programdata\elasticsearch\logs\*
# Exclude lines. A list of regular expressions to match. It drops the lines that are
# matching any regular expression from the list.
#exclude_lines: ['^DBG']
# Include lines. A list of regular expressions to match. It exports the lines that are
# matching any regular expression from the list.
#include_lines: ['^ERR', '^WARN']
# Exclude files. A list of regular expressions to match. Filebeat drops the files that
# are matching any regular expression from the list. By default, no files are dropped.
#exclude_files: ['.gz$']
# Optional additional fields. These fields can be freely picked
# to add additional information to the crawled log files for filtering
#fields:
# level: debug
# review: 1
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^\[
# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after
#============================= Filebeat modules ===============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
#filebeat.registry.path:
#filebeat.registry_file: /var/lib/graylog-sidecar/collectors/filebeat/data/registry/filebeat/data.json
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
#reload.period: 10s
#==================== Elasticsearch template setting ==========================
setup.template.settings:
index.number_of_shards: 3
#index.codec: best_compression
#_source.enabled: false
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
# env: staging
#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false
# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# Kibana Host
# Scheme and port can be left out and will be set to the default (http and 5601)
# In case you specify and additional path, the scheme is required: http://localhost:5601/path
# IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
#host: "localhost:5601"
# Kibana Space ID
# ID of the Kibana Space into which the dashboards should be loaded. By default,
# the Default Space will be used.
#space.id:
#============================= Elastic Cloud ==================================
# These settings simplify using filebeat with the Elastic Cloud (https://cloud.elastic.co/).
# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:
# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
# Array of hosts to connect to.
# hosts: ["10.49.39.48:1761"]
# Enabled ilm (beta) to use index lifecycle management instead daily indices.
#ilm.enabled: false
# Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
#----------------------------- Logstash output --------------------------------
output.logstash:
# The Logstash hosts
hosts: ["10.49.39.48:1761"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
#================================ Processors =====================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
#================================ Logging =====================================
# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug
# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]
#============================== Xpack Monitoring ===============================
# filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster. This requires xpack monitoring to be enabled in Elasticsearch. The
# reporting is disabled by default.
# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false
# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:
sidecar.yml
# The URL to the Graylog server API.
server_url: "http://10.49.39.48/api/"
# The API token to use to authenticate against the Graylog server API.
# This field is mandatory
server_api_token: "128b9hovgbgmg66bd6uhitnjrbm461hjh83kchvb93d18pf5epef"
# The node ID of the sidecar. This can be a path to a file or an ID string.
# If set to a file and the file doesn't exist, the sidecar will generate an
# unique ID and writes it to the configured path.
#
# Example file path: "file:/etc/graylog/sidecar/node-id"
# Example ID string: "6033137e-d56b-47fc-9762-cd699c11a5a9"
#
# ATTENTION: Every sidecar instance needs a unique ID!
#
#node_id: "file:/etc/graylog/sidecar/node-id"
# The node name of the sidecar. If this is empty, the sidecar will use the
# hostname of the host it is running on.
node_name: "fdr_sidecar"
# The update interval in secods. This configures how often the sidecar will
# contact the Graylog server for keep-alive and configuration update requests.
#update_interval: 10
# This configures if the sidecar should skip the verification of TLS connections.
# Default: false
#tls_skip_verify: false
# This enables/disables the transmission of detailed sidecar information like
# collector statues, metrics and log file lists. It can be disabled to reduce
# load on the Graylog server if needed. (disables some features in the server UI)
send_status: true
# A list of directories to scan for log files. The sidecar will scan each
# directory for log files and submits them to the server on each update.
#
# Example:
# list_log_files:
# - "/var/log/filebeat/"
# - "/opt/app/logs"
#
# Default: empty list
#list_log_files: []
# Directory where the sidecar stores internal data.
#cache_path: "/var/cache/graylog-sidecar"
# Directory where the sidecar stores logs for collectors and the sidecar itself.
#log_path: "/var/log/graylog-sidecar"
# The maximum size of the log file before it gets rotated.
#log_rotate_max_file_size: "10MiB"
# The maximum number of old log files to retain.
#log_rotate_keep_files: 10
# Directory where the sidecar generates configurations for collectors.
collector_configuration_directory: "/var/lib/graylog-sidecar/generated"
# A list of binaries which are allowed to be executed by the Sidecar. An empty list disables the whitelist feature.
# Wildcards can be used, for a full pattern description see https://golang.org/pkg/path/filepath/#Match
# Example:
collector_binaries_whitelist:
- "/usr/bin/filebeat"
- "/usr/share/filebeat/bin/filebeat"
# - "*"
# - "/opt/collectors/*"
#
# Example disable whitelisting:
# collector_binaries_whitelist: []
#
# Default:
#collector_binaries_whitelist:
# - "bin/filebeat"
# - "/usr/bin/packetbeat"
# - "/usr/bin/metricbeat"
# - "/usr/bin/heartbeat"
# - "/usr/bin/auditbeat"
# - "/usr/bin/journalbeat"
# - "/usr/share/filebeat/bin/filebeat"
# - "/usr/share/packetbeat/bin/packetbeat"
# - "/usr/share/metricbeat/bin/metricbeat"
# - "/usr/share/heartbeat/bin/heartbeat"
# - "/usr/share/auditbeat/bin/auditbeat"
# - "/usr/share/journalbeat/bin/journalbeat"
# - "/usr/bin/nxlog"
# - "/opt/nxlog/bin/nxlog"