I upgraded us from 2.1 to 2.2 without issue. But when I tried to go from v2.2 to v2.3 I have run into a problem, it seems Graylog can no longer reach Elasticsearch. When I start the graylog-server service I see entries like:
2018-04-14T17:45:40.427-04:00 ERROR [Cluster] Couldn’t read cluster health for indices [graylog2_*] (Could not connect to http://127.0.0.1:9200)
Here is my server.conf, as far as I know nothing was changed when updating, except for the elasticsearch_host line I added when troubleshooting, from what I understood on the update article that was only thing that was required.
> # ======================== Elasticsearch Configuration =========================
> #
> # NOTE: Elasticsearch comes with reasonable defaults for most settings.
> # Before you set out to tweak and tune the configuration, make sure you
> # understand what are you trying to accomplish and the consequences.
> #
> # The primary way of configuring a node is via this file. This template lists
> # the most important settings you may want to configure for a production cluster.
> #
> # Please see the documentation for further information on configuration options:
> # <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
> #
> # ---------------------------------- Cluster -----------------------------------
> #
> # Use a descriptive name for your cluster:
> #
> # cluster.name: my-application
> cluster.name: locker
> #
> # ------------------------------------ Node ------------------------------------
> #
> # Use a descriptive name for the node:
> #
> # node.name: graylog-1
> #
> # Add custom attributes to the node:
> #
> # node.rack: r1
> #
> # ----------------------------------- Paths ------------------------------------
> #
> # Path to directory where to store the data (separate multiple locations by comma):
> #
> # path.data: /path/to/data
> path.data: /mnt/store1/elasticsearch/data
> #
> # Path to log files:
> #
> # path.logs: /path/to/logs
> path.logs: /mnt/store1/elasticsearch/logs
> path.work: /mnt/store1/elasticsearch/work
> path.plugins: /mnt/store1/elasticsearch/plugins
> path.repo: ["/mnt/store1/backups/graylog-mongodb-elasticsearch/elasticsearch"]
> #
> # ----------------------------------- Memory -----------------------------------
> #
> # Lock the memory on startup:
> #
> bootstrap.mlockall: true
> #
> # Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory
> # available on the system and that the owner of the process is allowed to use this limit.
> #
> # Elasticsearch performs poorly when the system is swapping the memory.
> #
> # ---------------------------------- Network -----------------------------------
> #
> # Set the bind address to a specific IP (IPv4 or IPv6):
> #
> network.host: 10.1.115.2
> #
> # Set a custom port for HTTP:
> #
> # http.port: 9200
> #
> # For more information, see the documentation at:
> # <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
> #
> # --------------------------------- Discovery ----------------------------------
> #
> # Pass an initial list of hosts to perform discovery when new node is started:
> # The default list of hosts is ["127.0.0.1", "[::1]"]
> #
> discovery.zen.ping.multicast.enabled: false
> discovery.zen.ping.unicast.hosts: ["10.1.115.2"]
> # Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
> #
> # discovery.zen.minimum_master_nodes: 1
> #
> # For more information, see the documentation at:
> # <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>
> #
> # ---------------------------------- Gateway -----------------------------------
> #
> # Block initial recovery after a full cluster restart until N nodes are started:
> #
> # gateway.recover_after_nodes: 1
> #
> # For more information, see the documentation at:
> # <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
> #
> # ---------------------------------- Various -----------------------------------
> #
> # Disable starting multiple nodes on a single system:
> #
> # node.max_local_storage_nodes: 1
> #
> # Require explicit names when deleting indices:
> #
> # action.destructive_requires_name: true
>
> # Disable the dynamic scripting feature and prevent possible remote code executions.
> script.inline: false
> script.indexed: false
> script.file: false
You have a different IP for your elasticsearch in your elasticsearch.yml and in the elasticsearch_host line in the graylog.conf. If you don’t bind your elasticsearch in your loopback interface, you cannot reach it there.
Hi Jan, my mistake, that was just a test when I tried using the loopback, I was getting the same result with both IP’s set to 10.1.115.2.
I have added all the settings from your Upgrade guide linked earlier, most using the defaults. Here is a revised version of my server.conf and elasticsearch.yml, also my server.log. Would you mind looking it over again and pointing me in the right direction?
2018-04-16T13:41:25.506-04:00 INFO [CmdLineTool] Running with JVM arguments: -Xms1g -Xmx1g -XX:NewRatio=1 -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=rpm
2018-04-16T13:41:25.694-04:00 INFO [Version] HV000001: Hibernate Validator null
2018-04-16T13:41:27.234-04:00 INFO [InputBufferImpl] Message journal is enabled.
2018-04-16T13:41:27.254-04:00 INFO [NodeId] Node ID: a002c9da-98ce-4e77-9564-89915b10fd30
2018-04-16T13:41:27.429-04:00 INFO [LogManager] Loading logs.
2018-04-16T13:41:27.473-04:00 INFO [LogManager] Logs loading complete.
2018-04-16T13:41:27.473-04:00 INFO [KafkaJournal] Initialized Kafka based journal at /mnt/store1/graylog-server/journal
2018-04-16T13:41:27.486-04:00 INFO [InputBufferImpl] Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 2 parallel message handlers.
2018-04-16T13:41:27.503-04:00 INFO [cluster] Cluster created with settings {hosts=[127.0.0.1:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=5000}
2018-04-16T13:41:27.541-04:00 INFO [cluster] No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=127.0.0.1:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2018-04-16T13:41:27.556-04:00 INFO [connection] Opened connection [connectionId{localValue:1, serverValue:87}] to 127.0.0.1:27017
2018-04-16T13:41:27.557-04:00 INFO [cluster] Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[2, 6, 11]}, minWireVersion=0, maxWireVersion=2, maxDocumentSize=16777216, roundTripTimeNanos=400970}
2018-04-16T13:41:27.563-04:00 INFO [connection] Opened connection [connectionId{localValue:2, serverValue:88}] to 127.0.0.1:27017
2018-04-16T13:41:27.819-04:00 INFO [AbstractJestClient] Setting server pool to a list of 1 servers: [http://127.0.0.1:9200]
2018-04-16T13:41:27.820-04:00 INFO [JestClientFactory] Using multi thread/connection supporting pooling connection manager
2018-04-16T13:41:27.879-04:00 INFO [JestClientFactory] Using custom ObjectMapper instance
2018-04-16T13:41:27.879-04:00 INFO [JestClientFactory] Node Discovery disabled...
2018-04-16T13:41:27.880-04:00 INFO [JestClientFactory] Idle connection reaping disabled...
2018-04-16T13:41:28.080-04:00 INFO [ProcessBuffer] Initialized ProcessBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2018-04-16T13:41:29.616-04:00 INFO [RulesEngineProvider] Using rules: /etc/graylog/server/rules.drl
2018-04-16T13:41:29.703-04:00 INFO [OutputBuffer] Initialized OutputBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2018-04-16T13:41:29.957-04:00 INFO [ServerBootstrap] Graylog server 2.3.2+3df951e starting up
2018-04-16T13:41:29.957-04:00 INFO [ServerBootstrap] JRE: Oracle Corporation 1.8.0_91 on Linux 3.10.0-229.20.1.el7.x86_64
2018-04-16T13:41:29.957-04:00 INFO [ServerBootstrap] Deployment: rpm
2018-04-16T13:41:29.957-04:00 INFO [ServerBootstrap] OS: CentOS Linux 7 (Core) (centos)
2018-04-16T13:41:29.958-04:00 INFO [ServerBootstrap] Arch: amd64
2018-04-16T13:41:29.960-04:00 WARN [DeadEventLoggingListener] Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2018-04-16T13:41:29.978-04:00 INFO [PeriodicalsService] Starting 22 periodicals ...
2018-04-16T13:41:29.979-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.ThroughputCalculator] periodical in [0s], polling every [1s].
2018-04-16T13:41:29.981-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.AlertScannerThread] periodical in [10s], polling every [45s].
2018-04-16T13:41:29.982-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.BatchedElasticSearchOutputFlushThread] periodical in [0s], polling every [1s].
2018-04-16T13:41:29.984-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [120s], polling every [20s].
2018-04-16T13:41:29.984-04:00 INFO [PeriodicalsService] Not starting [org.graylog2.periodical.ContentPackLoaderPeriodical] periodical. Not configured to run on this node.
2018-04-16T13:41:29.984-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2018-04-16T13:41:29.985-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2018-04-16T13:41:29.986-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRetentionThread] periodical in [0s], polling every [300s].
2018-04-16T13:41:29.987-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRotationThread] periodical in [0s], polling every [10s].
2018-04-16T13:41:29.988-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.NodePingThread] periodical in [0s], polling every [1s].
2018-04-16T13:41:29.989-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.VersionCheckThread] periodical in [300s], polling every [1800s].
2018-04-16T13:41:29.990-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.ThrottleStateUpdaterThread] periodical in [1s], polling every [1s].
2018-04-16T13:41:29.990-04:00 INFO [Periodicals] Starting [org.graylog2.events.ClusterEventPeriodical] periodical in [0s], polling every [1s].
2018-04-16T13:41:29.991-04:00 INFO [Periodicals] Starting [org.graylog2.events.ClusterEventCleanupPeriodical] periodical in [0s], polling every [86400s].
2018-04-16T13:41:29.991-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.ClusterIdGeneratorPeriodical] periodical, running forever.
2018-04-16T13:41:29.992-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRangesMigrationPeriodical] periodical, running forever.
2018-04-16T13:41:29.992-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRangesCleanupPeriodical] periodical in [15s], polling every [3600s].
2018-04-16T13:41:29.992-04:00 INFO [connection] Opened connection [connectionId{localValue:3, serverValue:89}] to 127.0.0.1:27017
2018-04-16T13:41:29.992-04:00 INFO [connection] Opened connection [connectionId{localValue:4, serverValue:90}] to 127.0.0.1:27017
2018-04-16T13:41:29.999-04:00 INFO [connection] Opened connection [connectionId{localValue:5, serverValue:91}] to 127.0.0.1:27017
2018-04-16T13:41:30.002-04:00 INFO [PeriodicalsService] Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2018-04-16T13:41:30.002-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.AlarmCallbacksMigrationPeriodical] periodical, running forever.
2018-04-16T13:41:30.002-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.ConfigurationManagementPeriodical] periodical, running forever.
2018-04-16T13:41:30.006-04:00 INFO [PeriodicalsService] Not starting [org.graylog2.periodical.LdapGroupMappingMigration] periodical. Not configured to run on this node.
2018-04-16T13:41:30.007-04:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexFailuresPeriodical] periodical, running forever.
2018-04-16T13:41:30.036-04:00 INFO [IndexRetentionThread] Elasticsearch cluster not available, skipping index retention checks.
2018-04-16T13:41:30.037-04:00 ERROR [Cluster] Couldn't read cluster health for indices [graylog2_*] (Could not connect to http://127.0.0.1:9200)
2018-04-16T13:41:30.037-04:00 INFO [IndexerClusterCheckerThread] Indexer not fully initialized yet. Skipping periodic cluster check.
2018-04-16T13:41:30.046-04:00 INFO [JerseyService] Enabling CORS for HTTP endpoint
2018-04-16T13:41:30.051-04:00 INFO [V20161130141500_DefaultStreamRecalcIndexRanges] Cluster not connected yet, delaying migration until it is reachable.
2018-04-16T13:41:37.817-04:00 INFO [NetworkListener] Started listener bound to [0.0.0.0:12900]
2018-04-16T13:41:37.818-04:00 INFO [HttpServer] [HttpServer] Started.
2018-04-16T13:41:37.818-04:00 INFO [JerseyService] Started REST API at <http://0.0.0.0:12900/>
2018-04-16T13:41:39.996-04:00 INFO [NetworkListener] Started listener bound to [127.0.0.1:9000]
2018-04-16T13:41:39.997-04:00 INFO [HttpServer] [HttpServer-1] Started.
2018-04-16T13:41:39.997-04:00 INFO [JerseyService] Started Web Interface at <http://127.0.0.1:9000/>
2018-04-16T13:41:39.998-04:00 INFO [ServiceManagerListener] Services are healthy
I would like to answer your question with a link to the elastic discourse:
Graylog is using the REST interface of Elasticsearch now - thats why you need to connect to Port 9200 (if you did not change tat in the Elasticsearch configuration.
If you bound something to a specific interface, it will only answer calls to that interface - not any othern. Unless you bound to multiple interfaces. But you bound Elasticsearch to a specific.
I addition remove all settings from the server.conf that are mentioned in the upgrade guide as beeing removed …
I don’t understand your point, in the last post I have set Graylog to use the REST interface of Elasticsearch on port 9200… have I not? My elasticsearch is bound to that same IP address. I suspect you may have been reading my post while I was still editing it.
Also, I did remove all the settings mentioned in your upgrade guide, they are commented out in the server.conf in my last post.
What I see: Server.log in the above post shows Graylog trying to reach Elasticsearch at 127.0.0.1:9200, but I have 10.1.115.2:9200 defined in server.conf for elasticsearch_host - so why is it using the loopback? Also note I have 10.115.2 bound in elasticsearch.yml - is there a new setting I need there?