Graylog / ElasticSearch Index Error - Graylog Dead

Hi

we have faced today morning an error message in graylog, that about 150.000 messages could not be indexed

I’ve tried to change the index from 1 shard to 2 shards and used the option to reindex (or similar) in the graylog webinterface.

since that, the index is not working anymore and only raises exception messages in /var/log/graylog-server/server.log and in /var/log/elasticsearch.

We are also unable to search or do anything withing graylog, except start and stopping inputs.

MongoDB:

2018-03-28T09:27:51.411+0200 ***** SERVER RESTARTED *****
2018-03-28T09:27:51.414+0200 [initandlisten] MongoDB starting : pid=13245 port=27017 dbpath=/media/data/mongodb/db/ 64-bit host=SYSLOG01
2018-03-28T09:27:51.414+0200 [initandlisten]
2018-03-28T09:27:51.414+0200 [initandlisten] ** WARNING: You are running on a NUMA machine.
2018-03-28T09:27:51.414+0200 [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
2018-03-28T09:27:51.414+0200 [initandlisten] **              numactl --interleave=all mongod [other options]
2018-03-28T09:27:51.414+0200 [initandlisten]
2018-03-28T09:27:51.414+0200 [initandlisten] db version v2.6.10
2018-03-28T09:27:51.414+0200 [initandlisten] git version: nogitversion
2018-03-28T09:27:51.414+0200 [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2018-03-28T09:27:51.414+0200 [initandlisten] build info: Linux lgw01-12 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 BOOST_LIB_VERSION=1_58
2018-03-28T09:27:51.414+0200 [initandlisten] allocator: tcmalloc
2018-03-28T09:27:51.414+0200 [initandlisten] options: { config: "/etc/mongodb.conf", net: { bindIp: "127.0.0.1" }, storage: { dbPath: "/media/data/mongodb/db/", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongodb.log" } }
2018-03-28T09:27:51.417+0200 [initandlisten] journal dir=/media/data/mongodb/db/journal
2018-03-28T09:27:51.417+0200 [initandlisten] recover : no journal files present, no recovery needed
2018-03-28T09:27:51.505+0200 [initandlisten] waiting for connections on port 27017
2018-03-28T09:28:51.507+0200 [clientcursormon] mem (MB) res:60 virt:795
2018-03-28T09:28:51.507+0200 [clientcursormon]  mapped (incl journal view):576
2018-03-28T09:28:51.507+0200 [clientcursormon]  connections:0
2018-03-28T09:30:43.110+0200 [initandlisten] connection accepted from 127.0.0.1:37300 #1 (1 connection now open)
2018-03-28T09:30:43.150+0200 [initandlisten] connection accepted from 127.0.0.1:37302 #2 (2 connections now open)
2018-03-28T09:30:45.773+0200 [initandlisten] connection accepted from 127.0.0.1:37304 #3 (3 connections now open)
2018-03-28T09:31:01.606+0200 [initandlisten] connection accepted from 127.0.0.1:37306 #4 (4 connections now open)
2018-03-28T09:31:01.607+0200 [initandlisten] connection accepted from 127.0.0.1:37308 #5 (5 connections now open)
2018-03-28T09:31:01.630+0200 [initandlisten] connection accepted from 127.0.0.1:37310 #6 (6 connections now open)
2018-03-28T09:31:01.630+0200 [initandlisten] connection accepted from 127.0.0.1:37312 #7 (7 connections now open)

Graylog.log:

[2018-03-28 09:29:35,814][INFO ][node                     ] [Ghost Dancer] version[2.4.6], pid[13293], build[5376dca/2017-07-18T12:17:44Z]
[2018-03-28 09:29:35,815][INFO ][node                     ] [Ghost Dancer] initializing ...
[2018-03-28 09:29:36,170][INFO ][plugins                  ] [Ghost Dancer] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2018-03-28 09:29:36,209][INFO ][env                      ] [Ghost Dancer] using [1] data paths, mounts [[/media/data (/dev/sdb1)]], net usable_space [5.8tb], net total_space [8.6tb], spins? [no], types [ext4]
[2018-03-28 09:29:36,209][INFO ][env                      ] [Ghost Dancer] heap size [9.8gb], compressed ordinary object pointers [true]
[2018-03-28 09:29:37,855][INFO ][node                     ] [Ghost Dancer] initialized
[2018-03-28 09:29:37,855][INFO ][node                     ] [Ghost Dancer] starting ...
[2018-03-28 09:29:38,086][INFO ][transport                ] [Ghost Dancer] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-03-28 09:29:38,091][INFO ][discovery                ] [Ghost Dancer] graylog/lCr9JHLhRZWbydHnzZxXpg
[2018-03-28 09:29:41,170][INFO ][cluster.service          ] [Ghost Dancer] new_master {Ghost Dancer}{lCr9JHLhRZWbydHnzZxXpg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2018-03-28 09:29:41,302][INFO ][http                     ] [Ghost Dancer] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2018-03-28 09:29:41,303][INFO ][node                     ] [Ghost Dancer] started
[2018-03-28 09:29:42,009][INFO ][gateway                  ] [Ghost Dancer] recovered [143] indices into cluster_state

Elasticsearch:

018-03-28T09:30:40.113+02:00 INFO  [CmdLineTool] Loaded plugin: AWS plugins 2.4.3 [org.graylog.aws.plugin.AWSPlugin]
2018-03-28T09:30:40.115+02:00 INFO  [CmdLineTool] Loaded plugin: Elastic Beats Input 2.4.3 [org.graylog.plugins.beats.BeatsInputPlugin]
2018-03-28T09:30:40.115+02:00 INFO  [CmdLineTool] Loaded plugin: CEF Input 1.0.0 [org.graylog.plugins.cef.CEFInputPlugin]
2018-03-28T09:30:40.115+02:00 INFO  [CmdLineTool] Loaded plugin: Collector 2.4.3 [org.graylog.plugins.collector.CollectorPlugin]
2018-03-28T09:30:40.116+02:00 INFO  [CmdLineTool] Loaded plugin: Enterprise Integration Plugin 2.4.3 [org.graylog.plugins.enterprise_integration.EnterpriseIntegrationPlugin]
2018-03-28T09:30:40.117+02:00 INFO  [CmdLineTool] Loaded plugin: Internal Logs plugin 2.4.0 [org.graylog.plugins.internallogs.InternalLogsInputPlugin]
2018-03-28T09:30:40.117+02:00 INFO  [CmdLineTool] Loaded plugin: MapWidgetPlugin 2.4.3 [org.graylog.plugins.map.MapWidgetPlugin]
2018-03-28T09:30:40.117+02:00 INFO  [CmdLineTool] Loaded plugin: NetFlow Plugin 2.4.3 [org.graylog.plugins.netflow.NetFlowPlugin]
2018-03-28T09:30:40.122+02:00 INFO  [CmdLineTool] Loaded plugin: Pipeline Processor Plugin 2.2.0 [org.graylog.plugins.pipelineprocessor.ProcessorPlugin]
2018-03-28T09:30:40.123+02:00 INFO  [CmdLineTool] Loaded plugin: Threat Intelligence Plugin 2.4.3 [org.graylog.plugins.threatintel.ThreatIntelPlugin]
2018-03-28T09:30:40.123+02:00 INFO  [CmdLineTool] Loaded plugin: SnmpPlugin 0.3.0 [org.graylog.snmp.SnmpPlugin]
2018-03-28T09:30:40.366+02:00 INFO  [CmdLineTool] Running with JVM arguments: -Xms15g -Xmx15g -XX:NewRatio=1 -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=deb
2018-03-28T09:30:40.526+02:00 INFO  [Version] HV000001: Hibernate Validator 5.1.3.Final
2018-03-28T09:30:42.298+02:00 INFO  [InputBufferImpl] Message journal is enabled.
2018-03-28T09:30:42.313+02:00 INFO  [NodeId] Node ID: 689bbe87-2d30-46b1-b1b0-bab46e8f5e0d
2018-03-28T09:30:42.475+02:00 INFO  [LogManager] Loading logs.
2018-03-28T09:30:42.507+02:00 WARN  [Log] Found a corrupted index file, /var/lib/graylog-server/journal/messagejournal-0/00000000002887846478.index, deleting and rebuilding index...
2018-03-28T09:30:43.021+02:00 INFO  [LogManager] Logs loading complete.
2018-03-28T09:30:43.022+02:00 INFO  [KafkaJournal] Initialized Kafka based journal at /var/lib/graylog-server/journal
2018-03-28T09:30:43.049+02:00 INFO  [InputBufferImpl] Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 20 parallel message handlers.
2018-03-28T09:30:43.068+02:00 INFO  [cluster] Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=5000}
2018-03-28T09:30:43.117+02:00 INFO  [cluster] No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=localhost:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2018-03-28T09:30:43.138+02:00 INFO  [connection] Opened connection [connectionId{localValue:1, serverValue:1}] to localhost:27017
2018-03-28T09:30:43.140+02:00 INFO  [cluster] Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[2, 6, 10]}, minWireVersion=0, maxWireVersion=2, maxDocumentSize=16777216, roundTripTimeNanos=487680}
2018-03-28T09:30:43.153+02:00 INFO  [connection] Opened connection [connectionId{localValue:2, serverValue:2}] to localhost:27017
2018-03-28T09:30:43.469+02:00 INFO  [AbstractJestClient] Setting server pool to a list of 1 servers: [http://127.0.0.1:9200]
2018-03-28T09:30:43.470+02:00 INFO  [JestClientFactory] Using multi thread/connection supporting pooling connection manager
2018-03-28T09:30:43.525+02:00 INFO  [JestClientFactory] Using custom ObjectMapper instance
2018-03-28T09:30:43.525+02:00 INFO  [JestClientFactory] Node Discovery disabled...
2018-03-28T09:30:43.525+02:00 INFO  [JestClientFactory] Idle connection reaping disabled...
2018-03-28T09:30:43.730+02:00 INFO  [ProcessBuffer] Initialized ProcessBuffer with ring size <131072> and wait strategy <BlockingWaitStrategy>.
2018-03-28T09:30:45.295+02:00 INFO  [RulesEngineProvider] Using rules: /etc/graylog/server/rules.drl
2018-03-28T09:30:45.552+02:00 INFO  [OutputBuffer] Initialized OutputBuffer with ring size <131072> and wait strategy <BlockingWaitStrategy>.
2018-03-28T09:30:45.774+02:00 INFO  [connection] Opened connection [connectionId{localValue:3, serverValue:3}] to localhost:27017
2018-03-28T09:31:01.535+02:00 INFO  [ServerBootstrap] Graylog server 2.4.3+2c41897 starting up
2018-03-28T09:31:01.536+02:00 INFO  [ServerBootstrap] JRE: Oracle Corporation 1.8.0_151 on Linux 4.4.0-116-generic
2018-03-28T09:31:01.536+02:00 INFO  [ServerBootstrap] Deployment: deb
2018-03-28T09:31:01.536+02:00 INFO  [ServerBootstrap] OS: Ubuntu 16.04.4 LTS (xenial)
2018-03-28T09:31:01.536+02:00 INFO  [ServerBootstrap] Arch: amd64
2018-03-28T09:31:01.539+02:00 WARN  [DeadEventLoggingListener] Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2018-03-28T09:31:01.564+02:00 INFO  [PeriodicalsService] Starting 25 periodicals ...
2018-03-28T09:31:01.565+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThroughputCalculator] periodical in [0s], polling every [1s].
2018-03-28T09:31:01.571+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlertScannerThread] periodical in [10s], polling every [60s].
2018-03-28T09:31:01.585+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.BatchedElasticSearchOutputFlushThread] periodical in [0s], polling every [1s].
2018-03-28T09:31:01.590+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [120s], polling every [20s].
2018-03-28T09:31:01.591+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ContentPackLoaderPeriodical] periodical, running forever.
2018-03-28T09:31:01.595+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2018-03-28T09:31:01.597+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2018-03-28T09:31:01.599+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRetentionThread] periodical in [0s], polling every [300s].
2018-03-28T09:31:01.602+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRotationThread] periodical in [0s], polling every [10s].
2018-03-28T09:31:01.605+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.NodePingThread] periodical in [0s], polling every [1s].
2018-03-28T09:31:01.607+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.VersionCheckThread] periodical in [300s], polling every [1800s].
2018-03-28T09:31:01.607+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThrottleStateUpdaterThread] periodical in [1s], polling every [1s].
2018-03-28T09:31:01.609+02:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventPeriodical] periodical in [0s], polling every [1s].
2018-03-28T09:31:01.609+02:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventCleanupPeriodical] periodical in [0s], polling every [86400s].
2018-03-28T09:31:01.614+02:00 INFO  [connection] Opened connection [connectionId{localValue:4, serverValue:4}] to localhost:27017
2018-03-28T09:31:01.614+02:00 INFO  [connection] Opened connection [connectionId{localValue:5, serverValue:5}] to localhost:27017
2018-03-28T09:31:01.614+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterIdGeneratorPeriodical] periodical, running forever.
2018-03-28T09:31:01.615+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesMigrationPeriodical] periodical, running forever.
2018-03-28T09:31:01.617+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesCleanupPeriodical] periodical in [15s], polling every [3600s].
2018-03-28T09:31:01.631+02:00 INFO  [connection] Opened connection [connectionId{localValue:7, serverValue:6}] to localhost:27017
2018-03-28T09:31:01.631+02:00 INFO  [connection] Opened connection [connectionId{localValue:6, serverValue:7}] to localhost:27017
2018-03-28T09:31:01.657+02:00 INFO  [PeriodicalsService] Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2018-03-28T09:31:01.658+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlarmCallbacksMigrationPeriodical] periodical, running forever.
2018-03-28T09:31:01.658+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ConfigurationManagementPeriodical] periodical, running forever.
2018-03-28T09:31:01.678+02:00 INFO  [PeriodicalsService] Not starting [org.graylog2.periodical.LdapGroupMappingMigration] periodical. Not configured to run on this node.
2018-03-28T09:31:01.679+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexFailuresPeriodical] periodical, running forever.
2018-03-28T09:31:01.679+02:00 INFO  [Periodicals] Starting [org.graylog2.periodical.TrafficCounterCalculator] periodical in [0s], polling every [1s].
2018-03-28T09:31:01.682+02:00 INFO  [Periodicals] Starting [org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration] periodical, running forever.
2018-03-28T09:31:01.682+02:00 INFO  [Periodicals] Starting [org.graylog.plugins.collector.periodical.PurgeExpiredCollectorsThread] periodical in [0s], polling every [3600s].
2018-03-28T09:31:01.686+02:00 INFO  [LegacyDefaultStreamMigration] Legacy default stream has no connections, no migration needed.
2018-03-28T09:31:01.717+02:00 INFO  [LookupTableService] Data Adapter otx-api-domain/5a786bdacd455e047f8b5148 [@27a16870] STARTING
2018-03-28T09:31:01.728+02:00 INFO  [LookupTableService] Data Adapter otx-api-ip/5a786bdacd455e047f8b5146 [@2229c67a] STARTING
2018-03-28T09:31:01.729+02:00 WARN  [OTXDataAdapter] OTX API key is missing. Make sure to add the key to allow higher request limits.
2018-03-28T09:31:01.729+02:00 WARN  [OTXDataAdapter] OTX API key is missing. Make sure to add the key to allow higher request limits.
2018-03-28T09:31:01.738+02:00 INFO  [LookupTableService] Data Adapter whois/5a786bdacd455e047f8b5142 [@4a1f78b6] STARTING
2018-03-28T09:31:01.741+02:00 INFO  [LookupTableService] Data Adapter tor-exit-node/5a786bdacd455e047f8b5143 [@2b732eca] STARTING
2018-03-28T09:31:01.741+02:00 INFO  [LookupTableService] Data Adapter whois/5a786bdacd455e047f8b5142 [@4a1f78b6] RUNNING
2018-03-28T09:31:01.744+02:00 INFO  [LookupTableService] Data Adapter spamhaus-drop/5a786bdacd455e047f8b5149 [@4712eade] STARTING
2018-03-28T09:31:01.746+02:00 INFO  [LookupTableService] Data Adapter abuse-ch-ransomware-ip/5a786bdacd455e047f8b5145 [@502d6653] STARTING
2018-03-28T09:31:01.750+02:00 INFO  [LookupTableService] Data Adapter abuse-ch-ransomware-domains/5a786bdacd455e047f8b5144 [@5deeda51] STARTING
2018-03-28T09:31:01.757+02:00 INFO  [LookupTableService] Data Adapter otx-api-ip/5a786bdacd455e047f8b5146 [@2229c67a] RUNNING
2018-03-28T09:31:01.757+02:00 INFO  [LookupTableService] Data Adapter otx-api-domain/5a786bdacd455e047f8b5148 [@27a16870] RUNNING
2018-03-28T09:31:01.781+02:00 INFO  [LookupTableService] Cache whois-cache/5a786bdacd455e047f8b513e [@55dd6f0c] STARTING
2018-03-28T09:31:01.788+02:00 INFO  [LookupTableService] Cache otx-api-domain-cache/5a786bdacd455e047f8b513f [@6f093caa] STARTING
2018-03-28T09:31:01.790+02:00 INFO  [LookupTableService] Cache whois-cache/5a786bdacd455e047f8b513e [@55dd6f0c] RUNNING
2018-03-28T09:31:01.793+02:00 INFO  [LookupTableService] Cache threat-intel-uncached-adapters/5a786bdacd455e047f8b5141 [@1a442e3c] STARTING
2018-03-28T09:31:01.794+02:00 INFO  [LookupTableService] Cache threat-intel-uncached-adapters/5a786bdacd455e047f8b5141 [@1a442e3c] RUNNING
2018-03-28T09:31:01.793+02:00 INFO  [LookupTableService] Cache otx-api-domain-cache/5a786bdacd455e047f8b513f [@6f093caa] RUNNING
2018-03-28T09:31:01.793+02:00 INFO  [LookupTableService] Cache otx-api-ip-cache/5a786bdacd455e047f8b513d [@83d8ea2] STARTING
2018-03-28T09:31:01.794+02:00 INFO  [LookupTableService] Cache otx-api-ip-cache/5a786bdacd455e047f8b513d [@83d8ea2] RUNNING
2018-03-28T09:31:01.793+02:00 INFO  [LookupTableService] Cache spamhaus-e-drop-cache/5a786bdacd455e047f8b5140 [@5380ae08] STARTING
2018-03-28T09:31:01.794+02:00 INFO  [LookupTableService] Cache spamhaus-e-drop-cache/5a786bdacd455e047f8b5140 [@5380ae08] RUNNING
2018-03-28T09:31:01.922+02:00 INFO  [IndexRetentionThread] Elasticsearch cluster not available, skipping index retention checks.
2018-03-28T09:31:02.100+02:00 INFO  [JerseyService] Enabling CORS for HTTP endpoint
2018-03-28T09:31:02.333+02:00 INFO  [LookupTableService] Data Adapter abuse-ch-ransomware-ip/5a786bdacd455e047f8b5145 [@502d6653] RUNNING
2018-03-28T09:31:02.333+02:00 INFO  [LookupDataAdapterRefreshService] Adding job for <abuse-ch-ransomware-ip/5a786bdacd455e047f8b5145/@502d6653> [interval=150000ms]
2018-03-28T09:31:02.347+02:00 INFO  [LookupDataAdapterRefreshService] Adding job for <spamhaus-drop/5a786bdacd455e047f8b5149/@4712eade> [interval=43200000ms]
2018-03-28T09:31:02.346+02:00 INFO  [LookupTableService] Data Adapter spamhaus-drop/5a786bdacd455e047f8b5149 [@4712eade] RUNNING
2018-03-28T09:31:02.358+02:00 INFO  [LookupTableService] Data Adapter abuse-ch-ransomware-domains/5a786bdacd455e047f8b5144 [@5deeda51] RUNNING
2018-03-28T09:31:02.358+02:00 INFO  [LookupDataAdapterRefreshService] Adding job for <abuse-ch-ransomware-domains/5a786bdacd455e047f8b5144/@5deeda51> [interval=150000ms]
2018-03-28T09:31:02.647+02:00 INFO  [LookupTableService] Data Adapter tor-exit-node/5a786bdacd455e047f8b5143 [@2b732eca] RUNNING
2018-03-28T09:31:02.647+02:00 INFO  [LookupDataAdapterRefreshService] Adding job for <tor-exit-node/5a786bdacd455e047f8b5143/@2b732eca> [interval=3600000ms]
2018-03-28T09:31:02.655+02:00 INFO  [LookupTableService] Starting lookup table otx-api-ip/5a786bdacd455e047f8b514a [@4fdbd9f6] using cache otx-api-ip-cache/5a786bdacd455e047f8b513d [@83d8ea2], data adapter otx-api-ip/5a786bdacd455e047f8b5146 [@2229c67a]
2018-03-28T09:31:02.656+02:00 INFO  [LookupTableService] Starting lookup table otx-api-domain/5a786bdacd455e047f8b514b [@5c4993] using cache otx-api-domain-cache/5a786bdacd455e047f8b513f [@6f093caa], data adapter otx-api-domain/5a786bdacd455e047f8b5148 [@27a16870]
2018-03-28T09:31:02.656+02:00 INFO  [LookupTableService] Starting lookup table abuse-ch-ransomware-domains/5a786bdacd455e047f8b514c [@29a27563] using cache threat-intel-uncached-adapters/5a786bdacd455e047f8b5141 [@1a442e3c], data adapter abuse-ch-ransomware-domains/5a786bdacd455e047f8b5144 [@5deeda51]
2018-03-28T09:31:02.656+02:00 INFO  [LookupTableService] Starting lookup table whois/5a786bdacd455e047f8b514d [@47699322] using cache whois-cache/5a786bdacd455e047f8b513e [@55dd6f0c], data adapter whois/5a786bdacd455e047f8b5142 [@4a1f78b6]
2018-03-28T09:31:02.657+02:00 INFO  [LookupTableService] Starting lookup table abuse-ch-ransomware-ip/5a786bdacd455e047f8b514e [@566cf650] using cache threat-intel-uncached-adapters/5a786bdacd455e047f8b5141 [@1a442e3c], data adapter abuse-ch-ransomware-ip/5a786bdacd455e047f8b5145 [@502d6653]
2018-03-28T09:31:02.657+02:00 INFO  [LookupTableService] Starting lookup table tor-exit-node-list/5a786bdacd455e047f8b514f [@341e2ddf] using cache threat-intel-uncached-adapters/5a786bdacd455e047f8b5141 [@1a442e3c], data adapter tor-exit-node/5a786bdacd455e047f8b5143 [@2b732eca]
2018-03-28T09:31:02.657+02:00 INFO  [LookupTableService] Starting lookup table spamhaus-drop/5a786bdacd455e047f8b5150 [@e2d5f52] using cache spamhaus-e-drop-cache/5a786bdacd455e047f8b5140 [@5380ae08], data adapter spamhaus-drop/5a786bdacd455e047f8b5149 [@4712eade]
2018-03-28T09:31:10.079+02:00 INFO  [NetworkListener] Started listener bound to [172.16.2.119:9000]
2018-03-28T09:31:10.081+02:00 INFO  [HttpServer] [HttpServer] Started.
2018-03-28T09:31:10.081+02:00 INFO  [JerseyService] Started REST API at <http://172.16.2.119:9000/api/>
2018-03-28T09:31:10.081+02:00 INFO  [JerseyService] Started Web Interface at <http://172.16.2.119:9000/>
2018-03-28T09:31:10.083+02:00 INFO  [ServiceManagerListener] Services are healthy
2018-03-28T09:31:10.083+02:00 INFO  [ServerBootstrap] Services started, startup times in ms: {InputSetupService [RUNNING]=4, JournalReader [RUNNING]=4, OutputSetupService [RUNNING]=7, ConfigurationEtagService [RUNNING]=14, BufferSynchronizerService [RUNNING]=16, KafkaJournal [RUNNING]=20, StreamCacheService [RUNNING]=90, PeriodicalsService [RUNNING]=123, LookupTableService [RUNNING]=1091, JerseyService [RUNNING]=8519}
2018-03-28T09:31:10.084+02:00 INFO  [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2018-03-28T09:31:10.089+02:00 INFO  [ServerBootstrap] Graylog server up and running.
2018-03-28T09:31:10.123+02:00 INFO  [InputStateListener] Input [GELF UDP/599dc721cd455e354f2727e3] is now STARTING
2018-03-28T09:31:10.124+02:00 INFO  [InputStateListener] Input [Syslog UDP/598c1eaccd455e75ef7ef4f0] is now STARTING
2018-03-28T09:31:10.126+02:00 INFO  [InputStateListener] Input [CEF TCP Input/5ab21ad8cd455e22b63da370] is now STARTING
2018-03-28T09:31:10.127+02:00 INFO  [InputStateListener] Input [Raw/Plaintext UDP/5a86a2d6cd455e22b60e8489] is now STARTING
2018-03-28T09:31:10.127+02:00 INFO  [InputStateListener] Input [GELF TCP/599d1056cd455e2a5b920def] is now STARTING
2018-03-28T09:31:10.129+02:00 INFO  [InputStateListener] Input [SNMP UDP/5a546250cd455e228c318e5e] is now STARTING
2018-03-28T09:31:10.130+02:00 INFO  [InputStateListener] Input [Syslog TCP/598c1e8ecd455e75ef7ef4ca] is now STARTING
2018-03-28T09:31:11.011+02:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Syslog UDP, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} should be 262144 but is 212992.
2018-03-28T09:31:11.013+02:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=Windows Events GELF UDP, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=null} should be 262144 but is 212992.
2018-03-28T09:31:11.177+02:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input CEFTCPInput{title=TCP CEF , type=org.graylog.plugins.cef.input.CEFTCPInput, nodeId=null} should be 1048576 but is 212992.
2018-03-28T09:31:11.177+02:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input SyslogTCPInput{title=Syslog TCP , type=org.graylog2.inputs.syslog.tcp.SyslogTCPInput, nodeId=null} should be 1048576 but is 212992.
2018-03-28T09:31:11.177+02:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFTCPInput{title=Windows Events GELF TCP, type=org.graylog2.inputs.gelf.tcp.GELFTCPInput, nodeId=null} should be 1048576 but is 212992.
2018-03-28T09:31:11.182+02:00 INFO  [InputStateListener] Input [GELF UDP/599dc721cd455e354f2727e3] is now RUNNING
2018-03-28T09:31:11.184+02:00 INFO  [InputStateListener] Input [CEF TCP Input/5ab21ad8cd455e22b63da370] is now RUNNING
2018-03-28T09:31:11.526+02:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input SnmpUDPInput{title=SNMP Trap, type=org.graylog.snmp.input.SnmpUDPInput, nodeId=null} should be 262144 but is 212992.
2018-03-28T09:31:11.527+02:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input RawUDPInput{title=Network Event Firewall, type=org.graylog2.inputs.raw.udp.RawUDPInput, nodeId=null} should be 262144 but is 212992.
2018-03-28T09:31:11.539+02:00 INFO  [InputStateListener] Input [Syslog TCP/598c1e8ecd455e75ef7ef4ca] is now RUNNING
2018-03-28T09:31:11.546+02:00 INFO  [InputStateListener] Input [Syslog UDP/598c1eaccd455e75ef7ef4f0] is now RUNNING
2018-03-28T09:31:11.547+02:00 INFO  [InputStateListener] Input [Raw/Plaintext UDP/5a86a2d6cd455e22b60e8489] is now RUNNING
2018-03-28T09:31:11.550+02:00 INFO  [InputStateListener] Input [SNMP UDP/5a546250cd455e228c318e5e] is now RUNNING
2018-03-28T09:31:11.551+02:00 INFO  [InputStateListener] Input [GELF TCP/599d1056cd455e2a5b920def] is now RUNNING
2018-03-28T09:31:24.093+02:00 ERROR [DecodingProcessor] Unable to decode raw message RawMessage{id=71db3aa1-3252-11e8-9161-f403433d1b68, journalOffset=2887087277, codec=syslog, payloadSize=920, timestamp=2018-03-28T06:37:11.370Z, remoteAddress=/172.16.2.237:48001} on input <598c1eaccd455e75ef7ef4f0>.
2018-03-28T09:31:24.101+02:00 ERROR [DecodingProcessor] Unable to decode raw message RawMessage{id=71db3aa3-3252-11e8-9161-f403433d1b68, journalOffset=2887087279, codec=syslog, payloadSize=920, timestamp=2018-03-28T06:37:11.370Z, remoteAddress=/172.16.2.237:48001} on input <598c1eaccd455e75ef7ef4f0>.
2018-03-28T09:31:24.101+02:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=71db3aa3-3252-11e8-9161-f403433d1b68, journalOffset=2887087279, codec=syslog, payloadSize=920, timestamp=2018-03-28T06:37:11.370Z, remoteAddress=/172.16.2.237:48001}
java.lang.IllegalArgumentException: Invalid format: "922-21a0-0017a4770004/" is malformed at "a0-0017a4770004/"
        at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:945) ~[graylog.jar:?]
        at org.joda.time.DateTime.parse(DateTime.java:160) ~[graylog.jar:?]
        at org.joda.time.DateTime.parse(DateTime.java:149) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.parseDate(SyslogServerEvent.java:108) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.parsePriority(SyslogServerEvent.java:136) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.parse(SyslogServerEvent.java:152) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.<init>(SyslogServerEvent.java:50) ~[graylog.jar:?]
        at org.graylog2.inputs.codecs.SyslogCodec.parse(SyslogCodec.java:132) ~[graylog.jar:?]
        at org.graylog2.inputs.codecs.SyslogCodec.decode(SyslogCodec.java:96) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:150) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:91) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
        at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
2018-03-28T09:31:24.095+02:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=71db3aa1-3252-11e8-9161-f403433d1b68, journalOffset=2887087277, codec=syslog, payloadSize=920, timestamp=2018-03-28T06:37:11.370Z, remoteAddress=/172.16.2.237:48001}
java.lang.IllegalArgumentException: Invalid format: "2c4-5920094a-0d1a-0017a4" is malformed at "c4-5920094a-0d1a-0017a4"
        at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:945) ~[graylog.jar:?]
        at org.joda.time.DateTime.parse(DateTime.java:160) ~[graylog.jar:?]
        at org.joda.time.DateTime.parse(DateTime.java:149) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.parseDate(SyslogServerEvent.java:108) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.parsePriority(SyslogServerEvent.java:136) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.parse(SyslogServerEvent.java:152) ~[graylog.jar:?]
        at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.<init>(SyslogServerEvent.java:50) ~[graylog.jar:?]
        at org.graylog2.inputs.codecs.SyslogCodec.parse(SyslogCodec.java:132) ~[graylog.jar:?]
        at org.graylog2.inputs.codecs.SyslogCodec.decode(SyslogCodec.java:96) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:150) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:91) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
        at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
2018-03-28T09:32:12.986+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T09:32:13.441+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T09:33:04.287+02:00 INFO  [InputStateListener] Input [Raw/Plaintext UDP/5a86a2d6cd455e22b60e8489] is now STOPPING
2018-03-28T09:33:04.320+02:00 INFO  [InputStateListener] Input [Raw/Plaintext UDP/5a86a2d6cd455e22b60e8489] is now STOPPED
2018-03-28T09:33:04.321+02:00 INFO  [InputStateListener] Input [Raw/Plaintext UDP/5a86a2d6cd455e22b60e8489] is now TERMINATED
2018-03-28T09:33:04.994+02:00 INFO  [InputStateListener] Input [SNMP UDP/5a546250cd455e228c318e5e] is now STOPPING
2018-03-28T09:33:05.017+02:00 INFO  [InputStateListener] Input [SNMP UDP/5a546250cd455e228c318e5e] is now STOPPED
2018-03-28T09:33:05.018+02:00 INFO  [InputStateListener] Input [SNMP UDP/5a546250cd455e228c318e5e] is now TERMINATED
2018-03-28T09:33:05.537+02:00 INFO  [InputStateListener] Input [Syslog TCP/598c1e8ecd455e75ef7ef4ca] is now STOPPING
2018-03-28T09:33:05.557+02:00 INFO  [InputStateListener] Input [Syslog TCP/598c1e8ecd455e75ef7ef4ca] is now STOPPED
2018-03-28T09:33:05.558+02:00 INFO  [InputStateListener] Input [Syslog TCP/598c1e8ecd455e75ef7ef4ca] is now TERMINATED
2018-03-28T09:33:06.695+02:00 INFO  [InputStateListener] Input [Syslog UDP/598c1eaccd455e75ef7ef4f0] is now STOPPING
2018-03-28T09:33:06.713+02:00 INFO  [InputStateListener] Input [Syslog UDP/598c1eaccd455e75ef7ef4f0] is now STOPPED
2018-03-28T09:33:06.713+02:00 INFO  [InputStateListener] Input [Syslog UDP/598c1eaccd455e75ef7ef4f0] is now TERMINATED
2018-03-28T09:33:07.847+02:00 INFO  [InputStateListener] Input [GELF TCP/599d1056cd455e2a5b920def] is now STOPPING

Make sure that the Elasticsearch cluster is healthy and able to take new tasks.

thats my problem -> I can’t get it healthy again.

can I change back the settings from shards to 1 and then re-create them?

michael

Try deleting the message journal while Graylog is stopped.

Make sure that you’re only using plugins which are compatible with the version of Graylog you’re running.

Where exactly did you change these settings?

Hi Jochen!

I’ve stopped all services (mongodb, elasticsearch, graylog) and renamed the affected index (graylog_142) to graylog_142_old and restarted the services.

the folder is newly created and it looks like as it’s filling, but I still face below errors in /var/log/elasticsearch and graylog webinterface is not loaded properly

Via the graylog webgui / settings / indices I’ve changed the shards for my index from 1 to 2, then I used the “reindex” option from the webgui (or how it is called, can’t access currently). since that the elasticsearch went down.

the plugins are all compatible, as the setup is running since release of the latest version without problems until yesterday.

2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994dd285-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][5] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [116] requests]"}>
2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994ce809-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][3] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [137] requests]"}>
2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994cc0ff-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][5] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [116] requests]"}>
2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994dd28a-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [125] requests]"}>
2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994dd288-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][7] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [133] requests]"}>
2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994dd28c-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][6] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [115] requests]"}>
2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994dd289-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][3] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [137] requests]"}>
2018-03-28T11:08:14.203+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id=<994df970-3265-11e8-b628-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:"[graylog_142][6] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [115] requests]"}>
2018-03-28T11:08:14.203+02:00 ERROR [Messages] Failed to index [1000] messages. Please check the index error log in your web interface for the reason. Error: One or more of the items in the Bulk request failed, check BulkResult.getItems() for more information.
2018-03-28T11:08:16.784+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).

installed packages:

elasticsearch/unbekannt,now 2.4.6 all [installiert]
graylog-2.4-repository/stable,stable,now 1-5 all [installiert]
graylog-plugin-internal-logs/now 2.4.0 all [Installiert,lokal]
graylog-plugin-snmp/now 0.3.0 all [Installiert,lokal]
graylog-server/stable,stable,now 2.4.3-1 all [installiert]
mongodb-server/xenial,now 1:2.6.10-0ubuntu1 amd64 [installiert]
mongodb-compass/now 1.11.2-1 amd64 [Installiert,lokal]
mongodb-clients/xenial,now 1:2.6.10-0ubuntu1 amd64 [Installiert,automatisch]

No, they’re not. The Graylog Pipeline Processor plugin 2.2.0 is definitely not compatible with Graylog 2.4.3.

Make sure that only compatible plugins are in the plugin_dir of your Graylog nodes (see http://docs.graylog.org/en/2.4/pages/configuration/file_location.html#deb-package).

You should never manually rename any files or directories in the data directory of Graylog or Elasticsearch as this will lead to inconsistencies (as you’ve already experienced).

You can try deleting the affected index (graylog_142) via the Elasticsearch HTTP API (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/indices-delete-index.html), deleting the Graylog deflector alias (graylog_deflector) via the Elasticsearch HTTP API (Index Aliases | Elasticsearch Guide [2.4] | Elastic), all while Graylog is stopped.

Additionally, please try to format your posts for better readability, see Creating and highlighting code blocks - GitHub Docs for examples.

hi

pipeline processor should be fine

root root 5,4M Jän 24 23:30 graylog-plugin-pipeline-processor-2.4.3.jar

That are the currently installed plugins:

rw-r–r-- 1 root root 14M Jän 24 23:30 graylog-plugin-aws-2.4.3.jar
-rw-r–r-- 1 root root 27K Jän 24 23:30 graylog-plugin-beats-2.4.3.jar
-rw-r–r-- 1 root root 59K Jän 24 23:30 graylog-plugin-cef-2.4.3.jar
-rw-r–r-- 1 root root 2,9M Jän 24 23:30 graylog-plugin-collector-2.4.3.jar
-rw-r–r-- 1 root root 4,1M Jän 24 23:30 graylog-plugin-enterprise-integration-2.4.3.jar
-rwxrwxr-x 1 mike mike 7,2M Sep 7 2017 graylog-plugin-input-cef-1.2.0.jar
-rw-r–r-- 1 root root 24K Jän 31 14:55 graylog-plugin-internal-logs-2.4.0.jar
-rw-r–r-- 1 root root 6,4M Jän 24 23:30 graylog-plugin-map-widget-2.4.3.jar
-rw-r–r-- 1 root root 690K Jän 24 23:30 graylog-plugin-netflow-2.4.3.jar
-rw-r–r-- 1 root root 5,4M Jän 24 23:30 graylog-plugin-pipeline-processor-2.4.3.jar
-rw-r–r-- 1 root root 3,5M Aug 15 2015 graylog-plugin-snmp-0.3.0.jar
-rw-r–r-- 1 root root 4,4M Jän 24 23:30 graylog-plugin-threatintel-2.4.3.jar

I’ve successfully deleted the “renamed” index and the new created index via the API call

root@KTMATHQSYSLOG01:/usr/share/graylog-server/plugin# curl -XDELETE ‘http:// localhost:9200/graylog_142_old’
root@KTMATHQSYSLOG01:/usr/share/graylog-server/plugin# curl -XDELETE ‘http:// localhost:9200/graylog_142’

Both returned “acknowledged”: true" and the Folders have been disappeared in the elasticsearch data-folder

I’ve also tried to delete the deflector alias, but always getting an error, that the index is not found

curl -XPOST ‘http:// localhost:9200/_aliases’ -d ’
{
“actions” : [
{ “remove” : { “index” : “graylog”, “alias” : “graylog_deflector” } }
]
}’

Output:

{“error”:{“root_cause”:[{“type”:“index_not_found_exception”,“reason”:“no such index”,“resource.type”:“index_or_alias”,“resource.id”:“graylog”,“index”:“graylog”}],“type”:“index_not_found_exception”,“reason”:“no such index”,“resource.type”:“index_or_alias”,“resource.id”:“graylog”,“index”:“graylog”},“status”:404}

I’ve also changed the “index” graylog to "graylog_141, which is the latest shown folder in elasticsearch

command:

curl -XPOST ‘http:// localhost:9200/_aliases’ -d ’
{
“actions” : [
{ “remove” : { “index” : “graylog_141”, “alias” : “graylog_deflector” } }
]
}’

Output:

{“error”:{“root_cause”:[{“type”:“aliases_not_found_exception”,“reason”:“aliases [graylog_deflector] missing”,“resource.type”:“aliases”,“resource.id”:“graylog_deflector”}],“type”:“aliases_not_found_exception”,“reason”:“aliases [graylog_deflector] missing”,“resource.type”:“aliases”,“resource.id”:“graylog_deflector”},“status”:404}

I’had to modify the HTTP-Links to post the update, as I’m only allowed to “add” 2 hyperlinks currently

I’ve now started the graylog-server again and it looks like as it’s now working again.

I’ve disabled the inputs and I can see, that in “Output” are about 18.000-20.000 messages per second.

In System/Overview is the elasticsearch cluster Green acain with 120 active shards, but Indexer failures are still 148.943 in the last 24h.

If I try to open “Indices & Index Sets” in Graylog Webinterface, it’s only showing “loading”.

the Graylog Server-log shows the following errors:

root@SYSLOG01:/usr/share/graylog-server/plugin# tail -f /var/log/graylog-server/server.log
2018-03-28T12:44:09.067+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][2] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [135] requests]”}>
2018-03-28T12:44:09.067+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][2] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [135] requests]”}>
2018-03-28T12:44:09.067+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][4] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [136] requests]”}>
2018-03-28T12:44:09.067+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][0] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [102] requests]”}>
2018-03-28T12:44:09.067+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][5] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [128] requests]”}>
2018-03-28T12:44:09.067+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][1] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [134] requests]”}>
2018-03-28T12:44:09.067+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][6] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [117] requests]”}>
2018-03-28T12:44:09.067+02:00 ERROR [Messages] Failed to index [1000] messages. Please check the index error log in your web interface for the reason. Error: One or more of the items in the Bulk request failed, check BulkResult.getItems() for more information.
2018-03-28T12:45:08.777+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:45:09.019+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).

and now I saw that error. that brings me back to the config change from today morning, where I have changed the Shards Settings from “1” to “2”. is it possible to change that back via CLI? I didn’t find the setting in graylog nor elasticsearch. but I’m a noob in regards to that topic…

2018-03-28T12:48:09.133+02:00 WARN [Messages] Failed to index message: index=<graylog_142> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][7] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [38] requests]”}>
2018-03-28T12:48:09.133+02:00 ERROR [Messages] Failed to index [227] messages. Please check the index error log in your web interface for the reason. Error: One or more of the items in the Bulk request failed, check BulkResult.getItems() for more information.
2018-03-28T12:49:08.876+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:49:09.150+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:50:08.929+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:50:09.155+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:51:08.966+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:51:09.179+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:52:08.970+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-28T12:52:09.215+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).

good morning

I’ve tried the steps again, but run the remove deflector before deleting → that was succesful.

I now only get parsing errors as below:

2018-03-29T08:08:13.148+02:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=d9464f60-32a0-11e8-b668-f403433d1b68, journalOffset=2913255446, codec=syslog, payloadSize=269, timestamp=2018-03-28T15:58:25.622Z, remoteAddress=/172.16.5.11:1156}
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967) ~[?:1.8.0_151]
at org.graylog2.syslog4j.server.impl.event.CiscoSyslogServerEvent.parseDate(CiscoSyslogServerEvent.java:119) ~[graylog.jar:?]
at org.graylog2.syslog4j.server.impl.event.CiscoSyslogServerEvent.parsePriority(CiscoSyslogServerEvent.java:57) ~[graylog.jar:?]
at org.graylog2.syslog4j.server.impl.event.SyslogServerEvent.parse(SyslogServerEvent.java:152) ~[graylog.jar:?]
at org.graylog2.syslog4j.server.impl.event.CiscoSyslogServerEvent.(CiscoSyslogServerEvent.java:37) ~[graylog.jar:?]
at org.graylog2.inputs.codecs.SyslogCodec.parse(SyslogCodec.java:128) ~[graylog.jar:?]
at org.graylog2.inputs.codecs.SyslogCodec.decode(SyslogCodec.java:96) ~[graylog.jar:?]
at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:150) ~[graylog.jar:?]
at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:91) [graylog.jar:?]
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [graylog.jar:?]
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

the error messags in graylog-gui / system / indices show the last error 21h ago

21 hours ago graylog_142 1c0b55f6-3308-11e8-b668-f403433d1b68 {“type”:“unavailable_shards_exception”,“reason”:“[graylog_142][5] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_142] containing [123] requests]”}

I’ve done some more research in regards to the Plugin-version mismatch.

In the webgui is shown under system / plugins

“Pipeline Processor Plugin 2.2.0 Graylog, Inc”

I’ve searched through the whole filesystem within the command find and only found the 2.4.3.jar files. maybe a bug, as it shows in the gui 2.2.0?

root@SYSLOG01:/# find . -name graylog-plugin-pipeline-processor-2.*
./usr/share/graylog-server/plugin/graylog-plugin-pipeline-processor-2.4.3.jar

where is the setting stored for the elasticsearch replica and shards definition?

in the following config-files is configured to use 4 shards and 0 replicas

./etc/graylog/server/server.conf
./etc/graylog/server.conf

"# How many Elasticsearch shards and replicas should be used per index? Note that this only applies to newly created indices.
elasticsearch_shards = 4
root@KTMATHQSYSLOG01:/# cat /etc/graylog/server.conf | grep replica
"# How many Elasticsearch shards and replicas should be used per index? Note that this only applies to newly created indices.
elasticsearch_replicas = 0

but in the error-log is shown 8 shards 2 replicas. from where is that setting be taken?

2018-03-29T11:47:16.222+02:00 WARN [Messages] Failed to index message: index=<graylog_143> id=<16ebf044-3333-11e8-8ad9-f403433d1b68> error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_143][2] Not enough active copies to meet write consistency of [QUORUM] (have 1, needed 2). Timeout: [1m], request: [BulkShardRequest to [graylog_143] containing [118] requests]”}>

You can change these settings in the index set configuration:
http://docs.graylog.org/en/2.4/pages/configuration/index_model.html#index-set-configuration

thanks but how can I change that without the server gui from graylog? currently the whole webinterface is working except “indices/indexes”.

You could directly edit the configuration in the index_sets collection in the MongoDB database used by Graylog.

I’ve changed the value via the mongodb compass, restarted the services but still face the below error.

I’ve re-run the CURL commands for the deflector and removal of the last indice as shown below.

root@SYSLOG01:/usr/share/graylog-server/plugin# curl -XPOST ‘http://localhost:9200/_aliases’ -d ’
{
“actions” : [
{ “remove” : { “index” : “graylog_144”, “alias” : “graylog_deflector” } }
]
}’

and got an acknowledged message.

then run the command

root@SYSLOG01:/usr/share/graylog-server/plugin# curl -XDELETE ‘http://localhost:9200/graylog_144

which was also acknowledged but stil get 2 different error messages, once I check the graylog-server-log.

2018-03-29T15:56:17.417+02:00 ERROR [Messages] Caught exception during bulk indexing: java.net.SocketTimeoutException: Read timed out, retrying (attempt #1).
2018-03-29T15:56:18.906+02:00 WARN [Messages] Failed to index message: index=<graylog_143> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_143][4] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_143] containing [114] requests]”}>
2018-03-29T15:56:18.906+02:00 WARN [Messages] Failed to index message: index=<graylog_143> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_143][7] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_143] containing [106] requests]”}>
2018-03-29T15:56:18.906+02:00 WARN [Messages] Failed to index message: index=<graylog_143> id= error=<{“type”:“unavailable_shards_exception”,“reason”:“[graylog_143][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [graylog_143] containing [136] requests]”}>

thx

I’ve now fixed the Problem with the shards. after changing the value in the DB directly back to original, I had to delete all indices, which were created with the wrong settings via API / CURL.

then I had to remove the graylog-deflector → output was always error but something has changed.

after that I’ve restarted the elasticsearch service via service elasticsearch restart and started also graylog-server via service graylog-server start again.

Now I don’t see any errors related to elasticsearch, but still face the parsing error-message withing java.

example:

    at org.graylog2.syslog4j.server.impl.event.CiscoSyslogServerEvent.<init>(CiscoSyslogServerEvent.java:37) ~[graylog.jar:?]
    at org.graylog2.inputs.codecs.SyslogCodec.parse(SyslogCodec.java:128) ~[graylog.jar:?]
    at org.graylog2.inputs.codecs.SyslogCodec.decode(SyslogCodec.java:96) ~[graylog.jar:?]
    at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:150) ~[graylog.jar:?]
    at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:91) [graylog.jar:?]
    at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [graylog.jar:?]
    at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
    at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
    at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

is that maybe related to the settings in graylog-server.conf for the workers?

that are my current settings:

"mike@SYSLOG01:~$ cat /etc/graylog/server.conf | grep buffer
"# that every outputbuffer processor manages its own batch and performs its own batch write calls.
"# (“outputbuffer_processors” variable)
"# for this time period is less than output_batch_size * outputbuffer_processors.
"# Raise this number if your buffers are filling up.
processbuffer_processors = 10
outputbuffer_processors = 3
"#outputbuffer_processor_keep_alive_time = 5000
"#outputbuffer_processor_threads_core_pool_size = 3
"#outputbuffer_processor_threads_max_pool_size = 30
"# UDP receive buffer size for all message inputs (e. g. SyslogUDPInput).
"#udp_recvbuffer_sizes = 1048576
"# Wait strategy describing how buffer processors wait on a cursor sequence. (default: sleeping)
"# Size of internal ring buffers. Raise this if raising outputbuffer_processors does not help anymore.
"# For optimum performance your LogMessage objects in the ring buffer should fit in your CPU L3 cache.
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking

Try using a Raw/Plaintext input and extract the information which is interesting for you with extractors or pipeline rules.

It would also be great if you could provide some sample messages which cause this parsing errors in a bug report at Issues · Graylog2/graylog2-server · GitHub.