No Messages In or Out after rebooting Graylog Server

For what it’s worth. Sometimes you need to look at the basics. These screen shots are taken after the reboot, when it’s broken.


Thanks @gsmith @tmacgbay for your continual help!

So many things to look back at… please post current graylog server.conf and elasticsearch.yml Check logs for both and post any warnings. Stumped.

1 Like

The Problem

Graylog is working and collecting messages. However, after rebooting the server no message come in or out.

Operating system

  • Hyper-V Server 2019
  • Ubuntu 18.04

Package versions

  • Graylog 4.2.0+5adccc3 on graylog (Private Build 1.8.0_292 on Linux 4.15.0-159-generic)
  • MongoDB v4.0.27
  • Elasticsearch 7.10.2

What we’ve done

  • Ubuntu needs a restart. Linux-base will be updated.
  • We discovered that the data files are in a non-standard location and may be causing some confusion.
path.data: /mnt/sdb/data
path.logs: /mnt/sdb/logs
  • I updated some configuration settings.
    /etc/graylog/server/server.conf
elasticsearch_hosts = https://127.0.0.1:9200
  • I’ve made some changes to Elasticsearch config.
    /etc/elasticsearch/elasticsearch.yml

Config Files

Graylog

ldog@graylog:~$ cat /etc/graylog/server/server.conf | egrep -v "^\s*(#|$)"
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = xxx
root_password_sha2 = xxx
root_timezone = America/New_York
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 192.168.1.1:9000
elasticsearch_hosts = http://127.0.0.1:9200
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://localhost/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
proxied_requests_thread_pool_size = 32

Elasticsearch

ldog@graylog:~$ sudo cat /etc/elasticsearch/elasticsearch.yml | egrep -v "^\s*(#|$)"
cluster.name: graylog
path.data: /mnt/sdb/data
path.logs: /mnt/sdb/logs
network.host: 127.0.0.1
http.port: 9200
action.auto_create_index: false
discovery.type: single-node

One more what are the results of:

curl "localhost:9200/_nodes/settings?pretty=true"

This will tell us what Elasticsearch has picked up from configurations (or show defaults)

I don’t think we have asked - in the Graylog UI, when you go to system/indices, and iterate through them, are they all happy… meaning Graylog thinks they are all happy?

Graylog seems happy to me.



ldog@graylog:~$ curl "localhost:9200/_nodes/settings?pretty=true"
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "graylog",
  "nodes" : {
    "Lfg5ABAgRtKaa-BepiwdMw" : {
      "name" : "graylog",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1",
      "version" : "7.10.2",
      "build_flavor" : "oss",
      "build_type" : "deb",
      "build_hash" : "xxx",
      "roles" : [
        "data",
        "ingest",
        "master",
        "remote_cluster_client"
      ],
      "settings" : {
        "cluster" : {
          "name" : "graylog"
        },
        "node" : {
          "name" : "graylog",
          "pidfile" : "/var/run/elasticsearch/elasticsearch.pid"
        },
        "path" : {
          "data" : [
            "/mnt/sdb/data"
          ],
          "logs" : "/mnt/sdb/logs",
          "home" : "/usr/share/elasticsearch"
        },
        "discovery" : {
          "type" : "single-node"
        },
        "action" : {
          "auto_create_index" : "false"
        },
        "client" : {
          "type" : "node"
        },
        "http" : {
          "type" : {
            "default" : "netty4"
          },
          "port" : "9200"
        },
        "transport" : {
          "type" : {
            "default" : "netty4"
          }
        },
        "network" : {
          "host" : "127.0.0.1"
        }
      }
    }
  }
}

What are the three things Graylog is whining about here:
image
and also, interestingly - graylog_13 says it has received messages only a few seconds ago
image
If you do a general search, what would those messages be?

I will try to keep an eye out and be available over the weekend - get out there and enjoy something outside! :slight_smile:

1 Like

Nice, seams like were getting closer :slight_smile:

I have a couple of question to add to @tmacgbay suggestions.
This picture below (which I marked with a red box) shows my concerns that your plugin is not the right version.

image

You can either perform an upgrade to the plugin or navigate to the plugin directory and remove it.
Directory location.

/usr/share/graylog-server/plugin

Command for install plugin.

sudo apt-get install graylog-integrations-plugins

https://docs.graylog.org/docs/setup-intergrations

Have you checked permissions on Elasticsearch data directory?

ls -al /var/lib/elasticsearch

I believe in your case you have moved the data directory.

ls -al /mnt/sdb/data

Are you actually rebooting the server or restarting GL service?
If you rebooted the servers I’m concerned about the mount point /mnt in your fstab file.

EDIT: I re-read this post again and remembered a incident similar to this one. This is referring to your mount points. As you stated, you reconfigured your data/log directory when you had your server running. I have done the same thing before maybe I can shed some light on your fstab file configuration. Here is an example of what I would have done in your situation. Maybe it can help.

  • Stop graylog service using command: sudo systemctl stop graylog.service

  • Stop elasticsearch.service using command: sudo systemctl stop elasticsearch.service

  • Make a backup of your data !!! For example, simple copy to another destination with enough space using command: cp -av /var/lib/elasticsearch /media/backupdisk.

  • Check name for mounted volume.

    • sudo fdisk -l

image

  • Create elasticsearch directory in /mnt

    • sudo mkdir /mnt/elasticsearch
  • Mount /dev/sdb1 to /mnt/elasticsearch

    • mount /dev/sdb1 /mnt/elasticsearch
  • Create new sub-directories for elastic data/logs using these commands:

    • sudo mkdir -p /mnt/elasticsearch/es_data
    • sudo mkdir -p /mnt/elasticsearch/es_log
  • Now make sure the mounts are good after reboot by adding it to fstab file

    • /dev/sdb1 /mnt/elasticsearch ext4 defaults 0 0

  • Setup permissions for these directories using commands:
    • sudo chown -R elasticsearch:elasticsearch /mnt/elasticsearch/es_data
    • sudo chown -R elasticsearch:elasticsearch /mnt/elasticsearch/es_log
  • Move elasticsaerch db and logs to new directory.
    • sudo mv -v /var/lib/elasticsearch/ /mnt/elasticsearch/es_data*
    • sudo mv -v /var/log/elasticsearch/ /mnt/elasticsearch/es_log*
  • Start elasticsearch. service using command:
    • sudo systemctl start elasticsearch.service
  • Wait few moments to elasticsearch and then start graylog using:
    • sudo systemctl start graylog.service

If you noticed my second drive partition is named /dev/sdb1 my drive is named /dev/sdb.
using this command may help.

root # lsblk
The lsblk command lists all the block devices of your system along with their logical partitions.

I’m not 100% sure what you did before but maybe this would could give you some insight of what I did.

@tmacgbay :laughing:

image

2 Likes

Thanks @tmacgbay, @gsmith. It was nice to spend a few days not thinking about Graylog. I hope you had a nice weekend.

  1. An input has failed to start. I recently deleted the input. Not sure why it’s still complaining about it. I may not cleared the error. It was an input for the server itself, something the setup documentation had me do.
  2. You are running an outdated Graylog version.
  3. Email Transport Configuration is missing or invalid!
    I can fix both of them.

Two things here.

  1. After I reboot the server and I can log back on to Graylog’s UI I see a few messages being collected before it goes to 0 in a few seconds.
  2. Graylog is not a critical application for us right now. So I’ve been trouble shooting on the production server. I make the changes you recommend, take a snapshot, reboot and see if it works. If it’s no go then I revert back (keeping the changes). At first I thought this was a good way to work on this, it gave me a chance to work snapshots (which I had not done much with before), but now almost a week into this and 26 posts on the forum maybe I should clone this VM! So prior to rebooting the server, it was collecting messages.

I initially installed the enterprise version but then realized we would never buy it, so I removed the plugins. Or at least thought I did.

I removed the files in /usr/share/graylog-server/plugin. However, Graylog did not like that at all. I restarted the Graylog services and the webpage does not come up now.

● graylog-server.service - Graylog server
   Loaded: loaded (/usr/lib/systemd/system/graylog-server.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2021-11-22 09:26:43 EST; 7s ago
     Docs: http://docs.graylog.org/
  Process: 42782 ExecStart=/usr/share/graylog-server/bin/graylog-server (code=exited, status=1/FAILURE)
 Main PID: 42782 (code=exited, status=1/FAILURE)

2021-11-22T09:21:20.750-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog2.indexer.fieldtypes.IndexFieldTypePollerAdapter>> was bound.
2021-11-22T09:21:20.751-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog2.indexer.indices.IndicesAdapter>> was bound.
2021-11-22T09:21:20.751-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog2.indexer.messages.MessagesAdapter>> was bound.
2021-11-22T09:21:20.752-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog2.indexer.searches.SearchesAdapter>> was bound.
2021-11-22T09:21:20.752-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog2.migrations.V20170607164210_MigrateReopenedIndicesToAliases$ClusterState>> was bound.
2021-11-22T09:21:33.391-05:00 INFO  [ImmutableFeatureFlagsCollector] Following feature flags are used: {}
2021-11-22T09:21:34.926-05:00 INFO  [CmdLineTool] Running with JVM arguments: -Xms1g -Xmx1g -XX:NewRatio=1 -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=deb
2021-11-22T09:21:35.223-05:00 INFO  [Version] HV000001: Hibernate Validator null
2021-11-22T09:21:39.413-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog.events.search.MoreSearchAdapter>> was bound.
2021-11-22T09:21:39.414-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog.plugins.views.migrations.V20200730000000_AddGl2MessageIdFieldAliasForEvents$ElasticsearchAdapter>> was bound.
2021-11-22T09:21:39.415-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog.plugins.views.search.engine.QueryBackend<? extends org.graylog.plugins.views.search.engine.GeneratedQueryContext>>> was bound.
2021-11-22T09:21:39.415-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog.plugins.views.search.export.ExportBackend>> was bound.
2021-11-22T09:21:39.416-05:00 ERROR [CmdLineTool] Guice error (more detail on log level debug): No implementation for java.util.Map<org.graylog2.plugin.Version, javax.inject.Provider<org.graylog2.indexer.IndexToolsAdapter>> was bound.

I’m going to roll that one back! What did I do wrong?

My permissions look a little different than yours.

ldog@graylog:~$ ls -al /mnt/sdb/data
total 16
drwxrwxrwx 3 ldog          root          4096 Sep 21 15:33 .
drwxr-xr-x 5 ldog          root          4096 Sep 21 14:48 ..
drwxr-xr-x 3 elasticsearch elasticsearch 4096 Sep 21 15:33 nodes
-rw-rw-r-- 1 ldog          ldog            76 Sep 21 15:30 test

Restarting Graylog and it’s related services is fine. It only breaks after rebooting the server.
This is what my fstab looks like:

ldog@graylog:~$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/ubuntu-vg/ubuntu-lv during curtin installation
/dev/disk/by-id/dm-uuid-LVM-n4UnpBz6N7UfCGajqOh2QmRanFXlt5f9j6uBLeba12wd9oXZqaF29G26T0YljUsw / ext4 defaults 0 0
# /boot was on /dev/sda2 during curtin installation
/dev/disk/by-uuid/d2c58bea-c775-443e-806b-b329116dc3f4 /boot ext4 defaults 0 0
# /boot/efi was on /dev/sda1 during curtin installation
/dev/disk/by-uuid/889C-D099 /boot/efi vfat defaults 0 0
/swap.img       none    swap    sw      0       0
/dev/sdb    /mnt/sdb     ext4      defaults        0             0
Disk /dev/sdb: 700 GiB, 751619276800 bytes, 1468006400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

@gsmith Thanks for the rest of your instructions. You’ll have to give me some time to work through those!

Hello,

Rule #1 make sure all the plugins are the same version.

I see now you have some configuration issue. As you stated about cloning. I would totally clone your production server BUT remember you cant use the same IP/MAC address. Sometime you get nowhere just rolling it back.

When you execute this command what does it look like?

root # lsblk

Example:

[root@graylogl]#  lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
fd0               2:0    1    4K  0 disk
sda               8:0    0  100G  0 disk
├─sda1            8:1    0  200M  0 part /boot
└─sda2            8:2    0 97.9G  0 part
  ├─centos-root 253:0    0 82.2G  0 lvm  /
  └─centos-swap 253:1    0 15.6G  0 lvm  [SWAP]
sdb               8:16   0  300G  0 disk
└─sdb1            8:17   0  300G  0 part /mnt/elasticsearch
[root@nextcloud-web1 html]#

Here is the output of lsblk.

root@graylog:/home/ldog# lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0                       7:0    0  68.3M  1 loop /snap/powershell/189
loop2                       7:2    0  55.5M  1 loop /snap/core18/2246
loop3                       7:3    0  42.2M  1 loop /snap/snapd/13831
loop4                       7:4    0  32.5M  1 loop /snap/snapd/13640
loop5                       7:5    0  55.5M  1 loop /snap/core18/2253
loop6                       7:6    0  66.5M  1 loop /snap/powershell/185
sda                         8:0    0   127G  0 disk
├─sda1                      8:1    0   512M  0 part /boot/efi
├─sda2                      8:2    0     1G  0 part /boot
└─sda3                      8:3    0 125.5G  0 part
  └─ubuntu--vg-ubuntu--lv 253:0    0 125.5G  0 lvm  /
sdb                         8:16   0   700G  0 disk /mnt/sdb

Next things to do:

  1. Clone VM.
  2. Upgrade plugins.
  3. Work through instructions that @gsmith provided.
2 Likes

Hey @rrmike
If you can keep us posted, I would like to know how its going.

@tmacgbay @gsmith. I will keep you posted. However, it will be next week. My boss just pulled me off this, in order to figure out how to replicate our ldap server and add SSL. I think I’d rather work on Graylog!

1 Like

Plugins are updated.

@gsmith Two questions about your instructions.

  1. Where my data is stored seems to be one of the big problems here. Do I want to backup /var/lib/elasticsearch or should I backup /mnt/sdb/data and /mnt/sdb/logs where my data is actually being saved.
  2. In your instructions, I don’t see any commands that tell GrayLog or Elasticsearch that we moved directories. How does it know where to look? I’m guessing I need to update /etc/elasticsearch/elasticsearch.yml with the changes. Is there anyplace else that needs to be updated?

Hello,

Good question, its hard for me to tell you but from what you stated I would go with where the data is actually being saved.

[root@graylog graylog_user]#  cat /etc/elasticsearch/elasticsearch.yml | egrep -v "^\s*(#|$)"
cluster.name: graylog
path.data: /var/lib/elasticsearch  <----- **DATA**
path.logs: /var/log/elasticsearch <---- **LOGS**
network.host: 8.8.8.8
http.port: 9200
action.auto_create_index: false
discovery.type: single-node
path.repo: ["/mnt/sdb1/my_repo"]
[root@graylog graylog_user]#

Note:
You need to modify the path.data setting in the elasticsearch.yml file to the new folder you want the data to.

Here is what you need to do:

You may want to shut down your services , like graylog and elasticsearch first.

In elasticsearch.yml modify path.data to:

path.data: /foo/bar

You’ll end up with your data being stored in /foo/bar/elasticsearch instead of /var/lib/elasticsearch.
Make sure that the elasticsearch process can access your new folder.

Once you configure Elasticsearch make sure you start elasticsearch service first, wait until its completely running then start Graylog.

Hope that helps

EDIT @rrmike I have a question for ya, I was wondering why you didn’t create a portion on you drive sdb ? Or did you format the whole drive?