I’m running Graylog on AWS. After running for a number of weeks, the Web UI stopped working, and I see the following errors in the Nginx log:
2017/07/10 20:03:50 [error] 31144#0: *6 connect() failed (111: Connection refused) while connecting to upstream, client: 173.203.27.52, server: , request: “GET / HTTP/1.1”, upstream: “http://127.0.0.1:9000/”, host: "34.194.0.149"
Any suggestions would be appreciated.
What’s in the logs of your Graylog node(s)?
http://docs.graylog.org/en/2.2/pages/configuration/file_location.html#omnibus-package
Hi Jochen,
which specific log files would be of interest?
The NGINX error log mostly has those messages:
2017/07/10 20:05:26 [error] 31144#0: *14 connect() failed (111: Connection refused) while connecting to upstream, client: 173.203.27.52, server: , request: "GET /favicon.ico HTTP/1.1", upstream: "http://127.0.0.1:9000/favicon.ico", host: "34.194.0.149", referrer: "https://34.194.0.149/api/"
There’s nothing current in /var/log/graylog/server/current
There are some errors here, but I was under the impression they are non-fatal:
==> /var/log/graylog/elasticsearch/graylog.log <==
[2017-07-10 20:09:25,421][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:09:55,436][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:09:55,437][INFO ][cluster.routing.allocation.decider] [Amazon] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-07-10 20:10:25,452][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:10:55,467][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:10:55,467][INFO ][cluster.routing.allocation.decider] [Amazon] rerouting shards: [high disk watermark exceeded on one or more nodes]
What does that mean? What “old” entries are in the logs?
Is Graylog still running at all?
Non-fatal as in “unable to index any more messages on at least one node”.
Graylog is running, but the web UI is not loading due to connection errors. Can I PM you a link to the (compressed) log files - and thanks for being willing to look at this.
Dietrich
What connection errors?
2017/07/10 20:09:47 [error] 4653#0: *8 connect() failed (111: Connection refused) while connecting to upstream, client: 173.203.27.52, server: , request: “GET / HTTP/1.1”, upstream: “http://127.0.0.1:9000/”, host: “34.194.0.149”
These look like logs from nginx failing to connect to the Graylog web interface.
Are you sure that Graylog is running and successfully bound to the network interface you’ve configured for the web interface?
Are you sure that nginx is configured correctly?
What’s the output of sudo lsof -i :9000
and sudo netstat -tplen | grep :9000
on the machine running Graylog?
You’re right, it’s NOT running on port 9000!
This is the netstat output:
tcp 0 0 0.0.0.0:3333 0.0.0.0:* LISTEN 1001 128353242 30703/java
tcp 0 0 0.0.0.0:27017 0.0.0.0:* LISTEN 1001 128353738 30747/mongod
tcp 0 0 0.0.0.0:46187 0.0.0.0:* LISTEN 1001 128353240 30703/java
tcp 0 0 127.0.0.1:2380 0.0.0.0:* LISTEN 1001 128353707 30726/etcd
tcp 0 0 172.31.57.249:9200 0.0.0.0:* LISTEN 1001 128354477 30703/java
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 0 128353733 30756/nginx.conf
tcp 0 0 172.31.57.249:9300 0.0.0.0:* LISTEN 1001 128353981 30703/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 8596 1057/sshd
tcp 0 0 127.0.0.1:7001 0.0.0.0:* LISTEN 1001 128353708 30726/etcd
tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN 0 93642590 8912/master
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 0 128353734 30756/nginx.conf
tcp 0 0 0.0.0.0:53407 0.0.0.0:* LISTEN 1001 128353246 30703/java
tcp6 0 0 :::2379 :::* LISTEN 1001 128353709 30726/etcd
tcp6 0 0 :::22 :::* LISTEN 0 8598 1057/sshd
tcp6 0 0 :::25 :::* LISTEN 0 93642591 8912/master
tcp6 0 0 :::4001 :::* LISTEN 1001 128353710 30726/etcd
This is the nginx configuration:
worker_processes 1;
daemon off;
events {
worker_connections 1024;
}
http {
include /opt/graylog/conf/nginx/mime.types;
default_type application/octet-stream;
log_format graylog_format 'nginx: $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" <msec=$msec|connection=$connection|connection_requests=$connection_requests|millis=$request_time>';
access_log /dev/stdout graylog_format;
server {
listen 80;
return 301 https://$host:443$request_uri;
error_page 502 /502.html;
location /502.html {
internal;
}
}
server {
listen 443;
ssl on;
ssl_certificate /opt/graylog/conf/nginx/ca/graylog.crt;
ssl_certificate_key /opt/graylog/conf/nginx/ca/graylog.key;
ssl_session_timeout 5m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA;
ssl_prefer_server_ciphers on;
location / {
proxy_pass http://localhost:9000/;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Graylog-Server-URL http://34.194.0.149:9000/api/;
proxy_pass_request_headers on;
proxy_connect_timeout 150;
proxy_send_timeout 100;
proxy_read_timeout 100;
proxy_buffers 4 32k;
client_max_body_size 8m;
client_body_buffer_size 128k;
}
location /api/ {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_pass http://localhost:9000/api/;
}
error_page 502 /502.html;
location /502.html {
internal;
}
}
}
Again, is Graylog running at all? The output of netstat
doesn’t look like it.
Also, please post the current configuration and the logs of your Graylog node.
It is, but there seems to be a problem with Elasticsearch, maybe that’s the underlying cause:
# sudo graylog-ctl status
elasticsearch disabled
run: etcd: (pid 14469) 8s; run: log: (pid 29582) 1100622s
run: graylog-server: (pid 14480) 6s; run: log: (pid 29647) 1100621s
run: mongodb: (pid 14489) 6s; run: log: (pid 29589) 1100622s
run: nginx: (pid 14493) 6s; run: log: (pid 29653) 1100621s
/etc/graylog/graylog-settings.json
{
"timezone": "Etc/UTC",
"smtp_server": "",
"smtp_port": 587,
"smtp_user": "",
"smtp_password": "",
"smtp_from_email": null,
"smtp_web_url": null,
"smtp_no_tls": false,
"smtp_no_ssl": false,
"master_node": "127.0.0.1",
"local_connect": true,
"current_address": "172.31.57.249",
"last_address": "172.31.57.249",
"enforce_ssl": true,
"journal_size": 1,
"node_id": false,
"internal_logging": true,
"web_listen_uri": false,
"web_endpoint_uri": false,
"rest_listen_uri": false,
"rest_transport_uri": false,
"external_rest_uri": "http://34.194.0.149:9000/api/",
"custom_attributes": {
}
The logs are here:
https://drive.google.com/file/d/0B5snf7YL_Og6MkdPbWZHa0pDenc/view?usp=sharing
There are multiple issues in the logs of your Graylog node.
The virtual machine ran out of disk space and the journal possibly got corrupted by this:
2017-06-30_09:34:51.79272 2017-06-30 09:34:51,572 ERROR: org.graylog2.shared.journal.KafkaJournal - Cannot write /var/opt/graylog/data/journal/graylog2-committed-read-offset to disk.
2017-06-30_09:34:51.79294 java.io.IOException: No space left on device
There wasn’t enough memory to continue running the JVM:
2017-06-30_09:34:51.86975 Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007ff6fcb90000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
2017-06-30_09:34:51.87101 #
2017-06-30_09:34:51.87102 # There is insufficient memory for the Java Runtime Environment to continue.
2017-06-30_09:34:51.87131 # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
2017-06-30_09:34:51.87276 # Can not save log file, dump to screen..
2017-06-30_09:34:51.87846 #
2017-06-30_09:34:51.87871 # There is insufficient memory for the Java Runtime Environment to continue.
2017-06-30_09:34:51.87995 # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
2017-06-30_09:34:51.88447 # Possible reasons:
2017-06-30_09:34:51.88565 # The system is out of physical RAM or swap space
2017-06-30_09:34:51.89117 # In 32 bit mode, the process size limit was hit
2017-06-30_09:34:51.89117 # Possible solutions:
2017-06-30_09:34:51.89117 # Reduce memory load on the system
2017-06-30_09:34:51.89118 # Increase physical memory or swap space
2017-06-30_09:34:51.89118 # Check if swap backing store is full
2017-06-30_09:34:51.89118 # Use 64 bit Java on a 64 bit OS
2017-06-30_09:34:51.89119 # Decrease Java heap size (-Xmx/-Xms)
2017-06-30_09:34:51.89119 # Decrease number of Java threads
2017-06-30_09:34:51.89119 # Decrease Java thread stack sizes (-Xss)
2017-06-30_09:34:51.89120 # Set larger code cache with -XX:ReservedCodeCacheSize=
2017-06-30_09:34:51.89120 # This output file may be truncated or incomplete.
2017-06-30_09:34:51.89120 #
2017-06-30_09:34:51.89120 # Out of Memory Error (os_linux.cpp:2627), pid=2020, tid=0x00007ff6f11f9700
2017-06-30_09:34:51.89120 #
2017-06-30_09:34:51.89120 # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13)
Please not that the virtual machine should have at least 4 GB of memory to run Graylog, Elasticsearch, and MongoDB on the same machine.
Because of the insufficient disk space, Elasticsearch stopped indexing data:
[2017-07-10 20:10:55,467][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
Got it - so how do I reenable the shard once the disk space issue is resolved?
Elasticsearch will automatically start assigning shards to that node again.
As for the possibly corrupted Graylog message journal, you’ll have to remove it from disk (see http://docs.graylog.org/en/2.2/pages/configuration/file_location.html#omnibus-package).
I did remove the corrupted message journal, but Elasticsearch is still disbled:
elasticsearch disabled
run: etcd: (pid 14140) 607s; run: log: (pid 29582) 1191362s
run: graylog-server: (pid 15232) 107s; run: log: (pid 29647) 1191361s
run: mongodb: (pid 14174) 606s; run: log: (pid 29589) 1191362s
run: nginx: (pid 14178) 605s; run: log: (pid 29653) 1191361s
Any hints on how to reenable would be appreciated, thanks os much!
D.
Try restarting the virtual machine.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.