Connection refused) while connecting to upstream #197

I’m running Graylog on AWS. After running for a number of weeks, the Web UI stopped working, and I see the following errors in the Nginx log:
2017/07/10 20:03:50 [error] 31144#0: *6 connect() failed (111: Connection refused) while connecting to upstream, client: 173.203.27.52, server: , request: “GET / HTTP/1.1”, upstream: “http://127.0.0.1:9000/”, host: "34.194.0.149"
Any suggestions would be appreciated.

What’s in the logs of your Graylog node(s)?
:arrow_right: http://docs.graylog.org/en/2.2/pages/configuration/file_location.html#omnibus-package

Hi Jochen,
which specific log files would be of interest?
The NGINX error log mostly has those messages:

2017/07/10 20:05:26 [error] 31144#0: *14 connect() failed (111: Connection refused) while connecting to upstream, client: 173.203.27.52, server: , request: "GET /favicon.ico HTTP/1.1", upstream: "http://127.0.0.1:9000/favicon.ico", host: "34.194.0.149", referrer: "https://34.194.0.149/api/"

There’s nothing current in /var/log/graylog/server/current

There are some errors here, but I was under the impression they are non-fatal:

==> /var/log/graylog/elasticsearch/graylog.log <==
[2017-07-10 20:09:25,421][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:09:55,436][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:09:55,437][INFO ][cluster.routing.allocation.decider] [Amazon] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-07-10 20:10:25,452][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:10:55,467][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node
[2017-07-10 20:10:55,467][INFO ][cluster.routing.allocation.decider] [Amazon] rerouting shards: [high disk watermark exceeded on one or more nodes]

What does that mean? What “old” entries are in the logs?

Is Graylog still running at all?

Non-fatal as in “unable to index any more messages on at least one node”.

Graylog is running, but the web UI is not loading due to connection errors. Can I PM you a link to the (compressed) log files - and thanks for being willing to look at this. :wink:
Dietrich

What connection errors?

2017/07/10 20:09:47 [error] 4653#0: *8 connect() failed (111: Connection refused) while connecting to upstream, client: 173.203.27.52, server: , request: “GET / HTTP/1.1”, upstream: “http://127.0.0.1:9000/”, host: “34.194.0.149”

These look like logs from nginx failing to connect to the Graylog web interface.

Are you sure that Graylog is running and successfully bound to the network interface you’ve configured for the web interface?
Are you sure that nginx is configured correctly?
What’s the output of sudo lsof -i :9000 and sudo netstat -tplen | grep :9000 on the machine running Graylog?

You’re right, it’s NOT running on port 9000!
This is the netstat output:

tcp        0      0 0.0.0.0:3333            0.0.0.0:*               LISTEN      1001       128353242   30703/java      
tcp        0      0 0.0.0.0:27017           0.0.0.0:*               LISTEN      1001       128353738   30747/mongod    
tcp        0      0 0.0.0.0:46187           0.0.0.0:*               LISTEN      1001       128353240   30703/java      
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      1001       128353707   30726/etcd      
tcp        0      0 172.31.57.249:9200      0.0.0.0:*               LISTEN      1001       128354477   30703/java      
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      0          128353733   30756/nginx.conf
tcp        0      0 172.31.57.249:9300      0.0.0.0:*               LISTEN      1001       128353981   30703/java      
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      0          8596        1057/sshd       
tcp        0      0 127.0.0.1:7001          0.0.0.0:*               LISTEN      1001       128353708   30726/etcd      
tcp        0      0 0.0.0.0:25              0.0.0.0:*               LISTEN      0          93642590    8912/master     
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      0          128353734   30756/nginx.conf
tcp        0      0 0.0.0.0:53407           0.0.0.0:*               LISTEN      1001       128353246   30703/java      
tcp6       0      0 :::2379                 :::*                    LISTEN      1001       128353709   30726/etcd      
tcp6       0      0 :::22                   :::*                    LISTEN      0          8598        1057/sshd       
tcp6       0      0 :::25                   :::*                    LISTEN      0          93642591    8912/master     
tcp6       0      0 :::4001                 :::*                    LISTEN      1001       128353710   30726/etcd

This is the nginx configuration:

worker_processes  1;
daemon off;

events {
    worker_connections  1024;
}

http {
    include       /opt/graylog/conf/nginx/mime.types;
    default_type  application/octet-stream;
    log_format    graylog_format  'nginx: $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" <msec=$msec|connection=$connection|connection_requests=$connection_requests|millis=$request_time>';
    access_log    /dev/stdout graylog_format;

    server {
      listen 80;
      return 301 https://$host:443$request_uri;
      error_page 502 /502.html;
      location  /502.html {
        internal;
      }
    }

    server {
      listen 443;

      ssl on;
      ssl_certificate /opt/graylog/conf/nginx/ca/graylog.crt;
      ssl_certificate_key /opt/graylog/conf/nginx/ca/graylog.key;
      ssl_session_timeout 5m;
      ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
      ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA;
      ssl_prefer_server_ciphers on;

      location / {
        proxy_pass http://localhost:9000/;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Graylog-Server-URL http://34.194.0.149:9000/api/;
        proxy_pass_request_headers on;
        proxy_connect_timeout 150;
        proxy_send_timeout 100;
        proxy_read_timeout 100;
        proxy_buffers 4 32k;
        client_max_body_size 8m;
        client_body_buffer_size 128k;
      }

      location /api/ {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_pass http://localhost:9000/api/;
      }

      error_page 502 /502.html;
      location  /502.html {
        internal;
      }
    }
}

Again, is Graylog running at all? The output of netstat doesn’t look like it.

Also, please post the current configuration and the logs of your Graylog node.

It is, but there seems to be a problem with Elasticsearch, maybe that’s the underlying cause:

# sudo graylog-ctl status
elasticsearch disabled
run: etcd: (pid 14469) 8s; run: log: (pid 29582) 1100622s
run: graylog-server: (pid 14480) 6s; run: log: (pid 29647) 1100621s
run: mongodb: (pid 14489) 6s; run: log: (pid 29589) 1100622s
run: nginx: (pid 14493) 6s; run: log: (pid 29653) 1100621s

/etc/graylog/graylog-settings.json

{
  "timezone": "Etc/UTC",
  "smtp_server": "",
  "smtp_port": 587,
  "smtp_user": "",
  "smtp_password": "",
  "smtp_from_email": null,
  "smtp_web_url": null,
  "smtp_no_tls": false,
  "smtp_no_ssl": false,
  "master_node": "127.0.0.1",
  "local_connect": true,
  "current_address": "172.31.57.249",
  "last_address": "172.31.57.249",
  "enforce_ssl": true,
  "journal_size": 1,
  "node_id": false,
  "internal_logging": true,
  "web_listen_uri": false,
  "web_endpoint_uri": false,
  "rest_listen_uri": false,
  "rest_transport_uri": false,
  "external_rest_uri": "http://34.194.0.149:9000/api/",
  "custom_attributes": {

  }

The logs are here:
https://drive.google.com/file/d/0B5snf7YL_Og6MkdPbWZHa0pDenc/view?usp=sharing

There are multiple issues in the logs of your Graylog node.

The virtual machine ran out of disk space and the journal possibly got corrupted by this:

2017-06-30_09:34:51.79272 2017-06-30 09:34:51,572 ERROR: org.graylog2.shared.journal.KafkaJournal - Cannot write /var/opt/graylog/data/journal/graylog2-committed-read-offset to disk.
2017-06-30_09:34:51.79294 java.io.IOException: No space left on device

There wasn’t enough memory to continue running the JVM:

2017-06-30_09:34:51.86975 Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007ff6fcb90000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
2017-06-30_09:34:51.87101 #
2017-06-30_09:34:51.87102 # There is insufficient memory for the Java Runtime Environment to continue.
2017-06-30_09:34:51.87131 # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
2017-06-30_09:34:51.87276 # Can not save log file, dump to screen..
2017-06-30_09:34:51.87846 #
2017-06-30_09:34:51.87871 # There is insufficient memory for the Java Runtime Environment to continue.
2017-06-30_09:34:51.87995 # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.
2017-06-30_09:34:51.88447 # Possible reasons:
2017-06-30_09:34:51.88565 #   The system is out of physical RAM or swap space
2017-06-30_09:34:51.89117 #   In 32 bit mode, the process size limit was hit
2017-06-30_09:34:51.89117 # Possible solutions:
2017-06-30_09:34:51.89117 #   Reduce memory load on the system
2017-06-30_09:34:51.89118 #   Increase physical memory or swap space
2017-06-30_09:34:51.89118 #   Check if swap backing store is full
2017-06-30_09:34:51.89118 #   Use 64 bit Java on a 64 bit OS
2017-06-30_09:34:51.89119 #   Decrease Java heap size (-Xmx/-Xms)
2017-06-30_09:34:51.89119 #   Decrease number of Java threads
2017-06-30_09:34:51.89119 #   Decrease Java thread stack sizes (-Xss)
2017-06-30_09:34:51.89120 #   Set larger code cache with -XX:ReservedCodeCacheSize=
2017-06-30_09:34:51.89120 # This output file may be truncated or incomplete.
2017-06-30_09:34:51.89120 #
2017-06-30_09:34:51.89120 #  Out of Memory Error (os_linux.cpp:2627), pid=2020, tid=0x00007ff6f11f9700
2017-06-30_09:34:51.89120 #
2017-06-30_09:34:51.89120 # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13)

Please not that the virtual machine should have at least 4 GB of memory to run Graylog, Elasticsearch, and MongoDB on the same machine.

Because of the insufficient disk space, Elasticsearch stopped indexing data:

[2017-07-10 20:10:55,467][WARN ][cluster.routing.allocation.decider] [Amazon] high disk watermark [90%] exceeded on [_43G6SCNTVmjKyar-ysoFA][Amazon][/var/opt/graylog/data/elasticsearch/graylog/nodes/0] free: 2.5gb[9.7%], shards will be relocated away from this node

Got it - so how do I reenable the shard once the disk space issue is resolved?

Elasticsearch will automatically start assigning shards to that node again.

As for the possibly corrupted Graylog message journal, you’ll have to remove it from disk (see http://docs.graylog.org/en/2.2/pages/configuration/file_location.html#omnibus-package).

I did remove the corrupted message journal, but Elasticsearch is still disbled:
elasticsearch disabled
run: etcd: (pid 14140) 607s; run: log: (pid 29582) 1191362s
run: graylog-server: (pid 15232) 107s; run: log: (pid 29647) 1191361s
run: mongodb: (pid 14174) 606s; run: log: (pid 29589) 1191362s
run: nginx: (pid 14178) 605s; run: log: (pid 29653) 1191361s
Any hints on how to reenable would be appreciated, thanks os much!
D.

Try restarting the virtual machine.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.