Upgrade from 2.4.6 to 2.5.0 Collector Problem


(Michael) #1

Hi folks,

after upgrading to Graylog 2.5.0 and Elasticsearch 6.5.2 from 5.6.13 which was a pretty smooth process, I saw that all my collectors (0.1.6) were in inactive mode, even though delivering still log data to into graylog. I then saw that there is a new version available (0.1.7) which should be used due to the new CSRF headers to be send correctly. After I installed the new agent on one machine it thoughs an error while starting. (see below) The machine is running CentOS 5.11 and I hope the new version is compatible with that legacy OS. This was one of the reasons why I chose graylog. I also updated a CentOS 6 machine and there the collector is running without issues.
Hope someone can help me to understand, if the new agent design was breaking the compatiblity with CentOS 5 versions or just the new beats package inside.

Michael

time=“2018-12-06T17:16:26+01:00” level=info msg=“Using collector-id: 74d6e409-7842-47db-8f85-abd120c81883”
time=“2018-12-06T17:16:26+01:00” level=info msg=“Fetching configurations tagged by: [qad_test]”
runtime: epollwait on fd 4 failed with 38
fatal error: runtime: netpoll failed

runtime stack:
runtime.throw(0x805cc8, 0x17)
/opt/go/src/runtime/panic.go:616 +0x81
runtime.netpoll(0xc420018000, 0x0)
/opt/go/src/runtime/netpoll_epoll.go:75 +0x216
runtime.findrunnable(0xc420018000, 0x0)
/opt/go/src/runtime/proc.go:2251 +0x97b
runtime.schedule()
/opt/go/src/runtime/proc.go:2541 +0x13b
runtime.park_m(0xc420066900)
/opt/go/src/runtime/proc.go:2604 +0xb6
runtime.mcall(0x100000000001000)
/opt/go/src/runtime/asm_amd64.s:351 +0x5b

goroutine 1 [syscall]:
syscall.Syscall6(0xf7, 0x1, 0xb06, 0xc420143938, 0x1000004, 0x0, 0x0, 0x0, 0xa22380, 0x0)
/opt/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc42008a450, 0x0, 0x0, 0x2)
/opt/go/src/os/wait_waitid.go:31 +0x98
os.(*Process).wait(0xc42008a450, 0xc420118640, 0xc42009e918, 0xc42009e918)
/opt/go/src/os/exec_unix.go:22 +0x3c
os.(*Process).Wait(0xc42008a450, 0x8195b8, 0x8195c0, 0x8195b0)
/opt/go/src/os/exec.go:123 +0x2b
os/exec.(*Cmd).Wait(0xc42009e840, 0x0, 0x0)
/opt/go/src/os/exec/exec.go:461 +0x5c
os/exec.(*Cmd).Run(0xc42009e840, 0xc4200c30a0, 0xc42009e840)
/opt/go/src/os/exec/exec.go:305 +0x5c
os/exec.(*Cmd).CombinedOutput(0xc42009e840, 0x11, 0xc420143c48, 0x1, 0x1, 0xc42009e840)
/opt/go/src/os/exec/exec.go:521 +0x106
github.com/Graylog2/collector-sidecar/backends/beats/filebeat.(*FileBeatConfig).readVersion(0xc42008c0b0, 0x40cd35, 0xc420143d38, 0x5bb963b6, 0xd5e9fcb06aa555d, 0xc42008c0b0)
/go/src/github.com/Graylog2/collector-sidecar/backends/beats/filebeat/filebeat.go:100 +0xc3
github.com/Graylog2/collector-sidecar/backends/beats/filebeat.(*FileBeatConfig).ValidatePreconditions(0xc42008c0b0, 0x84d640)
/go/src/github.com/Graylog2/collector-sidecar/backends/beats/filebeat/filebeat.go:119 +0x40
main.backendSetup(0xc420090340)
/go/src/github.com/Graylog2/collector-sidecar/main.go:166 +0x228
main.main()
/go/src/github.com/Graylog2/collector-sidecar/main.go:127 +0x3d2

goroutine 19 [syscall]:
os/signal.signal_recv(0x0)
/opt/go/src/runtime/sigqueue.go:139 +0xa6
os/signal.loop()
/opt/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.0
/opt/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 20 [IO wait]:
internal/poll.runtime_pollWait(0x2ace2f56ff00, 0x72, 0xc4200274e8)
/opt/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc42008ea68, 0x72, 0xffffffffffffff01, 0x848820, 0x9e94a0)
/opt/go/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc42008ea68, 0xc42014c001, 0x200, 0x200)
/opt/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc42008ea50, 0xc42014c000, 0x200, 0x200, 0x0, 0x0, 0x0)
/opt/go/src/internal/poll/fd_unix.go:157 +0x17d
os.(*File).read(0xc42008c0c8, 0xc42014c000, 0x200, 0x200, 0xc42014c000, 0x0, 0x0)
/opt/go/src/os/file_unix.go:226 +0x4e
os.(*File).Read(0xc42008c0c8, 0xc42014c000, 0x200, 0x200, 0x0, 0x0, 0xc420027660)
/opt/go/src/os/file.go:107 +0x6a
bytes.(*Buffer).ReadFrom(0xc4200c30a0, 0x847e80, 0xc42008c0c8, 0x2ace2f70b020, 0xc4200c30a0, 0x1)
/opt/go/src/bytes/buffer.go:205 +0xa0
io.copyBuffer(0x847820, 0xc4200c30a0, 0x847e80, 0xc42008c0c8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/opt/go/src/io/io.go:386 +0x31a
io.Copy(0x847820, 0xc4200c30a0, 0x847e80, 0xc42008c0c8, 0x0, 0x0, 0x0)
/opt/go/src/io/io.go:362 +0x5a
os/exec.(*Cmd).writerDescriptor.func1(0x0, 0x0)
/opt/go/src/os/exec/exec.go:275 +0x4d
os/exec.(*Cmd).Start.func1(0xc42009e840, 0xc420118680)
/opt/go/src/os/exec/exec.go:396 +0x27
created by os/exec.(*Cmd).Start
/opt/go/src/os/exec/exec.go:395 +0x5df

Blockquote


(Davide Pala) #2

This version is unsupported, only 5.6


(Jan Doberstein) #3

This version is unsupported, only 5.6
@davide.pala not correct. Graylog 2.5 support ES6

@MHi you are right you need to run Collector Sidecar 0.1.7 - if that should have any issues, please report them on Github: https://github.com/Graylog2/collector-sidecar/issues

thank you


(Ben van Staveren) #4

Graylog 2.5 supports 6.x :slight_smile:


(Ben van Staveren) #5

Try this as a quick test on a server you run; purge the collector entirely (apt-get purge sidecar-collector*) (or whatever the package name is exactly, forgot), ensure that /etc/graylog/collector-sidecar is gone as well (optionally rm -rf that), then install the 0.1.7 collector sidecar and configure from scratch and see if that works.

At least then you can exclude any old configs and other “outdated” stuff from interfering.


(Tess) #6

Yikes… I hadn’t realized it’s that sensitive. Brrr, that means quite a number of upgrades in one blow.

Is 0.1.6 backwards compatible? Could I first upgrade all the agents and then move on to upgrading the server-side things? I’ll need to check the release notes.


(Jan Doberstein) #7

Is 0.1.6 backwards compatible? Could I first upgrade all the agents and then move on to upgrading the server-side things? I’ll need to check the release notes.

please find the note in http://docs.graylog.org/en/2.5/pages/upgrade/graylog-2.5.html

you can first upgrade all sidecars and after that the server. Should you have a proxy/loadbalancer you might can use the ability to add the needed header in this place - at least during the update time.


(Michael) #8

Ben, I have tried to install the collector 0.1.7 on a a machine with the same OS and where no collector has been installed before and it’s already failing at the part to create the init.d file “graylog-collector-side -serivce install”
But even starting the agent with an init file from a good installed version, still gives a fatal error in starting, like in my initial thread. (netpoll failed).


(Michael) #9

Thanks Jan, I will post the issue on Github then.


(Marco Pfatschbacher) #10

Hi,

we built the new collector with a more recent version of Go.
see https://github.com/golang/go/wiki/MinimumRequirements
“We don’t support CentOS 5. The kernel is too old (2.6.18).”

Since Centos 5 is EOL for over a year now, you should consider upgrading :wink:


(Michael) #11

Hi Marco,

that is bad news and I know that this needs to be upgraded. The production systems are running Oracle Linux 5.11 in fact and a very long project to migrate them over to Oracle Linux 7. Lots of important production systems running on that platform and it’s still supported by Oracle. That means for the time being I need to downgrade again :frowning:

I just saw a way to patch the “runtime/sys_linux_amd64.s” and go back to epoll_wait instead of epoll_pwait which is only runs on kernels 2.6.19 and above. I assume that is the current issue why it is not running correctly. And as I am not a coder I will probably not be able to make that patch and recompile. This would at least keep me running on an up to date version till the Oracle is migreated to 7.

Michael


(Ben van Staveren) #12

Mweuh… that really is weird, I’ve had no issues with the 0.1.7 collector deb package, and I’m utterly stumped as to why it would error out like that. Unless it’s some really, really weird exotic piece of hardware you run it on :smiley:

Edit: ah, okay, turns out it’s a kernel version that’s too old :smiley:


(Marco Pfatschbacher) #13

Hi,

I built 0.1.7 with an older Go version (1.7.6) for you.
Can you try https://github.com/Graylog2/collector-sidecar/releases/download/0.1.7/collector-sidecar-0.1.7.compat.tar.gz
and see if that works?
You also might have to downgrade your Beats to version 5.6, since elastic used a more recent version
of Go since 6.0 as well.

Marco


(Michael) #14

Hi Marco,

thanks a lot for your help and you were right, now filebeat is complaining a little, so that the collector “panics” again, but the first error is gone now. I assume this also means replacing filebeat with 5.6.13 and recompiling with old go version, correct ?

Michael

Here the error:
goroutine 1 [running]:
panic(0x76cac0, 0xc42000c0c0)
/home/mpf/Downloads/go/src/runtime/panic.go:500 +0x1a1
github.com/Graylog2/collector-sidecar/backends/beats/filebeat.(*FileBeatConfig).ValidatePreconditions(0xc4200260e8, 0x969ba0)
/home/mpf/oldgo/src/github.com/Graylog2/collector-sidecar/backends/beats/filebeat/filebeat.go:123 +0x1c7
main.backendSetup(0xc420054600)
/home/mpf/oldgo/src/github.com/Graylog2/collector-sidecar/main.go:166 +0x278
main.main()
/home/mpf/oldgo/src/github.com/Graylog2/collector-sidecar/main.go:127 +0x475


(Marco Pfatschbacher) #15

You don’t have to compile it yourself. The official release was built with Go 1.7.6 as well.
You can install it from https://www.elastic.co/downloads/past-releases/filebeat-5-6-13
Or their repository.


(Marco Pfatschbacher) #16

It should’ve been, but we forgot to backport a fix to 2.5.
This will be addressed with 2.5.1.


(Michael) #17

Thank you again for the great support. I installed the filebeat from the rpm file dowloaded, becasue their repository to not support old OS like Centos 5. I also had to remove the signature from the rpm in order to install it. Just replaced it your filebeat binary with the installed one and finally collector is executing.

Nochmals Danke. Ein super Produkt habt ihr da am Start.

Michael


(Davide Pala) #18

my fault, sorry, i don’t have read the right version