Graylog Enterprise 4.2 archiving speed

Hello,

We are doing archives with Graylog Enterprise, but the archiving is slow.
I’m unable to go faster than 200 Mbit/s of traffic between the Graylog master and the ES Cluster.

Even with local writing on SSD drives, without compression, I cap to 200 Mbit/s.

I would like to know if someone successfully goes beyond the 200 Mbit/s

Because my current archiving process take about 14h, and soon i will have twice the amount of data to archive.

Hello,

  • How is your Ethernet port configured on Graylog or Elasticsearch cluster? Is it capable of greater throughput then 200mb?
  • It this a throughput Issue or a read/write on the disk issue or both?
  • Do you have a raid configured or are these single drives?
  • Have you done a test on that drive for read and write without graylog interfering?
  • Do you have firewall or anything in the way that may interrupt the speed transfer from Graylog to Elasticsearch cluster?

More information and/or statistics would help.

Some more details.

All the nodes/graylogs are VM on several ESXi, same hardware, i checked a lot of metrics and none of the CPU/Disks/etc. seems to be overloaded.

10 Gbit/s ethernet.

It’s a throughput Issue
I did R/W tests with Bonnie++ on the disks, and i go much faster, It’s not a PB with the disks.
When i do a Snapshot directly from ES the speed is at least 3 times faster.

No firewall between all the nodes.

I don’t understand why Graylog’s archiving is 3 times slower than ES Snapshot.

That’s why i’m looking for people who successfully goes beyond the 200 Mbit/s barrier.

Hello,

I understand now. I have some suggestion to look at. Its just odd that a process is giving you throughput issues.

Not sure how your monitoring your metrics on the Graylog server but I would be looking at the I/O on the disk, eth0, top/htop or perhaps using Wireshark.
I would check if there is another service/application running at the same time when the archiving process is running. Just trying to get any data and see what’s actually going on during the archiving.

Correct me if I’m wrong but your eth0 port is 10 Gbit/s? If this is correct you shouldn’t have any lag.
If not, and this is a throughput issue I would really look at you network and traffic. Have you checked your ESXi server network? I think you have some type of virtual switch on the servers. I would look into that also to insure your are getting your throughput.
Do you have switches and if so are you monitoring them for throughput when the archiving process is running?

I assume your checking all these when the archiving process is running.

Yes , i already checked all theses parameters.

eth0 port is 10Gbit/s, and it’s not a network bottleneck because i did network network tests between hosts and the speed is greater.

While archiving the disk i/o of the Graylog master is very low and CPU usage is no more than 60% of the most used cpu core.

I check all the OS metrics that i know without finding anything who can be blocking.

Graylog support, says that’s a ES problem, but without helping about this.

And i have no report of someone going faster when archiving.

Hello,

We would need more information on what is going on in your environment. All I know so far is what you stated above. It not a lot of information to identify your issue.

What is the size of the archive?

What exactly have you done and showing the results would be nice?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.