Archive quits after one segment when throttled

hriley · February 6, 2018, 5:32pm

We’ve got our Graylog Enterprise instance set up with AWS ElasticSearch service as the index and an S3 bucket for archive storage (indices rotated once a day). This unfortunately means that when an archive is done, data has to be downloaded from AWS, exported and compressed, and then uploaded to AWS. This works relatively well except for the fact that the bandwidth used exceeds our contracted rate for a couple of hours, which results in significant overage charges.

To remedy that, I’ve throttled the bandwidth on the log server with wondershaper/tc which succeeds in limiting the bandwidth. However, every archive that I run since the change fails to complete. It only writes one segment and then quits. I see this in the server log:

2018-02-05T13:48:14.673-06:00 INFO [RollingFileSegmentOutputStream] Creating new segment: /opt/s3/graylog-archives/graylog_39-20180205-163344-372/archive-segment-1.gz
2018-02-05T13:48:25.581-06:00 ERROR [ArchiveCreateJob] Archived only 4421000 out of 8508322 documents, not deleting/closing index graylog_39
2018-02-05T13:48:25.593-06:00 INFO [SystemJobManager] SystemJob <55148030-0a92-11e8-91ec-fee5de21aa98> [org.graylog.plugins.archive.job.ArchiveCreateSystemJob] finished in 11681221ms.

Is there anything I can do to fix this?

Thanks.

jan · February 7, 2018, 8:36am

hej @hriley

could you please be a little more verbose on your setup. What Versions did you use? How did you configured them, how much did you throttle them? Where did you exactly throttle what kind of connection.

Currently we the above Information we are not able to give any help.

Jan

hriley · February 7, 2018, 5:30pm

I’m running Graylog 2.4.3 with the same version for the Enterprise plugins. It’s running on CentOS 7.3 and I’m using wondershaper 1.3 with the following config:

[wondershaper]
# Adapter
#
IFACE="ens160"

# Download rate in Kbps
#
DSPEED="18432"

# Upload rate in Kbps
#
USPEED="18432"

That will limit the up and down throughput for the ens160 adapter (the only network adapter in the system) to 18Mbps (our contracted rate is 20Mbps with bursts allowed to 40Mbps).

The Graylog server uses the AWS Elasticsearch service (three r4.xlarge.elasticsearch instances) connected over a VPC. Indices are rotated daily and each index is somewhere between 8 and 15G (usually closer to 12-13) in size comprised of 6-10M messages. 4 shards per index, 0 index replicas, 1 ES segment per index. We keep 30 days worth of indices (though it’s set to 45 now while we’re working on this problem) and indices are deleted after they’re archived. The archives are saved to an S3 bucket via a fuse.s3fs mount. Archive max segment size 500M, gzip compression, CRC32 checksum.

Let me know if you need any more information.

jan · February 8, 2018, 2:51pm

hej @hriley

that does not look like a bug, as of the technical way the archiving is working that could happen because of the slow storage, you have by design.

If you are not able extend the ressources and need some additional professional service that helps you with your use case, please get in contact with the Graylog Company.

hriley · February 8, 2018, 5:33pm

I don’t see how this is not a bug. The failures happen after one segment is written, so it’s clearly fast enough to be able to write one of the files. And 18Mbps is not that slow. Is there not a way to increase the logging level so we can see what’s happening before the ERROR entry?

jochen · February 8, 2018, 9:37pm

Does the same error occur when you write to the local/ephemeral disk of the EC2 instance?
If yes, I’d consider it a bug. If not, I’d recommend not throttling your network interface so much that it stifles the backup process by effectively reducing the write performance (to S3) to a crawl.

It’s shared between the connection to the AWS Elasticsearch service and S3, correct? So it effectively halves the bandwidth and only if you don’t count any overhead and under optimal conditions.

hriley · February 9, 2018, 11:34pm

I ran a few more tests and see the following:

Archive to local disk with throttling - succeeds
Archive to S3 with throttling set to 18Mbps down/40Mbps up - fails
Manually copy archive directory to S3 mount manually with 18/18 throttle - succeeds

The latter two say to me that there’s no issue with copying files to the S3 mount with the throttle active. However, I think I may have an idea for how to work around this issue. I can set the destination for archives to a local directory then move them to the S3 mount. Of course, Graylog will not be able to find the archive any more. So, is there a way to edit the “segment directory” value for a particular archive?

hriley · February 20, 2018, 4:27pm

Could I get an answer to my question? Is there a way to edit the segment directory value for an archive?

jochen · February 20, 2018, 5:08pm

The community forums are run on a best-effort basis. There’s no SLA and there’s no guarantee for a response whatsoever.

No, you can only modify the directory on a per-backend basis (see System/Archives/Manage Backends)

Topic		Replies	Views
Graylog Enterprise 4.2 archiving speed Graylog Central (peer support)	6	375	January 13, 2022
Ideal archiving method? Graylog Central (peer support)	5	1106	July 12, 2017
The greylog has stopped recording Graylog Central (peer support)	2	723	July 30, 2019
AWS S3 Log Archiving Graylog Add-ons	2	1237	September 3, 2021
Backing up to AWS from AWS Graylog Central (peer support)	3	1782	November 17, 2017

Archive quits after one segment when throttled

Related topics