The export of dataset in a CSV file with graylog API is very slow in version 2.4.6 (docker cluster)

keaoner · September 18, 2018, 12:18pm

Hi everyone

When I run an export of a csv format dataset in production with the Graylog 2.4.6 API (docker cluster) I have a bitrate of 5KO / sec, the total volume of this dataset is about 5GB, on another machine with graylog 2.0.1 (standalone install) with the same query and the same data size I have 4000KO/sec bitrate, is there a configuration to modify to solve the problem on version 2.4.6?

thank you all !!!

derPhlipsi · September 18, 2018, 12:41pm

Hey @keaoner,

might this be your problem:

Greetings,
Philipp

keaoner · September 19, 2018, 10:00am

thank you for the answer

I saw this post on the forum but I’m on a graylog installation under docker /etc/default/graylog-server does not exist, the ps aux command tells me the following launching parameters:

/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xms4g -Xmx8000m -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUn
loadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -jar -Dlog4j.configurationFile=/usr/share/graylog/data/config/log4j2.xml -Djava.library.path=/usr/share/graylog/lib/sigar/ -Dgraylog2.installation_source=docker /usr/share/
graylog/graylog.jar server -f /usr/share/graylog/data/config/graylog.conf

is the Djavax.net.debug = all parameter loaded by default in version 2.4.6 of graylog? how can i disable it in docker?

derPhlipsi · September 19, 2018, 10:54am

AFAIK this is not loaded by default. And since it is not included in your ps aux, it is also not enabled in your environment. Could you make another export to CSV and then get the Graylog logs shortly after that and post them here? Have a look through them first if you find something relevant, since you might have to alter the logging level of Graylog first in the System configuration menu for it to be more verbose

Greetings,
Philipp

keaoner · September 20, 2018, 2:36pm

Hi Philipp,

Here is the log in debug mode, thanks your for your help

https://drive.google.com/file/d/1_3OIPDOGOtlW9-1kLayJXNaxmhauFkCR/view?usp=sharing

derPhlipsi · September 20, 2018, 2:52pm

Heyo

Non related issues, that you should fix anyway:

2018-09-20 06:36:00,724 WARN : org.graylog2.inputs.codecs.GelfCodec - GELF message <705fd23f-bc9f-11e8-b49a-d6e55e6cc41f> is missing mandatory "host" field.

Over 106 thousand occurences…

2018-09-20 09:37:28,840 ERROR: org.graylog2.inputs.converters.CsvConverter - Different number of columns in CSV data (22) and configured field names (20). Discarding input.

59 matches. Your loosing some of your logs with this

Do you know at which time you issued the CSV export? Because else this will be searching the needle in a haystack…

Greetings,
Philipp

derPhlipsi · September 20, 2018, 6:38pm

Heyo @keaoner,

I just stumbled across this:

how’s the performance when querying Graylog itself?

Greetings,
Philipp

keaoner · September 26, 2018, 12:48pm

Hi Philipp,

I will redo a csv download test and provide you with the logs from the beginning

I get to download the csv my problem is the download speed

thank you very much, the performance is good during a query in the web interface, however when downloading the result in CSV is very slow 5ko/s

Best regards!

keaoner · September 27, 2018, 3:51pm

Hi Philipp,

Here is my new debug log I started the csv file download on 2018-09-26 at 15H06

thanks again for your help

https://drive.google.com/file/d/10bVKzr1tf64bGxfi2QfuP0xqgL4MNkJi/view?usp=sharing

derPhlipsi · September 28, 2018, 8:30pm

Heyo

What’s your timezone? The logs end at 14:03 (They’re in UTC, so… )

Greetings,
Philipp

keaoner · September 28, 2018, 9:06pm

Hi Philipp

My timezone is UTC+2

Thanks u

keaoner · October 1, 2018, 12:21pm

Hi Philipp,

additional information:

The production platform is composed of: Graylog 2.4.6 + Amazon Elastic Search 5.6 -> the csv download does not exceed 10 kb / s (with a cluster of 2 elastic search instance)

The platform of the POC is composed of: Graylog 2.0.2 + Elastic search 2.3 (AMI official version graylog 2.0.2) -> the download of the csv rises up to 4000 kb / s (with one elastic search instance)

the request we send contains many wildcards (example: AND NOT (* login * * ident * …)

The index containt 20000000 documents the size that varies from 12GB to 23GB

we have to keep the default values when creating the index-set

On the production platform Elastic search consumes a lot of CPU between 70 to 100% when downloading the csv

thx u very much

jan · October 2, 2018, 2:18pm

@keaoner sad to say - but yes that happens. Let me tell you why.

In 2.0.2 Graylog was part of the Elasticsearch cluster, being a no-data and no-master node. Speaking the binary protocol with Elasticsearch like all other nodes.
Because of some decisions made by Elastic, Graylog was forced to move to the HTTP REST Interface. Graylog and any other Solution that uses Elasticsearch is now in the need speaking HTTPRest to Elasticsearch what gives you lot of overhead and the need that the server does more processing before it sends out the answer.

Graylog will try to get more speed out of it, but that is nothing we can squeeze out in minutes. In addition, we are in the hands of Elastic on this topic because they do not provide a solid stable client for Elasticsearch (what they promised to the world … ).

No excuse, but to explain the problem.

keaoner · October 3, 2018, 8:46am

Hi Jan and Philipp

Thx u very much for your help
@jan : The explanation is clear, it’s a real shame we’ll try to find a solution internally.
Are there any plans to improve this problem of downloading CSV?

Just for your information with Graylog we have developed a user-based search term recommendation system, a statistical spelling checker based on our users, a document recommendation system based on our users’ usage. The POC works well we are moving into production.

jan · October 3, 2018, 10:40am

please see this graylog bug issue: https://github.com/Graylog2/graylog2-server/issues/5172

system · October 17, 2018, 10:40am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CSV export is slow after upgrading to 2.4.3 Graylog Central (peer support)	2	1087	September 27, 2018
Championing Graylog and need performance advice Graylog Central (peer support)	10	4100	September 14, 2017
Graylog User Experience very slow Graylog Central (peer support)	7	317	March 28, 2024
Export CSV different times from 10 minutes to over an hour Graylog Central (peer support)	1	731	November 3, 2020
Export stops after 150MB The Water Cooler (AMA)	2	21	September 9, 2024

The export of dataset in a CSV file with graylog API is very slow in version 2.4.6 (docker cluster)

Related topics