When I run an export of a csv format dataset in production with the Graylog 2.4.6 API (docker cluster) I have a bitrate of 5KO / sec, the total volume of this dataset is about 5GB, on another machine with graylog 2.0.1 (standalone install) with the same query and the same data size I have 4000KO/sec bitrate, is there a configuration to modify to solve the problem on version 2.4.6?
I saw this post on the forum but I’m on a graylog installation under docker /etc/default/graylog-server does not exist, the ps aux command tells me the following launching parameters:
AFAIK this is not loaded by default. And since it is not included in your ps aux, it is also not enabled in your environment. Could you make another export to CSV and then get the Graylog logs shortly after that and post them here? Have a look through them first if you find something relevant, since you might have to alter the logging level of Graylog first in the System configuration menu for it to be more verbose
2018-09-20 09:37:28,840 ERROR: org.graylog2.inputs.converters.CsvConverter - Different number of columns in CSV data (22) and configured field names (20). Discarding input.
59 matches. Your loosing some of your logs with this
Do you know at which time you issued the CSV export? Because else this will be searching the needle in a haystack…
The production platform is composed of: Graylog 2.4.6 + Amazon Elastic Search 5.6 -> the csv download does not exceed 10 kb / s (with a cluster of 2 elastic search instance)
The platform of the POC is composed of: Graylog 2.0.2 + Elastic search 2.3 (AMI official version graylog 2.0.2) -> the download of the csv rises up to 4000 kb / s (with one elastic search instance)
the request we send contains many wildcards (example: AND NOT (* login * * ident * …)
The index containt 20000000 documents the size that varies from 12GB to 23GB
we have to keep the default values when creating the index-set
On the production platform Elastic search consumes a lot of CPU between 70 to 100% when downloading the csv
@keaoner sad to say - but yes that happens. Let me tell you why.
In 2.0.2 Graylog was part of the Elasticsearch cluster, being a no-data and no-master node. Speaking the binary protocol with Elasticsearch like all other nodes.
Because of some decisions made by Elastic, Graylog was forced to move to the HTTP REST Interface. Graylog and any other Solution that uses Elasticsearch is now in the need speaking HTTPRest to Elasticsearch what gives you lot of overhead and the need that the server does more processing before it sends out the answer.
Graylog will try to get more speed out of it, but that is nothing we can squeeze out in minutes. In addition, we are in the hands of Elastic on this topic because they do not provide a solid stable client for Elasticsearch (what they promised to the world … ).
Thx u very much for your help @jan : The explanation is clear, it’s a real shame we’ll try to find a solution internally.
Are there any plans to improve this problem of downloading CSV?
Just for your information with Graylog we have developed a user-based search term recommendation system, a statistical spelling checker based on our users, a document recommendation system based on our users’ usage. The POC works well we are moving into production.