Elasticsearch indexing not done properly through graylog using csv output plugin


(Ganeshbabu Ramamoorthy) #1

Hi All,

We are using graylog version 2.3.0 & Elasticsearch 5.5.2 in our environment and I was trying to index csv data to elasticsearch by configuring filebeat but the entire data was not indexed to elasticsearch properly some data were missed.

Below is the sample data,

Title,Shortdescription,knowledgecategory,Description
DNS Issue Resolution,"Open Start
ype command prompt into Start
Click Command Prompt. It's at the top of the Start window. This will open Command Prompt
Type in ipconfig /flushdns and press Enter
Restart your web browser.",Network,

Below is my generated filebeat configuration,

filebeat:
  prospectors:
  - document_type: log
    encoding: plain
    exclude_lines:
    - ^Title
    fields:
      data: knowledgedata
      gl2_source_collector: f4749ffd-1f9b-4ef1-b065-a8fc32388fa1
    ignore_older: 0
    input_type: log
    paths:
    - /var/log/knowledgedata/*.csv
    scan_frequency: 10s
    tail_files: false
output:
  logstash:
    hosts:
    - graylogdemo.cloudapp.azure.com:5044
    loadbalance: false
path:
  data: /var/cache/graylog/collector-sidecar/filebeat/data
  logs: /var/log/graylog/collector-sidecar
tags:
- linux
- apache
- knowledgedata

Since I am indexing csv files to elasticsearch so I used the csv output plugin in graylog and configured pipeline rule for my knowledge data below is my pipeline rule for csv,

rule "knowledgedata"
when
	has_field("message")
then
  let csv_fields = "Title,Shortdescription,knowledgecategory,description";
  let csv_parsed = csv(csv_fields: csv_fields, csv_text: to_string($message.message), csv_separator: ",");
  set_fields(csv_parsed);
end

After the file done harvested through filebeat I can see in Elasticsearch that knowledge data index has below data only,

“Title”: “DNS Issue Resolution”
"Shortdescription": "\"Open Start"

I couldn’t find any other data indexed to elasticsearch.

Please kindly share any thoughts and let me know what changes I need to make in order to index data to Elasticsearch properly

Thanks,
Ganeshbabu R


(Jochen) #2

Do you really have newlines (\n) in your CSV file?
CSV is an inherently line-based format, so that would be invalid.


(Ganeshbabu Ramamoorthy) #3

No… @jochen

For testing purpose to see whether the issue comes or not I tried with sample data and below is the data in my csv file,

Below is the data in my excel,

Is there anything wrong in my csv file?

Please kindly correct me.

Thanks,
Ganeshbabu R


(Jochen) #4

According to your Excel screenshot, the “Shortdescription” field indeed does contain newline characters.
I’m not sure whether Notepad would display them correctly.


(Ganeshbabu Ramamoorthy) #5

@jochen

So then should I need to use multi line option to resolve this?

Or any other approach is there to resolve (something like using csv filter in logstash)

Please correct me if my understanding is right

Thanks,
Ganeshbabu R


(system) #6

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.