CSV importing...can it be done?

Any ideas on how to take a CSV file (a large one) and parse it into Graylog?

The goal is to take a nessus vulnerability scan, in CSV format, and bring it into graylog for charting, and future correlation

Is this even possible?

TP

If this is a one time solution, you might try something like this.

  • Move the file a system with nc (netcat)
  • Create a Raw/Plaintext TCP Input, give it a high port above 1024, start the input
  • On the system with nc, run the following command
    In a Linux box
nc REMOTE_IP PORT < YOURFILE.csv

Example

nc 127.0.0.1 8008 < sytems.csv

Once the data is ingested, you can run an extractor like a grok pattern on it to split it up into the fields. I did a cursory search in Google and didn’t see any Nessus CSV grok pattern examples come up.

This of course assumes we are talking about one record per line, this probably wouldn’t work for a multi-line file.

Another method would be to use something like logstash and use the CSV filter, you can use the GELF output in Logstash to send it to Graylog (would require you to configure a GELF input in GrayLog.

Not sure about your setup. Hope something I put helps. Good luck.

Once the data is ingested, you can run an extractor like a grok pattern on it to split it up into the fields.

I just want to add: you should first ingest just one line and build the extractor out of this line and after that is ready ingest all messages. Because extractors will not work on already ingested messages.

I know people that use some script like tha following to ingest data with cron from systems that only provide a csv status.

#/bin/bash


DOWNURL=http://This.Is.the.url/status_csv.ph

GRAYLOGSERVER=192.168.X.XX
GRAYLOGINPUTPORT=5555
# This can be nc or netcat depending of the system
NETCATCOMMAND=nc


wget -O- -q ${DOWNURL} | while read LINE
do
    if [[ ${#LINE} -gt 1  ]];then
        echo $LINE | ${NETCATCOMMAND} ${GRAYLOGSERVER} ${GRAYLOGINPUTPORT}
    fi
done

Reading CSV and sending GELF to Graylog is so easy I don’t see why you wouldn’t just use Python and graypy.

Here is a script I use for sending my Apache2 logs to a GELF UDP input. Reading the CSV should be a simple web search away.

Note that this script is written to take key=value\tkey=value\t… on stdin, which doesn’t make sense if you read CSV directly. It needs tweaking.

Let me know if you need a hand.

#!/usr/bin/env python3

import logging
from socket import gethostname
from sys import stdin

from graypy import GELFHandler

logger_name = __file__.split('/')[-1]
logger = logging.getLogger(logger_name)
logger.setLevel(logging.INFO)
logger.addHandler(GELFHandler(host='...', port=..., localname=gethostname().split('.', 1)[0],
                              debugging_fields=False, extra_fields=True, facility=logger_name))

while True:
    line = stdin.readline()
    if not line:
        break

    parts = line.rstrip().split('\t')
    msg = '-'
    data = {}

    for p in parts:
        key, value = p.split('=', 1)
        if key == 'short_message':
            msg = value

        elif value and value != '-':
            try:
                value = int(value)
            except ValueError:
                try:
                    value = float(value)
                except ValueError:
                    pass
            data[key] = value

    log = logging.LoggerAdapter(logger=logging.getLogger(logger_name), extra=data)
    log.info(msg)

These are all good solutions. One I saw recently, elsewhere, was a one-liner not requiring additional tools:

tail -F -q $yourlogfile |   while read -r line ; do   echo "<7> $hostnamesendingthelog $line" | nc -w 1 -u $graylogserver 514;   done;

…where you replace $yourlogfile, $hostnamesendingthelog, and $graylogserver. $line is populated by the script.

All these solutions let Graylog set the Log Time attribute to the moment these old logs are ingested: Is there a way to specify (maybe map from logged timestamp) to address that?

The snippet you’ve posted is sending each line from the CSV file as a syslog message to Graylog.

You can either include an ISO-8601 timestamp in the syslog message (e. g. <7>1 2017-11-16T09:45:23.000Z <hostname> <line>) or use a structured logging format such as GELF to send the CSV (see http://docs.graylog.org/en/2.3/pages/gelf.html#example-payload).

Of course you could also use Logstash with its file input, CSV filter, and GELF output to read, parse, and send the CSV contents to Graylog.

2 Likes