Пояснение

среда, 31 января 2018 г.

Logstash and netflow.

The fastest way to load flow data into Elasticsearch.

Hello everyone, I will try to describe my own way to load LOTS of netflow data into Elasticsearch.

At first time I tried to use logstash-codec-netflow. It's work fine but relatively slow. With real data we couldn’t reach more than 7000 EPS on dedicated 16 core server. What’s a shame. This codec is really CPU hungry and 7000 EPS is lesser than third of our netflow data.

So the simplest way I found is “nfcapd + nfdump + filebeat + redis + logstash”. I use logstash mostly for filtering. I will add some details about the implementation in future. But in the few words:

nfcapd - lightweighted daemon for netflow capture. It handles more than 30000 EPS without noticeable CPU using;
nfdump - helpful tool for converting binary into csv data. It’s really fast;
filebeat - the fastest way to read and send text data. The default logstash output for filebeat is slow (it’s even slower than usual logstash file input). You must use the redis output for higher speed;
redis - in memory message broker. It’s the fastest way I know to load something into logstash;
logstash - the great and simple thing for filtering and parsing data.

The nfcapd saves flows on disk in binary for every five minutes. You should use -x option with it to run nfdump and convert files to csv format. After it you can read and send the csv files to the redis by filebeat.

It will looks like nfcapd -w -D -S 1 -l /var/log/basedir/ -p 2055 -x ‘shellscript.sh %d %f %t’

And the script will looks like:
#!/bin/bash  
nfdump -q -o csv -r $1/$2 > /other/$3.log

I used bash because I couldn’t redirect the output from nfdump inside the -x statement. 

So now you can read the resulted csv files by filebeat without any problems!