Configuring syslog-ng for Splunk

Some of the most important data in the enterprise can only come into Splunk as syslog. Unfortunately, Splunk isn't, for a number of reasons, a great tool for receiving syslog data over the wire. Syslog-ng, however, is great at receiving syslog data over the wire, and it can write that data to files in a way that works perfectly for Splunk. Obviously, syslog wasn't your first choice... you desperately wanted to put a Splunk forwarder at the source; you tried and you tried to figure out how to install a forwarder; you begged and you pleaded, but it just wasn't meant to be. Maybe it was an appliance; maybe installing a forwarder would have voided a warranty; maybe it was a relic Unix distro; maybe... but no. You've been forced to collect syslog.

In solving the basic problem, let's also assume that we're using Syslog-ng in a rather large environment, so lots of folks can simply aim their syslog generators at your server (and not tell you). Let's also assume you've got a pair of collection servers sitting behind some sort of load balancer (e.g., an F5). Finally, let's assume that - as a Log Management Professional - you treat logs as either as 1) holy and you refuse to lose any data unnecessarily, or 2) a highly addictive substance you're hooked on, and again, you're damn sure you're not going to lose any unnecessarily.

Solution

The trick to solving this problem is to think like Splunk and NOT a human. What do we mean?

How is a human typically going to organize syslog? A human will generally want the fewest keystrokes required to log in to the syslog server and "tail -f | grep" some goodness from a specific host's files, so you'll end up with some base directory (say "/mnt/local/logs") and then all the logs will get dropped into subdirectories with host names as directory names... /mnt/local/logs/host1, /mnt/local/logs/host2, etc. What happens then? The human sets up logrotate to roll off old data, and BOOM they've just condemned themselves to 1) violate the holy oath of not losing data unnecessarily, or 2) they just lost some good shi-, er stuff, and they're not going to get their fix. Logrotate's SIGHUP causes data to drop on the data center floor for a brief moment.

There's a better way, but how?

Part One - The Basics

First, when Splunk is tagging logs with metadata, it needs to get four things right:

host
_time
sourcetype, and
index

Let’s talk about how we can configure syslog-ng to help Splunk with all four of these. Also, in what follows, we'll use the same base directory as above (/mnt/local/logs), but you may want to change that.

host

This is the easy one. Host is part of the syslog header, and you can actually configure syslog-ng to write every log into a subdirectory that's named after the host:

destination d_splunk{file("/mnt/local/log/$HOST/syslog.log" dir-owner("splunk") dir-group("splunk") owner("splunk") group("splunk")); };

So, if the hostname for a given event is fireeye01, then syslog-ng will write it to a file /mnt/local/log/fireeye01/syslog.log, and it will even create the directory with the correct permissions if necessary.

To get this right, we also need to set some options (syslog-ng version 3.5 or later):

options { flush_lines (100); time_reopen (10); log_fifo_size (1000); chain_hostnames (off); use_dns (no); use_fqdn (no); create_dirs (yes); keep_hostname (yes); threaded (yes); normalize-hostnames(yes); };

_time

The time is right in the syslog message.

sourcetype

You can almost always figure this out from the combination of host and syslog facility. You'll likely have built some sort of mapping or lookup table that clues you in on the sourcetype, but where will splunk get the facility? Easy; you ask syslog to tell record this in the file name like this:

destination d_splunk{file("/mnt/local/log/$HOST/$FACILITY.log" dir-owner("splunk") dir-group("splunk") owner("splunk") group("splunk")); };

index

You'll want to assign an index based on a number of criteria that syslog-ng doesn't know about. These typically include data owner, the sort of system that generated the data, the type of data. We'll have an entire blog article for you devoted entirely to this.

Part Two - Additional info

But wait! There's more! We haven't solved the entire problem yet.

What about not losing data with Logrotate? What if one of our syslog servers is acting up, and we want to immediately diagnose which one it is? What about identifying who sent us the data?

Rotating logs

Everyone knows that the hard drive will get too full if you don't do something with the old data. The default solution to this problem is to use Logrotate, but as we've already stated, the SIGHUP can lose seconds of data. Unacceptable.

The actual solution to this problem is to not rotate the logs! Or at least rotate them in a very different way. Let's start by adding another piece of information to our syslog-ng output path:

destination d_splunk{file("/mnt/local/log/$R_YEAR-$R_MONTH-$R_DAY/$HOST/$FACILITY.log" dir-owner("splunk") dir-group("splunk") owner("splunk") group("splunk")); };

Then, we'll add some cron jobs to clean up the old stuff:

#cron job 1: at 5am, find yesterday's logs, and move them to old_logs 0 5 * * * /usr/bin/find /mnt/*/log/????-??-?? -maxdepth 0 -type d ! -mmin -300 -exec bash -c 'dir={}; old=${dir/\/log\//\/old_logs\/}; mv ${dir} ${old}' \;

#cron job 2: find any files older than 4 days, 23 hours, and delete them 0 4 * * * /usr/bin/find /mnt/*/old_logs/????-??-?? -maxdepth 0 -type d ! -mmin -8580 -exec rm -rf {} \;

The result? Effectively a daily rotate, but no events lost! But wait... why move the files to an old_logs directory? Why not just use a single cron job that deletes files older than a few days? The answer: in a large organization the number of files that Splunk needs to monitor can grow into the thousands and eat up a good chunk of memory. Moving files a day old to a separate directory frees Splunk from having to keep looking at those files.

You'll likely want to change the 4-5 day persistence depending on the spare storage you have available on the syslog servers. Five days is sufficient to deai with all but the most catastrophic of failure modes.

Identify the syslog collector

If you're running a pair (or more) of syslog-ng servers behind a load balancer, if one of them starts to misbehave, it can be tricky to determine which one. Let's add another piece of info to the syslog-ng output path:

destination d_splunk{file("/mnt/$LOGHOST/log/$R_YEAR-$R_MONTH-$R_DAY/$HOST/$FACILITY.log" dir-owner("splunk") dir-group("splunk") owner("splunk") group("splunk")); };

Determine who sent the data

In a large organization, your syslog-ng server may not receive directly from the original host. There may be several intermediate syslog severs that catch the events and then forward to your server. This info can be very useful if you need to diagnose a problem upstream or if you need additional information to determine who sent you the data (remember, we ship data to indexes designed for a specific data owner.) Let's add an additional piece of information to our syslog-ng output path to inform us the immediately prior sender:

destination d_splunk{ file("/mnt/$LOGHOST/log/$R_YEAR-$R_MONTH-$R_DAY/$HOST_FROM/$HOST/$FACILITY.log" dir-owner("splunk") dir-group("splunk") owner("splunk") group("splunk")); };

Part Three - Configuring Splunk

Use a heavy forwarder

A universal forwarder may work okay, but when you’re building a big syslog server, you’re bound to get events from random time zones. Universal forwarders don’t parse, so you can't do time-zone adjustments; heavy forwarders do. By using a heavy forwarder, you can keep all your syslog time-zone adjustments in the same place as your syslog inputs for easy maintenance. There's another benefit of using a heavy forwarder, too. If you need to adjust _TCP_ROUTING to get data from a potentially communal syslog-ng server to a dedicated set of indexers apart from the other data, you'll be glad you've got the ability to redirect those events from the syslog server itself.

Adjust maxQueueSize in outputs.conf

The maxQueueSize defaults to 512kb, which is fine for a universal forwarder sitting on a workstation, but this low of a setting will cause massive indexing delays on syslog servers collecting 500GB of log data per day. In each output stanza in outputs.conf, add this line:

maxQueueSize = 64MB

Why 64MB? Because it’s often good enough, but feel free to go bigger! True, theory suggests that an optimal queue size in a constrained environment will generally be one that is on average half full, but these days RAM on your server is pretty cheap and readily available... You're not really dealing with a serious memory constraint are you?

Here’s a search that will let you know if you’ve gone big enough with your maxQueueSize:

index=_internal host= source=*metrics.log group=queue name=

| eval output_queue_pct=current_size/max_size*100

| timechart perc95(output_queue_pct) by host

| eval Bad=80

Run that over 24 hours as a graph. If the resulting graph ever pops above the “Bad” line, double your maxQueueSize. As long as you have enough free memory, you can afford to up your maxQueueSize. The meaning of the number is the exact amount of memory Splunk will allow each pipeline (by default there is only one pipeline) to devote to the output stage. 64MB should not be a problem on any modern computer.

Part Four - Get the final answer

A full exmaple of the syslog-ng.conf file can be downloaded from our repo at https://gitlab.com/rationalcyber. Enjoy!

#syslog #splunk