aaLog is all growed up and Splunktified

When I released the aaLog library a few months ago I thought it was a pretty good set of work.  With the library, and the help of the example projects, you could very easily gain access to the Archestra Log files on the local machine and return a strongly typed list that would then give you the ability to parse and process in any form you like.  Honestly, however, this is really only the start of a solution.  Collection and aggregation of logs are really just a means to an end.  That end should be analysis or more importantly actionable information.  You need a way to sift through 40,000 log records over the weekend and bubble up the fact that someone tried to log in with bad credentials 27 times on 4 different machines and we also had a failover of an engine.  The only way to do this is to have some kind of advanced analysis engine on top.  You could do this with SQL but thankfully today we have something much better.  Splunk.

For the uninitiated Splunk is a pretty simple product to explain.  Step 1.  Send all of your log type data to Splunk.  Step 2.  Splunk stores the data and indexes (pre-search) that data.  Step 3.  Use the Splunk Query Language, basically like SQL but for Splunk, to tease out the important information from your data set.  Again, you could do all of this with a standard RDBMS (relational database like SQL Server) but the mechanics of doing this is Splunk are orders of magnitude easier. Step 4.  Using the queries you wrote in step 4 create dashboards and reports that summarize the data into actionable chunks of information.  See the example of finding failed logins.  If you look at the logs you’ll note that a failed login doesn’t raise a warning or error.  It’s just another log entry so there is a good chance it won’t stand out.

So, to start you can go to Splunk.com and read for days on the product.

Next, you should download it and install it on your local machine.  While the product is designed to scale out to massive scale and performance it is also ultra simple to install and get running with a few minutes and a few clicks.  Within 10 minutes you can have all of your windows event logs coming into Splunk ready for analysis.  So the excuse of “It’s too complicated” just went out the window.

This is where we move to second grade.  Pushing your own data into Splunk.

There are countless ways to send data to Splunk from your application.  The most simplistic way is to simply open a TCP stream and send KVP (key-value pair) data to the Splunk server via the TCP connection.  You can take a look at this GitHub project for aaSplunko to get an idea of how you might do this.  Honestly a large amount of the code in the solution is to make the system a little more stable and flexible.  For your testing you can probably put together a much simpler POC.  One catch is that you have to make sure you use WriteLine and Flush after each write.  At least that’s the combination I found to make writing a single event a reliable activity.

Another way to get data into Splunk is to have it tail a parsable log file.

Finally, the method I have been working on uses the concept of Modular Inputs.  The tl;dr is that a modular input provides a way for a developer to package up code to read and parse log files data into a format that appears as a “built-in” method to read data into Splunk.  The modular input I have built is called Archestra Log Reader.  I know, a fancy name. mi1

Where it gets even better is when you click on the item you get a standard method for viewing all instances along with setting configuration, enable/disable, and other details.

mi2

Finally you get a standard method to let user configure the attributes for your input.  In my case I have a path to the log files and maximum number of messages to read in any read cycle.  You can set defaults and add your own validation rules, such as verifying a log file directory exists.mi3I won’t bore you with a line by line analysis of the code but I can help with the major steps if you want to write your own MI.

Step 1 – Download the Splunk C# SDK 2.0  This may not be 100% necessary but it is a lot of nice reference material.

Step 2 – Download the Visual Studio Extensions for Splunk.  This is definitely the way to go to help you get started writing a modular input.

Step 3 – Start a new C# project and select Installed -> Templates -> Visual C# -> Splunk -> Splunk Modular Input

mi4

 

Step 4 – Fill out the basic information.  I believe the purpose of putting in your Splunk user ID is to go ahead and seed the information should you decide to publish your MI.  You can modify this later in one of the created text files.

mi5

 

Step 5 – Now, study the generated code.  There are three main sections of the code.  Scheme, Validate, and StreamEventsAsync.  What is so nice about the MI format is that all of the scaffolding is in place for Splunk to inspect, extract a schema(e), validate parameters, and schedule execution.  You don’t have to worry about any of those mechanics, which is a serious win.

Step 6 – Scheme

The scheme is basically the list of parameters that you will use to configure your input.  In the example MI input you have values of max and min.  Now you must be careful if you want to add more values.  If you modify the scheme you have to update it in multiple places.  First, follow the format and modify the code for the Scheme function.  Second, navigate to Readme\inputs.conf.spec.  Don’t ask me why they make you update both place instead of one and regenerating the other file.  That’s just the way it is.  If you want to provide default values you must create a file called inputs.conf under the default directory.  Again, don’t ask me why they chose that method.  If you want an example, check out my github for my modular input and you will see how I specify default values.  Finally you will have to stop and restart Splunk to force it to recognize the new scheme.  This bit me multiple times.

Step 7 – Validate

Validation is a critical step that lets Splunk make sure the values you have specified for a configuration are valid.  There are some shortcut validations like checking for positive integers etc, but if you require validation with any kind of complex logic it should be done in this function.  If your values don’t pass validation you set an error message and the return value to false.  This will bubble up nicely to the Splunk UI and tell the user why the values are not valid.

Step 8 – StreamEventsAsync

This is where all the work really happens.  Just take a look at the example and you get an idea for all of the basics you need.  I am definitely no expert on all the ins and outs of the new Async/Await programming methods in C# so I won’t try to go down the rabbit hole.  I will. however, point you to my modular input code to show you how I’ve developed a slightly more complex example.  The only trick is the make sure your data is in KVP format.  Example:  speed=10 direction=7 angle=6.6.  There are also some special rules around timestamp formatting and labeling etc.  If you want to see this a little better check out my aaLogReader library and see how I generate the KVP for a record.

Step 9 – Compile and Run It

This is where I lost most of my time.  It can be a tricky dance to get this over into Splunk and ready to run.  Here are the basic steps.

1) Edit the post-build script and remove REM from before the Goto Deploy statement.  Study the balance of the script to understand where the files need to go.  Be careful to make sure you include all necessary files.  For instance if your code has special config or other support files make sure you select Copy Always under Copy to Output Directory.  For some reason this isn’t set on some of the Splunk config files so make sure you set all of these appropriately.

2) RESTART SPLUNK!  You must do this any time you add new modular inputs or if you change the Scheme.  You will find what you need under Settings -> Server Controls.

3) Find your new modular input in the list of Data Inputs.  Click on it and create a new one.  Don’t forget to enable.

4) Find a bug, you know you will.  When you edit and recompile you will get an error about not being able to copy.  The problem is that while the input is enabled, Splunk is actively running the compiled EXE.  To fix this issue you should disable the MI, build, watch the deply, and then re-enable.  You can manage MI’s through a rest API but I haven’t gone to the effort to work out the exact details yet.

Step 10 – Once you are done start searching for your data.  Because I suck at Splunk searching right now I won’t even try to give you guidance on that topic.

Step 11 – Profit for you and your organization and good will from everyone!

I hope this short introduction was enough to pique your interest and more importantly help you see that creating modular input for Splunk is quite simple.  While it may appear more complex than just streaming TCP data, if you want a solution that is supportable and easily configured and deployed then modular inputs is definitely the way to go.

Currently I am working on a Splunk App to visualize and summarize important log data.  Think pie charts and summaries to give you a quick at a glance of how things are going.  Or a table showing the last 20 security events. Or maybe just a single number showing count of security events in the last 24 hours.  There is an entire ecosystem and body of knowledge dedicated to helping you summarize and visual data better.  But it all starts with getting the data into a searchable, usable, format.  The ultimate goal is to turn data into information you can use.   Maybe my work will spur something bigger and better from you?

Comments, questions, and complaints always welcome.

-andy

2 thoughts on “aaLog is all growed up and Splunktified

    1. Thanks for the comment. As we speak I am packaging up the app to deploy on the Splunk site for the world to enjoy. I will come back and post an update once it’s been approved and is viewable to the public.

Leave a Reply