More goodness added to aaLog; New Options format

Spurred on by a request from a user with a unique situation I have been hard at work adding some new features to the aaLog library.  Over the course of the next few posts I will discuss many of the new features that have been added.  To start things off I’ll give information on the new options format.

New Method for Specifying Options

When you typically start out with a program you typically will pass in options via command line arguments.  Then, over time, this list grows to an uncontrollable monster.  Over the course of my last few projects I have come up with a really nice method for managing this in a much simpler fashion; the options file in JSON format.

Using a combination of an options type

and the Newtonsoft JSON library we can very easily create a single line method that will read in an options file and deserialize it for use in our application. By specifying defaults in the options class you only need to specify non-default options in your options file.

Here is an example of what a simplified options file might look like:

Using this composable options method we can now specify complex items in the options file using JSON.

What’s even better is that you can read in the options file but modify those options before you apply them to the instantiated object.

For a filter you you can approach it in one of two ways. You could specify the filter in the options file or you could just add one in-line after reading in the options file from disk. The example above shows how you might add a new filter in-line. I have compacted the essence of the code into a single line, but here is a simpler version expanded to multiple lines to make it easier to follow along.

Also take note of a couple other nuances.  First, if you are ok with all of the default options then you can just instantiate a new options structure and then add your filters as you go along.  Finally create a new log reader object, passing along the options to the constructor.

Now, you might also want to write these filters into the options file but what happens if you aren’t an expert in JSON.  Simple, create the object and use Visual Studio, use Newtonsoft to extract a JSON file then view it.



Taking this text and dropping it into a text file.  There are plenty of JSON beautifiers out there to help you “see” the format better.

In my next post I will discuss more details around Log Filtering.

aaLog is all growed up and Splunktified

When I released the aaLog library a few months ago I thought it was a pretty good set of work.  With the library, and the help of the example projects, you could very easily gain access to the Archestra Log files on the local machine and return a strongly typed list that would then give you the ability to parse and process in any form you like.  Honestly, however, this is really only the start of a solution.  Collection and aggregation of logs are really just a means to an end.  That end should be analysis or more importantly actionable information.  You need a way to sift through 40,000 log records over the weekend and bubble up the fact that someone tried to log in with bad credentials 27 times on 4 different machines and we also had a failover of an engine.  The only way to do this is to have some kind of advanced analysis engine on top.  You could do this with SQL but thankfully today we have something much better.  Splunk.

For the uninitiated Splunk is a pretty simple product to explain.  Step 1.  Send all of your log type data to Splunk.  Step 2.  Splunk stores the data and indexes (pre-search) that data.  Step 3.  Use the Splunk Query Language, basically like SQL but for Splunk, to tease out the important information from your data set.  Again, you could do all of this with a standard RDBMS (relational database like SQL Server) but the mechanics of doing this is Splunk are orders of magnitude easier. Step 4.  Using the queries you wrote in step 4 create dashboards and reports that summarize the data into actionable chunks of information.  See the example of finding failed logins.  If you look at the logs you’ll note that a failed login doesn’t raise a warning or error.  It’s just another log entry so there is a good chance it won’t stand out.

So, to start you can go to and read for days on the product.

Next, you should download it and install it on your local machine.  While the product is designed to scale out to massive scale and performance it is also ultra simple to install and get running with a few minutes and a few clicks.  Within 10 minutes you can have all of your windows event logs coming into Splunk ready for analysis.  So the excuse of “It’s too complicated” just went out the window.

This is where we move to second grade.  Pushing your own data into Splunk.

There are countless ways to send data to Splunk from your application.  The most simplistic way is to simply open a TCP stream and send KVP (key-value pair) data to the Splunk server via the TCP connection.  You can take a look at this GitHub project for aaSplunko to get an idea of how you might do this.  Honestly a large amount of the code in the solution is to make the system a little more stable and flexible.  For your testing you can probably put together a much simpler POC.  One catch is that you have to make sure you use WriteLine and Flush after each write.  At least that’s the combination I found to make writing a single event a reliable activity.

Another way to get data into Splunk is to have it tail a parsable log file.

Finally, the method I have been working on uses the concept of Modular Inputs.  The tl;dr is that a modular input provides a way for a developer to package up code to read and parse log files data into a format that appears as a “built-in” method to read data into Splunk.  The modular input I have built is called Archestra Log Reader.  I know, a fancy name. mi1

Where it gets even better is when you click on the item you get a standard method for viewing all instances along with setting configuration, enable/disable, and other details.


Finally you get a standard method to let user configure the attributes for your input.  In my case I have a path to the log files and maximum number of messages to read in any read cycle.  You can set defaults and add your own validation rules, such as verifying a log file directory exists.mi3I won’t bore you with a line by line analysis of the code but I can help with the major steps if you want to write your own MI.

Step 1 – Download the Splunk C# SDK 2.0  This may not be 100% necessary but it is a lot of nice reference material.

Step 2 – Download the Visual Studio Extensions for Splunk.  This is definitely the way to go to help you get started writing a modular input.

Step 3 – Start a new C# project and select Installed -> Templates -> Visual C# -> Splunk -> Splunk Modular Input



Step 4 – Fill out the basic information.  I believe the purpose of putting in your Splunk user ID is to go ahead and seed the information should you decide to publish your MI.  You can modify this later in one of the created text files.



Step 5 – Now, study the generated code.  There are three main sections of the code.  Scheme, Validate, and StreamEventsAsync.  What is so nice about the MI format is that all of the scaffolding is in place for Splunk to inspect, extract a schema(e), validate parameters, and schedule execution.  You don’t have to worry about any of those mechanics, which is a serious win.

Step 6 – Scheme

The scheme is basically the list of parameters that you will use to configure your input.  In the example MI input you have values of max and min.  Now you must be careful if you want to add more values.  If you modify the scheme you have to update it in multiple places.  First, follow the format and modify the code for the Scheme function.  Second, navigate to Readme\inputs.conf.spec.  Don’t ask me why they make you update both place instead of one and regenerating the other file.  That’s just the way it is.  If you want to provide default values you must create a file called inputs.conf under the default directory.  Again, don’t ask me why they chose that method.  If you want an example, check out my github for my modular input and you will see how I specify default values.  Finally you will have to stop and restart Splunk to force it to recognize the new scheme.  This bit me multiple times.

Step 7 – Validate

Validation is a critical step that lets Splunk make sure the values you have specified for a configuration are valid.  There are some shortcut validations like checking for positive integers etc, but if you require validation with any kind of complex logic it should be done in this function.  If your values don’t pass validation you set an error message and the return value to false.  This will bubble up nicely to the Splunk UI and tell the user why the values are not valid.

Step 8 – StreamEventsAsync

This is where all the work really happens.  Just take a look at the example and you get an idea for all of the basics you need.  I am definitely no expert on all the ins and outs of the new Async/Await programming methods in C# so I won’t try to go down the rabbit hole.  I will. however, point you to my modular input code to show you how I’ve developed a slightly more complex example.  The only trick is the make sure your data is in KVP format.  Example:  speed=10 direction=7 angle=6.6.  There are also some special rules around timestamp formatting and labeling etc.  If you want to see this a little better check out my aaLogReader library and see how I generate the KVP for a record.

Step 9 – Compile and Run It

This is where I lost most of my time.  It can be a tricky dance to get this over into Splunk and ready to run.  Here are the basic steps.

1) Edit the post-build script and remove REM from before the Goto Deploy statement.  Study the balance of the script to understand where the files need to go.  Be careful to make sure you include all necessary files.  For instance if your code has special config or other support files make sure you select Copy Always under Copy to Output Directory.  For some reason this isn’t set on some of the Splunk config files so make sure you set all of these appropriately.

2) RESTART SPLUNK!  You must do this any time you add new modular inputs or if you change the Scheme.  You will find what you need under Settings -> Server Controls.

3) Find your new modular input in the list of Data Inputs.  Click on it and create a new one.  Don’t forget to enable.

4) Find a bug, you know you will.  When you edit and recompile you will get an error about not being able to copy.  The problem is that while the input is enabled, Splunk is actively running the compiled EXE.  To fix this issue you should disable the MI, build, watch the deply, and then re-enable.  You can manage MI’s through a rest API but I haven’t gone to the effort to work out the exact details yet.

Step 10 – Once you are done start searching for your data.  Because I suck at Splunk searching right now I won’t even try to give you guidance on that topic.

Step 11 – Profit for you and your organization and good will from everyone!

I hope this short introduction was enough to pique your interest and more importantly help you see that creating modular input for Splunk is quite simple.  While it may appear more complex than just streaming TCP data, if you want a solution that is supportable and easily configured and deployed then modular inputs is definitely the way to go.

Currently I am working on a Splunk App to visualize and summarize important log data.  Think pie charts and summaries to give you a quick at a glance of how things are going.  Or a table showing the last 20 security events. Or maybe just a single number showing count of security events in the last 24 hours.  There is an entire ecosystem and body of knowledge dedicated to helping you summarize and visual data better.  But it all starts with getting the data into a searchable, usable, format.  The ultimate goal is to turn data into information you can use.   Maybe my work will spur something bigger and better from you?

Comments, questions, and complaints always welcome.


deployCamp is launching!

Notice things have been a little quiet here? Welp, Andy and I have been working on something special.

If you’ve been following me for a bit, you may have seen my frustrations at the lack of open dialogue in the manufacturing industry. I often feel like I’m working on a desert island with tools that Google isn’t aware exists. Welp, in the spirit of changing things for the better, Andy and I have been putting our heads together and figuring out what to do about it. Our solution? Have a giant meetup!

This event is a natural outflow of aaOpenSource, and I hope will bring more folks into the collaborative community. We’re capping the event at 100 participants, and already lining up amazing speakers. Gary Mintchell of The Manufacturing Connection will be our keynote speaker and is 100% onboard with what we’re doing.

You can read more about our motivation and what we’re thinking on the site. I’m super excited about the whole deal and hope that YOU can join us for deployCamp 2015!

(By the way… really need speakers! Submit a topic to cover at our Call for Speakers page.)

A few thoughts on aaLog

A couple weeks ago I posted the first commit for aaLog

I posted a few thoughts on some LinkedIn groups but I thought it was be more appropriate to write in a more long form style here on our home blog.

The Beginnings

For many years I’ve been frustrated by the limited nature of the built-in Log Management software with Wonderware products.  The SMC is a pretty nice interface and gives you some powerful functions to sift and sort your logs.  Unfortunately you can only look at the logs from a singe host at a time.  Yes you can jump from host to host easily but this simply isn’t very practical when you are trying to troubleshoot an issue that spans multiple platforms.  Take the simplest example, a deploy.  It starts on the GR, hops to the platform, and then back to the GR.  Why can’t I see the entire process in a unified view?  In the past I read bits and pieces about syslog and it seemed like a nice solution to this issue.  My only issue was that the Wonderware Logging subsystem did not have a feature to allow for forwarding logs to syslog nor was there a pre-built collector for Wonderware logs.. unlike say 90% of the other software in the world.  Once again living in our own little world where we can’t play with the toys everyone else is playing with.

The Motivation

At the Dallas users conference a few years ago I was having a rollicking good time late night with a few folks, one of them being an engineer that had recently done some work around this exact pain point.  He had the same misgivings as myself and he decided to do something about it.  He built a tool that would read the log files and forward them to numerous different formats; SQL Server, CSV, and syslog.  I saw the work and was duly impressed.  He had a nice GUI for configuring the service and everything.  The core code was quite nicely thought out and I thought it was a winner.  Unfortunately he could never get support from the PTB (powers that be) to actually productize and release the tool either as a free tool or something that customers might pay for.

The Opportunity

A few times I made half-hearted attempts as recreating his work so that it could be shared with the world but I floundered each time.  Why wouldn’t the file watcher start working.. why can’t I read the log records?  So what does any good engineer do when they can’t get someone else’s code to work like they expect?  They study the functionality of the code and then recreate in a form they better understand.  Sometimes the constructs of the original code are brilliant and you just copy them straight up.  Sometimes they may be brilliant but you simply don’t understand them so you instead try to figure out the inputs and the results then build your own algorithms to mimic.

The Boundaries

As I alluded to in the previous section one of the really mind-numbing issues I ran into was that fact that I couldn’t get the stupid file watcher to file every time the log file was updated.  So instead of wasting my time with that I decided I would instead work on the most important piece; the actual log reader functionality.  When you review the repo you will see a few example projects that actually use the library but these are really just basic examples to get you up and running.  They are not intended to be finished works.

The Understanding

The first step in writing this library involved understanding the log files themselves.  The first thing you notice when attempting to read the log files is that if you open them in a text editorm like NotePad++ they simply look like garbly gook

Log File garbly gook
Log File garbly gook

However, as you look a little closer you can definitely see something in the messages but how to figure out delimiters and what the heck is that first line?

What we have here are binary log files.  What are binary log files you ask?  A binary log file is one that is written to in byte form instead of readable text.  Because I’m simply not that smart I don’t know a better way to explain it in words.  But I can explain the pseudocode to read it.

  1. Open the file for reading
  2. Don’t read text from the file.  Instead read the actual bytes.
  3. Study the array of bytes returned and try to figure out what values are numbers and what values are text.  The text field typically will have a NULL in between them to signal that we are changing from one field to the next.  In a typical text log this might be a tab or a comma.  One good way to do this is to stop your code when you have the raw array of bytes and dump the contents to excel. From there you can more easily see that patterns

Here are links to a couple documents on GitHub showing details about each of the formats.

If you want a better understanding of what each of the ASCII codes means check out this link on Wikipedia.

One question you might have is why in the world would someone write log files in a cryptic, unreadable (at least with a text reader) format?

Well, after an exhaustive search (ok, about 2 minutes on Google) I found someone who agreed with my understanding so I shall cite them as an authoritative source.


Why should I use a binary log?

Binary logs are far more compact than text based logs, especially when using a text based encoding like JSON or XML. Binary logs also offer a more CPU efficient solution to encoding and decoding large logs of instruction sets.

The Work

Over the course of a long weekend and a few extra days I trudged through some of the original log reader DLL code to understand exactly how it worked. After understanding the format of the log files the rest is just work, adding features and in general making the library easier to use for others.

So What Now?

I will continue adding features as suggestions come in but what I would really like is for others to start building tools and services around this log reader.

aaSplunko Added to GitHub

In a nod to the big data gods we have posted what I hope is the first of a series of code release for helping you push System Platform data out to Splunk.

Link to the repository is here

Also, I have moved this collection of thoughts below from the ReadMe to this blog post as a way to get more interaction and also keep the ReadMe a bit more on the technical side.

I feel like there are some huge opportunities for sending some of this data to Splunk in addition to a traditional Historian.  For starters, and apologies in advance to Schneider/Invensys/Wonderware, but the cost of Splunk Enterprise is a fraction of the cost of the Historian.  Yes the Historian may be more efficient at storing data but frankly these days disk space is dirt cheap so that’s not a great differentiator any more.  Yes, you’ve got the Historian client, but the out of the box visualization tools provided with Splunk are really stunning and so much more powerful than anything Wonderware is currently providing.  My general feeling on the topic is that the “rest of the world” has solved some really basic problems and it’s time the automation world started to recognize that others are doing some things much better, faster, and cheaper than the proprietary tools we have to slave with.  Yes, the Historian Client is a very nice tool and is very powerful but I suspect with a little HTML5/CSS/jQuery magic someone could produce a web-based interface to this Splunk data that does about 95% of what the current Historian client tool does.  Honestly how many users do you have that use more than about 10% of the capabilities of Historian Client.  Do they know how to use the rubber band scaling?  Have they ever fiddled with the retrieval methods?

Finally, it’s not so much the cost as it is the cost model.  Should I really pay the same price for a boolean tag that changes 10 times a day vs a floating point that changes every 250ms?  I think WW is on to something with the pricing model for Historian online being user based but still seems odd.  To be fair this does give much more predictable pricing for customers and I can empathize.  I think something all of the Industrial Automation companies have got to sit up and recognize is that the pace of innovation in these new technologies is orders of magnitude faster than what we’re seeing from our suppliers.  It is my sincere hope that others will join me in democratizing access to the data that we’ve already paid to collect and then paid again to store.  Why should I have to pay a third time to get it back out?

If you want to know more about what Splunk is doing with Industrial data, look up Brian Gilmore on Twitter.  He’s been a great resource for me and very supportive in making sure I was successful with this tiny proof of concept.  I think this could definitely be a symbiotic relationship for both sides.  We want better places to store and analyze our data.  They want more customers storing data in Splunk.  Win Win if you ask me.

Finally now that I have a little win under my belt I might be attacking Log Files next.  Watch out for that one if I am successful.  Goodbye crappy single node log viewer.  Hello global view across multiple platforms and galaxies. Frankly I’m amazed we’ve put up with that crap for so long.  oooommpphh.  That’s me hopping off the soapbox.



Getting Started with GitHub

Updated on October 20, 2014 with a little more clarification between New and Existing Projects, and links to the new meta project that I setup to capture requests for project inclusion.

You may have noticed that we keep all of our code on GitHub. While this is a relatively unfamiliar site to much of the manufacturing industry, it’s extremely popular and common among web developers, mobile developers, and all types of open source projects. Git itself is a version control system like CVS, Subversion, or Source Safe. GitHub just happens to be a very popular site that provides a Git server and additional features like issue tracking that open source projects often need.

Andy has already set up an “organization” on GitHub that hosts our various repositories. Each distinct project that we have started is its own repository within the organization. If you’d like to work on one particular project and not ALL of the code that we’ve started, you can just look at that one repository and not worry about the rest. You can browse all of the existing projects that we have started or have been contributed to the community on our Projects page. A simple guide to Contributing can be found in a meta repository that I set up.

New Projects

If you have code that is an entirely new project and is unrelated to any of our existing projects, you have two choices:

  1. Create your own repository under your own account and we’ll add it to our Projects page.
  2. We can create a new repository under aaOpenSource for your code to live.

Either way, create a new issue here and we’ll take care of it!

Existing Projects

  • Check out an existing project, try it out, and submit any issues you find.
  • If you have a solution, feel free to create a merge request and contribute back in to the code.

How to do a merge request

Creating a merge request is how you contribute back to existing code. Let’s walk through it and see how it works.

Let’s say that you notice an improvement that could be made with my project, aaTemplateExtract, and would like to submit a fix to it.

10,000 Foot View
  1. Make a copy of the code into your own online GitHub account
  2. Copy your online copy to your computer for editing
  3. Push the changes back to your own account
  4. Submit a request that the changes be merged back into the main code
Break It Down
  1. Download and install the latest GitHub client:
  2. When you start the app, you can sign in using an existing account or you can create a new one.
  3. On the aaTemplateExtract page, click on the Fork button at the top right. This will copy the entire project code into your own account online.
    • Fork Button
  4. After it’s been Forked to your account, you can Clone this to your PC to start making the changes. From YOUR forked page, click the Clone in Desktop button on the right. It’s critical that we clone YOUR copy. So make sure that it no longer says aaOpenSource at the top and now says your user name and that it is a fork.
    • Forked Copy
  5. GitHub for Windows will now ask you for a directory to place it. Select a good place you’d like to keep your working code.
  6. After it’s done Cloning, you’ll see the full revision history in GitHub for Windows and all of the source code is now on your own PC in the directory you specified. You can right click on the repository in the left-most panel and Open in Explorer to see the files.
    • 2014-10-07 15_31_12-GitHub
  7. Open up the project in your favorite development environment, just like you normally would using Visual Studio or SharpDevelop. Make any changes you’d like to make, save, build, and test. Then, come back to GitHub for Windows to continue.
  8. Along the top you’ll see “master” with a drop down arrow. Click on it and you’ll see a list of Branches for this repository. There may not be any except for master.
    • Master is the main stream of code that all approved changes get put into. If someone wants to make a change, they have to have their own Branch with their changes. When the Branch is uploaded, the person who made the change needs to send a “Pull Request” to the owner. This is just a notification to the owner to say “hey, I made some changes to the code base and I’d like for you to check it out and possibly include in the main code.” If the owner of the repository approves the Pull Request, then the Branch is merged in and becomes the latest master code.
  9. If you have a Branch of your own that you’d like to include this change in, then select it. Otherwise, create a new one by typing the name in the Filter text box and then click Create.
    • Branches can be named anything, but the convention is to give them a somewhat descriptive title for what the purpose is for, often using prefix verbs such as “add” or “update” or “cleanup”. For instance “add/featurex” would be a good branch name.
    • Adding a Branch
  10. You’ll notice that after you made changes you’ll have an Uncomitted changes panel above the History. Type in a good subject and description that explains your change, click Commit, then click Publish.
    • Committing to Branch
  11. Now, let’s go back to GitHub. On YOUR fork page, you’ll see that a recently pushed branch is shown. Click on Compare & pull request.
    • Compare pull
  12. If you’re ready to submit this code for review for the main branch, click on Create pull request. Be as descriptive as you can for the feature so that others can know what this is all about.
    • Submit Pull Request
  13. Congratulations! You’ve submitted your first Pull Request to aaOpenSource! After we take a look at it, we’ll evaluate it for inclusion and approve or deny the request.

There’s lots more tasks that you can do with GitHub. I just wanted to get you started with it. Let me know if you have questions. I’m happy to setup a GoToMeeting to help out if you’re trying to contribute to the project. Definitely read and search the GitHub help, first, though, especially the GitHub for Windows help. There’s also GitHub Guides and a GitHub YouTube channel. Edu-ma-cate yourself, friends!

Seeding the Repo

So I was a little busy this morning seeding the GitHub repo with some projects that have absolutely no code in them.  My main purpose was to start putting down all of my ideas for tools we could create with help from the community.  I have the core concepts in my mind and now we just need someone with coding skills and some time to put it together and post for everyone to laugh out.. no that’s not the the open source spirit!  In all honesty I think there are a lot of great engineers in our community with really good ideas but maybe not the time or expertise.  Share your ideas or your time and together we can do something great!