Splunk adds easy and open machine learning
Splunk started its life as a log analysis system and has since grown into a general solution for analysing and acting on machine-generated data.
With Splunk Enterprise 6.5, the company’s enterprise-level offerings now feature machine learning, an ingredient that is all but obligatory for any big data product. But Splunk’s approach is less opaque than most, and it encourages enterprise devs to build with it instead of merely deploying it.
Splunk has two offerings for machine learning: a pre-packaged set of functionalities for common use cases, and a developer toolkit for building custom machine learning models that can be leveraged against data harvested with Splunk.
Enterprises getting their feet wet with either Splunk, machine learning, or a combination of the two can start with the Splunk IT Service Intelligence, Splunk User Behavior Analytics, and Splunk Enterprise Security bundled solution sets.
All of these focus on problems where enterprises have to paw through mountains of data and perform analyses on them that reflect common business problems. For instance, if you want to use machine intelligence to guard against outside attacks or insider threats, you would most likely use some kind of anomaly detection algorithm. But you’d need to ensure that the algorithm can adapt intelligently and not get swamped by natural changes in behaviour.
Splunk says that in such instances where the problem is already well-known and defined, solutions should be provided in a form enterprises can make use of immediately, rather than having to reinvent the wheel.
The newly unveiled Machine Learning Toolkit is likely to be the most useful offering. It leverages Python, which has a wide user base in machine learning and scientific computing, to allow users to develop their own machine learning models that can be run on Splunk data.
As such, Machine Learning Toolkit helps users build solutions that are not covered by the out-of-the-box machine learning or customise a given solution to fix some knotty internal problem.
Splunk provides examples for the Toolkit that cover pretty common scenarios — for example, detecting outliers in server response time or forecasting the number of employee logins for a given time period. The Toolkit also includes visualisations that can be used to display results from the algorithms on Splunk’s dashboard UI.
It’s the data, stupid
All of this allows Splunk’s products to be enriched with machine learning without becoming a black box. It’s too easy to claim something is “powered by machine learning” without providing any details about what’s going on under the hood. The most promising advances in machine learning come from open source toolkits that can be used by everyone but investigated by those who need to know what’s going on underneath.
Machine learning is more about data than algorithms, and Splunk’s always been a data company. It is far more suited to incorporating machine learning than many other outfits might be, but Splunk has taken the extra step of adding machine learning that’s open-ended, not closed.
IDG News Service