Machine Learning – A Tool for Augmenting Security

Reading time: 8 min
Share this Share on email Share on twitter Share on linkedin Share on facebook

The use of machine learning algorithms in the security industry has become a necessity with the rapid growth of malware samples. Obfuscation, polymorphism and encryption have caused a surge in new malware, which now numbers in the hundreds of millions of samples hunting on the internet.

Since one of the main benefits of machine learning is its ability to effortlessly cluster big data, its application in the security industry has enabled endpoint security vendors not just to remain responsive to new threats, but also to push malware detection proactivity. The technology itself, though, has some limitations, and some types of attacks that can bypass it.

Helping the Security Industry

Machine learning algorithms have given bad guys many headaches as they seek new ways to bypass it, and the technology can more accurately detect and identify malware. This technology is a key line of defense against threats and it complements almost all security layers deployed in next-gen endpoint protection platforms from security vendors.

These algorithms must be seen as a detection tool that can be used on all layers of protection – spam detection, phishing detection and network anomalies – to augment their capabilities and increase their effectiveness against new threats. When used as single layer of protection, machine learning alone may have serious limitations in terms of performance and detection that may cause more problems than it solves.

“Anomaly detection and machine learning are helping us to find bad guys that have otherwise bypassed our rules-based prevention systems,” said Eric Ahlm, Research Director within the Security team at Gartner. “That’s why analytics are so relevant to security operations today, they are good at finding bad guys in the data that other systems missed.”

Currently intertwined with most layers of deployed adaptive security architectures, machine learning algorithms are not just part of data-driven security analytics, but also responsible for augmenting security layers. The market size for security applications driven by machine learning has increased significantly in 2016, reaching $800 million, according to Gartner.

“In terms of market size, Gartner estimates that in 2016 the world spent approximately $800 million on the application of big data and machine learning technologies to security use cases," said Will Cappelli, Vice President of Research at Gartner. "About 80 percent of that was big data; about 20 percent was machine learning.”

Incorporating machine learning into both static (file-based) and dynamic (behavior-based) malware analysis has enabled security vendors to augment these security layers with one of the most proactive technologies to date. Instead of acting like a magic bullet that single handedly detects and prevents all threats, machine learning algorithms can be tailored to augment any security layer, acting more like a silver shotgun shell that can prevent even advanced persistent threats aimed at specific companies.

The use of perceptrons, neural networks, centroids, binary decision trees and deep learning algorithms, to name a few, let security researchers correlate results from multiple algorithms from multiple layers of security to tag indicators of compromise and flag potential advanced persistent threats.

The reason for using more than one machine learning algorithm is that, by working together, they can solve most problems as long as each one is designed for high performance for specific tasks. By working as an ensemble – or as a sum of systems - they can provide a high detection rate of new and never-before-seen threats while keeping false positives to a minimum.

Biggest Limitations of Machine Learning

One of the biggest limitations of the technology itself, regardless of the field it’s used in, concerns balancing detection rage, number of positives and performance impact. Making them too generic or too restrictive will cause problems, such as false positives or false negatives, so they become too much of a hassle to work with. For this reason alone, they need to be backed by other security systems or technologies that can fine-tune their “behavior” and filter out potential misfires.

For example, coupling machine learning with whitelisting or adding other methods of detection – or even both – can significantly reduce the number of false positives caused by various machine learning models.

Another limitation is that machine learning can detect only specific types of attacks. Because some of these algorithms might have been specifically trained to detect file-based malware, they will be completely ineffective against denial of service attacks or interpreters.

This can be both a good thing – because a machine learning algorithm becomes more effective as its task becomes more specialized – but also an issue as an algorithm needs to be developed, trained and tweaked for each type of attack type. Of course, having a mesh of security-driven machine learning algorithms augmenting all possible security layers of an endpoint security platform is far more effective than just relying on a single and generic algorithm.

Detecting Malware with Machine Learning

Neural networks, such as multiplayer perceptrons or specific implementations of linear classifiers, are some of the most popular implementations of machine learning algorithms that are designed to increase malware detection rates using repeated training sessions on popular malware categories. For example, allowing these algorithms to extract features from existing malware samples or families, enables them to learn to predict future malware based on shared similar feature. This proactivity feature correctly tags unknown malware based on the latest information available in the training dataset.

Some of these algorithms can be trained to identify potentially unsafe applications or even malicious URLs based on text analysis, using various clustering methods and NLP (Natural Language Processing) to parse texts and EULAs.

The same algorithms can be tuned and trained to identify emerging threats by analyzing public data streams to enhance proactive security. When coupled with various clustering techniques for multidimensional feature analysis, these techniques can help augment more than just malware detection technologies, but also anti-spam technologies and even vulnerabilities in triggered by JavaScript or PDF files.

The Next-Gen Endpoint Security Platform

While machine learning technologies already play a vital role in today’s security industry, the next-gen endpoint security platform will have machine learning algorithms augmenting every security layer and technology designed to stop threats at pre-execution, on-execution and post-execution.

The use of non-signature-based threat detection through the use of machine learning can help prevent sophisticated threats, targeted attacks, hacking tools and even exploits, while allowing visibility into an organization’s security posture. Because these algorithms can be fine-tuned to consolidate existing security layers, a next-gen endpoint security platform that relies on machine learning tools to augment security layers will have the ability to prevent and detect advanced and sophisticated threats before it’s too late.