You may not know exactly what your IT Security problem is. We will find it and we’ll develop and implement real-world solutions. This article highlights the trends in cybersecurity and the trajectory of our field.
How security practitioners can incorporate expert knowledge into machine learning algorithms that reveal security insights, safeguard data, and keep attackers out.
With the omnipresence of the term artificial intelligence (AI) and the increased popularity of deep learning, a lot of security practitioners are being lured into believing that these approaches are the magic silver bullet we have been waiting for to solve all of our security challenges. But deep learning — or any other machine learning (ML) approach — is just a tool. And it’s not a tool we should use on its own. We need to incorporate expert knowledge for the algorithms to reveal actual security insights.
Before continuing this post, I will stop using the term artificial intelligence and revert back to using the term machine learning. We don’t have AI or, to be precise, artificial general intelligence (AGI) yet, so let’s not distract ourselves with these false concepts.
Where do we stand today with AI — excuse me, machine learning — in cybersecurity? We first need to look at our goals: To make a broad statement, we are trying to use ML to identify malicious behavior or malicious entities; call them hackers, attackers, malware, unwanted behavior, etc. In other words, it comes down to finding anomalies. Beware: to find anomalies, one of the biggest challenges is to define what is “normal.” For example, can you define what is normal behavior for your laptop day in, day out? Don’t forget to think of that new application you downloaded recently. How do you differentiate that from a download triggered by an attacker? In abstract terms, only a subset of statistical anomalies contains interesting security events.
Applying Machine Learning to Security
Within machine learning, we can look at two categories of approaches: supervised and unsupervised. Supervised ML is great at classifying data — for example, learning whether something is “good” or “bad.” To do so, these approaches need large collections of training data to learn what these classes of data look like. Supervised algorithms learn the properties of the training data and are then used to apply the acquired knowledge to classify new, previously unknown data. Unsupervised ML is well suited for making large data sets easier to analyze and understand. Unfortunately, they are not that well suited to find anomalies. Read more »