Using AI/ML to Create Better Security Detections
Published 08/19/2022
Originally published by LogicHub here.
Written by Anthony Morris, Solution Architect, LogicHub.
The blue-team challenge
Ask any person who has interacted with a security operations center (SOC) and they will tell you that noisy detections (false positives) are one of the biggest challenges. There have been many companies that have tried to solve this problem but virtually all attempts have come up short. This article will attempt to promote a better solution using artificial intelligence (AI) & machine learning (ML) while remaining highly understandable and easily comprehensible.
First, to understand the challenge facing blue teams - those defenders charged with identifying and responding to attacks - you realize that almost any indicator will fit into one of two buckets. All detections/indicators can either be categorized as signature-based or anomaly-based.
Signature-based detections
Signature-based detections are manifested with things like:
- Look for a running process named “mimikatz.exe”
- Look for 50 failed logins in less than 60 minutes
Signature-based detections are trivial for attackers to circumvent in most cases. Using the two examples above, an attacker might rename their malicious mimikatz.exe executable to notepad.exe to avoid to detection. Similarly, if they execute 30 failed logins/hour, they also remain under the radar of detection because the threshold of concern was 50.
The effectiveness of signature-based detections depends highly on the breadth of detections and maintaining the secrecy of what is being monitored for. A non-technical analogy would be laying a field with tripwires and landmines; if the attacker knows the locations of your defenses, they can successfully navigate through them.
Anomaly-based detections
A second bucket of detections are anomaly-based detections. Anomaly based detections don’t rely on signatures but instead look for things that aren’t normal. Using the two examples above, anomaly detections might be something like:
- Look for uncommon running process names
- Look for statistically interesting volumes of failed logins
These anomaly detections are more difficult for attackers to circumvent but have challenges of their own. Specifically, just because something is anomalous doesn’t make it malicious.
Actions like quarterly backups appear statistically similar to data exfiltration, as an example. If a defender makes these anomaly detections too sensitive, then they are bombarded with noise. If they make the thresholds too high, they risk missing attacks.
Over the years, there have been companies that try to solve this problem by aggregating these indicators. Examples include:
- A vendor that aggregates first-time events such as, “the first time a user logged on from a foreign country,”“the first time a user setup a scheduled task,” and “the first time a user sent 1GB of data.”
- Assigning points to indicators and looking at those that accumulate the most points.
- Mapping indicators to an industry standard (e.g., MITRE) and identifying actors that are exploiting multiple tactics/techniques.
But advances in computer technology have allowed us to develop a better way. Artificial intelligence and machine learning solutions are well within reach and less complicated than you might believe. To demonstrate this, we’ll pivot to an example that isn’t a cyber security issue.
A “Dummie’s intro" to machine learning
Ask the question “Will my spouse get home from work before 6:00 PM?” Where my spouse gets off work at 5:00pm, and it takes 30 minutes to drive home. To answer this question, there are several questions that must be considered such as, “Did they leave work on time?” or “Was there traffic on the way home?” These questions are known as FEATURES.
The result of comparing features to outcome is rather intuitive:
Programmatically, this can be expressed like:
SELECT COLLECT_SET (Actual Outcome) FROM TRIPS GROUP BY F1, F2
As long as the collection of outcomes based on previous features is limited to a single outcome, we can accurately predict [in theory] the outcome is that my spouse will arrive home on time (Outcome=Yes).
However, the problem starts to grow in complexity when the outcome doesn’t match. Consider this scenario: my spouse did NOT leave work at 5:00pm, but traffic was good, and my spouse still made it home by 6:00pm. In this scenario, we have the same values in Feature 1 (F1) and Feature 2 (F2) but the actual outcome is different.
Said another way, the predicted outcome and the actual outcome are different. One hypothetical explanation for this difference might be because the question allows one hour to make the trip, and without challenges, it is in fact a 30-minute trip. Technically, we have 30 minutes of “cushion.”
In this case, the model would be more accurate if we express features as numeric values like, “How many minutes after 5:00pm did my spouse leave?” (F1) or “How many minutes was my spouse detained in traffic?” (F2)
In our scenario, because our spouse left only 15 minutes after 5:00pm, there is enough cushion to predict he or she will still arrive before 6:00pm. Consequently, our model can be improved if we replace yes/no values with numerical values. Now we get a model that works:
LESSON # 1 - How you define features impacts the accuracy of the outcome.
More powerful yet, I can now create additional features combining F1 & F2. Now I will add a new feature (F99) called “Total Delay” that is the sum of F1 & F2. My outcome is determined by joining these two features. This new feature (F99) allows the system to “guess” the answer for previously unseen scenarios not considered before.
Suppose that my spouse was 15 minutes late leaving (F1) and then delayed 20 minutes in traffic (F2). Even though this is a scenario not previously observed, the system accurately predicts the outcome based on similarity of F99 values:
LESSON # 2 - Features may be combined to create additional features to improve accurate outcomes to unknown scenarios.
There is one more consideration when building an AI/ML learning. Suppose my spouse stopped at the grocery store for 35 minutes on the way home. Even leaving on time and without traffic, the resulting table has a conflict. Notice when F99 is matched, the actual outcome and predicted outcome is different.
This is because there is additional information that we must consider that was not reflected in our original model. We need to add a third feature, “How many minutes did they stop before coming home?” (F3) and modify our F99 formula to be F1+F2+F3. The resulting table becomes:
With the new feature added, our F99 values are mapped and once again, the model works.
LESSON # 3 - When outcomes are not accurate, the most common explanation is that a necessary additional feature was not considered in the model.
Finally, even when numbers don’t exactly match, we can still perform predictions based on the closest match, a principle called “nearest neighbor.” Now we have added two more scenarios.
Notice the nearest neighbor to 37 is 35, so we predict an outcome of “No.” In contrast, the nearest neighbor to 14 is 15, so we predict an actual outcome of “Yes.” In both scenarios, we were correct. When our estimates based on nearest neighbor are incorrect, we can simply enlarge the size of our training data to get more accurate predictions.
LESSON #4 - Increasing the size of the training data is another way to increase the accuracy of predictions.
Application and next steps
It is the position of this author and LogicHub that the industry could significantly advance detection quality if we take additional steps beyond the initial signature/anomaly detection.
Rather than simply aggregating the indicators or attempting to directly respond to individual indicators, we would benefit from building a knowledge base of the features associated with indicators. By using these features in machine learning and artificial intelligence systems we can better predict what is actionable for the SOC.
About the Author
Anthony Morris is a Solution Architect and one of the primary developers of the award-winning LogicHub security platform. Anthony is responsible for developing and maintaining the content library for the LogicHub platform, performing tasks like automated threat hunting, incident triage, and security orchestration and automated response (SOAR).
Before his current role, Anthony had other significant information security experience ranging from front-line security analyst to acting as a senior manager within as many as 14 headcount reporting to him. In his various previous roles, Anthony has been accountable for performing incident triage, investigation and containment of multiple global and international incidents, detection of emerging threats and detection of security incidents from internal and external threats, automating security operations through code, writing security policies, performing risk assessments, and ensuring regulatory compliance.
Anthony has experience working with multiple industries including information services, financial industry, education, government, manufacturing, healthcare, and pharmaceuticals. Anthony has multiple active security certifications to accompany his Masters Degree in Security.
Related Articles:
The Evolution of DevSecOps with AI
Published: 11/22/2024
How Cloud-Native Architectures Reshape Security: SOC2 and Secrets Management
Published: 11/22/2024
CSA Community Spotlight: Nerding Out About Security with CISO Alexander Getsin
Published: 11/21/2024
Establishing an Always-Ready State with Continuous Controls Monitoring
Published: 11/21/2024