If you’re in security, you probably have some tool somewhere showing a lot of events. Verizon Data Breach Investigation Reports consistently show that the data’s there — proof and trails in our event stores of threats — and yet, threats continue to go missed. Why? How do you fix this problem?
One knee-jerk reaction is to ask for more detection rules.
Round one! (Bell dings.)
Most of us have tried to fix this problem by turning on a thousand or more rules — looking for anything that might be an indicator of compromise in tools — and threat hunting.
Let’s say, for example, the IP 172.26.1.60 is involved in a threat in my dataset. Here’s what it looks like to start.
Yep. Search five billion events for an IP match. There are a few million known, evil IP addresses that change every day. A recipe for success (not).
Next, you’re flooded by events that might be part of a threat. Every login might be a bad guy. Every new program might be malware.
Let me qualify what I mean by “flooded.” These days, 1,000+ events/second is a small deployment for log collection. That’s 86,400,000 events per day. Ready to “threat hunt”?
Just start typing the latest malware 256-bit hash into the search engine and find them all. Try it for yourself – free threat intel is available from Netenrich at KnowNow. For example, RedLine Stealer has recently been the top malware, and 46206315b239aaa7ed7fc548e1580baf has been a checksum of an evil file.
As you try “threat hunting” to find all threats, you quickly start asking: “How many threats are there?” Millions. And who can type fast enough to run them all every day? I know I can’t.
Unfortunately, many people often throw up their hands at this point; and instead of looking for real threats, they end up just making Top 10 dashboards of the most common events.
Round two! You move to automated rules that look for every evil IP, hash, URL, or other indicator in the logs in a security information and event management (SIEM) system.
And what do you actually look for with rules? Let’s go back to IP 172.26.1.60. Over the last year, this IP has connected to 99+ hosts in my network.
It’s not a difficult rule to write — and you can fudge a little bit so it’s not just looking for that specific IP. For example, you can have it alert on any external IP that connects to more than 100 hosts in the network. Some of these will be normal: sources that are part of Google, Microsoft, or other common networks our users access. Others may be malicious.
But because these rules fire thousands of times a day, you tune. You blind yourself by setting a pain threshold. In the above instance, if you say only alert if this IP address talks to more than 100 unique destinations in your network, you’re good. But what happens if you reconsider? What if the threshold is only 50? Or only 10? Stress, worry, and migraines, that’s what happens. It’s a balancing act — noise/pain threshold for false positives versus missing threats/false negatives — and often, it’s hard to avoid being overwhelmed by noise.
Another question: Do you care if the IP was blocked? I don’t. My defenses did their job. The issue is not looking for a single IP address to block, but rather, finding every IP to block. So, you tune the rule some more … but then, the rule goes off when some string in it has a false positive on the vulnerability scanner, the EDR tool event, or something in a file somewhere. So, you tune some more.
Remember, the personnel tuning the rules are not cheap. And every minute tuning is a minute they’re not responding. Every false positive costs time and money — and distracts from the real threat.
Done well, SIEMs can reduce event noise by a factor of five nines. That’s 99.999% reduction. One million inbound events become 10 alerts that require investigation (See Optimizing Threat Detection).
Unfortunately, if you’ve ever heard about the 2013 Target breach, you know that you have tools telling us about threats, but we often don’t realize when a threat is real because we’re buried in too much noise. Attackers try their best to leave a small footprint, whispering in your ears, so to speak. Threats hide in the noise and it’s hard to hear them over the blaring firewall and web content filter saying over and over and over again that users are trying to go to “prohibited” places all over the world.
And yet, zero-day threats go undetected. This can be due to “stupid user tricks” like using the same password on an already compromised home machine that’s also used for work; or downloading a favorite game with free malware to a system used to access a company network. In fact, in more than 20 years in the cybersecurity business, I can say that more than half of what I need to find to protect the network won’t trigger any of those 1,000+ rules we just enabled in the SIEM.
Round three! Congratulations. Not everyone makes it to round three. Many give up.
In round three, you use patterns (not rules!), repeat attacks, success after fail, rare events, spikes, anomalies (See blog Why 5 patterns are better than 500 rules). You stop missing the zero-days and “stupid user tricks” and actually, can see that patterns detect more. Turn the noise up to 11!
Round four! Collect more and build threat chains (See MITRE ATT&CK framework). You look for the same user or machine to show multiple indicators. Baseball time: One strike? Who cares? Three strikes, you’re out!
Breaking into a network isn’t a single event. There is no sniper rifle for hackers. Most hackers use scripts that first discover open paths and devices on the network (reconnaissance and lateral movement). Then, they have to exfiltrate the data or install some tool to maintain persistent access. It’s a messy, multi-step process. Half the time, you won’t even see the actual event that kills the target, but you will see the misses and can triangulate the direction. Just like playing battleship, you will see a few hits before they sink your ships. So, wait for confirmation.
Have I mixed enough metaphors yet?
At Netenrich, we call a set of alerts, which may or may not be malicious, a situation. When using our Resolution Intelligence Cloud platform and you see multiple indicators (e.g., rule matches or anomalies), you can say with confidence you need to take action. You can focus on the repeated whispers.
Here, the whispers include five detection rules spanning four MITRE ATT&CK techniques from three different log sources. This is triangulation. A correlation of correlations.
Based on my experience over the last decade, I can say with some confidence that rules produce 90+% false positives. They are real events, but nothing that requires action. Moreover, rules can often not fire at all because they’re looking for last year’s malware.
But by the time you get a second strike in a different ATT&CK category from a different log source, the false positive rate drops to less than 30%. By the time you get a third strike, you’re down to around 3%.
Noise reduction increases another order of magnitude with threat chaining versus SIEM. One million inbound raw events become one situation. Going back to our early math, 86,400,000 events (1,000 events/second) turn into 86 situations a day. It’s still a lot, but also, a manageable number.
Round five! You’re down to 86 situations that need a response. Now what? Thankfully, there is a small subset of responses that handles most of these situations.
SOAR to the rescue:
SOAR Playbooks | Steps |
Identify Friend or Foe (IFF) | Open ticket, add threat intel, determine internal vs. external. |
Remove malware. | Isolate host, scan for malware, remove malware, update the EDR and OS, add to watchlist. |
Disable user account. | Disable primary user account. Notify IAM/provisioning. Disable additional linked accounts, as required. |
Remove phishing email. | Search for additional instances in other mailboxes. Remove all. Block sending domain. |
Remove malware attachment from email. | Search for additional instances in other mailboxes. Remove all. |
Block inbound threat source. | Add IP to block list on firewall. |
Distribute black list - EDR. | Add Filename and Checksum to blocked process list. |
Yes, some subset of those 86 will still need investigation. However, with a 3% false-positive rate, blocking the discovered IP address, malicious file, or infected machine will take care of more than 80% of those, leaving about 17 situations/day.
Generally, it’s possible to triage an event in 15 minutes — so, 17 is a manageable number. While one or two may need a deep dive that takes several hours, those are the interesting cases rather than the noise systems should learn to ignore. What’s more, this refinement now allows SOC teams to focus on what matters most to the business (See UEBA as a Use Case).
Today, it’s the best way to find threats in logs in an automated manner without a huge team and without missing nasty threats.