Top Categories

Spotlight

todayMay 7, 2024

Cyber Security Hubbl3

Survivorship Bias and How Red Teams Can Handle It

Reporting is, by nature, only the threat actors that have been caught. What about all the ones that didn’t get caught? There is no way to examine that and It comes down to the fact that we don’t know what they did and therein lies the problem for threat emulation.


Survivorship Bias and How Red Teams Can Handle It

Cyber Security Hubbl3 todayMay 7, 2024

Background
share close

If you have spent any time on our Discord, you have almost certainly seen some discussions about the prevalence of PowerShell and how it’s still used in most modern attacks. After all, we are still big fans of it in threat emulation and still publish research on it. Inevitably though, someone will break out this image.

If you aren’t familiar with it, it’s the damage distribution of bombers that returned home during World War II. If you are discussing where armor should be added to the aircraft, the initial inclination is to add it to the most often damaged areas. However, this is the distribution of damage on planes that still made it home. Conversely, the aircraft damaged in the areas with almost no damage did not. That initial tendency to focus on only the damage that had returned or a group that has gone through some selection process, not those that didn’t, is known as Survivorship Bias.

How does this relate to the fact that we see PowerShell being reported on constantly in attacks? Well, reporting is, by nature, only the threat actors that have been caught. What about all the ones that didn’t get caught? There is no way to examine that and it comes down to the fact that we don’t know what they did. Therein lies the problem of threat emulation. But it does mean they were doing something different than those that did get caught.

We are going to crack open Pandora’s box for a second here. This fact is one of the main points that advocates of open-source tooling focus on. Without people bringing unknown TTPs to people’s attention, adversaries would be free to continue using them without detection. Exposing them allows defenders to identify indicators of compromise (IOCs)  that can be used to better monitor them. Back to the discussion, though, if we know that reporting is biased towards only those adversaries we currently know how to detect, how should we ensure that we represent valid customer threats? Well, we should consider a couple of things when trying to answer that question.

First, a very large class of threat actors don’t care about being caught and don’t simply change TTPs because their tooling was burned once. Things like ransomware change more on development timelines than because they got caught once. We see continual use of ransomware strains even if more mature organizations routinely catch them. They are targeting low-hanging fruit and getting caught doesn’t hurt their ability to still go after those unprepared organizations. So, for many customers, especially those in less mature environments, simply replicating those reported threats is enough, which brings us to the second point.

Every organization has a different configuration and level of maturity. Just because Unit 42 reported on detecting a threat doesn’t mean that security products or organizations industry-wide are detecting them. This is what I, very unoriginally, refer to as the intelligence gap. When new threats are reported, they almost always come from a single source and other reporting lags behind this initial detection. In a few months or a year, many products will likely have integrated and rolled out updated signatures and detection methods, but this leaves a significant time gap where we, as red teams, can be more agile than the defense industry. We can start integrating newly identified TTPs to give organizations a better picture of how much risk they accept while waiting for their defense products to be updated. It may also be the case that some products already utilize IOCs that can detect the newly identified TTPs without really needing to change anything.

This is where I think it’s important to stay up to date on open resource reporting because we cannot only integrate those new TTPs but also see what trends threat actors are moving towards. Say moving from C# to Rust or redeploying some older PowerShell loaders. Those trends can allow us to give customers the most representative threat to train and test against, which brings me to the third and final point.

What does emulation actually mean? This might seem like a silly question, especially if you have heard my soap-box speech about the lack of rigor applied to emulation vs simulation. For those who don’t know, emulation reproduces the exact behavior of something, while simulation only reproduces the outcome of an event. That is under the strictest definitions, we would recreate the exact steps of a known threat, including running the same commands that they did. While a simulation would use whatever means to produce the same outcome, information exfiltration, domain takeover, etc. MITRE follows this stricter definition of emulation and Cat Self and Kate Espir gave an excellent talk on how they identify threat reports with sufficient technical data to build an emulation and replicate them at last year’s Black Hat USA and you can find their slides here.

There is a lot of value in reconstructing emulation in this way. Especially for creating objective, data-based evaluations for software products. This allows us to compare the performance of various tools to identical operations and determine which ones perform the best in what scenarios. However, it is missing one important factor for red team emulation. Modern day compromises include hands-on keyboard humans who are themselves dynamic.

Whenever these conversations arise, I am reminded of a quote I heard while working a Red Flag at Nellis. I don’t remember the exact quote or who said it, but it was something like, “Adversaries are dynamic and those that train against static threats are destined to fail.” Always using the exact same playbook trains people to counter only very specific actions and scenarios. You sometimes see this between internal red and blue teams. Since they constantly face the same group with the same tendencies and tools, their detection methods become tailored to stopping their internal team instead of general threats. So, when red teams conduct threat emulation, they should be dynamic. Mixing in TTPs that still align with the threat level they represent but are not constrained to only those TTPs that have been explicitly reported for a threat. In other words, they should not be constrained by the survivorship bias of reporting. Reporting is useful for identifying trends and the types of TTPs and languages being used in the wild without being restricted to only those tools that have already been burned.

Now, this doesn’t mean that if a red team is emulating a threat known for its .NET tooling, it’s okay for them to run entirely in C tooling. That goes to the other end of the spectrum, where teams are running the shiniest newest tools that few customers have the ability to detect and become disconnected from the trends being reported in the wild. But that’s a discussion for another blog.  

Let me close with this, I believe red teaming is much like the hidden iceberg. As with identified TTPs, the iceberg reveals only a fraction of its true size above the surface, as relying solely on reported threats exposes only a small portion of the potential risks. Embracing the dynamic nature of adversaries, red teams must delve beneath the surface to uncover the unseen threats that lurk below. However,  the iceberg is still one connected piece. While we don’t see everything going on, we need to bound what we are doing to those TTPs that fall within the capability and probable use of the threat we are emulating and as part of that testing, known TTPs are just as important as the unknown.


Empire Ops II will offered as a live virtual class on June 5. If you bundle it with Empire Ops I you will get Ops I at a 20% discount! Sign up now

Written by: Hubbl3

Tagged as: .

Rate it

Previous post