MITRE ATT&CK® Evaluations 2022 – Why Actionable Detections Matter

On March 31st, the results of the latest round of the MITRE ATT&CK® Evaluations for security solutions were released. This year, 30 security solutions from leading cybersecurity companies, including Bitdefender, were tested on their ability to detect the tactics and techniques of Wizard Spider and Sandworm Team.

This 4th round of the MITRE evaluations focused on the Data Encrypted for Impact technique (T1486). Adversaries may encrypt data on target systems or on large numbers of systems in a network to interrupt the availability of system and network resources. Wizard Spider, known for the Ryuk (S0446) and Conti (S0575) malware, was selected to represent the ransomware industry. The Sandworm Team, known for NotPetya malware (S0368), represents a more sinister wiper malware, designed to cause irreversible destruction. Both are very timely selections – Conti ransomware is under detailed scrutiny by security researchers after a recent leak, while wipers like NotPetya are commonly being deployed in Ukraine amidst the ongoing war.

What makes MITRE ATT&CK® Evaluations unique and valuable?

In a market filled with over-hyped claims, validating capabilities through independent third-party testing is critical. AV-Comparatives and AV-TEST are some of the well-known organizations when it comes to evaluating security solutions, and the MITRE Engenuity ATT&CK® Evaluations have been gaining in popularity among security vendors and practitioners.

The ATT&CK® Evaluations are unique in many ways. Instead of testing the solution’s ability to block cyber threats, MITRE emulates the full behavior of sophisticated threat actors if they were to get passed prevention layers. To this objective, the blocking behavior, or preventative capabilities, of the tested security solution are disabled so that the evaluation can focus on detection, telemetry, and analytics capabilities. The ATT&CK® knowledge base framework is used to provide a common vocabulary and alignment for all evaluated vendors.

Making sense of the ATT&CK® Evaluation results can be challenging as MITRE Engenuity does not publish a comparative analysis, instead leaving this to individuals to assess. There are no scores, rankings, or ratings. Instead, evaluations show how each vendor approaches threat detection in the context of the ATT&CK® knowledge base. Provided results are extensive, and without competitive rankings, multiple potential interpretations make it difficult to navigate the results. Forrester analysts nicely summarized the challenges of ATT&CK evaluations marketing in a blog “Winning” MITRE ATT&CK, Losing Sight Of Customers.

The ultimate competitor of all ATT&CK evaluations participants are threat actors. ATT&CK evaluations help security vendors to learn from these exercises and improve our security tools. At Bitdefender, we are very proud that over one-third of the participating vendors license one or more technologies from us, validating the value of our technology and expertise.

How to evaluate detection quality?

The lack of competitive rankings is an unusual characteristic of the ATT&CK® evaluations, but it shouldn’t discourage the cybersecurity community from spending the time to understand the results. The results provide an excellent source for understanding the behavior of tested solutions and are complementary to other independent third-party reports.

The ATT&CK® evaluation scenarios this year contained 109 sub-steps, covering a wide range of ATT&CK® tactics and techniques. One of the easiest ways to visualize the tactics and techniques included in the current round of ATT&CK® Evaluations is to use ATT&CK® Navigator – a web-based tool from MITRE for visualizing the ATT&CK® matrix.

ATT&CK® Navigator with an applied layer for the current round of ATT&CK® evaluations. Horizontal columns represent tactics, vertical rows represent techniques, different colors identify techniques used by one (or both) malicious groups. Source: MITRE

For each of the sub-steps, the highest achieved detection category is listed. The detection category tells you if a security solution has the capability to see the behavior (Telemetry), has analytics capability to tell you what the attacker was trying to achieve (Tactic), or provides details for how the action was performed (Technique).

Shape, arrow

Description automatically generated

Source: MIT

MITRE provides some resources to help with the interpretation of results, or you can join our upcoming webinar 2022 MITRE Engenuity ATT&CK® Evaluations: Decoding the results which will focus on interpreting and understanding the 2022 ATT&CK® Evaluations results.

Bitdefender results

Vendors receive one of the following “grades” for each of the MITRE’s sub-steps.

None - No detection of the sub-step or below the minimum required details
Telemetry - Information related to the sub-step was collected, but not interpreted (only raw data)
General – Sub-step is suspicious, but no more details are available. Anti-malware detections (without additional context) fall under this category
Tactic – Sub-step was identified as malicious. The vendor knows WHAT happened, but not HOW it happened. For example, a lateral movement from A to B is detected, but it’s not clear how threat actors moved between different segments.
Technique – The vendor understands WHAT and HOW it happened. For example, a lateral movement from A to B is detected, using PsExec with credentials that were stolen from another machine.

Only General, Tactic, and Technique grades count towards Analytics coverage. High analytical coverage identifies vendors that not only collect the right data (visibility) but also apply advanced analytics to accurately correlate events collected from different sensors. In our own evaluations, we define "high" as covering more than 90% of the sub-steps. Only 7 vendors have met this criterion of quality for analytical coverage this year.

The fourth round of ATT&CK® Evaluations has confirmed Bitdefender as a leader in providing highly actionable detections, enabling efficient security operations, and reducing alert fatigue.

Bitdefender detected 97% of all major attack steps on Windows machines and 100% of all adversary techniques used against Linux systems.

The Bitdefender GravityZone platform provided analytics insights for 106 of 109 sub-steps (97%) and technique-level descriptions (the highest possible level of analytics coverage) for 103 of 109 sub-steps (95%).

To provide a comparison for these results, the average analytics coverage for other vendors was 71% and the average coverage for technique-level descriptions was only 65%.

UPDATE 4/4/2022:

Some of our customers and partners have inquired about a "MITRE 2022 Results: Overall Detection & Protection" graph that is circulating on the internet.

Every year, some vendors use misleading marketing practices when interpreting the results of MITRE and other evaluations. The main value of the ATT&CK evaluations is in providing insights into the detection and analytics capabilities of different security solutions. The protection evaluation is a recent addition to the ATT&CK evaluations and is optional for vendors to participate.

Unfortunately, many vendors decided to focus their marketing on these secondary test results this year and misleadingly included other vendors who chose not to participate in this test, averaging these vendors “score” in as a zero. In fact, a widely distributed article on Help Net Security has been removed as the publisher could not validate the data as accurate.

1. Participants pay to be included in ATT&CK evaluations - and the "Protection" evaluation requires additional payment from vendors. Many vendors, including Bitdefender, chose to opt out from this protection testing, as other independent tests are dedicated to prevention. We concluded that the additional cost and effort would not provide a significant research value for our Bitdefender Labs team as we have engaged in many such tests in recent months and years with AV-Comparatives and AV-TEST. The Bitdefender score for protection testing in the MITRE ATT&CK test is not 0, it is “Not Applicable (N/A)”. To compare our solution with other vendors, we suggest looking at independent comparisons at sites such as AV-Comparatives or AV-TEST.

2. Be aware of vendors that are rephrasing the original MITRE terminology. What is being referred to as "Detection Rate" or "Overall Detection" is a metric called "Visibility" by MITRE. Visibility is a combination of low-context (telemetry - raw data) and high-context (understanding what happened and how it happened) detections. High visibility means that security vendors collect the right data, but doesn't say anything about analytics capabilities, ability to help with alert fatigue, and is prone to a higher number of false positives. Consider it a warning sign when a vendor decides to replace the original metric names with euphemisms (for example renaming "telemetry" to "presenting evidence").

How can CISOs and security teams interpret the results?

When evaluating this data, we recommend starting by understanding your needs and identifying the techniques that are most relevant to your organization in the current threat landscape. This exercise can help identify gaps in your currently deployed security controls and key metrics to monitor for your specific needs. To effectively implement the ATT&CK® framework, we would also recommend adapting it for your specific business priorities. Gartner shares some of the common inquiries they receive and ATT&CK framework leading practices in this blog post, MITRE ATT&CK - does it do; what you need it to?, which you may find helpful.

The fundamental value of endpoint detection and response (EDR) & extended detection and response (XDR) solutions is minimizing the dwell time of threat actors. ATT&CK® Evaluations are most valuable in describing if vendors collect the right data (telemetry) and have the required analytics capabilities to provide context to those detections by identifying tactics & techniques.

The ATT&CK® evaluations are a valuable tool when considering security solutions, however, they are not a substitute for a proof-of-concept exercise. The ATT&CK test environment is minimally sized, and the evaluation does not consider factors like cost, how much noise is generated (alert fatigue), or challenges related to the operationalization of security solutions. We would also recommend that the MITRE results be interpreted alongside other independent 3rd party tests focused on threat prevention, such as AV-Comparatives and AV-TEST.

When evaluating the results, we recommend drawing your own conclusions and not solely relying on interpretations by security vendors (even ours!). One of the changes introduced this year is that each sub-step has only a single detection category that represents the highest level of context provided to the analysts across all detections for that sub-step. This is an important change, as scoring high in some categories is not necessarily a positive sign. For example, if a vendor has a high rate of telemetry coverage, that can mean their analytics were not able to enrich the basic telemetry data, and analytics coverage is limited.

To learn more about the key metrics included in the 2022 MITRE Engenuity ATT&CK® Evaluations report, join our Live Webinar on April 6th 2022. Dragos Gavrilut, one of the main participants in the ATT&CK® Evaluations, will share his insights on the methodology, key metrics, and how to use the results to improve your cyber resilience.

Diagram

Description automatically generated

ATT&CK® evaluations don’t cover operational aspects of security solutions. Automated correlation of events and consolidated user experience is critical to effectively reduce alert fatigue.