See for yourself how Prophet AI can supercharge your security operations, accelerating alert investigation and response
Key benefits:
Lowers MTTR with AI-driven automated alert triage & investigation
Lowers risk by prioritizing critical alerts for analyst review
Eliminates manual effort, freeing analysts to focus on high-impact security tasks
As security practitioners, it’s harder than ever to keep up with threat activity (if you’d like to commiserate with me, check out our previous blog detailing Security Operations Center (SOC) challenges.) In an ideal world, your team would have time to investigate every alert, check the applicability of every vulnerability, and exercise and test your incident response playbooks on a regular basis. In reality, it’s becoming nearly impossible for teams to do just one of these things extremely well.
More often than not, organizations are tuning their alerts to fit the size of their security team and incurring the risk of disabling or ignoring some subset of detection signals. It’s without question that some level of tuning needs to happen, but how do you go about making those decisions while minimizing “over-tuning”?
In this blog post, we will discuss some of the alert tuning best practices SOC teams should implement. These recommendations are in part inspired from the 2011 film, Moneyball, which tells the true story of how the Oakland Athletics general manager, Billy Beane (played by Brad Pitt), was forced to innovate with roster selection in the midst of resource constraints for bringing on new players. The parallels are clear for global SOC teams today, so we’re covering a simple repeatable strategy you can employ to make systematic improvements to your detection logic without incurring unnecessary risk.
At a high level:
The first step to make better alert tuning decisions is to understand what’s consuming your team’s time and mental capacity, and weighing that against detection accuracy. At a high level, you’re trying to minimize the amount of time spent on detections that aren’t providing direct value to the team.
Collect metadata around your alerts that have been dispositioned over the last 90 days. This may be stored within a SIEM, case management solution, or individual security tools.
Specifically:
Understand that these 6 fields are just starting dimensions that you might expand upon, and it might not be easy to collect all of these metrics. As an example, gathering “Total Time Investigated” might be challenging for small teams that don’t retain a record of when triage started on an alert. In that case, you might use the time elapsed between “Alert Created” to “Alert Closed”, which most teams should be able to derive, to achieve a similar effect – with the caveat that your data will be skewed by Dwell Time.
Patterns become more apparent in this dataset when plotting the records on a chart. For this framework, I recommend using “Efficacy” as the Y-axis, “Total Time Investigated” as the X-axis, and scaling the data point size based on “Alert Count”.
In the event you don’t have your own data visualization tool like a SIEM, we’ve gone ahead and made a simple Python script to produce the chart like the one below.
After reviewing the visualization, you’ll notice a few patterns start to emerge:
For tuning purposes, prioritize the top few alerts in the bottom right quadrant – especially those with high median investigation times – and proceed to the next step.
Visualizing your alert management highlights the sore spots that impact daily analysis, but it won’t clearly articulate what action should be taken next.
For next steps, you should investigate the individual alerts composing a data point to deepen your understanding of the issue. Each alert may have its nuances, but ask these questions consistently:
If the answer is “no” for all of these, disable the alert. If “yes” was answered for either #2 or #3, I would opt to tune the alert logic and review if the shift was effective during the next tuning review. Ideally for #2 , the goal would be to tune the alert to shift it to the lower left quadrant for automation.
Record your decisions in a standing team document to codify the decision. When possible, it’s best practice to shift alerts that are disabled to an “Informational” type severity for audit purposes.
When auditing tuning, you’ll want to look at the total alert count for a rule over a 6 month period. A significant drop in alert volume (single digits) might represent vendor logic changes and warrant revisiting production use.
With an understanding of your current alert management position, you can put requirements around what detections must look like in the SOC in order to be reviewed by the team. This transfers “gut feel” tuning into a system that’s documentable, scalable, and easy to implement across the team.
By answering these questions and regularly performing the tuning process listed above (or your own version of it), you can evaluate historical investigative trends, prioritize addressing most impactful alerts, and make decisions that ensure holistic threat coverage..
Managing upstream detections is one of the core ways the security teams can combat alert fatigue and threat actors. Following best practices for alert tuning creates a natural balance between over-tuning and under-tuning alert signals so that SOCs can operate effectively, audit changes, and maintain their sanity.
While today alert tuning is a necessity to manage risk (and sanity) for security operations teams, we believe in a future for analysts that eliminates the need for tedious and repetitive tasks like alert tuning. That’s why we’re building Prophet AI for Security Operations to triage and investigate every alert on your behalf and avoid managing the upstream alert problem altogether.
Request a demo of Prophet AI to learn how you can triage and investigate security alerts 10 times faster.