See for yourself how Prophet AI can supercharge your security operations, accelerating alert investigation and response
Key benefits:
Lowers MTTR with AI-driven automated alert triage & investigation
Lowers risk by prioritizing critical alerts for analyst review
Eliminates manual effort, freeing analysts to focus on high-impact security tasks
The growth in cloud usage continues to change the cybersecurity landscape, as more and more valuable assets that bad actors can target now reside in the cloud. Many cloud providers and 3rd party vendors offer security tools to help organizations keep up with increasingly sophisticated threats. However, the scale of the cloud combined with a general skills shortage means these security tools generate large numbers of security alerts that are difficult to investigate by teams lacking the right skill sets.
In this blog post, we’ll measure the effectiveness of Large Language Models (LLMs) like GPT-4 at triaging and investigating alerts within the cloud control plane, highlighting their advantages and shortcomings in enhancing alert triage and investigation. We’ll examine AWS GuardDuty as the security tool for detecting anomalies and potential security issues within AWS environments because it’s the common investigative sort for AWS customers.
Leveraging AI technologies like Large Language Models (LLMs) from GPT-4 can aid in managing and interpreting these alerts, but this approach has its own set of challenges. This blog answers the question – Can LLMs be used for triaging AWS GuardDuty alerts (also known as GuardDuty findings)?
One of the first tasks GPT-4 can take on is summarizing our existing alert into something that’s a little more understandable. Below is the prompt and response we got for the alert. We’ve included a portion of the GuardDuty finding.
“You're an AWS security expert explaining a GuardDuty finding to a junior analyst that is easily understandable. How would you summarize this alert in 4 sentences or less?”
{
"accountId": "111111111111",
"arn": "arn:aws:guardduty:us-west-2:640347035237:detector/4244c5410b/finding/45fab8a380",
"createdAt": "2024-05-31T12:03:33.734Z",
"description": "APIs commonly used in Discovery tactics were invoked by user AssumedRole : AWSReservedSSO_AWSAdministratorAccess_0005f71c31fb9a1d under unusual circumstances. Such activity is not typically seen from this user.",
"id": "3294993f9c80s195038758499af7",
"partition": "aws",
…
}
“This GuardDuty finding indicates that the user "AWSReservedSSO_AWSAdministratorAccess_0005f71c31fb9a1d" is making API calls that are unusual for this user and are typically associated with Discovery tactics. The activity was performed from a remote IP address in London (8.8.8.8) and involved APIs such as "ListMembers" in "securityhub.amazonaws.com". This behavior is considered anomalous and may signify a potential security threat. Immediate investigation is recommended to ensure the user's actions are legitimate.”
7/10 – It’s a decent summary of the alert, but the response doesn’t provide enough information for a junior analyst to then determine next steps. Additionally, it would be much more valuable if the API call “ListMembers” was better explained instead of simply referenced.
LLMs can analyze the content and context of security alerts to make an initial determination about their validity. But how accurate are they? As an analyst, you might want to know how confident the LLM is in its determination so that you can respond accordingly. However, simply asking the LLM for a confidence interval would be in vain. LLMs are not very good at accurately determining how confident they should be in their own assessment. Prompting an LLM for a confidence is only about 58% accurate, according to Refuel’s most recent research.
“You're a cloud security expert. What's the likelihood that this alert is malicious on a scale from 1 - 100? This can be your only response followed by up to a 2 sentence explanation as to why.”
“85"
The alert indicates unusual API activity for an administrative role from a remote IP, particularly invoking APIs associated with discovery tactics. This combination suggests a high likelihood of malicious intent.”
4/10. This GuardDuty finding is actually a false positive for a user visiting AWS Security Hub for the first time. Anomalous, but not malicious – a very common conflict in the cloud space.
To be fair, the LLM doesn’t have access to all your data – just the alert, so it is operating with limited context.
Next we look at how well an LLM plans for an investigation. We ask for next steps related to this investigation to see if it can guide a more junior analyst through performing those tasks.
“What plan of action should I take as a more junior investigator to investigate this alert completely?”
6/10 – Not a bad overview, but it lacks optimization and clarity. As an example, you would probably want to look into whether the user logged in with MFA prior to reviewing the historical baseline of their past activity. The response as written lacks enough clarity for an analyst to perform specific actions.
In order for the LLM to provide more concrete actions, you need to reevaluate your prompt. For example, specific follow-up questions like “How would you identify the principal that assumed the role?” may get you more concrete answers around actions you can take in CloudTrail.
In our experience, asking very targeted questions around specific elements of an investigation is far more effective than asking for a foolproof and actionable plan off the jump.
Integrating an AI technology like LLMs into the security alert management process, particularly in cloud environments, can enhance an organization's ability to respond swiftly and effectively to potential threats. However, it is crucial to be aware of their limitations and use them as a complement to, rather than a replacement for, human expertise and traditional security measures. By striking the right balance, organizations can leverage the strengths of LLMs around summarization and planning for narrowly tailored investigative actions rather than asking an LLM to completely resolve the alert from a single prompt.
At Prophet Security, we think poking and prodding an LLM all day sounds like another security task our customers shouldn’t have to do. Prophet AI for Security Operations doesn’t require prompts (or prompt engineering) to triage and investigate alerts. We’ve honed Prophet AI to get the most from an investigation on your behalf so that you get the accuracy of a human analyst.
Request a demo of Prophet AI to learn how you can triage and investigate security alerts 10 times faster.