How to Investigate GuardDuty Findings with ChatGPT

The growth in cloud usage continues to change the cybersecurity landscape, as more and more valuable assets that bad actors can target now reside in the cloud. Many cloud providers and 3rd party vendors offer security tools to help organizations keep up with increasingly sophisticated threats. However, the scale of the cloud combined with a general skills shortage means these security tools generate large numbers of security alerts that are difficult to investigate by teams lacking the right skill sets.

In this blog post, we’ll measure the effectiveness of Large Language Models (LLMs) like GPT-4 at triaging and investigating alerts within the cloud control plane, highlighting their advantages and shortcomings in enhancing alert triage and investigation. We’ll examine AWS GuardDuty as the security tool for detecting anomalies and potential security issues within AWS environments because it’s the common investigative sort for AWS customers.

Leveraging AI technologies like Large Language Models (LLMs) from GPT-4 can aid in managing and interpreting these alerts, but this approach has its own set of challenges. This blog answers the question – Can LLMs be used for triaging AWS GuardDuty alerts (also known as GuardDuty findings)?

1. Understanding a sample GuardDuty findings

One of the first tasks GPT-4 can take on is summarizing our existing alert into something that’s a little more understandable. Below is the prompt and response we got for the alert. We’ve included a portion of the GuardDuty finding.

Our Prompt:

“You're an AWS security expert explaining a GuardDuty finding to a junior analyst that is easily understandable. How would you summarize this alert in 4 sentences or less?”

{
"accountId": "111111111111",

  "arn": "arn:aws:guardduty:us-west-2:640347035237:detector/4244c5410b/finding/45fab8a380",

  "createdAt": "2024-05-31T12:03:33.734Z",

  "description": "APIs commonly used in Discovery tactics were invoked by user AssumedRole : AWSReservedSSO_AWSAdministratorAccess_0005f71c31fb9a1d under unusual circumstances. Such activity is not typically seen from this user.",

  "id": "3294993f9c80s195038758499af7",

  "partition": "aws",
…
}

‍

LLM Response:

‍“This GuardDuty finding indicates that the user "AWSReservedSSO_AWSAdministratorAccess_0005f71c31fb9a1d" is making API calls that are unusual for this user and are typically associated with Discovery tactics. The activity was performed from a remote IP address in London (8.8.8.8) and involved APIs such as "ListMembers" in "securityhub.amazonaws.com". This behavior is considered anomalous and may signify a potential security threat. Immediate investigation is recommended to ensure the user's actions are legitimate.”

Our Take:

7/10 – It’s a decent summary of the alert, but the response doesn’t provide enough information for a junior analyst to then determine next steps. Additionally, it would be much more valuable if the API call “ListMembers” was better explained instead of simply referenced.

2. Determining whether activity is true positive vs. false positive

LLMs can analyze the content and context of security alerts to make an initial determination about their validity. But how accurate are they? As an analyst, you might want to know how confident the LLM is in its determination so that you can respond accordingly. However, simply asking the LLM for a confidence interval would be in vain. LLMs are not very good at accurately determining how confident they should be in their own assessment. Prompting an LLM for a confidence is only about 58% accurate, according to Refuel’s most recent research.

Our Prompt:

“You're a cloud security expert. What's the likelihood that this alert is malicious on a scale from 1 - 100? This can be your only response followed by up to a 2 sentence explanation as to why.”

LLM Response:

“85"

The alert indicates unusual API activity for an administrative role from a remote IP, particularly invoking APIs associated with discovery tactics. This combination suggests a high likelihood of malicious intent.”

Our take:

4/10. This GuardDuty finding is actually a false positive for a user visiting AWS Security Hub for the first time. Anomalous, but not malicious – a very common conflict in the cloud space.

3. Supporting the investigative planning process

To be fair, the LLM doesn’t have access to all your data – just the alert, so it is operating with limited context.

Next we look at how well an LLM plans for an investigation. We ask for next steps related to this investigation to see if it can guide a more junior analyst through performing those tasks.

Our Prompt:

“What plan of action should I take as a more junior investigator to investigate this alert completely?”

LLM Response:

Verify the Alert:
1. Check the alert's details in AWS GuardDuty to confirm its authenticity.
2. Ensure there are no false positives by cross-referencing with similar past alerts.
Identify the User and Role:
1. Examine the user AWSReservedSSO_AWSAdministratorAccess_0005f71c31fb9a1d.
2. Determine if this user role was recently created or modified.
Review User Activity:
1. Use CloudTrail to review recent activity by this user.
2. Look for any additional suspicious actions or patterns.
Validate IP Address:
1. Investigate the IP address 8.8.8.8 to verify if it is associated with known malicious activity or a legitimate location for the user.
2. Compare this IP address with previous login locations for this user.
Analyze API Calls:
1. Detail the specific API calls made (ListMembers, GetDataSource, etc.).
2. Assess if these calls align with the user's typical responsibilities.
Check for Anomalies:
1. Review any anomalies reported by GuardDuty, focusing on unusual API usage and user behavior.
2. Confirm if these APIs are rarely accessed by this user or role.
Inspect AWS Config and IAM Policies:
1. Ensure there have been no recent changes to IAM policies that could explain the behavior.
2. Review AWS Config for any configuration changes in the environment.
Communicate with the User:
1. If possible, reach out to the user or associated personnel to verify if the actions were intentional and authorized.”

Our Take:

6/10 – Not a bad overview, but it lacks optimization and clarity. As an example, you would probably want to look into whether the user logged in with MFA prior to reviewing the historical baseline of their past activity. The response as written lacks enough clarity for an analyst to perform specific actions.

In order for the LLM to provide more concrete actions, you need to reevaluate your prompt. For example, specific follow-up questions like “How would you identify the principal that assumed the role?” may get you more concrete answers around actions you can take in CloudTrail.

In our experience, asking very targeted questions around specific elements of an investigation is far more effective than asking for a foolproof and actionable plan off the jump.

Conclusion

Integrating an AI technology like LLMs into the security alert management process, particularly in cloud environments, can enhance an organization's ability to respond swiftly and effectively to potential threats. However, it is crucial to be aware of their limitations and use them as a complement to, rather than a replacement for, human expertise and traditional security measures. By striking the right balance, organizations can leverage the strengths of LLMs around summarization and planning for narrowly tailored investigative actions rather than asking an LLM to completely resolve the alert from a single prompt.

At Prophet Security, we think poking and prodding an LLM all day sounds like another security task our customers shouldn’t have to do. Prophet AI for Security Operations doesn’t require prompts (or prompt engineering) to triage and investigate alerts. We’ve honed Prophet AI to get the most from an investigation on your behalf so that you get the accuracy of a human analyst.

Request a demo of Prophet AI to learn how you can triage and investigate security alerts 10 times faster.

How to Investigate GuardDuty Findings with ChatGPT

1. Understanding a sample GuardDuty findings

Our Prompt:

LLM Response:

Our Take:

2. Determining whether activity is true positive vs. false positive

Our Prompt:

LLM Response:

Our take:

3. Supporting the investigative planning process

Our Prompt:

LLM Response:

Our Take:

Conclusion

Further reading

Table of Contents

How to Investigate GuardDuty Findings with ChatGPT

1. Understanding a sample GuardDuty findings

Our Prompt:

LLM Response:

Our Take:

2. Determining whether activity is true positive vs. false positive

Our Prompt:

LLM Response:

Our take:

3. Supporting the investigative planning process

Our Prompt:

LLM Response:

Our Take:

Conclusion

Further reading

Table of Contents

Subscribe to blog

Subscribe to the Prophet AI Blog