Threat Hunting Methodologies

Intro
Intel Driven Hunting
Hypothesis Driven Hunting
Compliance Hunting
Final Thoughts

Intro

In this document I will be attempting to give an overview of the threat hunting methods I personally use when attempting to find malicious activity. Due to the amount of variables involved in threat hunting (what tools you are using/have access to, the depth and quality of telemetry, etc...) this will be quite a high level overview with the goal of introducing core ideas with simple examples. It is up to you as to how far you can take these methods and for those of you in an MDR setting with millions of records to be hunted through, there's no excuse!

Intel Driven Hunting

Intel driven threat hunting is all about building queries around current or newly released information. This could include ongoing or new malware campaign TTPs, PoC exploits, LOLBIN/OS abuse techniques or following a tip from a threat intel team. Our goal is to understand what we're looking for and come up with some viable methods of identifying the activity with the telemetry we have available. There will be many cases where this is just not a possibility, your telemetry collector/EDR agent will have blind spots and part of the challenge is working with what you have.

Let's run through a little case study together, have a read of this Qakbot report from The DFIR Report (great website by the way, highly recommended to read through as much of their analysis as you can)... Stop, go and actually read it...

Now that you've read the article, there are a lot of things we could hunt for but let's take a look at hunting for the initial execution of the Qakbot payload. The sample that DFIR Report had performed the following:

rundll32.exe ocrafh.html,#1

It seems the stage 0 payload uses rundll32.exe to execute ocrafh.html which is clearly just a renamed DLL. Let's ask ourselves the following questions:

Does this campaign use randomly generated payload names?
is the "#1" (which tells rundll to execute the first exported function) seen in every instance?

If no one else has done the analysis to provide answers to these questions we can do this ourselves. In this instance the easiest thing to do would be to collect as many samples as we can find from this campaign (using a service like Malware Bazaar) and then run them all through an automated sandbox (using a service like Triage) and observe the initial execution phase.

Let's say we went through with this approach and found that the payload name is randomly generated but is always six characters long and the ",#1" is seen in every instance: we could very easily hunt for this with something like this:

Query (Using Regex to be platform agnostic):

process_name        = rundll32
process_commandline = rundll32.exe [a-zA-Z0-9]{6}\.html,#1

Off of the back of this we can ask more questions like "how many times do we see rundll32 execute a non dll?" which would fall under the "Hypothesis Driven" methodology however these are the right sort of questions to be asking off of the back of an "Intel Driven" hunt and will help forward your understanding of what normal behavior looks like as well as giving you some great ideas for hunt playbooks or detection rules.

Hypothesis Driven Hunting

Hypothesis driven hunting is my personal favorite method of hunting and is as simple as coming up with an idea of how we could find anomalous activity, converting this idea into a query, running it, refining it if required and then reviewing the results. A nice example of this could be something as follows:

Hypothesis:

Many malware campaigns leverage .js files as a stager for further malicious payloads and it is common to see email based phishing campaigns leverage .zip files (and other archive types) to help bypass email filters.
I believe we could find malicious activity by looking for .js files being executed directly from an archive without extracting beforehand.

Query (Using Regex to be platform agnostic):

process_name   = wscript|cscript
file_full_path = .*\\AppData\\Local\\Temp\\.*\.(zip|7z|rar)\\.*\.js

Now if you're running the query over a large amount of machines, you may be surprised at how noisy it is: Queries that you think would be high fidelity but turn out to be noisier than you thought is not a failed threat hunt, it's a learning experience and will help grow your understanding of what "normal activity" looks like in your environments. I've had ideas and thought "there's no way anything doing this could be legitimate" only to find out that there are legitimate administrative tools or management software doing exactly that.

Compliance Based Hunting

Compliance based hunting is certainly the least glamorous of the group however it can yield some of the bigger results even if the results at the end aren't directly "realized".

To me, compliance hunting is about stopping potential threats way in advance by ensuring users are following company policies. Here is an example:

Made Up Company Policy 6.1.2: 

[Insert Generic Corporate Name Here] prohibits the unauthorized duplication, distribution, or use of software or other copyrighted materials. 
This includes, but is not limited to, pirating software, video and other media.
Any employee found to be in violation of this policy may be subject to disciplinary action, up to and including termination of employment.

I'm sure most of us are aware that plenty of malware campaigns utilize trojanized software/installers to infect users attempting to download free copies of paid media so hunting for users potentially downloading pirated content is a great way of spotting risky behavior early before anything bad can happen. Of course our goal is not to get anyone in trouble but instead to ensure company policies are being upheld. In this case we're less worried about the legal aspect to the policy and more focused on the potential risk of users getting a malware infection from a dodgy torrent. Someone downloading files using a torrent protocol definitely does not mean they're downloading something illegal (in fact, many Linux ISOs are distributed over torrent) however for most users, looking for torrent managers is an easy way to find potentially risky activity. We can use a query like this to do so:

Query (Using Regex to be platform agnostic):

process_name = (utorrent|bittorrent|vuze|deluge|qbittorrent|frostwire|bitcomet|tixati|bitlord)\.exe

If you have permission (and the capability) I would recommend pairing this hunt with a mapping/listing of the contents of in the user's Downloads folder or the %appdata%\[insert torrent name here] (or where the default download location is for the client you've found is located) directory just to check the type of content downloaded. You may find that a systems administrator is downloading legitimate software in which case we can ignore. This is a type of hunt that if successful is likely to yield an "unrealized" result. By this I mean that reporting a user inappropriately using torrent software may not directly lead you to malware but has stopped a potential future infection, a valuable result for sure.

Final thoughts

After reading this you may be left thinking that this was more of a "how to regex for command line arguments guide" which I know may have been a bit underwhelming however threat hunting is very dependent on the data you have available and the goal of this blog was to give examples that hopefully everyone no matter the depth of telemetry available could understand and follow along with. Most of your time hunting will likely be finding weird activity that ends up being legitimate but don't let this bring you down. A successful threat hunt is one where you learned something new or confirmed a theory, finding badness is just an extra!

Threat Hunting Methodologies

Contents

Intro

Intel Driven Hunting

Hypothesis Driven Hunting

Compliance Based Hunting

Final thoughts