Chapter 18

Chapter 18: Binary Obfuscation & Entropy Reduction

The Science of Appearing Normal

Security products face an enormous challenge: distinguishing legitimate software from malicious code based solely on static analysis of binary files. One of the most powerful heuristics they employ is entropy analysis—the measurement of randomness within data. This chapter explores how entropy works, why it matters for detection, and the various techniques used to make binaries appear benign.

Understanding these concepts serves multiple purposes. For defenders, recognizing obfuscation techniques helps identify potentially malicious samples. For red teamers, understanding how detection works enables crafting payloads that survive initial analysis. For researchers, these techniques illustrate the ongoing cat-and-mouse game between attackers and security products.

THE ENTROPY DETECTION PROBLEM
=============================

Normal Compiled Code:                 Encrypted Payload:
┌─────────────────────────────┐      ┌─────────────────────────────┐
│ 48 89 5C 24 08              │      │ A3 F7 2B 9E 4D E1 8B C2 57  │
│ 48 89 6C 24 10              │      │ AA 39 D4 6F B8 12 7C 5A 91  │
│ 48 89 74 24 18              │      │ E8 2F 44 C3 90 B5 13 6E A0  │
│ 57 48 83 EC 20              │      │ 82 F1 D7 4B 8E 23 59 CA 77  │
│                             │      │                             │
│ Pattern: Repeated byte      │      │ Pattern: Seemingly random   │
│ sequences, common opcodes   │      │ with no structure           │
│                             │      │                             │
│ Entropy: ~4.5 bits/byte     │      │ Entropy: ~7.9 bits/byte     │
└─────────────────────────────┘      └─────────────────────────────┘
                                               │
                                               ▼
                                     ┌─────────────────────┐
                                     │ Security Product:   │
                                     │ "HIGH ENTROPY!      │
                                     │ Likely encrypted    │
                                     │ or packed malware"  │
                                     │                     │
                                     │ [BLOCK/QUARANTINE]  │
                                     └─────────────────────┘

The challenge: Make encrypted payloads look like normal code.

Part 1: Understanding Shannon Entropy

Entropy, in information theory, measures the unpredictability or randomness of data. Named after Claude Shannon who formalized the concept in 1948, Shannon entropy quantifies how much "information" is contained in a message. For our purposes, it tells us how random a byte stream appears.

The Mathematics of Entropy

Entropy is calculated as the sum of probabilities multiplied by their logarithms. For a file or data block, we count how often each byte value (0-255) appears, then calculate:

H(X) = -Σ p(x) × log₂(p(x))

Where:
  H(X) = Entropy in bits per byte
  p(x) = Probability of byte value x occurring
  Σ    = Sum over all possible byte values (0-255)

The result ranges from 0 (all bytes identical) to 8 (all byte values equally likely, perfectly random).

#!/usr/bin/env python3
"""
Shannon entropy calculator for binary analysis
"""

import math
import sys
from collections import Counter

def calculate_entropy(data):
    """
    Calculate Shannon entropy of byte data.
    Returns value between 0.0 (all same) and 8.0 (perfectly random).
    """
    if not data:
        return 0.0

    # Count occurrences of each byte value
    byte_counts = Counter(data)
    total_bytes = len(data)

    # Calculate probability and entropy contribution for each byte value
    entropy = 0.0
    for count in byte_counts.values():
        probability = count / total_bytes
        if probability > 0:
            # Shannon's formula: -p * log2(p)
            entropy -= probability * math.log2(probability)

    return entropy

def analyze_pe_sections(filepath):
    """
    Analyze entropy of a PE file by section.
    High entropy sections often indicate encrypted or packed content.
    """
    with open(filepath, 'rb') as f:
        data = f.read()

    # Overall file entropy
    overall = calculate_entropy(data)
    print(f"Overall entropy: {overall:.4f} bits/byte")

    if overall > 7.0:
        print("  WARNING: High entropy suggests encryption/packing")
    elif overall > 6.0:
        print("  NOTICE: Elevated entropy, may contain compressed data")
    else:
        print("  Normal entropy range for compiled code")

    # Analyze in 4KB chunks to find hot spots
    print("\nHigh-entropy regions (>7.0):")
    chunk_size = 4096
    found_high = False

    for offset in range(0, len(data), chunk_size):
        chunk = data[offset:offset+chunk_size]
        chunk_entropy = calculate_entropy(chunk)

        if chunk_entropy > 7.0:
            found_high = True
            print(f"  Offset 0x{offset:08X}: {chunk_entropy:.4f} bits/byte")

    if not found_high:
        print("  None found")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <file>")
        sys.exit(1)
    analyze_pe_sections(sys.argv[1])

Entropy Thresholds in Practice

Different types of content produce characteristic entropy signatures:

ENTROPY SIGNATURES BY CONTENT TYPE
==================================

┌────────────────┬───────────────┬─────────────────────────────────────────┐
│ Entropy Range  │ Content Type  │ Description                             │
├────────────────┼───────────────┼─────────────────────────────────────────┤
│ 0.0 - 1.0      │ Null/Sparse   │ Large blocks of zeros or repeated bytes│
│ 1.0 - 3.0      │ Plain text    │ English text, simple patterns          │
│ 3.0 - 5.0      │ Code/Data     │ Normal compiled binaries               │
│ 5.0 - 6.5      │ Compressed    │ ZIP, gzip, PNG without encryption      │
│ 6.5 - 7.5      │ Encrypted     │ AES-CBC, XOR with multi-byte key       │
│ 7.5 - 8.0      │ Random/Crypto │ AES-CTR, CSPRNG output, hashes         │
└────────────────┴───────────────┴─────────────────────────────────────────┘

Detection heuristic used by many security products:
  IF section_entropy > 7.0 THEN flag as potentially packed/encrypted
  IF overall_entropy > 7.5 THEN high confidence of encryption

Why compiled code has ~4.5 entropy:
• x86/x64 instructions cluster around certain opcodes
• Common patterns: push rbp, mov rsp, lea, call
• Repeated function prologues and epilogues
• NULL padding between functions
• String data in .rdata with repeated characters

Why High Entropy Triggers Detection

Security products learned through empirical observation that legitimate software rarely has overall entropy above 7.0, while packed or encrypted malware almost always does. This simple heuristic catches a tremendous amount of malicious software with minimal false positives.

The reasoning is straightforward: legitimate developers have no need to encrypt their code. When you see encryption in an executable, it usually means someone is hiding something—and that something is often malicious.

Part 2: Entropy Reduction Techniques

If high entropy triggers detection, the solution is to lower the entropy while maintaining the ability to execute encrypted payloads. Several techniques accomplish this by mixing high-entropy data with low-entropy padding.

English Text Padding

Natural language has relatively low entropy (around 1.5-4.5 bits per byte) because it follows predictable patterns. By surrounding encrypted payloads with large amounts of English text, we can bring overall entropy into the normal range.

#include <windows.h>

// High-entropy encrypted shellcode (entropy ~7.9)
BYTE g_EncryptedPayload[] = { 0xA3, 0xF7, 0x2B, 0x9E, /* ... */ };

// Low-entropy English text padding (entropy ~4.0)
// These strings dramatically reduce overall entropy
const char* g_TextPadding[] = {
    "The quick brown fox jumps over the lazy dog. This sentence contains "
    "every letter of the English alphabet and has been used for typing "
    "practice since the late nineteenth century.",

    "Windows is a family of graphical operating systems developed by "
    "Microsoft Corporation. First released in 1985 as a graphical shell "
    "for MS-DOS, Windows evolved into a complete operating system.",

    "The Internet is a global system of interconnected computer networks "
    "that use the standard Internet protocol suite to serve billions of "
    "users worldwide. It is a network of networks.",

    "Software development is the process of conceiving, specifying, "
    "designing, programming, documenting, testing, and bug fixing "
    "involved in creating and maintaining applications.",

    // More text significantly reduces overall entropy
    // Ratio guideline: ~10KB text per 1KB encrypted data
    // to bring entropy from 7.9 to ~5.5
};

// Structure interleaving high and low entropy data
typedef struct _PADDED_BINARY {
    char szPadding1[4096];      // English text
    BYTE bPayload[1024];        // Encrypted payload
    char szPadding2[4096];      // More English text
    char szPadding3[8192];      // Additional padding
} PADDED_BINARY;

// Result: Binary that appears to have normal entropy
// Security product sees overall entropy ~5.5, doesn't flag

ENTROPY MIXING CALCULATION
==========================

Before Padding:
┌─────────────────────────────────────┐
│ 1KB encrypted payload               │
│ Entropy: 7.9 bits/byte              │
│ Total: 1024 bytes                   │
└─────────────────────────────────────┘
Overall entropy: 7.9 (FLAGGED!)

After Padding:
┌─────────────────────────────────────┐
│ 4KB English text (entropy ~4.0)     │
├─────────────────────────────────────┤
│ 1KB encrypted payload (7.9)         │
├─────────────────────────────────────┤
│ 8KB English text (entropy ~4.0)     │
└─────────────────────────────────────┘
Weighted average: (4096×4.0 + 1024×7.9 + 8192×4.0) / 13312
                = ~4.3 bits/byte (NOT FLAGGED!)

Structured Padding

Rather than random text, we can create padding that looks like legitimate application data. Configuration structures, fake settings, and realistic strings make the binary appear purposeful:

#include <windows.h>

// Fake application configuration (looks legitimate, low entropy)
typedef struct _APP_CONFIG {
    DWORD dwVersion;
    DWORD dwFlags;
    DWORD dwLogLevel;

    char szServerName[256];
    char szBackupServer[256];
    char szUsername[128];
    char szDomain[128];

    DWORD dwPort;
    DWORD dwTimeout;
    DWORD dwRetryCount;
    DWORD dwMaxConnections;

    char szLogPath[512];
    char szCachePath[512];
    char szTempPath[512];
    char szConfigPath[512];

    DWORD dwChecksum;
    DWORD dwReserved[64];
} APP_CONFIG;

// Initialize with realistic-looking values
APP_CONFIG g_Config = {
    .dwVersion = 0x00020001,        // v2.1
    .dwFlags = 0x00000003,
    .dwLogLevel = 2,                // Warning level

    .szServerName = "api.contoso.com",
    .szBackupServer = "backup.contoso.com",
    .szUsername = "serviceaccount",
    .szDomain = "CONTOSO",

    .dwPort = 443,
    .dwTimeout = 30000,
    .dwRetryCount = 3,
    .dwMaxConnections = 100,

    .szLogPath = "C:\\ProgramData\\ContosoApp\\Logs\\",
    .szCachePath = "C:\\ProgramData\\ContosoApp\\Cache\\",
    .szTempPath = "C:\\Windows\\Temp\\ContosoApp\\",
    .szConfigPath = "C:\\ProgramData\\ContosoApp\\Config\\",

    .dwChecksum = 0x12345678,
};

// This ~3KB structure has entropy around 3.5
// Adds legitimacy AND reduces entropy

Encryption Algorithm Selection

Different encryption methods produce different entropy levels. The choice of algorithm affects both security and detectability:

#include <windows.h>

/*
 * ENCRYPTION ENTROPY COMPARISON
 * =============================
 *
 * Single-byte XOR:    ~6.5-7.0 entropy
 *   - Some patterns preserved from plaintext
 *   - Repeated key visible in frequency analysis
 *   - Pro: Lower entropy than random
 *   - Con: Weak encryption, easily broken
 *
 * Multi-byte XOR:     ~7.0-7.5 entropy
 *   - Better distribution than single-byte
 *   - Still not cryptographically secure
 *
 * RC4:                ~7.5-7.9 entropy
 *   - Stream cipher with high randomness
 *   - Good security for obfuscation
 *
 * AES-CBC:            ~7.9 entropy
 *   - Block cipher with chaining
 *   - Highest security but maximum entropy
 *
 * AES-CTR:            ~7.95+ entropy
 *   - Counter mode produces near-perfect randomness
 *   - Indistinguishable from random
 */

// Single-byte XOR - preserves some patterns, lower entropy
void XorSingleByte(PBYTE pData, SIZE_T sSize, BYTE bKey) {
    for (SIZE_T i = 0; i < sSize; i++) {
        pData[i] ^= bKey;
    }
    // Resulting entropy: ~6.5-7.0
    // If plaintext had patterns, some remain
}

// Rolling XOR with multi-byte key - better distribution
void XorRollingKey(PBYTE pData, SIZE_T sSize, PBYTE pKey, SIZE_T sKeySize) {
    for (SIZE_T i = 0; i < sSize; i++) {
        pData[i] ^= pKey[i % sKeySize];
    }
    // Resulting entropy: ~7.0-7.5
    // Longer keys produce higher entropy
}

// For evasion: Choose simpler encryption + heavy padding
// Trade cryptographic strength for lower detection

Part 3: Compiler and Build Optimization

Beyond payload obfuscation, the compiler and linker settings dramatically affect how the final binary appears. Properly configured builds produce smaller, cleaner binaries that draw less attention.

Visual Studio Optimization Settings

RECOMMENDED BUILD SETTINGS FOR MINIMAL DETECTION
================================================

C/C++ → Optimization:
┌────────────────────────────────────────────────────────────────────┐
│ Optimization:               /O1 (Minimize Size)                   │
│                             or /Os (Favor Small Code)             │
│                                                                    │
│ Whole Program Optimization: Yes (/GL)                             │
│   - Allows cross-module inlining                                   │
│   - Removes unused code more aggressively                          │
│                                                                    │
│ Why: Smaller binaries have less code to analyze, fewer false      │
│      positive triggers from unused library code                    │
└────────────────────────────────────────────────────────────────────┘

C/C++ → Code Generation:
┌────────────────────────────────────────────────────────────────────┐
│ Security Check:        Disable (/GS-)                             │
│   - Removes stack canary code                                      │
│   - Reduces binary size and removes characteristic patterns        │
│                                                                    │
│ Control Flow Guard:    No                                          │
│   - Removes CFG metadata and checks                               │
│   - CFG presence might indicate "trying to be secure"              │
│                                                                    │
│ Runtime Library:       Multi-threaded (/MT) or None               │
│   - /MT: Static link, larger but no DLL dependency                │
│   - None: Custom entry, smallest possible                          │
└────────────────────────────────────────────────────────────────────┘

Linker → General:
┌────────────────────────────────────────────────────────────────────┐
│ Link Time Code Generation: Yes (/LTCG)                            │
│   - Works with /GL for whole-program optimization                  │
│                                                                    │
│ Enable Incremental Linking: No                                    │
│   - Removes padding and incremental link tables                    │
└────────────────────────────────────────────────────────────────────┘

Linker → Debugging:
┌────────────────────────────────────────────────────────────────────┐
│ Generate Debug Info: No                                            │
│   - Removes PDB path strings                                       │
│   - Removes debug symbols and line info                            │
└────────────────────────────────────────────────────────────────────┘

Linker → Optimization:
┌────────────────────────────────────────────────────────────────────┐
│ References: Yes (/OPT:REF)                                        │
│   - Removes unreferenced functions and data                        │
│                                                                    │
│ COMDAT Folding: Yes (/OPT:ICF)                                    │
│   - Merges identical code blocks                                   │
└────────────────────────────────────────────────────────────────────┘

Removing C Runtime Dependency

The C Runtime Library (CRT) adds significant code to binaries. By eliminating CRT dependencies, we dramatically reduce binary size and remove characteristic CRT initialization patterns that security products recognize.

// Tell linker to use custom entry point instead of CRT
#pragma comment(linker, "/ENTRY:CustomEntry")

// Disable default libraries
#pragma comment(linker, "/NODEFAULTLIB")

// Implement minimal versions of needed functions
void* __cdecl my_memcpy(void* dst, const void* src, size_t n) {
    unsigned char* d = (unsigned char*)dst;
    const unsigned char* s = (const unsigned char*)src;
    while (n--) {
        *d++ = *s++;
    }
    return dst;
}

void* __cdecl my_memset(void* dst, int val, size_t n) {
    unsigned char* d = (unsigned char*)dst;
    while (n--) {
        *d++ = (unsigned char)val;
    }
    return dst;
}

size_t __cdecl my_strlen(const char* s) {
    const char* p = s;
    while (*p) p++;
    return (size_t)(p - s);
}

// Replace malloc/free with Win32 heap functions
#define my_malloc(size) HeapAlloc(GetProcessHeap(), 0, size)
#define my_free(ptr) HeapFree(GetProcessHeap(), 0, ptr)

// Custom entry point - no CRT initialization
void __stdcall CustomEntry(void) {
    // Your code here
    // Must call ExitProcess when done (no automatic cleanup)

    ExitProcess(0);
}

Linker Directives for Minimal Binaries

// Merge sections to simplify PE structure
#pragma comment(linker, "/MERGE:.rdata=.text")
#pragma comment(linker, "/MERGE:.data=.text")

// Minimize section alignment (default is 4096)
#pragma comment(linker, "/ALIGN:16")
#pragma comment(linker, "/FILEALIGN:16")

// Remove unreferenced code and data
#pragma comment(linker, "/OPT:REF")
#pragma comment(linker, "/OPT:ICF")

// Disable incremental linking artifacts
#pragma comment(linker, "/INCREMENTAL:NO")

// Result: Binary with single .text section
// Simpler structure = less to analyze = fewer heuristic triggers

Part 4: Import Table Camouflage

The Import Address Table (IAT) reveals which APIs a program uses. Security products examine imports as strong indicators of functionality. Binaries importing VirtualAlloc, WriteProcessMemory, and CreateRemoteThread immediately attract scrutiny, while those importing MessageBox, CreateFile, and RegOpenKey appear benign.

Adding Innocent Imports

#include <windows.h>
#include <shlwapi.h>
#include <gdiplus.h>
#include <winmm.h>
#include <commdlg.h>

#pragma comment(lib, "shlwapi.lib")
#pragma comment(lib, "gdiplus.lib")
#pragma comment(lib, "winmm.lib")
#pragma comment(lib, "comdlg32.lib")

// Global volatile to prevent optimization
volatile BOOL g_bInitialized = FALSE;

// This function creates IAT entries but is never actually called
// The volatile global prevents the optimizer from removing it
__declspec(noinline) void CreateInnocentImports(void) {
    if (g_bInitialized) return;  // Never true at runtime

    // Graphics application imports (GDI+)
    Gdiplus::GdiplusStartupInput input;
    ULONG_PTR token;
    Gdiplus::GdiplusStartup(&token, &input, NULL);

    // File management imports (Shell)
    PathFileExistsA("config.ini");
    PathFindExtensionA("document.docx");
    PathCombineA(NULL, NULL, NULL);

    // Sound playback imports (WinMM)
    PlaySoundA(NULL, NULL, 0);
    waveOutGetNumDevs();

    // User interaction imports (Common Dialogs)
    OPENFILENAMEA ofn = { sizeof(ofn) };
    GetOpenFileNameA(&ofn);

    // Registry imports (normal app behavior)
    HKEY hKey;
    RegOpenKeyExA(HKEY_CURRENT_USER, "Software\\MyApp", 0, KEY_READ, &hKey);
    RegCloseKey(hKey);

    // Network imports (if application needs them)
    HINTERNET hInet = InternetOpenA("MyApp/1.0", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
    InternetCloseHandle(hInet);

    g_bInitialized = TRUE;
}

// Ensure function and its references survive optimization
#pragma optimize("", off)
volatile void* g_FunctionTable[] = {
    (void*)CreateInnocentImports,
};
#pragma optimize("", on)

IAT COMPOSITION STRATEGY
========================

Suspicious IAT (flagged):
┌────────────────────────────────────────────────────────────────────┐
│ kernel32.dll:                                                      │
│   VirtualAlloc         ◄── Memory allocation for code             │
│   VirtualProtect       ◄── Making memory executable               │
│   WriteProcessMemory   ◄── Process manipulation                   │
│   CreateRemoteThread   ◄── Code injection                         │
│                                                                    │
│ ntdll.dll:                                                        │
│   NtAllocateVirtualMemory  ◄── Direct syscall variant            │
│                                                                    │
│ Total: 5 imports, ALL suspicious → DETECTION                       │
└────────────────────────────────────────────────────────────────────┘

Camouflaged IAT (blends in):
┌────────────────────────────────────────────────────────────────────┐
│ kernel32.dll:                                                      │
│   CreateFileA          ◄── Normal file operations                 │
│   ReadFile, WriteFile  ◄── Normal file operations                 │
│   GetModuleHandle      ◄── Every app uses this                    │
│   VirtualAlloc         ◄── Still present, but...                  │
│                                                                    │
│ user32.dll:                                                       │
│   MessageBoxA          ◄── User notification                      │
│   GetWindowText        ◄── UI interaction                         │
│   SetWindowText        ◄── UI interaction                         │
│                                                                    │
│ gdi32.dll:                                                        │
│   CreateFont           ◄── Graphics rendering                     │
│   TextOut              ◄── Text display                           │
│                                                                    │
│ shell32.dll:                                                      │
│   ShellExecute         ◄── Normal app launcher                    │
│   SHGetFolderPath      ◄── Standard paths                         │
│                                                                    │
│ shlwapi.dll:                                                      │
│   PathFileExists       ◄── Path utilities                         │
│   PathCombine          ◄── Path utilities                         │
│                                                                    │
│ Total: 15+ imports, suspicious ones buried → HARDER TO DETECT      │
└────────────────────────────────────────────────────────────────────┘

Part 5: Code Signing Considerations

Signed binaries receive preferential treatment from security products. The presence of a valid digital signature historically reduced scrutiny because obtaining a code signing certificate required identity verification. While this trust model has been abused, understanding code signing remains relevant.

Certificate Fundamentals

Code signing uses X.509 certificates to cryptographically bind a publisher identity to a binary. When Windows verifies a signature, it checks:

The signature cryptographically matches the file's hash
The certificate chain leads to a trusted root CA
The certificate hasn't been revoked
The timestamp (if present) validates the signing time

CODE SIGNING VERIFICATION CHAIN
===============================

┌─────────────────────────────────────────────────────────────────┐
│                     SIGNED EXECUTABLE                           │
│                                                                 │
│  PE File + Authenticode Signature                              │
│  ├── File hash (SHA-256)                                       │
│  ├── Signer's certificate                                      │
│  ├── Certificate chain                                         │
│  └── Timestamp (optional)                                      │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    VERIFICATION STEPS                           │
│                                                                 │
│ 1. Hash verification                                           │
│    • Calculate file hash                                       │
│    • Compare with signed hash                                  │
│    • Detect any modification                                   │
│                                                                 │
│ 2. Chain validation                                            │
│    Signer Cert → Intermediate CA → Root CA                     │
│    • Each link cryptographically verified                      │
│    • Root must be in Windows trusted store                     │
│                                                                 │
│ 3. Revocation check                                            │
│    • Query OCSP or CRL                                         │
│    • Ensure certificate not revoked                            │
│                                                                 │
│ 4. Timestamp validation (if present)                           │
│    • Proves signing occurred before timestamp                   │
│    • Signature valid even after cert expires                   │
└─────────────────────────────────────────────────────────────────┘

Self-Signed Certificates

For testing or internal use, self-signed certificates can be created. Note that these provide minimal trust benefit since they're not backed by a trusted CA, and many security products specifically flag self-signed code.

# PowerShell: Create self-signed code signing certificate
$cert = New-SelfSignedCertificate `
    -Type CodeSigningCert `
    -Subject "CN=Contoso Software Inc, O=Contoso, C=US" `
    -KeyAlgorithm RSA `
    -KeyLength 2048 `
    -HashAlgorithm SHA256 `
    -CertStoreLocation Cert:\CurrentUser\My `
    -NotAfter (Get-Date).AddYears(5) `
    -TextExtension @("2.5.29.37={text}1.3.6.1.5.5.7.3.3")

# Export to PFX for use with signtool
$password = ConvertTo-SecureString -String "ExportPassword123!" -Force -AsPlainText
Export-PfxCertificate -Cert $cert -FilePath "codesign.pfx" -Password $password

# Sign executable
# signtool sign /f codesign.pfx /p ExportPassword123! /fd SHA256 myapp.exe

# OpenSSL: Create self-signed certificate
# Generate private key
openssl genrsa -out private.key 2048

# Create certificate signing request
openssl req -new -key private.key -out cert.csr \
    -subj "/CN=Contoso Software Inc/O=Contoso/C=US"

# Self-sign for 5 years
openssl x509 -req -days 1825 -in cert.csr -signkey private.key \
    -out certificate.crt -extfile <(echo "extendedKeyUsage=codeSigning")

# Convert to PFX
openssl pkcs12 -export -out codesign.pfx \
    -inkey private.key -in certificate.crt \
    -password pass:ExportPassword123!

Part 6: PE Structure Manipulation

The Portable Executable format offers opportunities for obfuscation through section manipulation. The way sections are organized, named, and populated affects both analysis tools and heuristic detection.

Section Merging

Merging multiple sections into one reduces the PE's complexity and removes some analysis opportunities:

// Merge all data into .text section
#pragma comment(linker, "/MERGE:.rdata=.text")
#pragma comment(linker, "/MERGE:.data=.text")

// Benefits:
// • Simpler PE structure (fewer sections to analyze)
// • Read-only data mixed with code (harder to identify strings)
// • Removes typical section entropy patterns

// Result: Binary with potentially just .text and .rsrc sections

Strategic Section Naming

Custom section names can be used to store payloads while appearing legitimate:

// Create custom section with innocent name
#pragma section(".config", read, write)
#pragma section(".cache", read)
#pragma section(".debug", read)  // Looks like debug info
#pragma section(".reloc2", read) // Looks like relocation data

// Place encrypted payload in innocuously-named section
__declspec(allocate(".config")) BYTE g_EncryptedPayload[4096] = { /* ... */ };

// Add low-entropy data to other sections to dilute overall entropy
__declspec(allocate(".cache")) char g_CacheData[8192] = "Configuration cache...";
__declspec(allocate(".debug")) char g_DebugStrings[4096] = "Debug information...";

Adding Legitimate Resources

The resource section provides excellent cover for payloads and dramatically increases perceived legitimacy:

// resources.rc - Resource script

// Version information (makes binary look professional)
1 VERSIONINFO
FILEVERSION 1,0,0,0
PRODUCTVERSION 1,0,0,0
FILEFLAGSMASK 0x3fL
FILEFLAGS 0x0L
FILEOS 0x40004L
FILETYPE 0x1L
FILESUBTYPE 0x0L
BEGIN
    BLOCK "StringFileInfo"
    BEGIN
        BLOCK "040904b0"
        BEGIN
            VALUE "CompanyName", "Contoso Corporation"
            VALUE "FileDescription", "System Configuration Utility"
            VALUE "FileVersion", "1.0.0.0"
            VALUE "InternalName", "sysconfig.exe"
            VALUE "LegalCopyright", "Copyright (C) 2024 Contoso Corp."
            VALUE "OriginalFilename", "sysconfig.exe"
            VALUE "ProductName", "System Configuration Tool"
            VALUE "ProductVersion", "1.0.0.0"
        END
    END
    BLOCK "VarFileInfo"
    BEGIN
        VALUE "Translation", 0x409, 1200
    END
END

// Application icon
1 ICON "app.ico"

// Application manifest (for proper Windows integration)
1 24 "app.manifest"

// Dialog templates (if applicable)
// Menu resources
// String tables
// All add size and legitimacy

Part 7: String Obfuscation

Strings are goldmines for analysts. Function names, URLs, file paths, and error messages immediately reveal a program's purpose. Effective obfuscation hides these strings from static analysis while preserving runtime functionality.

Compile-Time String Encryption

#include <windows.h>

// Encrypted string storage (XOR with varying key)
// "VirtualAlloc" encrypted with key 0x41
BYTE g_sVirtualAlloc[] = {
    0x17, 0x28, 0x33, 0x35, 0x36, 0x20, 0x2D,
    0x01, 0x2D, 0x2D, 0x2E, 0x22, 0x00
};

// "kernel32.dll" encrypted with key 0x41
BYTE g_sKernel32[] = {
    0x2A, 0x24, 0x33, 0x2D, 0x24, 0x2D, 0x72,
    0x73, 0x2F, 0x25, 0x2D, 0x2D, 0x00
};

// Runtime decryption function
char* DecryptString(PBYTE pEncrypted, SIZE_T sLength, BYTE bKey) {
    char* szDecrypted = (char*)HeapAlloc(GetProcessHeap(), 0, sLength);
    if (!szDecrypted) return NULL;

    for (SIZE_T i = 0; i < sLength; i++) {
        szDecrypted[i] = pEncrypted[i] ^ bKey;
    }

    return szDecrypted;
}

// Usage
void Example(void) {
    char* szFunc = DecryptString(g_sVirtualAlloc, sizeof(g_sVirtualAlloc), 0x41);
    char* szMod = DecryptString(g_sKernel32, sizeof(g_sKernel32), 0x41);

    // Now use the decrypted strings
    HMODULE hMod = GetModuleHandleA(szMod);
    FARPROC pFunc = GetProcAddress(hMod, szFunc);

    // Clean up
    HeapFree(GetProcessHeap(), 0, szFunc);
    HeapFree(GetProcessHeap(), 0, szMod);
}

Stack-Based String Construction

Building strings on the stack prevents them from appearing in the binary's data sections:

#include <windows.h>

void GetModuleWithoutString(void) {
    // Build "kernel32.dll" on stack - never in .rdata
    char szModule[13];
    szModule[0]  = 'k';
    szModule[1]  = 'e';
    szModule[2]  = 'r';
    szModule[3]  = 'n';
    szModule[4]  = 'e';
    szModule[5]  = 'l';
    szModule[6]  = '3';
    szModule[7]  = '2';
    szModule[8]  = '.';
    szModule[9]  = 'd';
    szModule[10] = 'l';
    szModule[11] = 'l';
    szModule[12] = '\0';

    HMODULE hKernel32 = GetModuleHandleA(szModule);

    // String only exists on stack during function execution
    // Static analysis of .rdata won't find it
}

// Macro for cleaner stack string definition
#define STACK_STR(name, ...) \
    char name[] = { __VA_ARGS__, '\0' }

void Example2(void) {
    STACK_STR(ntdll, 'n','t','d','l','l','.','d','l','l');
    HMODULE hNtdll = GetModuleHandleA(ntdll);
}

Part 8: Defense and Detection

Understanding obfuscation helps defenders recognize when it's being employed. Several characteristics indicate binary obfuscation:

OBFUSCATION DETECTION INDICATORS
================================

┌─────────────────────────────┬────────────────────────────────────────┐
│ Indicator                   │ Detection Method                       │
├─────────────────────────────┼────────────────────────────────────────┤
│ Entropy padding             │ Section-by-section entropy analysis    │
│                             │ Look for sharp entropy boundaries      │
│                             │ Text blocks adjacent to random data    │
├─────────────────────────────┼────────────────────────────────────────┤
│ IAT camouflage              │ Behavioral analysis: track which       │
│                             │ imported APIs are actually called      │
│                             │ vs. just present in IAT                │
├─────────────────────────────┼────────────────────────────────────────┤
│ Missing CRT                 │ Lack of standard CRT initialization    │
│                             │ patterns, unusual entry point          │
├─────────────────────────────┼────────────────────────────────────────┤
│ String obfuscation          │ Entropy analysis of .rdata             │
│                             │ Runtime API monitoring for strings     │
├─────────────────────────────┼────────────────────────────────────────┤
│ Self-signed certificate     │ Issuer == Subject in signature         │
│                             │ Unknown CA in chain                    │
├─────────────────────────────┼────────────────────────────────────────┤
│ Unusual section structure   │ Merged sections, non-standard names    │
│                             │ Executable sections that aren't .text  │
└─────────────────────────────┴────────────────────────────────────────┘

YARA Detection Rules

rule High_Entropy_PE_Section {
    meta:
        description = "Detects PE files with high-entropy sections"

    condition:
        uint16(0) == 0x5A4D and  // MZ header
        for any section in pe.sections : (
            math.entropy(section.raw_data_offset, section.raw_data_size) > 7.0
        )
}

rule Entropy_Padding_Suspected {
    meta:
        description = "Detects suspected entropy padding patterns"

    strings:
        // Large blocks of English text mixed with binary
        $text1 = "The quick brown fox" ascii
        $text2 = "Lorem ipsum dolor sit" ascii
        $text3 = "Microsoft Windows" ascii

    condition:
        uint16(0) == 0x5A4D and
        2 of ($text*) and
        math.entropy(0, filesize) > 5.5 and
        math.entropy(0, filesize) < 6.5
}

rule Self_Signed_Code {
    meta:
        description = "Detects self-signed Authenticode signature"

    condition:
        uint16(0) == 0x5A4D and
        pe.number_of_signatures > 0 and
        pe.signatures[0].issuer contains pe.signatures[0].subject
}

rule IAT_Camouflage_Pattern {
    meta:
        description = "Detects IAT with suspicious + innocent imports"

    strings:
        // Suspicious APIs
        $sus1 = "VirtualAlloc" ascii
        $sus2 = "VirtualProtect" ascii
        $sus3 = "WriteProcessMemory" ascii

        // Innocent padding APIs
        $inn1 = "MessageBox" ascii
        $inn2 = "GetOpenFileName" ascii
        $inn3 = "PathFileExists" ascii

    condition:
        uint16(0) == 0x5A4D and
        all of ($sus*) and
        2 of ($inn*)
}

Summary: The Obfuscation Toolkit

Binary obfuscation is a layered discipline. No single technique provides complete protection, but combining multiple approaches creates significant analysis friction.

TECHNIQUE EFFECTIVENESS MATRIX
==============================

┌──────────────────────────┬──────────────┬────────────┬───────────────┐
│ Technique                │ Effectiveness│ Complexity │ Detection     │
├──────────────────────────┼──────────────┼────────────┼───────────────┤
│ English text padding     │ High         │ Low        │ Medium        │
│ Same-byte padding        │ Medium       │ Low        │ Easy          │
│ Structured fake data     │ High         │ Medium     │ Hard          │
│ CRT removal              │ High         │ High       │ Medium        │
│ IAT camouflage           │ Medium       │ Low        │ Medium        │
│ Section manipulation     │ Medium       │ Medium     │ Easy          │
│ String encryption        │ High         │ Medium     │ Medium        │
│ Stack strings            │ High         │ Low        │ Hard          │
│ Code signing             │ Varies       │ Medium     │ Easy*         │
└──────────────────────────┴──────────────┴────────────┴───────────────┘

* Self-signed is easily detected; legitimate certs are not

Best Practices:

Layer multiple techniques for defense in depth
Target overall entropy between 4.5 and 6.5 bits/byte
Make IAT composition match the purported application type
Test against common AV/EDR products in isolated environments
Regularly update techniques as detection evolves

References

Claude Shannon: "A Mathematical Theory of Communication" (1948)
Microsoft PE/COFF Specification
Microsoft Authenticode Documentation
"Practical Malware Analysis" by Sikorski & Honig
MITRE ATT&CK: T1027 (Obfuscated Files or Information)

← Back to Wiki