← black metal kernel — series index

// black metal kernel — episode 01 of 08

the foundation:
no_std and raw control

// kernel programming in rust — zero-cost abstractions — no gc — no mercy

no_std no_main AtomicBool spinlock TTAS acquire/release

// 00 — the philosophy

the kernel does not ask permission the kernel does not borrow check its feelings the kernel owns memory the way a black hole owns light — absolutely and without negotiation rust enters this domain not as a guest but as a replacement for the uncertainty that c left behind and what rust offers is not safety rails but rather a different kind of violence — the violence of types that guarantee invariants at compile time so that the runtime never has to guess

zero-cost abstractions are not marketing they are a contract between you and the compiler — if you express a higher-level concept the machine code produced must be identical to what you would have written by hand if it is not then the abstraction costs something and in kernel space cost is death measured in microseconds and cache misses and interrupt latency

// expanded — why rust belongs in the kernel

the kernel is the first code that executes when a machine boots — before any userspace process, before libc, before the scheduler itself exists. it speaks directly to hardware: it manages physical memory page by page, handles CPU exceptions, schedules every thread that will ever run on the system. a bug here does not throw an exception and unwind a stack. it corrupts state silently, or triggers a hard fault that locks the machine. there is no recovery path. there is no second chance. C dominated this space for fifty years and it earned its position — it compiles to tight, predictable machine code, it has no hidden runtime, and it maps directly onto the hardware memory model. but C carries a structural flaw: it gives you a pointer and trusts you completely. use-after-free (accessing memory after the owner freed it), double-free (freeing the same allocation twice), data races between CPU cores (two cores writing the same location without synchronization), buffer overflows into adjacent structs — all of these are valid C. all of them compile without warnings. all of them are responsible for the overwhelming majority of kernel CVEs across every major operating system.

Rust's answer is not a runtime check and not a garbage collector. it is a type system that makes these mistakes structurally impossible to express. the compiler refuses to produce a binary if ownership rules are violated. the resulting machine code is identical to what you would have written in C. the safety cost at runtime is exactly zero — because the runtime never sees it. it is entirely a compile-time property. zero-cost abstractions means exactly this: when you write a generic type or a higher-order function, the compiler monomorphizes it — specializes it for each concrete type it is used with — and inlines aggressively. the CPU never executes generics. it executes flat, specialized machine code indistinguishable from what a skilled C programmer would have written by hand. if the abstraction costs anything at runtime, it is not zero-cost and Rust's contract with you is broken. in kernel space, where you measure latency in nanoseconds and cache pressure in bytes, this contract is the only kind worth signing.

// 01 — the foundation: no_std and raw control

before you write a single line of kernel code you declare war on the standard library — you strip it away and you are left with what the machine actually is which is registers and memory and nothing else this is where rust becomes extraordinary because its core type system survives the stripping intact — ownership borrowing lifetimes all of it remains and you get to use them with absolute hardware

#![no_std]
#![no_main]
#![feature(asm_const)]

use core::sync::atomic::{AtomicBool, Ordering};

#[repr(C)]
pub struct BlackSpinlock {
    black_flag: AtomicBool,
}

impl BlackSpinlock {
    pub const fn black_new() -> Self {
        BlackSpinlock {
            black_flag: AtomicBool::new(false),
        }
    }

    #[inline(always)]
    pub fn black_acquire(&self) {
        while self.black_flag
            .compare_exchange_weak(false, true, Ordering::Acquire, Ordering::Relaxed)
            .is_err()
        {
            while self.black_flag.load(Ordering::Relaxed) {
                core::hint::spin_loop();
            }
        }
    }

    #[inline(always)]
    pub fn black_release(&self) {
        self.black_flag.store(false, Ordering::Release);
    }
}

unsafe impl Send for BlackSpinlock {}
unsafe impl Sync for BlackSpinlock {}

// the two-phase spin is not decoration — outer loop pays the compare_exchange cost once on contention then the inner relaxed load burns no bus bandwidth until the lock frees — this is the difference between a spinlock that works and one that serializes your entire memory bus

// expanded — no_std, atomics, and the two-phase spinlock dissected

#![no_std] is a declaration of intent: you are telling the Rust compiler to strip the standard library entirely. gone are String, Vec, HashMap, println!, and every abstraction that requires a heap allocator or an OS beneath it. what remains is core — the pure language primitives: ownership, lifetimes, traits, generics, atomic types. the machine does not vanish. its representation in Rust's type system does not vanish either. only the comfortable userspace wrappers are gone. #![no_main] removes the assumption that your entry point is the standard fn main(). in a kernel, the entry point is determined by the linker script and called by the bootloader. you define it yourself — often a naked extern function at a fixed address — and you are responsible for everything that happens before it, including setting up the stack pointer and zeroing the BSS segment.

a spinlock is the simplest possible mutual exclusion primitive: one boolean. locked or unlocked. a thread that wants the lock spins — burns CPU cycles in a tight loop — until it can claim the bit. no scheduler involvement. no context switch. this is the right choice in kernel contexts where blocking (yielding to a scheduler) is either impossible (the scheduler is not yet initialized) or too expensive (you are inside an interrupt handler).

the two-phase design matters on SMP systems. the outer loop uses compare_exchange_weak — an atomic read-modify-write that attempts to flip false to true. on x86-64 this compiles to LOCK CMPXCHG, which asserts exclusive ownership of the cache line, broadcasts an invalidation to all other cores, and waits for acknowledgment. this generates coherency traffic on the memory bus. if every spinning core repeatedly fires this instruction, every failed attempt generates a bus storm — the interconnect becomes the bottleneck, not the work. the inner loop fixes this: switch to a plain relaxed load once the CAS fails. reads do not require cache-line exclusivity — the line stays shared across all spinning cores, no bus traffic. the expensive outer CAS is only re-entered when the load detects the lock actually changed state. this is the TTAS (Test-and-Test-and-Set) pattern and it is the correct spinlock implementation for any multi-core system.

Ordering::Acquire on the successful CAS creates a memory barrier: no load or store that appears after the CAS in source order can be reordered before it by the CPU or the compiler. the lock acquisition is a hard ordering fence. Ordering::Release on the store in release() creates the matching barrier in the other direction: everything written inside the critical section is visible to the next thread that acquires before it reads the lock state as free. these two orderings together form the standard acquire-release protocol that underpins every correct concurrent data structure.

// 01 / 08 — black_ptr owns its truth — BLACK0X80