← black metal kernel — series index

// black metal kernel — episode 07 of 08

the page table:
direct address space construction

// kernel programming in rust — zero-cost abstractions — no gc — no mercy

CR3 paging huge pages x86-64 MMU

// 07 — mapping virtual execution to physical reality

x86-64 enforces a rigid four-level paging hierarchy. every single memory access that the cpu makes is dynamically translated by the hardware memory management unit (mmu). without explicit control of this mapping, the kernel is blind, running at the mercy of whatever bootloader populated the `CR3` register initially. to own the address space, you must rewrite reality from the ground up, manipulating page map level 4 (pml4) descriptors directly.

allocating 2mb huge pages over the standard 4kb pages significantly reduces translation lookaside buffer (tlb) pressure. it avoids thrashing caches. establishing a direct map of all physical memory into a high-half virtual window ensures the kernel can universally dereference physical memory directly.

pub const BLACK_PAGE_PRESENT:  u64 = 1 << 0;
pub const BLACK_PAGE_WRITABLE: u64 = 1 << 1;
pub const BLACK_PAGE_HUGE:     u64 = 1 << 7;

#[repr(C, align(4096))]
pub struct BlackPageTable {
    black_entries: [u64; 512],
}

impl BlackPageTable {
    pub const fn black_zeroed() -> Self {
        Self {
            black_entries: [0; 512],
        }
    }

    pub fn black_map_huge(&mut self, black_index: usize, black_phys: u64, black_flags: u64) {
        self.black_entries[black_index] = black_phys | black_flags | BLACK_PAGE_HUGE | BLACK_PAGE_PRESENT;
    }

    pub unsafe fn black_load(&self) {
        let black_addr = self as *const _ as u64;
        core::arch::asm!(
            "mov cr3, {}",
            in(reg) black_addr,
            options(nostack, preserves_flags)
        );
    }
}
// expanded — pagination bounds, CR3, and TLB caches

x86-64 long mode requires paging. a virtual address is sliced into four 9-bit indices and a 12-bit offset. these 9-bit integers reference entries in 512-slot nested arrays: PML4, PDPT, PD, and PT. our BlackPageTable struct flawlessly dictates this array schema relying on `align(4096)`. page tables must reside on exact 4KB boundaries; otherwise, the hardware MMU strips the lower 12 bits as flags and attempts to jump to the wrong physical page.

in modern systems, cache misses are lethal. parsing a four-level page table requires up to four serial memory lookups before the actual data is touched. the Translation Lookaside Buffer (TLB) acts as a microchip memoization mechanism for resolved addresses, but it is microscopically small. mapping physical regions as huge pages (2MB) bypasses the deepest level of the hierarchy entirely, drastically compacting TLB occupancy.

when we write the base address of the PML4 into the `CR3` control register, the architectural context instantly switches. any instruction executed immediately afterward undergoes the new translation mechanism. `CR3` updates inherently trigger a massive TLB flush sequence across the cores unless explicitly masked. the execution is immediate, ruthless, and unforgiving of logical voids.

// 07 / 08 — black_ptr owns its truth — BLACK0X80