Newer
Older
AMI-Aptio-BIOS-Reversed / FpgaErrorHandler / FpgaErrorHandler_analysis.md
@Ajax Dong Ajax Dong 2 days ago 16 KB Init

FpgaErrorHandler Module

Overview

SMM driver that handles FPGA (Field Programmable Gate Array) error status monitoring and correction for the Intel Purley platform. This module runs in System Management Mode (SMM), monitors FPGA error registers via MMIO, and performs error acknowledgment and system reset when critical FPGA errors are detected. Also integrates with the MpSyncData library for multi-processor synchronization.

Address Range

0x280 - 0x1E80 (0x1C00 bytes .text, 44 functions)

Key Functions

Address Name Purpose
0x514 _ModuleEntryPoint SMM module entry point, initializes error handler
0x5C0 sub_5C0 Auto-generated UEFI init: saves ImageHandle, SystemTable, BootServices, RuntimeServices, gSmst
0xEAC sub_EAC Main FPGA error handler registration -- locates protocols, registers callbacks
0xDFC sub_DFC Error status collection callback -- reads FPGA error status registers per socket
0xD48 sub_D48 Error polling function -- checks FPGA error bits and triggers correction
0xCB4 sub_CB4 Fatal error handler -- logs error, writes GPIO, triggers warm reset via 0xCF9
0xC90 sub_C90 Error status query -- checks if a specific error bit is set
0xB48 sub_B48 Error clear function -- clears FPGA error status registers
0xBF0 sub_BF0 Error buffer clear function -- resets FPGA error buffer to zero
0xB38 sub_B38 FPGA error presence check -- returns whether any FPGA error is active
0xA30 sub_A30 Error logging function -- writes error info via MpSyncData protocol
0x1580 sub_1580 MpSyncData library initialization -- sets up CPU topology tracking

Entry Points (Public API)

  • 0x514 _ModuleEntryPoint: Standard UEFI SMM driver entry point. Saves boot services, locates protocols, registers FPGA error callbacks, and initializes error monitoring.

Internal Helpers

UEFI Infrastructure (Auto-generated or library code)

  • 0x5C0 sub_5C0: UEFI boot services initialization. Saves ImageHandle, SystemTable, gBS, gRT, gSmst pointers from the UEFI system table. Generated by AutoGen.c
  • 0x280 sub_280: SetJump implementation -- saves CPU register context (GP registers, XMM, MXCSR) into a jump buffer structure for non-local goto. Used to protect the entry point during error handling.
  • 0x320 sub_320: LongJump implementation -- restores CPU context from a jump buffer and longjmps back. Used to recover from errors during initialization.
  • 0x11E0 sub_11E0: SetJump validation wrapper -- validates jump buffer alignment (8-byte aligned) and non-null.
  • 0x3A0 sub_3A0: ZeroMem -- zero-fills memory buffers (aligned + tail).
  • 0x12FC sub_12FC: ZeroMem wrapper with validation -- validates buffer non-null and length bounds.
  • 0x430 sub_430: RDTSC wrapper -- reads timestamp counter.
  • 0x420 sub_420: _mm_pause wrapper -- CPU spin-loop hint.
  • 0x4A0 sub_4A0: _enable() -- enable interrupts.
  • 0x4B0 sub_4B0: _disable() -- disable interrupts.
  • 0x4C0 sub_4C0: __getcallerseflags() -- read EFLAGS.
  • 0x440 sub_440: __cpuid wrapper (leaf-based) -- CPUID with EAX input, returns EAX/EBX/ECX/EDX.
  • 0x470 sub_470: __cpuid wrapper (function-based) -- CPUID with query type input.

MMIO/I/O Access Wrappers

  • 0x128C sub_128C: 64-bit MMIO read -- reads a QWORD from an MMIO address with alignment check.
  • 0x12BC sub_12BC: 64-bit MMIO write -- writes a QWORD to an MMIO address with alignment check.
  • 0x1228 sub_1228: 16-bit MMIO read -- reads a WORD from IO address with alignment check.
  • 0x1258 sub_1258: 16-bit MMIO write (constant 0x500) -- writes 0x500 to a WORD MMIO address.
  • 0x1D54 sub_1D54: __indword -- reads a DWORD from an I/O port.
  • 0x1D24 sub_1D24: Unaligned read -- reads a QWORD from potentially unaligned address.

PCI Express Helpers

  • 0x1544 sub_1544: PCI Express MMIO address translation -- validates PCIe address (upper bits must be zero) and adds the MMIO base from qword_2E68.
  • 0x143C sub_143C: MMIO read via PciRootBridge protocol -- reads a DWORD from PCI config space (Bus 0, Dev 0, Func 0, Reg 0xF8000 = 1015808) using protocol at qword_2E58.
  • 0x1C00 sub_1C00: PCH info query -- reads LPC device ID to determine PCH SKU, validates it (checks for supported PCH), returns PCH-specific data from unk_2B00.
  • 0x1BA0 sub_1BA0: GPIO value read -- reads a GPIO value via MMIO (address 0xFD000148 + shifted offset) using PCH info.

Memory Management

  • 0x13D4 sub_13D4: Free pool -- frees the allocated buffer at qword_23FF0. Uses either InternalSmmFreePool or gBS->FreePool depending on context.
  • 0x1360 sub_1360: MM RAM range check -- checks if an address is within any registered SMM memory region.
  • 0x13A4 sub_13A4: SMM allocate pool -- allocates memory via the SMM memory allocator.

Protocol Lookups (Singleton pattern)

  • 0x10C8 sub_10C8: Lazy-loads debug logging protocol at GUID unk_2A60 into qword_2E50.
  • 0x1C98 sub_1C98: Lazy-loads PCD protocol at GUID unk_2A50 into qword_2EA0.
  • 0x146C sub_146C: Lazy-loads HOB list from SystemTable's HOB list pointer.

HOB/Data Comparison

  • 0x1D84 sub_1D84: GUID comparison -- compares two GUID values (16 bytes each, split as two QWORDs). Used to find matching HOB entries.
  • 0x1990 sub_1990: CPU topology discovery -- uses CPUID to determine thread bits and core bits for the CPU topology.

Spinlock

  • 0x1DF4 sub_1DF4: Spinlock initialization -- sets a spinlock QWORD to 1 (unlocked).

Assertion/Logging

  • 0x11A0 sub_11A0: Debug logging -- calls debug print protocol if available (lazy-loaded).
  • 0x1118 sub_1118: Conditional debug print with platform check -- checks CMOS (I/O 0x70/0x71) for platform type and error severity level before printing.

State Management

Global Variables (.data section, 0x2A00 - 0x24800)

Address Size Name Purpose
0x2E28 8 SystemTable Saved UEFI SystemTable pointer
0x2E30 8 BootServices Saved gBS pointer
0x2E38 8 ImageHandle Saved driver image handle
0x2E40 8 RuntimeServices Saved gRT pointer
0x2E48 8 qword_2E48 SMM System Table pointer (gSmst)
0x2E50 8 qword_2E50 Debug print protocol (lazy-loaded)
0x2E58 8 qword_2E58 PciRootBridge protocol
0x2E60 8 qword_2E60 HOB list pointer (lazy-loaded)
0x2E68 8 qword_2E68 PCI Express MMIO base address
0x2E78 8 qword_2E78 MpSyncData protocol pointer
0x2E80 8 qword_2E80 MpSyncData service protocol
0x2E88 8 qword_2E88 MpSyncData CPU info protocol
0x2E90 8 unk_2E90 MpSyncData second protocol
0x2E98 8 qword_2E98 FPGA MMIO protocol (PciRootBridge IO access)
0x2EA0 8 qword_2EA0 PCD protocol pointer
0x2FA8 8 qword_2FA8 Module return status (initialized to 0x8000000000000001)
0x24050 8 qword_24050 FPGA state structure pointer (from MmPciBase protocol)
0x24058 8 qword_24058 FPGA protocol interface for callback registration
0x24000 8 qword_24000 MmPciBase protocol instance
0x23FF0 8 qword_23FF0 SMM memory allocation buffer
0x23FE8 8 qword_23FE8 SMM descriptor count
0x23FE0 4 dword_23FE0 Topology: socket mask (1 << socket)
0x7FD8 4 dword_7FD8 Topology: thread mask (1 << (core + socket))
0x24020 8 psub_B38 Function pointer: FPGA error presence check
0x24028 8 psub_B48 Function pointer: FPGA error clear
0x24030 8 psub_BF0 Function pointer: FPGA error buffer clear
0x24038 8 psub_C90 Function pointer: FPGA error status query
0x24040 8 psub_CB4 Function pointer: FPGA fatal error handler
0x24048 8 psub_D48 Function pointer: FPGA error poll handler
0x24060 16 buf_ FPGA error status buffer (4 x DWORD, one per socket)
0x2EB0 248 unk_2EB0 SetJump buffer (248 bytes for CPU context save)

Large Data Tables (0x7FC8 - 0xE7E0)

Address Purpose
0x7FC8 unk_7FC8 - Spinlock
0x7FD0 unk_7FD0 - Spinlock
0x7FE0 byte_7FE0 - Per-CPU active state flags
0x87E0 qword_87E0 - Per-CPU APIC ID mapping table
0xC7E0 dword_C7E0 - Per-CPU initial APIC ID table
0xE7E0 byte_E7E0 - Per-CPU state byte table
0x2FC0 unk_2FC0 - Per-CPU spinlock array (0x5000 bytes, 40 bytes per entry)
0x2FC8 byte_2FC8 - Per-CPU active flag array (0x5000 bytes, 40 bytes stride)

Reference Data (.rdata section, 0x1E80 - 0x2A00)

Address Data Purpose
0x1FD0 6 x WORD offsets FPGA register offset table (0x394, 0x39C, 0x3A4, 0x3AC, 0x3B4, 0x3BC)
0x1FE0 6 x WORD offsets FPGA register offset table group 2 (0x390, 0x398, 0x3A0, 0x3A8, 0x3B0, 0x3B8)

Protocol GUIDs

Located via SMM System Table (qword_2E48 + 208 = Smst->SmmLocateProtocol)

Address GUID Binary Likely Protocol
0x2A00 3BA7E14B-176D-4B2A-948A-C86FB001943C MmPciBase protocol (get FPGA base address)
0x2A10 86B091ED-1463-43B5-82A1-2C8B83CB8917 MmPciBase FPGA cfg protocol
0x2A20 0067835F-9A50-433A-8CBB-852078197814 MpSyncData protocol
0x2A70 ED32D533-99E6-4209-9CC0-2D72CDD998A7 FPGA MMIO access protocol
0x2A80 1D202CAB-C8AB-4D5C-94F7-3CFCC0D3D335 MpSyncData CPU info protocol
0x2A90 6820ABD4-A292-4817-9147-D91DC83542 PCI config protocol
0x2AB0 47B7FA8C-F4BD-4AF6-8200-333086F0D2C8 FPGA callback registration protocol
0x2AC0 7739F24C-93D7-11D4-9A3A-0090273FC14D HOB GUID (gEfiHobMemoryAllocModuleGuid)
0x2AC8 0090273FC14D... Part of HOB entry GUID

Located via BootServices (gBS + 320 = gBS->LocateProtocol)

Address GUID Binary Likely Protocol
0x2AD0 F4CCBFB7-F6E0-47FD-9DD4-10A8F150C191 MpSyncData protocol
0x2A40 A7CED760-C71C-4E1A-ACB1-89604D5216CB MpSyncData protocol
0x2A50 11B34006-D85B-4D0A-A290-D5A571310EF7 PCD protocol

Located via MpSyncData (qword_2E78 + 208 = MpSyncData->SmmLocateProtocol)

Address GUID Binary Purpose
0x2A20 0067835F-9A50-433A-8CBB-852078197814 MpSyncData protocol
0x2A80 1D202CAB-C8AB-4D5C-94F7-3CFCC0D3D335 MpSyncData CPU info

Data Structures

FPGA State Structure (accessed via qword_24050)

The FPGA state object is obtained from MmPciBase protocol. Key offsets:

  • +22: Byte field with bitmask of active FPGA sockets (bits 0-3 for sockets 0-3)
  • +14958 (0x3A6E): Per-socket FPGA error status array (starts at offset 14958 from base)
  • Each socket's FPGA status spans 14944 bytes

FPGA Error Register Set

  • Based on PCI config space at MMIO base derived from MmPciBase
  • Register offsets at 0x1FD0 table: 0x394, 0x39C, 0x3A4, 0x3AC, 0x3B4, 0x3BC
  • Register offsets at 0x1FE0 table: 0x390, 0x398, 0x3A0, 0x3A8, 0x3B0, 0x3B8
  • These appear to be FPGA error status/clear registers

FPGA Callback Registration Structure (at qword_24058)

Protocol with a function at offset 0 that takes an array of 6 function pointers and a parameter:

typedef struct {
    void (*Register)(FPGA_CALLBACK_ARRAY *Callbacks, UINT8 Param);
} FPGA_CALLBACK_PROTOCOL;

The callback array has 6 entries:

  • [0] = sub_B38: FPGA error presence check
  • [1] = sub_B48: FPGA error clear
  • [2] = sub_BF0: FPGA error buffer clear
  • [3] = sub_C90: FPGA error status query
  • [4] = sub_CB4: FPGA fatal error handler
  • [5] = sub_D48: FPGA error poll handler

Per-CPU Data Tables (MpSyncData)

The module allocates large tables indexed by (socket, core, thread):

  • byte_7FE0: Per-CPU active flag (1 = active)
  • byte_E7E0: Per-CPU state byte (0 = invalid, 1 = present, 2 = initialized)
  • dword_C7E0: Per-CPU initial APIC ID
  • qword_87E0: Per-CPU APIC ID (64-bit)
  • byte_2FC8: Per-CPU active flag (40-byte stride)
  • unk_2FC0: Per-CPU spinlock (40-byte stride)

Offset calculation: idx = thread + (core + 448 * socket) * 64

  • 448 cores per socket max, 64 threads per core max

Calling Patterns

1. Module Initialization Flow

_ModuleEntryPoint (0x514)
  -> sub_5C0 (0x5C0): Save boot services, gBS, gRT, gSmst
  -> sub_280 (0x280): SetJump to protect against errors
  -> sub_EAC (0xEAC): Main FPGA error handler setup
      -> LocateProtocol (MmPciBase) -> gets FPGA base -> qword_24050
      -> LocateProtocol (PCI config) -> qword_24058
      -> LocateProtocol (MpSyncData) -> qword_2E20
      -> Register protocol callbacks:
           RegisterCallback( {sub_B38, sub_B48, sub_BF0, sub_C90, sub_CB4, sub_D48}, 3)
  -> sub_11E0: Validate SetJump buffer
  -> sub_320: LongJump back if error occurred

2. FPGA Error Polling Flow (sub_DFC - error status collection)

sub_DFC (0xDFC) - called from FPCA callback framework
  -> For each of 4 sockets:
     -> Check if bit N is set in FPGA state byte (+22)
     -> Read FPGA error register via MMIO protocol at qword_2E98
     -> Store status in buf_ array

3. FPGA Error Correction Flow (sub_D48)

sub_D48 (0xD48) - poll handler
  -> For each active socket:
     -> Check error register at offset +16400 (error pending)
     -> If pending: call sub_A30 to log, set output flag
     -> Check register at offset +968 (secondary error)
     -> If pending: call sub_A30 to log, set output flag

4. Fatal Error Flow (sub_CB4)

sub_CB4 (0xCB4) - fatal FPGA error handler
  -> Read GPIO value via sub_1BA0 (MMIO 0xFD000148 + shift)
  -> Write GPIO output bit
  -> __outbyte(0xCF9, 2): Reset CPU
  -> __outbyte(0xCF9, 6): Full system reset
  -> Infinite loop

5. MpSyncData Initialization Flow (sub_1580)

sub_1580 (0x1580) - initialize CPU sync data
  -> LocateProtocol (MpSyncData)
  -> LocateProtocol (MpSyncData second)
  -> LocateProtocol (MpSyncData CPU info)
  -> sub_1990: Get CPU topology (thread_bits, core_bits)
  -> Initialize per-CPU data tables (spinlocks, flags, APIC IDs)
  -> Enumerate all CPUs via MpSyncData CPU info protocol

Dependencies

Consumed (this module calls out to)

  • UEFI Boot Services (gBS): LocateProtocol, FreePool
  • SMM System Table (gSmst): SmmLocateProtocol, SmmAllocatePool, InternalSmmFreePool
  • MmPciBase Protocol: Provides FPGA MMIO config space base address
  • MmPci IO Protocol: FPGA register read/write access
  • FPGA Callback Registration Protocol: Registers error handler callbacks
  • MpSyncData Protocol: Multi-processor synchronization data management
  • PciRootBridge Protocol: PCI config space access for PCH detection
  • PCD Protocol: Platform Configuration Database
  • Debug Print Protocol: DEBUG/ASSERT message output
  • GPIO Private Library: GPIO status register access via MMIO
  • PCH Info Library: PCH SKU detection via LPC device ID

Consumed By (other modules call this)

  • SMM Core: Via registered FPCA callback protocol -- the 6 callbacks (sub_B38, sub_B48, sub_BF0, sub_C90, sub_CB4, sub_D48) are registered with the FPGA framework to be called when FPGA errors occur.

Notes

  1. Source file: PurleyPlatPkg/Ras/Smm/ErrHandling/FpgaErrorHandler/FpgaErrorHandler.c -- SMM driver for FPGA error handling on Intel Purley platform.

  2. Multi-socket support: All error handling iterates over sockets 0-3, using a bitmask at FPGA state byte +22 to determine which sockets are populated.

  3. FPGA Register Layout: The FPGA error status registers are at offsets 0x390-0x3BC in the FPGA PCI config space. Two sets of 6 WORD-sized registers (set1: 0x394/0x39C/0x3A4/0x3AC/0x3B4/0x3BC, set2: 0x390/0x398/0x3A0/0x3A8/0x3B0/0x3B8).

  4. Error severity levels: The callbacks are registered with parameter "3", and the sub_1118 debug print function checks the platform type via CMOS to control which error levels get logged.

  5. CMOS-based debug control: sub_1118 reads CMOS offset 0x4C (via I/O ports 0x70/0x71) to determine platform type and control error message output.

  6. Warm reset signaling: Fatal errors trigger a warm reset via the standard 0xCF9 reset port (write 2 then 6). The GPIO register at 0xFD000148 is used to signal the error cause to the hardware.

  7. CPU topology: sub_1990 uses CPUID leaf 0xB (Extended Topology) to discover thread bits per core and core bits per package, supporting up to 4 sockets, ~448 cores per socket, and 64 threads per core.

  8. SetJump protection: The entry point uses SetJump/LongJump (sub_280/sub_320) to protect against crashes during initialization -- if sub_EAC fails (longjmp called), the module continues gracefully.