Newer
Older
AMI-Aptio-BIOS-Reversed / ProcessorErrorHandler / ProcessorErrorHandler_analysis.md
@Ajax Dong Ajax Dong 2 days ago 17 KB Init

ProcessorErrorHandler Module Analysis

Overview

SMM driver for Intel Purley (Skylake-SP Xeon) platform processor error handling. Responsible for Machine Check Architecture (MCA) error handling, memory error correction reporting, IIO (Integrated IO) error handling, VTD/ITC/OTC/DMA error logging, and Post-Package Repair (PPR) / Thermal Alert System (TAS) management. Runs as a UEFI SMM driver registered via SMM SW dispatch protocol.

Address Range

0x300 - 0x11a00 (.text segment, 232 functions)

Source Files (from debug paths)

  • PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/ProcessorErrorHandler.c - Main module
  • PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/McaHandler.c - MCA interrupt handler
  • PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/MemoryErrorHandler.c - Memory error handler
  • LenovoServerPkg/Library/LnvPurleyLib/ProcMemErrReporting/ProcMemErrReporting.c - Lenovo processor/memory error reporting
  • PurleyPlatPkg/Ras/Library/mpsyncdatalib/mpsyncdatalib.c - MP sync data library
  • PurleySktPkg/Library/emcaplatformhookslib/emcaplatformhookslib.c - eMCA platform hooks

Entry Points (Public API)

0x5F0 - _ModuleEntryPoint (ModuleEntryPoint)

Entry point called by SMM infrastructure. Initializes the module by calling sub_69C, then calls sub_34B0 to locate protocols and register SMI handlers. Sets return status in qword_14FD8.

0x67AC - McaHandler (MCA SMI Handler) (source: McaHandler.c)

Called when an SMI is triggered by a machine check. Parameters:

  • a1 - CpuInfo pointer (output struct)
  • a2 - InterruptType pointer
  • a3 - SystemContext pointer (EFI_SMM_CPU_REGISTER_CONTEXT)
    Extracts CPU APIC ID via sub_9CD4(), reads topology via sub_9DA0(), computes a CPU index via sub_9C34(), and returns a struct with:
  • offset 0: APIC ID
  • offset 4: Package ID (socket)
  • offset 8: Core ID
  • offset 12: Thread ID
  • offset 16: CPU index (unique identifier)
  • offset 24: InterruptType value
  • offset 32: SystemContext pointer

0x145C - SmmPeriodicDispatchHandler

Registered via (qword_368C8+16)(sub_145C, 1) meaning SMM SW dispatch with SwSmiInputValue=1. Called periodically to process pending error reports. Iterates over CPU entries in the error array at [qword_36960] (stride 216 bytes each), clears processed errors, and for multi-socket configurations triggers PPR/TAS and error logging.

Protocols Located (via qword_14CF8+208, an SmmLocateProtocol or similar)

GUID Address GUID Stored At Purpose
0x14180 {3BA7E14B-176D-4B2A-948A-C86FB001943C} qword_36950 EFI_SMM_CPU_PROTOCOL - SMM CPU services
0x14300 {ED32D533-99E6-4209-9CC0-2D72CDD998A7} qword_368C8 SMM SW Dispatch2 Protocol
0x141A0 {86B091ED-1463-43B5-82A1-2C8B83CB8917} qword_368E0 Configuration Protocol (provides system config data)
0x14340 {1DBD1503-0A60-4230-AAA3-8016D8C3DE2F} qword_368F0 IOH/Iio Protocol - IIO (Integrated IO Hub) access
0x141C0 {0067835F-9A50-433A-8CBB-852078197814} qword_36930 SmmCpuIo Protocol - SMM CPU I/O
0x14350 {5138B5C5-9369-48EC-5B97-38A2F7096675} qword_36948 Platform Info Protocol
0x14280 {ED32D533-99E6-4209-9CC0-2D72CDD998A7} qword_368C0 SMM BASE 2 Protocol
0x14290 {1D202CAB-C8AB-4D5C-94F7-3CFCC0D3D335} unk_36958 SMM CPU Protocol
0x142A0 {1DBD1503-0A60-4230-AAA3-8016D8C3DE2F} unk_36948 SmmCpuIo
0x142B0 {5138B5C5-9369-48EC-5B97-38A2F7096675} unk_36940 SmmBase2
0x142E0 for SmmLockBox - SmmLockBoxCommunication config table GUID
0x14220 {CD3D0A05-9E24-437C-A891-1EE053DB7638} - Used for BootServices->LocateProtocol in sub_27E0

From configuration protocol (qword_368E0):

  • qword_368D8 = config pointer at offset 0 (GeneralCfg / SystemConfig struct)
  • qword_368E8 = config pointer at offset 8 (PcieCfg / PerSocketCfg struct)
  • qword_36938 = config pointer at offset 16 (CpuCfg / PerSocketInfo struct)

Key Functions

Initialization (ProcessorErrorHandler.c)

Address Name Lines Purpose
0x5F0 _ModuleEntryPoint 0xA9 Driver entry, calls init and protocol locator
0x69C DriverInit 0xAC9 UEFI boot services init (gST, gBS, gRT, gImageHandle setup)
0x34B0 LocateProtocolsAndRegisterHandlers 0x407 Locates all required protocols, registers SMI handlers
0x278C IsErrorHandlingEnabled 0x51 Checks if error handling should be active (reads config fields)
0x27E0 ErrorHandlerSetup 0x422 Full error handler init: registers MCA SMI, enables MSRs, reads "Setup" UEFI var, handles PprAddress NVRAM, configures per-socket MCA banks
0x3364 InitPerSocketMca 0x14C Per-socket MCA initialization: enables MCG_CTL_P (LERR, RERR), sets up MCi_CTL2 for corrected errors
0x5394 RunPprTas 0x1D6 Post-Package Repair / Thermal Alert System: marks PPR/TAS busy, disables DIMMs on retired pages, clears throttled DIMMs
0x2324 ConfigureMcBanks 0x214 Configure MCA banks per socket (enables specific error types)

Error Dispatch

Address Name Size Purpose
0x104D0 LogErrorEvent 0x22C Central error logging dispatcher. Dispatches by ErrorSource type:
- ErrorSource=1: calls sub_10410
- ErrorSource=3: iterates sub_110D0 callbacks
- ErrorSource=4: iterates sub_111F4 callbacks
- ErrorSource=6: iterates platform hooks sub_114F0 (Corrected)
- ErrorSource=7: iterates platform hooks (Recoverable)
- ErrorSource=8: iterates platform hooks (Fatal)
- ErrorSource=9: iterates platform hooks (Uncorrected)
0x3D8C ReportGenericErrors 0x1AB Reports generic error via sub_104D0 with bitmask error type. Error types: bit0=-112(0x90), bit1=-111(0x91), bit2=-110(0x92), bit3=-109(0x93), bit4=-108(0x94), bit5=-107(0x95), bit6=-106(0x96), bit7=-105(0x97), bit8=-104(0x98), bit31=-96(0xA0)
0x4CD8 LogProcMemError 0x144 Encodes processor/memory error source (n5=0..10) into sub_104D0 format and dispatches

MCA Handler (McaHandler.c)

Address Name Size Purpose
0x9C34 GetCpuIndex 0x8F Returns CPU unique index from package/core/thread topology using lookup table byte_15220
0x9CD4 ReadLocalApicId 0x6A Reads local APIC ID (via MSR 0x1B or CPUID)
0x9DA0 ReadCpuTopology 0x80 Reads CPU package/core/thread topology via CPUID
0x9E20 GetCurrentSocketAndCore 0x4C0 Determines current socket and core numbers
0xA2EC GetMaxCoresPerPackage 0x4C Gets maximum cores per socket from topology

Memory Error Handler (MemoryErrorHandler.c)

Address Name Size Purpose
0x56AC LogMcBankErrors 0x542 Logs MCA bank errors: iterates MC banks, decodes MCi_STATUS/MSR, determines error severity, calls error dispatch. References gMcBankList global
0x5BF0 LogMemError 0x638 Full memory error reporting: reads MC banks, decodes DIMM info (socket/channel/rank/bank/row/col), writes to lockbox, logs via platform hooks
0x6228 DecodeDimmInfo 0x1C8 Decodes DIMM location from physical address / MCA address data
0x63F0 ReadMemErrorRegisters 0x180 Reads memory controller error registers (MC ODBC, EMASK, etc.)
0xE510 ReadMemControllerMsr 0x54 Reads memory controller MSR registers (MC_ODBC, etc.)

IIO / Platform Error Handler (ProcMemErrReporting.c)

Address Name Size Purpose
0xD62C ProcessIioErrors 0x84E IIO error processing per socket: checks for VTD, ITC, OTC, DMA errors, reads status registers, logs errors via platform hooks
0xDE7C CheckAndReportIioErrors 0x1D9 Per-socket IIO error checker: iterates IIO stacks, checks error status, triggers ProcessIioErrors
0xE070 HandleCorrectedIioErrors 0x49F Handles corrected IIO error logging per socket
0xC5A8 ProcessDimmCorrectedErrors 0x1F0 Processes DIMM corrected ECC errors: reads error registers, determines channel/dimm, logs via platform hooks
0xE6A0 ProcessMemErrorReporting 0x1C4 Main memory error reporting entry: reads system config, determines error type, calls appropriate handler
0xD228 LogVtdErrors 0x72 Logs VT-d errors (register 0x66b = VTD UNC ERR STS)
0xD29C LogItcErrors 0x70 Logs ITC errors (register 0x66b)
0xD30C LogOtcErrors 0x73 Logs OTC errors (register 0x66b)
0xD380 LogDmaErrors 0x2AA Logs DMA errors (detailed bit decoding)

Utility / Library Functions

Address Name Size Callers Purpose
0x6C68 DebugAssert 0x4F 2 ASSERT implementation
0x6CB8 DebugPrint 0x88 45 Print debug message
0x6D40 DebugAssertLine 0x3E 82 ASSERT with file/line
0x85FC AcquireSpinLock 0x34 7 Acquire spin lock for synchronization
0x8630 SpinLockAcquireWithTimeout 0xB3 6 Spinlock acquire with timeout (10M ns)
0x8760 SpinLockRelease 0x6C 6 Release spin lock
0x86E4 SpinLockAcquireInternal 0x7A 2 Internal spin lock acquire
0x87CC InterlockedIncrement 0x4D 2 Atomic increment
0x881C HobGetHobList 0x82 2 Get HOB list pointer
0x88F0 GetSystemFirmwareResource 0x49 3 Get firmware resource descriptor from HOBs
0x88A0 HobGetNextHob 0x4D 1 Get next HOB entry
0x893C AllocatePool 0x75 2 SmmAllocatePool wrapper
0x89B4 SmmLockBoxSave 0x33 2 Save data to SMM LockBox
0x1168 SmmLockBoxDestructor 0xCA 1 SMM LockBox destructor, unregisters config table
0xA4FC WritePciConfig - - Write PCI configuration space
0xA66C ReadPciConfig - - Read PCI configuration space
0xAB60 GetPciDeviceClass - - Get PCI device class code
0xB534 LogErrorToBanks - - Log error to error bank array
0xBD34 ClearErrorStatus - - Clear error status bits
0xB62C SetErrorBankFlag - - Set error pending flag per CPU

SMM Handler Registration

From sub_34B0 (line ~1856):

  1. (qword_36950)(&psub_278C, 3) - Registers isErrorHandlingEnabled handler with SMM CPU protocol
  2. (qword_368C8+16)(sub_145C, 1) - Registers periodic SW SMI dispatch handler (SwSmiInputValue=1)
  3. If qword_368E8+17 == 1 && qword_368D8+10 == 1: registers with qword_36968+40 for MSR 0x6ABC (MCG_CTL or MCi_CTL2)
  4. If qword_368D8+4 == 1: calls sub_3364 for per-socket MCA init

Data Structures

Error Bank Entry (stride 216 bytes at qword_36960)

Offset  Size  Description
0x00    1     Active flag
0x01    1     Retry flag
0x03    1     PPR/TAS flags
0x30    1     Bank valid flag
0x34    1     Error type flags (bit0=UC, bit1=CE, bit2=deferred)
0x3C    4     Error bank/register indices
0x40    4     Error status data

CpuInfo (returned by McaHandler at 0x67AC)

Offset  Size  Description
0x00    4     APIC ID
0x04    1     Package ID (socket)
0x08    1     Core ID
0x0C    1     Thread ID
0x10    8     CPU Index (unique)
0x18    8     InterruptType
0x20    8     SystemContext

Socket Configuration (per socket, stride 14944 bytes at qword_14E70)

Offset  Size  Description
0x00    2     Socket enabled bitmask
0x06    2     Core active bitmask
0x18    1     Socket present / enabled
...     ...   IIO stack config

System Config (at qword_368D8)

Offset  Size  Description
0x00    1     Error handling enable flag
0x04    1     Per-socket MCA init flag
0x0A    1     Advanced RAS enable
0x0C    1     Corrected error logging mode (0=off, 1=logged, 2=throttled)
0x0D    1     PPR/TAS enable (0=off, 1=on, 2=aggressive)
0x11    1     IIO error handling flag

Socket Feature Config (at qword_368E8)

Offset  Size  Description
0x10    1     MCA corrected error handling feature
0x11    1     MCA uncorrected error handling feature
0x1D    1     IIO error handling feature flag

Global Variables

Address Name Purpose
0x14CE8 ImageHandle EFI image handle (gImageHandle)
0x14CD8 SystemTable EFI system table (gST)
0x14CE0 BootServices EFI boot services table (gBS)
0x14CF0 RuntimeServices EFI runtime services table (gRT)
0x14CF8 Smst SMM system table (gSmst)
0x14DD0 SmmCpuProtocol SMM CPU protocol interface
0x14E60 IioProtocol IIO protocol interface
0x14E70 SocketConfigArray Per-socket config array (stride 14944)
0x14E78 IioProtocol2 IIO protocol interface (alternate)
0x14E80 PcieConfig PCIe configuration structure
0x14E88 GeneralConfig General system configuration
0x14E10 IioStackMask IIO stack bitmask (which stacks are populated)
0x14EE0 LockStruct Synchronization lock structure
0x14FD8 ReturnStatus Module return status
0x15000 ErrorSource Current error source type
0x15198 SmmCpuProtocol2 Alternate SMM CPU protocol (from sub_27E0)
0x151A0 SmmCpuRegistered SMM CPU registration flag
0x151F8 BootScriptEntry BootScript entry pointer
0x15220 CpuTopologyTable CPU topology lookup table
0x31220 MaxCoreCount Maximum core count per package
0x368C8 SwDispatchProtocol SMM SW Dispatch2 protocol
0x368C0 SmmBase2Protocol SMM Base2 protocol
0x368D0 SavedPprTas Saved PPR/TAS data pointer
0x368D8 SystemConfig System configuration (GeneralCfg)
0x368E0 ConfigProtocol Configuration protocol instance
0x368E8 SocketFeatureConfig Per-socket feature config
0x368F0 IioProtocolRef IIO protocol reference
0x36900 psub_278C Registered isErrorHandlingEnabled callback
0x36908 psub_27E0 Registered error handler setup callback
0x36910 psub_2C04 Callback pointer
0x36918 psub_2C10 Callback pointer
0x36920 psub_3090 Callback pointer
0x36928 ConfigData Configuration data pointer
0x36930 CpuIoProtocol SmmCpuIo protocol
0x36938 PerSocketCpuConfig Per-socket CPU config array
0x36940 SpinLock Spin lock for synchronization
0x36948 PlatformInfoProtocol Platform info protocol
0x36950 SmmCpuServices SMM CPU services (SaveState, etc.)
0x36960 ErrorBankArray Error bank array base (stride 216)
0x36968 IioPciProtocol IIO PCI protocol
0x1B008 LastErrorSource Per-bank last error source tracking

Calling Patterns

Module Init Flow

_ModuleEntryPoint(0x5F0)
  +-- sub_69C()          -- init gST/gBS/gRT/ImageHandle
  +-- sub_300()          -- lock check
  +-- sub_34B0()         -- locate protocols (7 protocols)
     +-- Locate protocols via SMST->SmmLocateProtocol (offset 208)
     +-- Extract config from ConfigProtocol
     +-- sub_85FC()   -- acquire spin lock
     +-- sub_88F0()   -- get HOB list
     +-- sub_5394()   -- run PPR/TAS if needed
     +-- sub_278C()   -- is error handling enabled?
     +-- sub_27E0()   -- full error handler setup
     |     +-- Register SMM SW SMI handler (sub_145C)
     |     +-- Enable MSRs (MCG_CTL, MCi_CTL2)
     |     +-- Read "Setup" UEFI variable
     |     +-- Handle PprAddress NVRAM
     +-- sub_3364()   -- per-socket MCA init (if configured)
  +-- sub_6F88()         -- release lock
  +-- sub_3E0()          -- cleanup

SMI Dispatch Flow

sub_145C() (Periodic SMI handler)
  +-- Iterate error bank array (216 byte stride)
  +-- For each active bank:
     +-- Check error type (UC/CE/deferred)
     +-- sub_BD34()   -- clear error status
     +-- Clear bank entries
  +-- If error reported and multi-socket:
     +-- Check socket IIO error status
     +-- sub_B534()   -- log error to banks
     +-- sub_A76C()   -- remote CPU execution
  +-- Clear per-bank tracking

MCa Handler Flow (sub_67AC)

McaHandler(CpuInfo, InterruptType, SystemContext)
  +-- Validate pointers
  +-- sub_9CD4()         -- read local APIC ID
  +-- sub_9DA0()         -- read CPU topology (package/core/thread)
  +-- sub_9C34()         -- compute CPU index from topology table
  +-- Return populated CpuInfo struct

Error Logging Flow (sub_104D0)

LogErrorEvent(ErrorSourceHeader)
  +-- Switch on ErrorSource byte[0]:
      1 -> sub_10410()        -- CPU error
      3 -> sub_110D0() list   -- PCIe error
      4 -> sub_111F4() list   -- 
      6 -> platform hooks CE  -- Corrected
      7 -> platform hooks     -- Recoverable  
      8 -> platform hooks     -- Fatal
      9 -> platform hooks     -- Uncorrected

Dependencies

Consumed (this module calls)

  • SmiHandler via SMST->SmiHandlerRegister (offset 208) to register SMM handlers
  • SmmCpuProtocol for CPU save state access (offset 112=remote CPU, 120=current CPU, 128=CPU count, 136=CPU context, 144=CPU state buffer)
  • SmmSwDispatch2Protocol for register/unregister SW SMI handlers
  • RuntimeServices for GetVariable/SetVariable ("Setup", "PprAddress" UEFI variables)
  • BootServices for LocateProtocol
  • SmmLockBox for saving error records across resets
  • PciIo Protocol for PCI config space access (LocateProtocol at 0x14220)

Consumed By

  • SMM Core - calls _ModuleEntryPoint on driver load
  • SMM SW Dispatch - calls sub_145C on SMI trigger
  • SMM CPU Protocol - calls isErrorHandlingEnabled (sub_278C) callback
  • SMM Base2 - calls destructor on SMM termination

Notes

  • The module uses a spinlock-based synchronization mechanism (sub_85FC/sub_8760) for thread safety during error handling.
  • The module supports up to 4 sockets (n4 loop < 4u throughout).
  • Socket configuration data is stored in arrays of 14944 bytes per socket at qword_14E70.
  • The byte_15220 table (28672 bytes per socket) is used for CPU topology lookup (package -> core -> thread -> unique index).
  • Error bank array stride is 216 bytes, tracked at qword_36960.
  • The module uses Lenovo-specific NVRAM for PprAddress storage (sub_27E0 at ~0x2b47).
  • The "Setup" UEFI variable (GUID {4E2CC220-057B-4D47-88CF-CDC71BA911F1} at 0x14190) controls error handling features.
  • Debug print wrapper at sub_6CB8 uses format "%a entry\n" for function trace logging when DEBGUG build enabled.
  • ASSERT implementation at sub_6D40 is a wrapper that prints "ASSERT_EFI_ERROR (Status = %r)" with file/line info.