ProcessorErrorHandler Module Analysis
Overview
SMM driver for Intel Purley (Skylake-SP Xeon) platform processor error handling. Responsible for Machine Check Architecture (MCA) error handling, memory error correction reporting, IIO (Integrated IO) error handling, VTD/ITC/OTC/DMA error logging, and Post-Package Repair (PPR) / Thermal Alert System (TAS) management. Runs as a UEFI SMM driver registered via SMM SW dispatch protocol.
Address Range
0x300 - 0x11a00 (.text segment, 232 functions)
Source Files (from debug paths)
PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/ProcessorErrorHandler.c - Main module
PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/McaHandler.c - MCA interrupt handler
PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/MemoryErrorHandler.c - Memory error handler
LenovoServerPkg/Library/LnvPurleyLib/ProcMemErrReporting/ProcMemErrReporting.c - Lenovo processor/memory error reporting
PurleyPlatPkg/Ras/Library/mpsyncdatalib/mpsyncdatalib.c - MP sync data library
PurleySktPkg/Library/emcaplatformhookslib/emcaplatformhookslib.c - eMCA platform hooks
Entry Points (Public API)
0x5F0 - _ModuleEntryPoint (ModuleEntryPoint)
Entry point called by SMM infrastructure. Initializes the module by calling sub_69C, then calls sub_34B0 to locate protocols and register SMI handlers. Sets return status in qword_14FD8.
0x67AC - McaHandler (MCA SMI Handler) (source: McaHandler.c)
Called when an SMI is triggered by a machine check. Parameters:
a1 - CpuInfo pointer (output struct)
a2 - InterruptType pointer
a3 - SystemContext pointer (EFI_SMM_CPU_REGISTER_CONTEXT)
Extracts CPU APIC ID via sub_9CD4(), reads topology via sub_9DA0(), computes a CPU index via sub_9C34(), and returns a struct with:
- offset 0: APIC ID
- offset 4: Package ID (socket)
- offset 8: Core ID
- offset 12: Thread ID
- offset 16: CPU index (unique identifier)
- offset 24: InterruptType value
- offset 32: SystemContext pointer
0x145C - SmmPeriodicDispatchHandler
Registered via (qword_368C8+16)(sub_145C, 1) meaning SMM SW dispatch with SwSmiInputValue=1. Called periodically to process pending error reports. Iterates over CPU entries in the error array at [qword_36960] (stride 216 bytes each), clears processed errors, and for multi-socket configurations triggers PPR/TAS and error logging.
Protocols Located (via qword_14CF8+208, an SmmLocateProtocol or similar)
| GUID Address |
GUID |
Stored At |
Purpose |
| 0x14180 |
{3BA7E14B-176D-4B2A-948A-C86FB001943C} |
qword_36950 |
EFI_SMM_CPU_PROTOCOL - SMM CPU services |
| 0x14300 |
{ED32D533-99E6-4209-9CC0-2D72CDD998A7} |
qword_368C8 |
SMM SW Dispatch2 Protocol |
| 0x141A0 |
{86B091ED-1463-43B5-82A1-2C8B83CB8917} |
qword_368E0 |
Configuration Protocol (provides system config data) |
| 0x14340 |
{1DBD1503-0A60-4230-AAA3-8016D8C3DE2F} |
qword_368F0 |
IOH/Iio Protocol - IIO (Integrated IO Hub) access |
| 0x141C0 |
{0067835F-9A50-433A-8CBB-852078197814} |
qword_36930 |
SmmCpuIo Protocol - SMM CPU I/O |
| 0x14350 |
{5138B5C5-9369-48EC-5B97-38A2F7096675} |
qword_36948 |
Platform Info Protocol |
| 0x14280 |
{ED32D533-99E6-4209-9CC0-2D72CDD998A7} |
qword_368C0 |
SMM BASE 2 Protocol |
| 0x14290 |
{1D202CAB-C8AB-4D5C-94F7-3CFCC0D3D335} |
unk_36958 |
SMM CPU Protocol |
| 0x142A0 |
{1DBD1503-0A60-4230-AAA3-8016D8C3DE2F} |
unk_36948 |
SmmCpuIo |
| 0x142B0 |
{5138B5C5-9369-48EC-5B97-38A2F7096675} |
unk_36940 |
SmmBase2 |
| 0x142E0 |
for SmmLockBox |
- |
SmmLockBoxCommunication config table GUID |
| 0x14220 |
{CD3D0A05-9E24-437C-A891-1EE053DB7638} |
- |
Used for BootServices->LocateProtocol in sub_27E0 |
From configuration protocol (qword_368E0):
- qword_368D8 = config pointer at offset 0 (GeneralCfg / SystemConfig struct)
- qword_368E8 = config pointer at offset 8 (PcieCfg / PerSocketCfg struct)
- qword_36938 = config pointer at offset 16 (CpuCfg / PerSocketInfo struct)
Key Functions
Initialization (ProcessorErrorHandler.c)
| Address |
Name |
Lines |
Purpose |
| 0x5F0 |
_ModuleEntryPoint |
0xA9 |
Driver entry, calls init and protocol locator |
| 0x69C |
DriverInit |
0xAC9 |
UEFI boot services init (gST, gBS, gRT, gImageHandle setup) |
| 0x34B0 |
LocateProtocolsAndRegisterHandlers |
0x407 |
Locates all required protocols, registers SMI handlers |
| 0x278C |
IsErrorHandlingEnabled |
0x51 |
Checks if error handling should be active (reads config fields) |
| 0x27E0 |
ErrorHandlerSetup |
0x422 |
Full error handler init: registers MCA SMI, enables MSRs, reads "Setup" UEFI var, handles PprAddress NVRAM, configures per-socket MCA banks |
| 0x3364 |
InitPerSocketMca |
0x14C |
Per-socket MCA initialization: enables MCG_CTL_P (LERR, RERR), sets up MCi_CTL2 for corrected errors |
| 0x5394 |
RunPprTas |
0x1D6 |
Post-Package Repair / Thermal Alert System: marks PPR/TAS busy, disables DIMMs on retired pages, clears throttled DIMMs |
| 0x2324 |
ConfigureMcBanks |
0x214 |
Configure MCA banks per socket (enables specific error types) |
Error Dispatch
| Address |
Name |
Size |
Purpose |
| 0x104D0 |
LogErrorEvent |
0x22C |
Central error logging dispatcher. Dispatches by ErrorSource type: |
|
|
|
- ErrorSource=1: calls sub_10410 |
|
|
|
- ErrorSource=3: iterates sub_110D0 callbacks |
|
|
|
- ErrorSource=4: iterates sub_111F4 callbacks |
|
|
|
- ErrorSource=6: iterates platform hooks sub_114F0 (Corrected) |
|
|
|
- ErrorSource=7: iterates platform hooks (Recoverable) |
|
|
|
- ErrorSource=8: iterates platform hooks (Fatal) |
|
|
|
- ErrorSource=9: iterates platform hooks (Uncorrected) |
| 0x3D8C |
ReportGenericErrors |
0x1AB |
Reports generic error via sub_104D0 with bitmask error type. Error types: bit0=-112(0x90), bit1=-111(0x91), bit2=-110(0x92), bit3=-109(0x93), bit4=-108(0x94), bit5=-107(0x95), bit6=-106(0x96), bit7=-105(0x97), bit8=-104(0x98), bit31=-96(0xA0) |
| 0x4CD8 |
LogProcMemError |
0x144 |
Encodes processor/memory error source (n5=0..10) into sub_104D0 format and dispatches |
MCA Handler (McaHandler.c)
| Address |
Name |
Size |
Purpose |
| 0x9C34 |
GetCpuIndex |
0x8F |
Returns CPU unique index from package/core/thread topology using lookup table byte_15220 |
| 0x9CD4 |
ReadLocalApicId |
0x6A |
Reads local APIC ID (via MSR 0x1B or CPUID) |
| 0x9DA0 |
ReadCpuTopology |
0x80 |
Reads CPU package/core/thread topology via CPUID |
| 0x9E20 |
GetCurrentSocketAndCore |
0x4C0 |
Determines current socket and core numbers |
| 0xA2EC |
GetMaxCoresPerPackage |
0x4C |
Gets maximum cores per socket from topology |
Memory Error Handler (MemoryErrorHandler.c)
| Address |
Name |
Size |
Purpose |
| 0x56AC |
LogMcBankErrors |
0x542 |
Logs MCA bank errors: iterates MC banks, decodes MCi_STATUS/MSR, determines error severity, calls error dispatch. References gMcBankList global |
| 0x5BF0 |
LogMemError |
0x638 |
Full memory error reporting: reads MC banks, decodes DIMM info (socket/channel/rank/bank/row/col), writes to lockbox, logs via platform hooks |
| 0x6228 |
DecodeDimmInfo |
0x1C8 |
Decodes DIMM location from physical address / MCA address data |
| 0x63F0 |
ReadMemErrorRegisters |
0x180 |
Reads memory controller error registers (MC ODBC, EMASK, etc.) |
| 0xE510 |
ReadMemControllerMsr |
0x54 |
Reads memory controller MSR registers (MC_ODBC, etc.) |
| Address |
Name |
Size |
Purpose |
| 0xD62C |
ProcessIioErrors |
0x84E |
IIO error processing per socket: checks for VTD, ITC, OTC, DMA errors, reads status registers, logs errors via platform hooks |
| 0xDE7C |
CheckAndReportIioErrors |
0x1D9 |
Per-socket IIO error checker: iterates IIO stacks, checks error status, triggers ProcessIioErrors |
| 0xE070 |
HandleCorrectedIioErrors |
0x49F |
Handles corrected IIO error logging per socket |
| 0xC5A8 |
ProcessDimmCorrectedErrors |
0x1F0 |
Processes DIMM corrected ECC errors: reads error registers, determines channel/dimm, logs via platform hooks |
| 0xE6A0 |
ProcessMemErrorReporting |
0x1C4 |
Main memory error reporting entry: reads system config, determines error type, calls appropriate handler |
| 0xD228 |
LogVtdErrors |
0x72 |
Logs VT-d errors (register 0x66b = VTD UNC ERR STS) |
| 0xD29C |
LogItcErrors |
0x70 |
Logs ITC errors (register 0x66b) |
| 0xD30C |
LogOtcErrors |
0x73 |
Logs OTC errors (register 0x66b) |
| 0xD380 |
LogDmaErrors |
0x2AA |
Logs DMA errors (detailed bit decoding) |
Utility / Library Functions
| Address |
Name |
Size |
Callers |
Purpose |
| 0x6C68 |
DebugAssert |
0x4F |
2 |
ASSERT implementation |
| 0x6CB8 |
DebugPrint |
0x88 |
45 |
Print debug message |
| 0x6D40 |
DebugAssertLine |
0x3E |
82 |
ASSERT with file/line |
| 0x85FC |
AcquireSpinLock |
0x34 |
7 |
Acquire spin lock for synchronization |
| 0x8630 |
SpinLockAcquireWithTimeout |
0xB3 |
6 |
Spinlock acquire with timeout (10M ns) |
| 0x8760 |
SpinLockRelease |
0x6C |
6 |
Release spin lock |
| 0x86E4 |
SpinLockAcquireInternal |
0x7A |
2 |
Internal spin lock acquire |
| 0x87CC |
InterlockedIncrement |
0x4D |
2 |
Atomic increment |
| 0x881C |
HobGetHobList |
0x82 |
2 |
Get HOB list pointer |
| 0x88F0 |
GetSystemFirmwareResource |
0x49 |
3 |
Get firmware resource descriptor from HOBs |
| 0x88A0 |
HobGetNextHob |
0x4D |
1 |
Get next HOB entry |
| 0x893C |
AllocatePool |
0x75 |
2 |
SmmAllocatePool wrapper |
| 0x89B4 |
SmmLockBoxSave |
0x33 |
2 |
Save data to SMM LockBox |
| 0x1168 |
SmmLockBoxDestructor |
0xCA |
1 |
SMM LockBox destructor, unregisters config table |
| 0xA4FC |
WritePciConfig |
- |
- |
Write PCI configuration space |
| 0xA66C |
ReadPciConfig |
- |
- |
Read PCI configuration space |
| 0xAB60 |
GetPciDeviceClass |
- |
- |
Get PCI device class code |
| 0xB534 |
LogErrorToBanks |
- |
- |
Log error to error bank array |
| 0xBD34 |
ClearErrorStatus |
- |
- |
Clear error status bits |
| 0xB62C |
SetErrorBankFlag |
- |
- |
Set error pending flag per CPU |
SMM Handler Registration
From sub_34B0 (line ~1856):
(qword_36950)(&psub_278C, 3) - Registers isErrorHandlingEnabled handler with SMM CPU protocol
(qword_368C8+16)(sub_145C, 1) - Registers periodic SW SMI dispatch handler (SwSmiInputValue=1)
- If
qword_368E8+17 == 1 && qword_368D8+10 == 1: registers with qword_36968+40 for MSR 0x6ABC (MCG_CTL or MCi_CTL2)
- If
qword_368D8+4 == 1: calls sub_3364 for per-socket MCA init
Data Structures
Error Bank Entry (stride 216 bytes at qword_36960)
Offset Size Description
0x00 1 Active flag
0x01 1 Retry flag
0x03 1 PPR/TAS flags
0x30 1 Bank valid flag
0x34 1 Error type flags (bit0=UC, bit1=CE, bit2=deferred)
0x3C 4 Error bank/register indices
0x40 4 Error status data
CpuInfo (returned by McaHandler at 0x67AC)
Offset Size Description
0x00 4 APIC ID
0x04 1 Package ID (socket)
0x08 1 Core ID
0x0C 1 Thread ID
0x10 8 CPU Index (unique)
0x18 8 InterruptType
0x20 8 SystemContext
Socket Configuration (per socket, stride 14944 bytes at qword_14E70)
Offset Size Description
0x00 2 Socket enabled bitmask
0x06 2 Core active bitmask
0x18 1 Socket present / enabled
... ... IIO stack config
System Config (at qword_368D8)
Offset Size Description
0x00 1 Error handling enable flag
0x04 1 Per-socket MCA init flag
0x0A 1 Advanced RAS enable
0x0C 1 Corrected error logging mode (0=off, 1=logged, 2=throttled)
0x0D 1 PPR/TAS enable (0=off, 1=on, 2=aggressive)
0x11 1 IIO error handling flag
Socket Feature Config (at qword_368E8)
Offset Size Description
0x10 1 MCA corrected error handling feature
0x11 1 MCA uncorrected error handling feature
0x1D 1 IIO error handling feature flag
Global Variables
| Address |
Name |
Purpose |
| 0x14CE8 |
ImageHandle |
EFI image handle (gImageHandle) |
| 0x14CD8 |
SystemTable |
EFI system table (gST) |
| 0x14CE0 |
BootServices |
EFI boot services table (gBS) |
| 0x14CF0 |
RuntimeServices |
EFI runtime services table (gRT) |
| 0x14CF8 |
Smst |
SMM system table (gSmst) |
| 0x14DD0 |
SmmCpuProtocol |
SMM CPU protocol interface |
| 0x14E60 |
IioProtocol |
IIO protocol interface |
| 0x14E70 |
SocketConfigArray |
Per-socket config array (stride 14944) |
| 0x14E78 |
IioProtocol2 |
IIO protocol interface (alternate) |
| 0x14E80 |
PcieConfig |
PCIe configuration structure |
| 0x14E88 |
GeneralConfig |
General system configuration |
| 0x14E10 |
IioStackMask |
IIO stack bitmask (which stacks are populated) |
| 0x14EE0 |
LockStruct |
Synchronization lock structure |
| 0x14FD8 |
ReturnStatus |
Module return status |
| 0x15000 |
ErrorSource |
Current error source type |
| 0x15198 |
SmmCpuProtocol2 |
Alternate SMM CPU protocol (from sub_27E0) |
| 0x151A0 |
SmmCpuRegistered |
SMM CPU registration flag |
| 0x151F8 |
BootScriptEntry |
BootScript entry pointer |
| 0x15220 |
CpuTopologyTable |
CPU topology lookup table |
| 0x31220 |
MaxCoreCount |
Maximum core count per package |
| 0x368C8 |
SwDispatchProtocol |
SMM SW Dispatch2 protocol |
| 0x368C0 |
SmmBase2Protocol |
SMM Base2 protocol |
| 0x368D0 |
SavedPprTas |
Saved PPR/TAS data pointer |
| 0x368D8 |
SystemConfig |
System configuration (GeneralCfg) |
| 0x368E0 |
ConfigProtocol |
Configuration protocol instance |
| 0x368E8 |
SocketFeatureConfig |
Per-socket feature config |
| 0x368F0 |
IioProtocolRef |
IIO protocol reference |
| 0x36900 |
psub_278C |
Registered isErrorHandlingEnabled callback |
| 0x36908 |
psub_27E0 |
Registered error handler setup callback |
| 0x36910 |
psub_2C04 |
Callback pointer |
| 0x36918 |
psub_2C10 |
Callback pointer |
| 0x36920 |
psub_3090 |
Callback pointer |
| 0x36928 |
ConfigData |
Configuration data pointer |
| 0x36930 |
CpuIoProtocol |
SmmCpuIo protocol |
| 0x36938 |
PerSocketCpuConfig |
Per-socket CPU config array |
| 0x36940 |
SpinLock |
Spin lock for synchronization |
| 0x36948 |
PlatformInfoProtocol |
Platform info protocol |
| 0x36950 |
SmmCpuServices |
SMM CPU services (SaveState, etc.) |
| 0x36960 |
ErrorBankArray |
Error bank array base (stride 216) |
| 0x36968 |
IioPciProtocol |
IIO PCI protocol |
| 0x1B008 |
LastErrorSource |
Per-bank last error source tracking |
Calling Patterns
Module Init Flow
_ModuleEntryPoint(0x5F0)
+-- sub_69C() -- init gST/gBS/gRT/ImageHandle
+-- sub_300() -- lock check
+-- sub_34B0() -- locate protocols (7 protocols)
+-- Locate protocols via SMST->SmmLocateProtocol (offset 208)
+-- Extract config from ConfigProtocol
+-- sub_85FC() -- acquire spin lock
+-- sub_88F0() -- get HOB list
+-- sub_5394() -- run PPR/TAS if needed
+-- sub_278C() -- is error handling enabled?
+-- sub_27E0() -- full error handler setup
| +-- Register SMM SW SMI handler (sub_145C)
| +-- Enable MSRs (MCG_CTL, MCi_CTL2)
| +-- Read "Setup" UEFI variable
| +-- Handle PprAddress NVRAM
+-- sub_3364() -- per-socket MCA init (if configured)
+-- sub_6F88() -- release lock
+-- sub_3E0() -- cleanup
SMI Dispatch Flow
sub_145C() (Periodic SMI handler)
+-- Iterate error bank array (216 byte stride)
+-- For each active bank:
+-- Check error type (UC/CE/deferred)
+-- sub_BD34() -- clear error status
+-- Clear bank entries
+-- If error reported and multi-socket:
+-- Check socket IIO error status
+-- sub_B534() -- log error to banks
+-- sub_A76C() -- remote CPU execution
+-- Clear per-bank tracking
MCa Handler Flow (sub_67AC)
McaHandler(CpuInfo, InterruptType, SystemContext)
+-- Validate pointers
+-- sub_9CD4() -- read local APIC ID
+-- sub_9DA0() -- read CPU topology (package/core/thread)
+-- sub_9C34() -- compute CPU index from topology table
+-- Return populated CpuInfo struct
Error Logging Flow (sub_104D0)
LogErrorEvent(ErrorSourceHeader)
+-- Switch on ErrorSource byte[0]:
1 -> sub_10410() -- CPU error
3 -> sub_110D0() list -- PCIe error
4 -> sub_111F4() list --
6 -> platform hooks CE -- Corrected
7 -> platform hooks -- Recoverable
8 -> platform hooks -- Fatal
9 -> platform hooks -- Uncorrected
Dependencies
Consumed (this module calls)
- SmiHandler via SMST->SmiHandlerRegister (offset 208) to register SMM handlers
- SmmCpuProtocol for CPU save state access (offset 112=remote CPU, 120=current CPU, 128=CPU count, 136=CPU context, 144=CPU state buffer)
- SmmSwDispatch2Protocol for register/unregister SW SMI handlers
- RuntimeServices for GetVariable/SetVariable ("Setup", "PprAddress" UEFI variables)
- BootServices for LocateProtocol
- SmmLockBox for saving error records across resets
- PciIo Protocol for PCI config space access (LocateProtocol at 0x14220)
Consumed By
- SMM Core - calls _ModuleEntryPoint on driver load
- SMM SW Dispatch - calls sub_145C on SMI trigger
- SMM CPU Protocol - calls isErrorHandlingEnabled (sub_278C) callback
- SMM Base2 - calls destructor on SMM termination
Notes
- The module uses a spinlock-based synchronization mechanism (sub_85FC/sub_8760) for thread safety during error handling.
- The module supports up to 4 sockets (n4 loop < 4u throughout).
- Socket configuration data is stored in arrays of 14944 bytes per socket at qword_14E70.
- The
byte_15220 table (28672 bytes per socket) is used for CPU topology lookup (package -> core -> thread -> unique index).
- Error bank array stride is 216 bytes, tracked at qword_36960.
- The module uses Lenovo-specific NVRAM for PprAddress storage (sub_27E0 at ~0x2b47).
- The "Setup" UEFI variable (GUID {4E2CC220-057B-4D47-88CF-CDC71BA911F1} at 0x14190) controls error handling features.
- Debug print wrapper at sub_6CB8 uses format "%a entry\n" for function trace logging when DEBGUG build enabled.
- ASSERT implementation at sub_6D40 is a wrapper that prints "ASSERT_EFI_ERROR (Status = %r)" with file/line info.