# ProcessorErrorHandler Module Analysis

## Overview

SMM driver for Intel Purley (Skylake-SP Xeon) platform processor error handling. Responsible for Machine Check Architecture (MCA) error handling, memory error correction reporting, IIO (Integrated IO) error handling, VTD/ITC/OTC/DMA error logging, and Post-Package Repair (PPR) / Thermal Alert System (TAS) management. Runs as a UEFI SMM driver registered via SMM SW dispatch protocol.

## Address Range

0x300 - 0x11a00 (.text segment, 232 functions)

## Source Files (from debug paths)

- `PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/ProcessorErrorHandler.c` - Main module
- `PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/McaHandler.c` - MCA interrupt handler
- `PurleyPlatPkg/Ras/Smm/ErrHandling/ProcessorErrorHandler/MemoryErrorHandler.c` - Memory error handler
- `LenovoServerPkg/Library/LnvPurleyLib/ProcMemErrReporting/ProcMemErrReporting.c` - Lenovo processor/memory error reporting
- `PurleyPlatPkg/Ras/Library/mpsyncdatalib/mpsyncdatalib.c` - MP sync data library
- `PurleySktPkg/Library/emcaplatformhookslib/emcaplatformhookslib.c` - eMCA platform hooks

## Entry Points (Public API)

### 0x5F0 - _ModuleEntryPoint (ModuleEntryPoint)
Entry point called by SMM infrastructure. Initializes the module by calling `sub_69C`, then calls `sub_34B0` to locate protocols and register SMI handlers. Sets return status in `qword_14FD8`.

### 0x67AC - McaHandler (MCA SMI Handler) (source: McaHandler.c)
Called when an SMI is triggered by a machine check. Parameters:
- `a1` - CpuInfo pointer (output struct)
- `a2` - InterruptType pointer
- `a3` - SystemContext pointer (EFI_SMM_CPU_REGISTER_CONTEXT)
Extracts CPU APIC ID via `sub_9CD4()`, reads topology via `sub_9DA0()`, computes a CPU index via `sub_9C34()`, and returns a struct with:
- offset 0: APIC ID
- offset 4: Package ID (socket)
- offset 8: Core ID
- offset 12: Thread ID
- offset 16: CPU index (unique identifier)
- offset 24: InterruptType value
- offset 32: SystemContext pointer

### 0x145C - SmmPeriodicDispatchHandler
Registered via `(qword_368C8+16)(sub_145C, 1)` meaning SMM SW dispatch with SwSmiInputValue=1. Called periodically to process pending error reports. Iterates over CPU entries in the error array at `[qword_36960]` (stride 216 bytes each), clears processed errors, and for multi-socket configurations triggers PPR/TAS and error logging.

## Protocols Located (via qword_14CF8+208, an SmmLocateProtocol or similar)

| GUID Address | GUID | Stored At | Purpose |
|---|---|---|---|
| 0x14180 | {3BA7E14B-176D-4B2A-948A-C86FB001943C} | qword_36950 | **EFI_SMM_CPU_PROTOCOL** - SMM CPU services |
| 0x14300 | {ED32D533-99E6-4209-9CC0-2D72CDD998A7} | qword_368C8 | **SMM SW Dispatch2 Protocol** |
| 0x141A0 | {86B091ED-1463-43B5-82A1-2C8B83CB8917} | qword_368E0 | **Configuration Protocol** (provides system config data) |
| 0x14340 | {1DBD1503-0A60-4230-AAA3-8016D8C3DE2F} | qword_368F0 | **IOH/Iio Protocol** - IIO (Integrated IO Hub) access |
| 0x141C0 | {0067835F-9A50-433A-8CBB-852078197814} | qword_36930 | **SmmCpuIo Protocol** - SMM CPU I/O |
| 0x14350 | {5138B5C5-9369-48EC-5B97-38A2F7096675} | qword_36948 | **Platform Info Protocol** |
| 0x14280 | {ED32D533-99E6-4209-9CC0-2D72CDD998A7} | qword_368C0 | **SMM BASE 2 Protocol** |
| 0x14290 | {1D202CAB-C8AB-4D5C-94F7-3CFCC0D3D335} | unk_36958 | **SMM CPU Protocol** |
| 0x142A0 | {1DBD1503-0A60-4230-AAA3-8016D8C3DE2F} | unk_36948 | **SmmCpuIo** |
| 0x142B0 | {5138B5C5-9369-48EC-5B97-38A2F7096675} | unk_36940 | **SmmBase2** |
| 0x142E0 | for SmmLockBox | - | **SmmLockBoxCommunication** config table GUID |
| 0x14220 | {CD3D0A05-9E24-437C-A891-1EE053DB7638} | - | Used for BootServices->LocateProtocol in sub_27E0 |

From configuration protocol (qword_368E0):
- **qword_368D8** = config pointer at offset 0 (GeneralCfg / SystemConfig struct)
- **qword_368E8** = config pointer at offset 8 (PcieCfg / PerSocketCfg struct)
- **qword_36938** = config pointer at offset 16 (CpuCfg / PerSocketInfo struct)

## Key Functions

### Initialization (ProcessorErrorHandler.c)

| Address | Name | Lines | Purpose |
|---|---|---|---|
| 0x5F0 | _ModuleEntryPoint | 0xA9 | Driver entry, calls init and protocol locator |
| 0x69C | DriverInit | 0xAC9 | UEFI boot services init (gST, gBS, gRT, gImageHandle setup) |
| 0x34B0 | LocateProtocolsAndRegisterHandlers | 0x407 | Locates all required protocols, registers SMI handlers |
| 0x278C | IsErrorHandlingEnabled | 0x51 | Checks if error handling should be active (reads config fields) |
| 0x27E0 | ErrorHandlerSetup | 0x422 | Full error handler init: registers MCA SMI, enables MSRs, reads "Setup" UEFI var, handles PprAddress NVRAM, configures per-socket MCA banks |
| 0x3364 | InitPerSocketMca | 0x14C | Per-socket MCA initialization: enables MCG_CTL_P (LERR, RERR), sets up MCi_CTL2 for corrected errors |
| 0x5394 | RunPprTas | 0x1D6 | Post-Package Repair / Thermal Alert System: marks PPR/TAS busy, disables DIMMs on retired pages, clears throttled DIMMs |
| 0x2324 | ConfigureMcBanks | 0x214 | Configure MCA banks per socket (enables specific error types) |

### Error Dispatch

| Address | Name | Size | Purpose |
|---|---|---|---|
| 0x104D0 | LogErrorEvent | 0x22C | **Central error logging dispatcher**. Dispatches by ErrorSource type: |
| | | | - ErrorSource=1: calls sub_10410 |
| | | | - ErrorSource=3: iterates sub_110D0 callbacks |
| | | | - ErrorSource=4: iterates sub_111F4 callbacks |
| | | | - ErrorSource=6: iterates platform hooks sub_114F0 (Corrected) |
| | | | - ErrorSource=7: iterates platform hooks (Recoverable) |
| | | | - ErrorSource=8: iterates platform hooks (Fatal) |
| | | | - ErrorSource=9: iterates platform hooks (Uncorrected) |
| 0x3D8C | ReportGenericErrors | 0x1AB | Reports generic error via sub_104D0 with bitmask error type. Error types: bit0=-112(0x90), bit1=-111(0x91), bit2=-110(0x92), bit3=-109(0x93), bit4=-108(0x94), bit5=-107(0x95), bit6=-106(0x96), bit7=-105(0x97), bit8=-104(0x98), bit31=-96(0xA0) |
| 0x4CD8 | LogProcMemError | 0x144 | Encodes processor/memory error source (n5=0..10) into sub_104D0 format and dispatches |

### MCA Handler (McaHandler.c)

| Address | Name | Size | Purpose |
|---|---|---|---|
| 0x9C34 | GetCpuIndex | 0x8F | Returns CPU unique index from package/core/thread topology using lookup table `byte_15220` |
| 0x9CD4 | ReadLocalApicId | 0x6A | Reads local APIC ID (via MSR 0x1B or CPUID) |
| 0x9DA0 | ReadCpuTopology | 0x80 | Reads CPU package/core/thread topology via CPUID |
| 0x9E20 | GetCurrentSocketAndCore | 0x4C0 | Determines current socket and core numbers |
| 0xA2EC | GetMaxCoresPerPackage | 0x4C | Gets maximum cores per socket from topology |

### Memory Error Handler (MemoryErrorHandler.c)

| Address | Name | Size | Purpose |
|---|---|---|---|
| 0x56AC | LogMcBankErrors | 0x542 | Logs MCA bank errors: iterates MC banks, decodes MCi_STATUS/MSR, determines error severity, calls error dispatch. References `gMcBankList` global |
| 0x5BF0 | LogMemError | 0x638 | Full memory error reporting: reads MC banks, decodes DIMM info (socket/channel/rank/bank/row/col), writes to lockbox, logs via platform hooks |
| 0x6228 | DecodeDimmInfo | 0x1C8 | Decodes DIMM location from physical address / MCA address data |
| 0x63F0 | ReadMemErrorRegisters | 0x180 | Reads memory controller error registers (MC ODBC, EMASK, etc.) |
| 0xE510 | ReadMemControllerMsr | 0x54 | Reads memory controller MSR registers (MC_ODBC, etc.) |

### IIO / Platform Error Handler (ProcMemErrReporting.c)

| Address | Name | Size | Purpose |
|---|---|---|---|
| 0xD62C | ProcessIioErrors | 0x84E | IIO error processing per socket: checks for VTD, ITC, OTC, DMA errors, reads status registers, logs errors via platform hooks |
| 0xDE7C | CheckAndReportIioErrors | 0x1D9 | Per-socket IIO error checker: iterates IIO stacks, checks error status, triggers ProcessIioErrors |
| 0xE070 | HandleCorrectedIioErrors | 0x49F | Handles corrected IIO error logging per socket |
| 0xC5A8 | ProcessDimmCorrectedErrors | 0x1F0 | Processes DIMM corrected ECC errors: reads error registers, determines channel/dimm, logs via platform hooks |
| 0xE6A0 | ProcessMemErrorReporting | 0x1C4 | Main memory error reporting entry: reads system config, determines error type, calls appropriate handler |
| 0xD228 | LogVtdErrors | 0x72 | Logs VT-d errors (register 0x66b = VTD UNC ERR STS) |
| 0xD29C | LogItcErrors | 0x70 | Logs ITC errors (register 0x66b) |
| 0xD30C | LogOtcErrors | 0x73 | Logs OTC errors (register 0x66b) |
| 0xD380 | LogDmaErrors | 0x2AA | Logs DMA errors (detailed bit decoding) |

### Utility / Library Functions

| Address | Name | Size | Callers | Purpose |
|---|---|---|---|---|
| 0x6C68 | DebugAssert | 0x4F | 2 | ASSERT implementation |
| 0x6CB8 | DebugPrint | 0x88 | 45 | Print debug message |
| 0x6D40 | DebugAssertLine | 0x3E | 82 | ASSERT with file/line |
| 0x85FC | AcquireSpinLock | 0x34 | 7 | Acquire spin lock for synchronization |
| 0x8630 | SpinLockAcquireWithTimeout | 0xB3 | 6 | Spinlock acquire with timeout (10M ns) |
| 0x8760 | SpinLockRelease | 0x6C | 6 | Release spin lock |
| 0x86E4 | SpinLockAcquireInternal | 0x7A | 2 | Internal spin lock acquire |
| 0x87CC | InterlockedIncrement | 0x4D | 2 | Atomic increment |
| 0x881C | HobGetHobList | 0x82 | 2 | Get HOB list pointer |
| 0x88F0 | GetSystemFirmwareResource | 0x49 | 3 | Get firmware resource descriptor from HOBs |
| 0x88A0 | HobGetNextHob | 0x4D | 1 | Get next HOB entry |
| 0x893C | AllocatePool | 0x75 | 2 | SmmAllocatePool wrapper |
| 0x89B4 | SmmLockBoxSave | 0x33 | 2 | Save data to SMM LockBox |
| 0x1168 | SmmLockBoxDestructor | 0xCA | 1 | SMM LockBox destructor, unregisters config table |
| 0xA4FC | WritePciConfig | - | - | Write PCI configuration space |
| 0xA66C | ReadPciConfig | - | - | Read PCI configuration space |
| 0xAB60 | GetPciDeviceClass | - | - | Get PCI device class code |
| 0xB534 | LogErrorToBanks | - | - | Log error to error bank array |
| 0xBD34 | ClearErrorStatus | - | - | Clear error status bits |
| 0xB62C | SetErrorBankFlag | - | - | Set error pending flag per CPU |

## SMM Handler Registration

From sub_34B0 (line ~1856):
1. `(qword_36950)(&psub_278C, 3)` - Registers isErrorHandlingEnabled handler with SMM CPU protocol
2. `(qword_368C8+16)(sub_145C, 1)` - Registers periodic SW SMI dispatch handler (SwSmiInputValue=1)
3. If `qword_368E8+17 == 1 && qword_368D8+10 == 1`: registers with `qword_36968+40` for MSR 0x6ABC (MCG_CTL or MCi_CTL2)
4. If `qword_368D8+4 == 1`: calls `sub_3364` for per-socket MCA init

## Data Structures

### Error Bank Entry (stride 216 bytes at qword_36960)
```
Offset  Size  Description
0x00    1     Active flag
0x01    1     Retry flag
0x03    1     PPR/TAS flags
0x30    1     Bank valid flag
0x34    1     Error type flags (bit0=UC, bit1=CE, bit2=deferred)
0x3C    4     Error bank/register indices
0x40    4     Error status data
```

### CpuInfo (returned by McaHandler at 0x67AC)
```
Offset  Size  Description
0x00    4     APIC ID
0x04    1     Package ID (socket)
0x08    1     Core ID
0x0C    1     Thread ID
0x10    8     CPU Index (unique)
0x18    8     InterruptType
0x20    8     SystemContext
```

### Socket Configuration (per socket, stride 14944 bytes at qword_14E70)
```
Offset  Size  Description
0x00    2     Socket enabled bitmask
0x06    2     Core active bitmask
0x18    1     Socket present / enabled
...     ...   IIO stack config
```

### System Config (at qword_368D8)
```
Offset  Size  Description
0x00    1     Error handling enable flag
0x04    1     Per-socket MCA init flag
0x0A    1     Advanced RAS enable
0x0C    1     Corrected error logging mode (0=off, 1=logged, 2=throttled)
0x0D    1     PPR/TAS enable (0=off, 1=on, 2=aggressive)
0x11    1     IIO error handling flag
```

### Socket Feature Config (at qword_368E8)
```
Offset  Size  Description
0x10    1     MCA corrected error handling feature
0x11    1     MCA uncorrected error handling feature
0x1D    1     IIO error handling feature flag
```

## Global Variables

| Address | Name | Purpose |
|---|---|---|
| 0x14CE8 | ImageHandle | EFI image handle (gImageHandle) |
| 0x14CD8 | SystemTable | EFI system table (gST) |
| 0x14CE0 | BootServices | EFI boot services table (gBS) |
| 0x14CF0 | RuntimeServices | EFI runtime services table (gRT) |
| 0x14CF8 | Smst | SMM system table (gSmst) |
| 0x14DD0 | SmmCpuProtocol | SMM CPU protocol interface |
| 0x14E60 | IioProtocol | IIO protocol interface |
| 0x14E70 | SocketConfigArray | Per-socket config array (stride 14944) |
| 0x14E78 | IioProtocol2 | IIO protocol interface (alternate) |
| 0x14E80 | PcieConfig | PCIe configuration structure |
| 0x14E88 | GeneralConfig | General system configuration |
| 0x14E10 | IioStackMask | IIO stack bitmask (which stacks are populated) |
| 0x14EE0 | LockStruct | Synchronization lock structure |
| 0x14FD8 | ReturnStatus | Module return status |
| 0x15000 | ErrorSource | Current error source type |
| 0x15198 | SmmCpuProtocol2 | Alternate SMM CPU protocol (from sub_27E0) |
| 0x151A0 | SmmCpuRegistered | SMM CPU registration flag |
| 0x151F8 | BootScriptEntry | BootScript entry pointer |
| 0x15220 | CpuTopologyTable | CPU topology lookup table |
| 0x31220 | MaxCoreCount | Maximum core count per package |
| 0x368C8 | SwDispatchProtocol | SMM SW Dispatch2 protocol |
| 0x368C0 | SmmBase2Protocol | SMM Base2 protocol |
| 0x368D0 | SavedPprTas | Saved PPR/TAS data pointer |
| 0x368D8 | SystemConfig | System configuration (GeneralCfg) |
| 0x368E0 | ConfigProtocol | Configuration protocol instance |
| 0x368E8 | SocketFeatureConfig | Per-socket feature config |
| 0x368F0 | IioProtocolRef | IIO protocol reference |
| 0x36900 | psub_278C | Registered isErrorHandlingEnabled callback |
| 0x36908 | psub_27E0 | Registered error handler setup callback |
| 0x36910 | psub_2C04 | Callback pointer |
| 0x36918 | psub_2C10 | Callback pointer |
| 0x36920 | psub_3090 | Callback pointer |
| 0x36928 | ConfigData | Configuration data pointer |
| 0x36930 | CpuIoProtocol | SmmCpuIo protocol |
| 0x36938 | PerSocketCpuConfig | Per-socket CPU config array |
| 0x36940 | SpinLock | Spin lock for synchronization |
| 0x36948 | PlatformInfoProtocol | Platform info protocol |
| 0x36950 | SmmCpuServices | SMM CPU services (SaveState, etc.) |
| 0x36960 | ErrorBankArray | Error bank array base (stride 216) |
| 0x36968 | IioPciProtocol | IIO PCI protocol |
| 0x1B008 | LastErrorSource | Per-bank last error source tracking |

## Calling Patterns

### Module Init Flow
```
_ModuleEntryPoint(0x5F0)
  +-- sub_69C()          -- init gST/gBS/gRT/ImageHandle
  +-- sub_300()          -- lock check
  +-- sub_34B0()         -- locate protocols (7 protocols)
  |     +-- Locate protocols via SMST->SmmLocateProtocol (offset 208)
  |     +-- Extract config from ConfigProtocol
  |     +-- sub_85FC()   -- acquire spin lock
  |     +-- sub_88F0()   -- get HOB list
  |     +-- sub_5394()   -- run PPR/TAS if needed
  |     +-- sub_278C()   -- is error handling enabled?
  |     +-- sub_27E0()   -- full error handler setup
  |     |     +-- Register SMM SW SMI handler (sub_145C)
  |     |     +-- Enable MSRs (MCG_CTL, MCi_CTL2)
  |     |     +-- Read "Setup" UEFI variable
  |     |     +-- Handle PprAddress NVRAM
  |     +-- sub_3364()   -- per-socket MCA init (if configured)
  +-- sub_6F88()         -- release lock
  +-- sub_3E0()          -- cleanup
```

### SMI Dispatch Flow
```
sub_145C() (Periodic SMI handler)
  +-- Iterate error bank array (216 byte stride)
  +-- For each active bank:
  |     +-- Check error type (UC/CE/deferred)
  |     +-- sub_BD34()   -- clear error status
  |     +-- Clear bank entries
  +-- If error reported and multi-socket:
  |     +-- Check socket IIO error status
  |     +-- sub_B534()   -- log error to banks
  |     +-- sub_A76C()   -- remote CPU execution
  +-- Clear per-bank tracking
```

### MCa Handler Flow (sub_67AC)
```
McaHandler(CpuInfo, InterruptType, SystemContext)
  +-- Validate pointers
  +-- sub_9CD4()         -- read local APIC ID
  +-- sub_9DA0()         -- read CPU topology (package/core/thread)
  +-- sub_9C34()         -- compute CPU index from topology table
  +-- Return populated CpuInfo struct
```

### Error Logging Flow (sub_104D0)
```
LogErrorEvent(ErrorSourceHeader)
  +-- Switch on ErrorSource byte[0]:
      1 -> sub_10410()        -- CPU error
      3 -> sub_110D0() list   -- PCIe error
      4 -> sub_111F4() list   -- 
      6 -> platform hooks CE  -- Corrected
      7 -> platform hooks     -- Recoverable  
      8 -> platform hooks     -- Fatal
      9 -> platform hooks     -- Uncorrected
```

## Dependencies

### Consumed (this module calls)
- **SmiHandler** via SMST->SmiHandlerRegister (offset 208) to register SMM handlers
- **SmmCpuProtocol** for CPU save state access (offset 112=remote CPU, 120=current CPU, 128=CPU count, 136=CPU context, 144=CPU state buffer)
- **SmmSwDispatch2Protocol** for register/unregister SW SMI handlers
- **RuntimeServices** for GetVariable/SetVariable ("Setup", "PprAddress" UEFI variables)
- **BootServices** for LocateProtocol
- **SmmLockBox** for saving error records across resets
- **PciIo Protocol** for PCI config space access (LocateProtocol at 0x14220)

### Consumed By
- **SMM Core** - calls _ModuleEntryPoint on driver load
- **SMM SW Dispatch** - calls sub_145C on SMI trigger
- **SMM CPU Protocol** - calls isErrorHandlingEnabled (sub_278C) callback
- **SMM Base2** - calls destructor on SMM termination

## Notes

- The module uses a spinlock-based synchronization mechanism (sub_85FC/sub_8760) for thread safety during error handling.
- The module supports up to 4 sockets (n4 loop < 4u throughout).
- Socket configuration data is stored in arrays of 14944 bytes per socket at qword_14E70.
- The `byte_15220` table (28672 bytes per socket) is used for CPU topology lookup (package -> core -> thread -> unique index).
- Error bank array stride is 216 bytes, tracked at qword_36960.
- The module uses Lenovo-specific NVRAM for PprAddress storage (sub_27E0 at ~0x2b47).
- The "Setup" UEFI variable (GUID {4E2CC220-057B-4D47-88CF-CDC71BA911F1} at 0x14190) controls error handling features.
- Debug print wrapper at sub_6CB8 uses format "%a entry\n" for function trace logging when DEBGUG build enabled.
- ASSERT implementation at sub_6D40 is a wrapper that prints "ASSERT_EFI_ERROR (Status = %r)" with file/line info.