Newer
Older
AMI-Aptio-BIOS-Reversed / PurleyPlatPkg / Ras / Smm / ErrHandling / AmiErrorHandlerMain / AmiErrorHandlerMain_analysis.md
@Ajax Dong Ajax Dong 2 days ago 14 KB Full restructure

AmiErrorHandlerMain Module

Overview

SMM error handler for Intel Purley (Xeon Scalable) platform. Receives error notifications via SMM communication protocol, classifies them by error source type (1-9), and dispatches to platform-specific handlers that log/report errors to BMC via IPMI/SMM communicate. Manages CPU topology tracking (socket/core/thread) for error source correlation.

Address Range

0x300 - 0x4880 (97 functions)

Module Type

UEFI SMM Driver (SMM_DISPATCH)

Key Functions

Address Name Purpose
0x5f8 ModuleEntryPoint Driver entry: init libraries, register SMI handler
0x3e24 sub_3E24 AutoGen init: calls 18 sub-init functions in sequence
0x42ec sub_42EC Main init: SMM protocol registration
0x4680 sub_4680 SMM SwDispatch registration + SMI handler install
0x4364 sub_4364 SMI dispatch callback: demux by GUID
0x27e4 sub_27E4 Core error dispatch: switch(error_source) {1..9}
0x1500 sub_1500 MP sync data init: CPU topology tracking
0x2f2c sub_2F2C Report CSR info to BMC via IPMI
0x3628 sub_3628 Error source 1 sub-handler (MCA/MC bank)
0x33dc sub_33DC Error source 3 sub-handler
0x3500 sub_3500 Error source 4 sub-handler
0x37fc sub_37FC Error source 6-9 inner handler (generic CSR reporting)
0x31f0 sub_31F0 Error source 1 additional handler (CSR dump)
0x339c sub_339C SMM Communicate wrapper (sends data to SMM core)
0x32b8 sub_32B8 Error record builder (formats error into SMM comm buffer)
0x1ca0 sub_1CA0 EMCA platform hooks init (gMcBankList, processor info)
0x1be4 sub_1BE4 MC bank list population based on CPU model
0x4808 sub_4808 SMI handler registration (SmmSwDispatch2)
0x4854 sub_4854 SMI handler callback: clear SMI status bit
0xc68 sub_C68 SMM memory allocation init (SmmAccessProtocol)
0x2724 sub_2724 Error source 1 handler: severity classification
0x6e8 sub_6E8 gBS/gST/ImageHandle init
0x784 sub_784 gRT init
0x7c0 sub_7C0 gSmst init via SmmBase2Protocol

Entry Points (Public API)

  • 0x5f8 ModuleEntryPoint: Standard UEFI driver entry. Calls sub_3E24 (AutoGen), then sub_42EC (SMM init).

  • 0x4364 sub_4364: SMI dispatch callback registered via SmmSwDispatch2. Entry receives a context buffer. Reads error type from buffer offset+12, copies payload data, and dispatches to sub_27E4. Handles 3 protocol GUIDs (unk_5C80, unk_5C90, unk_5CA0).

  • 0x4808 sub_4808: Called during sub_4680 init if a hardware flag is set. Registers sub_4854 as an SMI handler for all CPUs.

  • 0x4854 sub_4854: SMI handler that clears bit 0 of SMI status register at I/O port 0x790 (1936 dec).

Internal Helpers

Init Sequence (called from sub_3E24 in order):

  1. 0x6e8 - Init gImageHandle, gST, gBS globals
  2. 0x784 - Init gRT (Runtime Services Table)
  3. 0x7c0 - Init gSmst via SmmBase2Protocol (GUID F4CCBFB7-F6E0-47FD-9dd410a8-f150c191)
  4. 0xc68 - Init SMM memory ranges via SmmAccessProtocol
  5. 0xee4 - (thunk to protocol init)
  6. 0xfc0 - Additional protocol init
  7. 0x1158 - Protocol init
  8. 0x1168 - Protocol init
  9. 0x1204 - Protocol init
  10. 0x1500 - MP sync data init (CPU topology)
  11. 0x1ca0 - EMCA platform hooks init (GUID: 6820ABD4-A292-4817-9147d91d-c8c53542)
  12. 0x2520 - Protocol init
  13. 0x263c - Protocol init

Core Error Flow (sub_27E4):

SMI dispatch (sub_4364) -> sub_27E4 (error dispatch)

  +-- Error Source 1: sub_2724 (severity classify) -> funcs_27C2[] chain
     chain starts with sub_3628 (MCA error detail) -> sub_31F0 (CSR dump to BMC)

  +-- Error Source 3: sub_33DC (check lock status) -> funcs_29FA[] chain

  +-- Error Source 4: sub_3500 (PCIe error) -> funcs_29AA[] chain

  +-- Error Source 6: sub_37FC loop (generic handler)
  +-- Error Source 7: sub_37FC loop
  +-- Error Source 8: sub_37FC loop
  +-- Error Source 9: sub_37FC loop

State Management

Global Variables (.data segment 0x5B80-0x27780)

Address Name Size Purpose
0x6618 qword_6618 8 Return status for ModuleEntryPoint
0x6620 SystemTable 8 EFI_SYSTEM_TABLE pointer
0x6628 BootServices 8 EFI_BOOT_SERVICES pointer
0x6630 qword_6630 8 ImageHandle
0x6638 qword_6638 8 gRT (Runtime Services) pointer
0x6640 qword_6640 8 gSmst (SMM Services Table) pointer
0x6668 qword_6668 8 mSmst_syncdata (SMM sync data protocol)
0x6670 qword_6670 8 SMM MpSync protocol interface
0x6678 qword_6678 8 MP sync data access protocol
0x6680 unk_6680 ? Protocol buffer (guid 1D202CAB-C8AB-4D5C)
0x6690 qword_6690 8 EMCA platform protocol interface
0x6698 qword_6698 8 EMCA platform MP protocol
0x66a0 qword_66A0 8 Processor info protocol
0x66a8 qword_66A8 8 McBank info protocol
0x66b0 qword_66B0 8 gMcBankList pointer (MC bank list)
0x66b8 n6 4 Number of socket bits
0x66c0 qword_66C0 8 McBank table for current CPU type
0x66c8 qword_66C8 8 McBank register map
0x66d0 n2 4 Number of core bits
0x66f8 qword_66F8 8 Cached protocol for error reporting
0x6708 qword_6708 8 SmmSwDispatch2 protocol
0x6710 qword_6710 8 SmmCommunication protocol
0x6720 unk_6720 ? Error record buffer
0x6730 n3 1 Current error source type
0x6731 byte_6731 1 Current error severity
0x6740 unk_6740 ? Per-CPU state arrays
0x6748 byte_6748 ? Per-CPU active flags (stride 40)
0xbf60 qword_BF60 ? CPU index lookup by socket/core/thread
0xb758 dword_B758 4 Thread mask
0xb760 byte_B760 ? Thread active flags
0xff60 dword_FF60 ? APIC ID map
0x11f60 byte_11F60 ? CPU state flags
0x27760 dword_27760 4 Socket mask (1 << n_socket_bits)
0x27768 qword_27768 8 Number of SMRAM descriptors
0x27770 qword_27770 8 SMRAM ranges buffer

MC Bank Tables (selected by CPU model):

  • 0x5CB0 - Generic MC bank name table
  • 0x5CC0 - MC bank info for CPUs with n4=5..6
  • 0x5F40 - MC bank data for CPUs with n4=3..4
  • 0x61C0 - MC bank register map

Data Structures

Error Record Buffer Layout

The error record structure (passed as p_n3 / int * in sub_27E4) has the following known offsets:

Offset Size Field Description
+0 1 ErrorSource Error source type (1-9)
+4 1 ErrorSeverity Error severity level
+5 1 field_5 Secondary severity component
+6 1 field_6 Tertiary field
+7 1 field_7 Quaternary field
+12 1 ErrorData Error source discriminator (compared by dispatch)
+13 4 field_13 Error class/type information
+15 1 field_15 Sub-type field
+36 1 Thread Thread number (field_36)
+37 1 Core Core number (field_37)
+39 1 field_39 Additional topology info
+40 1 field_40 Socket/Node number
+41 1 field_41 Additional identifier
+53 2 field_53 Status flags (bit 14: 0x4000 = uncorrected)
+86 4 field_86 Error sub-class for source 3
+102 8 field_102 Extended status flags
+144 16 GUID The handler GUID that identifies error source
+157 4 field_157 Address/register value
+200 N Payload Variable-length error data
+206 4 field_206 Error type indicator (used for source 4)
+213 4 field_213 CSR register value for report
+217 4 field_217 Additional CSR register value
+221 4 field_221 Yet another register value
+233 1 field_233 Classified severity for source 1
+234 2 field_234 Additional status

Error Source Dispatch Table

+--------+----------+--------------------------------------------------+
 Source | Handler  | Description                                      |
+--------+----------+--------------------------------------------------+
 1      | sub_2724 | MCA (Machine Check Architecture) errors          |
        | + chain  | sub_3628 -> sub_31F0 (MCA detail -> CSR dump)    |
 3      | sub_33DC | Corrected/Uncorrected MCA errors                 |
        | + chain  | (funcs_29FA chain, same function list)           |
 4      | sub_3500 | PCIe errors (AER)                                |
        | + chain  | (funcs_29AA chain)                               |
 6      | sub_37FC | Generic CSR errors                               |
 7      | sub_37FC | Generic bus errors                                |
 8      | sub_37FC | Generic I/O errors                                |
 9      | sub_37FC | Generic memory errors                             |
+--------+----------+--------------------------------------------------+

Severity Classification (Error Source 1, sub_2724)

field_233 Meaning
1 Corrected (no action needed)
2 Uncorrected non-fatal
3 Fatal (when severity=2)
4 Uncorrected with deferred flag

Protocol GUIDs

Address GUID Protocol
0x5B80 7739F24C-93D7-11D4-9a3a0090-273fc14d EFI_SMM_BASE2_PROTOCOL
0x5B90 05AD34BA-6F02-4214-952e4da0-398e2bb9 Unknown SMM protocol
0x5BA0 F4CCBFB7-F6E0-47FD-9dd410a8-f150c191 EFI_SMM_SYSTEM_TABLE2_PROTOCOL
0x5BB0 6820ABD4-A292-4817-9147d91d-c8c53542 AMI SMM MP_SYNC_DATA protocol
0x5BC0 86B091ED-1463-43B5-82a12c8b-83cb8917 AMI EMCA_PLATFORM_HOOKS protocol
0x5C00 C2702B74-800C-4131-87468fb5-b89ce4ac EFI_SMM_ACCESS2_PROTOCOL
0x5C20 A7CED760-C71C-4E1A-acb18960-4d5216cb EFI_SMM_MP_PROTOCOL
0x5C30 0067835F-9A50-433A-8cbb8520-78197814 EFI_SMM_CPU_IO2_PROTOCOL
0x5C40 1D202CAB-C8AB-4D5C-94f73cfc-c0d3d335 EFI_SMM_MP_SYNC_DATA_PROTOCOL
0x5C60 1DBD1503-0A60-4230-aaa38016-d8c3de2f AMI SMM_CSR_REPORT_PROTOCOL
0x5C70 4E2CC220-057B-4D47-88cfcdc7-1ba911f1 EFI_SMM_VARIABLE_PROTOCOL
0x5C80 A5BC1114-6F64-4EDE-b8633e83-ed7c83b1 Error source 1 handler GUID
0x5C90 D995E954-BBC1-430F-ad91b44d-cb3c6f35 Error source 3 handler GUID
0x5CA0 9876CCAD-47B4-4BDB-b65e16f1-93c4f3db Error source 4 handler GUID

Calling Patterns

Module Initialization Flow:

ModuleEntryPoint(0x5f8)

  +-> sub_3E24 (AutoGen init)
     +-> sub_6E8    -> gBS/gST init
     +-> sub_784    -> gRT init
     +-> sub_7C0    -> gSmst init (SmmBase2)
     +-> sub_C68    -> SMRAM ranges
     +-> sub_EE4    -> [thunk protocol]
     +-> sub_FC0    -> [protocol]
     +-> sub_1158   -> [protocol]
     +-> sub_1168   -> [protocol]
     +-> sub_1204   -> [protocol]
     +-> sub_1500   -> MP sync data init
     +-> sub_1CA0   -> EMCA platform hooks
     +-> sub_2520   -> [protocol]
     +-> sub_263C   -> [protocol]

  +-> sub_42EC (SMM registration)
        +-> sub_300    -> debug check
        +-> sub_4680   -> SMM protocol registration
              +-> SmmSwDispatch2.Register()
              +-> SmmCommunication.Register()
              +-> sub_4808 (if flag set)
                    +-> SmmSwDispatch2->Register(sub_4854)

Error Handling Flow:

SMM Software SMI (sub_4364)

  +-> Compare GUID at offset+144
     +-- A5BC1114-... -> error source 1 (MCA)
     +-- D995E954-... -> error source 3 (corrected MCA)
     +-- 9876CCAD-... -> error source 4 (PCIe)

  +-> sub_27E4 (error dispatch)

        +-> Log: "ErrorSource: %d, ErrorSeverity: %d"

        +-> switch(error_source):
             case 1: sub_2724 -> severity classify
                        -> funcs_27C2 chain (sub_3628 -> sub_31F0)
             case 3: sub_33DC -> check lock status
                        -> funcs_29FA chain
             case 4: sub_3500 -> PCIe error
                        -> funcs_29AA chain
             case 6-9: sub_37FC loop -> generic CSR report
               (extracts socket/bus/device/function from record)
               -> sub_2F2C (report to BMC via IPMI)

CSR Reporting to BMC (sub_2F2C):

sub_2F2C(socket, bus, dev, func, reg, value)

  +-> Format record (12 bytes):
     [0]: socket
     [1]: bus  
     [2]: dev
     [3]: func
     [4-5]: reg (16-bit)
     [6]: 0x40 (command=64)
     [7-11]: value (32-bit)

  +-> SmmCommunication->Communicate(guid_5C60, cmd=0x2E, data_len=122)
     -> "LnvReportCsrInfoToBmc: Report the Status fail"  (on error)

  +-> DEBUG: "LnvReportCsrInfoToBmc: Socket:%x Bus:%x/%x/%x-%x: 0x%08x"
           (socket, bus, dev, func, reg, value)

Dependencies

Consumed (this module calls protocols)

  • gBS->LocateProtocol (BootServices + 320): Used to locate SMM protocols
  • SmmBase2Protocol GUID=F4CCBFB7-...: To get gSmst
  • gSmst (SmmServicesTable): SmmSwDispatch2->Register, SmmCommunication->Communicate, protocol locate via gSmst+208
  • EFI_SMM_ACCESS2_PROTOCOL GUID=C2702B74-...: To query SMRAM ranges
  • EFI_SMM_MP_PROTOCOL GUID=A7CED760-...: For multi-processor sync
  • EFI_SMM_CPU_IO2_PROTOCOL GUID=0067835F-...: CPU I/O access
  • SMM CSR Report Protocol GUID=1DBD1503-...: Send error reports to BMC
  • EMCA Platform Hooks GUID=86B091ED-...: Platform-specific error handling
  • SMM Variable Protocol GUID=4E2CC220-...: Variable services

Consumed By (other modules call this)

  • SMM Core: Calls sub_4364 via SmmSwDispatch2 registered handler
  • BMC/IPMI subsystem: Receives error reports via SmmCommunication from sub_2F2C

Notes

  1. The module uses SMM Software SMI (SwSmi) for dispatch rather than hardware SMIs. The SMI handler at 0x4364 clears its SMI status bit at port 0x790 after processing.

  2. Error severity string "LnvReportCsrInfoToBmc" suggests Lenovo customization for BMC CSR (Configuration Status Register) reporting.

  3. The MP sync data at 0x1500 creates a 3D topology mapping (socket x core x thread) indexed by APIC ID, with max 4 sockets, 448 cores per socket, 64 threads per core.

  4. The function pointer tables (funcs_27C2, funcs_29AA, funcs_29FA, qword_5408) are sparse arrays terminated by NULL. Only the first entry is populated in each table.

  5. MC bank list selection (sub_1BE4) branches on CPU model byte at qword_66A8+1782:

    • n4=3..4: Uses MC bank table at 0x5F40, 6 socket bits, 2 core bits
    • n4=5..6: Uses MC bank table at 0x5CC0, 6 socket bits, 2 core bits
  6. The sub_300 function at 0x300 is a debug/release build check (checked via sub_C40/C4C/C58 chain).