[v2,1/3] ghes_edac: unify memory error report format with cper

From: Documentation/process/submitting-patches.rst

The changes are to:

- add device info into ghes_edac
- change bit_pos to bit_position, col to column, requestorID to
  requestor_id, etc in ghes_edac
- move requestor_id, responder_id, target_id and chip_id into memory error
  location in ghes_edac
- add "DIMM location: not present." for DIMM location in ghes_edac
- remove the 'space' delimiter after the colon in ghes_edac and cper

The original EDAC and cper error log are as follows (all Validation Bits
are enabled):

[31940.060454] EDAC MC0: 1 CE Single-symbol ChipKill ECC on unknown memory (node:0 card:0 module:0 rank:0 bank:257 bank_group:1 bank_address:1 row:75492 col:8 bit_pos:0 DIMM DMI handle: 0x0000 chipID: 0 page:0x93724c offset:0x20 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:0 bank:257 bank_group:1 bank_address:1 row:75492 col:8 bit_pos:0 DIMM DMI handle: 0x0000 chipID: 0 status(0x0000000000000000): reserved requestorID: 0x0000000000000000 responderID: 0x0000000000000000 targetID: 0x0000000000000000)
[31940.060459] {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[31940.060460] {3}[Hardware Error]: It has been corrected by h/w and requires no further action
[31940.060462] {3}[Hardware Error]: event severity: corrected
[31940.060463] {3}[Hardware Error]:  Error 0, type: corrected
[31940.060464] {3}[Hardware Error]:   section_type: memory error
[31940.060465] {3}[Hardware Error]:   error_status: 0x0000000000000000
[31940.060466] {3}[Hardware Error]:   physical_address: 0x000000093724c020
[31940.060466] {3}[Hardware Error]:   physical_address_mask: 0x0000000000000000
[31940.060469] {3}[Hardware Error]:   node: 0 card: 0 module: 0 rank: 0 bank: 257 bank_group: 1 bank_address: 1 device: 0 row: 75492 column: 8 bit_position: 0 requestor_id: 0x0000000000000000 responder_id: 0x0000000000000000
[31940.060470] {3}[Hardware Error]:   error_type: 4, single-symbol chipkill ECC
[31940.060471] {3}[Hardware Error]:   DIMM location: not present. DMI handle: 0x0000

Now, the EDAC and cper error log are properly reporting the error as
follows (all Validation Bits are enabled):

[  117.973657] EDAC MC0: 1 CE Single-symbol ChipKill ECC on 0x0000 (node:0 card:0 module:0 rank:0 bank:1026 bank_group:4 bank_address:2 device:0 row:6749 column:8 bit_position:0 requestor_id:0x0000000000000000 responder_id:0x0000000000000000 target_id:0x0000000000000000 chip_id:0 DIMM location:not present. DIMM DMI handle:0x0000 page:0x8d2ef4 offset:0x20 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:0 bank:1026 bank_group:4 bank_address:2 device:0 row:6749 column:8 bit_position:0 requestor_id:0x0000000000000000 responder_id:0x0000000000000000 target_id:0x0000000000000000 chip_id:0 DIMM location:not present. DIMM DMI handle:0x0000 status(0x0000000000000000):reserved)
[  117.973663] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[  117.973664] {2}[Hardware Error]: It has been corrected by h/w and requires no further action
[  117.973665] {2}[Hardware Error]: event severity: corrected
[  117.973666] {2}[Hardware Error]:  Error 0, type: corrected
[  117.973667] {2}[Hardware Error]:   section_type: memory error
[  117.973668] {2}[Hardware Error]:   error_status: 0x0000000000000000
[  117.973669] {2}[Hardware Error]:   physical_address: 0x00000008d2ef4020
[  117.973670] {2}[Hardware Error]:   physical_address_mask: 0x0000000000000000
[  117.973672] {2}[Hardware Error]:   node:0 card:0 module:0 rank:0 bank:1026 bank_group:4 bank_address:2 device:0 row:6749 column:8 bit_position:0 requestor_id:0x0000000000000000 responder_id:0x0000000000000000 target_id:0x0000000000000000 chip_id:0
[  117.973673] {2}[Hardware Error]:   error_type: 4, single-symbol chipkill ECC
[  117.973674] {2}[Hardware Error]:   DIMM location: not present. DMI handle:0x0000

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/edac/ghes_edac.c    | 35 +++++++++++++++++++----------------
 drivers/firmware/efi/cper.c | 34 +++++++++++++++++-----------------
 2 files changed, 36 insertions(+), 33 deletions(-)

Message ID	20211210134019.28536-2-xueshuai@linux.alibaba.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-edac-owner@kernel.org> From: Shuai Xue <xueshuai@linux.alibaba.com> To: mchehab@kernel.org, bp@alien8.de, tony.luck@intel.com, james.morse@arm.com, rric@kernel.org, ardb@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org Cc: xueshuai@linux.alibaba.com, zhangliguang@linux.alibaba.com, zhuo.song@linux.alibaba.com Subject: [PATCH v2 1/3] ghes_edac: unify memory error report format with cper Date: Fri, 10 Dec 2021 21:40:17 +0800 Message-Id: <20211210134019.28536-2-xueshuai@linux.alibaba.com> In-Reply-To: <20211210134019.28536-1-xueshuai@linux.alibaba.com> References: <20211210134019.28536-1-xueshuai@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ghes_edac: refactor memory error reporting to avoid code duplication \| expand [v2,0/3] ghes_edac: refactor memory error reporting to avoid code duplication [v2,1/3] ghes_edac: unify memory error report format with cper [v2,2/3] ghes_edac: refactor memory error location processing [v2,3/3] ghes_edac: refactor error status fields decoding

[v2,1/3] ghes_edac: unify memory error report format with cper

Commit Message

Comments

Patch