diff mbox series

[v2,1/2] edac,ghes,cper: Add Row Extension to Memory Error Record

Message ID 20200819143544.155096-2-alex.kluver@hpe.com (mailing list archive)
State New, archived
Headers show
Series UEFI v2.8 Memory Error Record Updates | expand

Commit Message

Alex Kluver Aug. 19, 2020, 2:35 p.m. UTC
Memory errors could be printed with incorrect row values since the DIMM
size has outgrown the 16 bit row field in the CPER structure. UEFI
Specification Version 2.8 has increased the size of row by allowing it to
use the first 2 bits from a previously reserved space within the structure.

When needed, add the extension bits to the row value printed.

Based on UEFI 2.8 Table 299. Memory Error Record

Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Tested-by: Russ Anderson <russ.anderson@hpe.com>
Signed-off-by: Alex Kluver <alex.kluver@hpe.com>
---

v1 -> v2:
   * Add static inline cper_get_mem_extension() to make it 
    more readable, as suggested by Borislav Petkov.

   * Add second patch for bank field, bank group, and chip id.

---
 drivers/edac/ghes_edac.c    |  8 ++++++--
 drivers/firmware/efi/cper.c |  9 +++++++--
 include/linux/cper.h        | 16 ++++++++++++++--
 3 files changed, 27 insertions(+), 6 deletions(-)

Comments

Borislav Petkov Sept. 15, 2020, 4:33 p.m. UTC | #1
On Wed, Aug 19, 2020 at 09:35:43AM -0500, Alex Kluver wrote:
> Memory errors could be printed with incorrect row values since the DIMM
> size has outgrown the 16 bit row field in the CPER structure. UEFI
> Specification Version 2.8 has increased the size of row by allowing it to
> use the first 2 bits from a previously reserved space within the structure.
> 
> When needed, add the extension bits to the row value printed.
> 
> Based on UEFI 2.8 Table 299. Memory Error Record
> 
> Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com>
> Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
> Tested-by: Russ Anderson <russ.anderson@hpe.com>
> Signed-off-by: Alex Kluver <alex.kluver@hpe.com>
> ---
> 
> v1 -> v2:
>    * Add static inline cper_get_mem_extension() to make it 
>     more readable, as suggested by Borislav Petkov.
> 
>    * Add second patch for bank field, bank group, and chip id.
> 
> ---
>  drivers/edac/ghes_edac.c    |  8 ++++++--
>  drivers/firmware/efi/cper.c |  9 +++++++--
>  include/linux/cper.h        | 16 ++++++++++++++--
>  3 files changed, 27 insertions(+), 6 deletions(-)

For the EDAC bits:

Acked-by: Borislav Petkov <bp@suse.de>

Also, I could take both through the EDAC tree, if people prefer.
Ard Biesheuvel Sept. 15, 2020, 5:07 p.m. UTC | #2
On Tue, 15 Sep 2020 at 19:33, Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Aug 19, 2020 at 09:35:43AM -0500, Alex Kluver wrote:
> > Memory errors could be printed with incorrect row values since the DIMM
> > size has outgrown the 16 bit row field in the CPER structure. UEFI
> > Specification Version 2.8 has increased the size of row by allowing it to
> > use the first 2 bits from a previously reserved space within the structure.
> >
> > When needed, add the extension bits to the row value printed.
> >
> > Based on UEFI 2.8 Table 299. Memory Error Record
> >
> > Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com>
> > Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
> > Tested-by: Russ Anderson <russ.anderson@hpe.com>
> > Signed-off-by: Alex Kluver <alex.kluver@hpe.com>
> > ---
> >
> > v1 -> v2:
> >    * Add static inline cper_get_mem_extension() to make it
> >     more readable, as suggested by Borislav Petkov.
> >
> >    * Add second patch for bank field, bank group, and chip id.
> >
> > ---
> >  drivers/edac/ghes_edac.c    |  8 ++++++--
> >  drivers/firmware/efi/cper.c |  9 +++++++--
> >  include/linux/cper.h        | 16 ++++++++++++++--
> >  3 files changed, 27 insertions(+), 6 deletions(-)
>
> For the EDAC bits:
>
> Acked-by: Borislav Petkov <bp@suse.de>
>
> Also, I could take both through the EDAC tree, if people prefer.
>

I'll take this via the EFI tree - I was just preparing the branch for
a PR anyways.
Ard Biesheuvel Sept. 15, 2020, 5:12 p.m. UTC | #3
On Tue, 15 Sep 2020 at 20:07, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 15 Sep 2020 at 19:33, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Wed, Aug 19, 2020 at 09:35:43AM -0500, Alex Kluver wrote:
> > > Memory errors could be printed with incorrect row values since the DIMM
> > > size has outgrown the 16 bit row field in the CPER structure. UEFI
> > > Specification Version 2.8 has increased the size of row by allowing it to
> > > use the first 2 bits from a previously reserved space within the structure.
> > >
> > > When needed, add the extension bits to the row value printed.
> > >
> > > Based on UEFI 2.8 Table 299. Memory Error Record
> > >
> > > Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com>
> > > Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
> > > Tested-by: Russ Anderson <russ.anderson@hpe.com>
> > > Signed-off-by: Alex Kluver <alex.kluver@hpe.com>
> > > ---
> > >
> > > v1 -> v2:
> > >    * Add static inline cper_get_mem_extension() to make it
> > >     more readable, as suggested by Borislav Petkov.
> > >
> > >    * Add second patch for bank field, bank group, and chip id.
> > >
> > > ---
> > >  drivers/edac/ghes_edac.c    |  8 ++++++--
> > >  drivers/firmware/efi/cper.c |  9 +++++++--
> > >  include/linux/cper.h        | 16 ++++++++++++++--
> > >  3 files changed, 27 insertions(+), 6 deletions(-)
> >
> > For the EDAC bits:
> >
> > Acked-by: Borislav Petkov <bp@suse.de>
> >
> > Also, I could take both through the EDAC tree, if people prefer.
> >
>
> I'll take this via the EFI tree - I was just preparing the branch for
> a PR anyways.

Alex - these patches do not apply cleanly. Could you please respin
them on top of the next branch in
https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git?

Boris - do you anticipate any conflicts? If so, please take these via
the EDAC tree - the CPER code is mostly self contained so I don't
expect any conflicts with the EFI tree in that case.
Borislav Petkov Sept. 15, 2020, 5:19 p.m. UTC | #4
On Tue, Sep 15, 2020 at 08:12:31PM +0300, Ard Biesheuvel wrote:
> Boris - do you anticipate any conflicts? If so, please take these via
> the EDAC tree - the CPER code is mostly self contained so I don't
> expect any conflicts with the EFI tree in that case.

None so far, and I applied them for testing ontop of my EDAC queue for
5.10 so it should be all good. But if you want me, I can test-merge your
branch once ready, just in case...
Ard Biesheuvel Sept. 16, 2020, 1:09 p.m. UTC | #5
On Tue, 15 Sep 2020 at 20:19, Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Sep 15, 2020 at 08:12:31PM +0300, Ard Biesheuvel wrote:
> > Boris - do you anticipate any conflicts? If so, please take these via
> > the EDAC tree - the CPER code is mostly self contained so I don't
> > expect any conflicts with the EFI tree in that case.
>
> None so far, and I applied them for testing ontop of my EDAC queue for
> 5.10 so it should be all good. But if you want me, I can test-merge your
> branch once ready, just in case...
>

I managed to apply these patches by using a different base and
cherrypicking them into efi/next

I expect to send out a couple of PRs tomorrow, once the bots have had
a go at building the branches. In the meantime, you can take a look at

git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git next
Borislav Petkov Sept. 16, 2020, 6:10 p.m. UTC | #6
On Wed, Sep 16, 2020 at 04:09:36PM +0300, Ard Biesheuvel wrote:
> git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git next

Looks good and no conflicts, builds fine too.

[boris@zn: ~/kernel/linux> git fetch efi
remote: Enumerating objects: 85, done.
remote: Counting objects: 100% (85/85), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 131 (delta 71), reused 85 (delta 71), pack-reused 46
Receiving objects: 100% (131/131), 113.14 KiB | 1.69 MiB/s, done.
Resolving deltas: 100% (89/89), completed with 33 local objects.
From git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
 + 84780c5438ef...744de4180a43 next                    -> efi/next  (forced update)
   fb1201aececc..46908326c6b8  urgent                  -> efi/urgent
 * [new tag]                   efi-next-for-v5.10      -> efi-next-for-v5.10
 * [new tag]                   efi-urgent-for-v5.9-rc5 -> efi-urgent-for-v5.9-rc5
 * [new tag]                   efi-riscv-shared-for-v5.10 -> efi-riscv-shared-for-v5.10
[boris@zn: ~/kernel/linux> git checkout -b test-merge ras/edac-for-next
Branch 'test-merge' set up to track remote branch 'edac-for-next' from 'ras'.
Switched to a new branch 'test-merge'
[boris@zn: ~/kernel/linux> git merge efi/next
Auto-merging drivers/firmware/efi/libstub/efi-stub-helper.c
Auto-merging drivers/firmware/efi/efi.c
Auto-merging drivers/edac/ghes_edac.c
Auto-merging arch/x86/platform/efi/efi.c
Merge made by the 'recursive' strategy.
 arch/arm/include/asm/efi.h                      |  23 +++--
 arch/arm64/include/asm/efi.h                    |   5 +-
 arch/x86/kernel/setup.c                         |   1 +
 arch/x86/platform/efi/efi.c                     |   3 +
 drivers/edac/ghes_edac.c                        |  17 +++-
 drivers/firmware/efi/Makefile                   |   3 +-
 drivers/firmware/efi/cper.c                     |  18 +++-
 drivers/firmware/efi/{arm-init.c => efi-init.c} |   1 +
 drivers/firmware/efi/efi.c                      |   6 ++
 drivers/firmware/efi/libstub/arm32-stub.c       | 178 +++++++---------------------------
 drivers/firmware/efi/libstub/arm64-stub.c       |   1 -
 drivers/firmware/efi/libstub/efi-stub-helper.c  | 101 +++++++++++++++++++-
 drivers/firmware/efi/libstub/efi-stub.c         |  48 +---------
 drivers/firmware/efi/libstub/efistub.h          |  61 +++++++++++-
 drivers/firmware/efi/libstub/file.c             |   5 +-
 drivers/firmware/efi/libstub/relocate.c         |   4 +-
 drivers/firmware/efi/libstub/vsprintf.c         |   2 +-
 drivers/firmware/efi/mokvar-table.c             | 360 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cper.h                            |  24 ++++-
 include/linux/efi.h                             |  34 +++++++
 include/linux/pe.h                              |   3 +
 security/integrity/platform_certs/load_uefi.c   |  85 +++++++++++++----
 22 files changed, 746 insertions(+), 237 deletions(-)
 rename drivers/firmware/efi/{arm-init.c => efi-init.c} (99%)
 create mode 100644 drivers/firmware/efi/mokvar-table.c
Russ Anderson Sept. 16, 2020, 6:12 p.m. UTC | #7
On Wed, Sep 16, 2020 at 08:10:30PM +0200, Borislav Petkov wrote:
> On Wed, Sep 16, 2020 at 04:09:36PM +0300, Ard Biesheuvel wrote:
> > git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git next
> 
> Looks good and no conflicts, builds fine too.

Excellent.
Thanks!

> [boris@zn: ~/kernel/linux> git fetch efi
> remote: Enumerating objects: 85, done.
> remote: Counting objects: 100% (85/85), done.
> remote: Compressing objects: 100% (14/14), done.
> remote: Total 131 (delta 71), reused 85 (delta 71), pack-reused 46
> Receiving objects: 100% (131/131), 113.14 KiB | 1.69 MiB/s, done.
> Resolving deltas: 100% (89/89), completed with 33 local objects.
> From git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
>  + 84780c5438ef...744de4180a43 next                    -> efi/next  (forced update)
>    fb1201aececc..46908326c6b8  urgent                  -> efi/urgent
>  * [new tag]                   efi-next-for-v5.10      -> efi-next-for-v5.10
>  * [new tag]                   efi-urgent-for-v5.9-rc5 -> efi-urgent-for-v5.9-rc5
>  * [new tag]                   efi-riscv-shared-for-v5.10 -> efi-riscv-shared-for-v5.10
> [boris@zn: ~/kernel/linux> git checkout -b test-merge ras/edac-for-next
> Branch 'test-merge' set up to track remote branch 'edac-for-next' from 'ras'.
> Switched to a new branch 'test-merge'
> [boris@zn: ~/kernel/linux> git merge efi/next
> Auto-merging drivers/firmware/efi/libstub/efi-stub-helper.c
> Auto-merging drivers/firmware/efi/efi.c
> Auto-merging drivers/edac/ghes_edac.c
> Auto-merging arch/x86/platform/efi/efi.c
> Merge made by the 'recursive' strategy.
>  arch/arm/include/asm/efi.h                      |  23 +++--
>  arch/arm64/include/asm/efi.h                    |   5 +-
>  arch/x86/kernel/setup.c                         |   1 +
>  arch/x86/platform/efi/efi.c                     |   3 +
>  drivers/edac/ghes_edac.c                        |  17 +++-
>  drivers/firmware/efi/Makefile                   |   3 +-
>  drivers/firmware/efi/cper.c                     |  18 +++-
>  drivers/firmware/efi/{arm-init.c => efi-init.c} |   1 +
>  drivers/firmware/efi/efi.c                      |   6 ++
>  drivers/firmware/efi/libstub/arm32-stub.c       | 178 +++++++---------------------------
>  drivers/firmware/efi/libstub/arm64-stub.c       |   1 -
>  drivers/firmware/efi/libstub/efi-stub-helper.c  | 101 +++++++++++++++++++-
>  drivers/firmware/efi/libstub/efi-stub.c         |  48 +---------
>  drivers/firmware/efi/libstub/efistub.h          |  61 +++++++++++-
>  drivers/firmware/efi/libstub/file.c             |   5 +-
>  drivers/firmware/efi/libstub/relocate.c         |   4 +-
>  drivers/firmware/efi/libstub/vsprintf.c         |   2 +-
>  drivers/firmware/efi/mokvar-table.c             | 360 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/cper.h                            |  24 ++++-
>  include/linux/efi.h                             |  34 +++++++
>  include/linux/pe.h                              |   3 +
>  security/integrity/platform_certs/load_uefi.c   |  85 +++++++++++++----
>  22 files changed, 746 insertions(+), 237 deletions(-)
>  rename drivers/firmware/efi/{arm-init.c => efi-init.c} (99%)
>  create mode 100644 drivers/firmware/efi/mokvar-table.c
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
diff mbox series

Patch

diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index cb3dab56a875..98fcdaf72a09 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -337,8 +337,12 @@  void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
 		p += sprintf(p, "rank:%d ", mem_err->rank);
 	if (mem_err->validation_bits & CPER_MEM_VALID_BANK)
 		p += sprintf(p, "bank:%d ", mem_err->bank);
-	if (mem_err->validation_bits & CPER_MEM_VALID_ROW)
-		p += sprintf(p, "row:%d ", mem_err->row);
+	if (mem_err->validation_bits & (CPER_MEM_VALID_ROW | CPER_MEM_VALID_ROW_EXT)) {
+		u32 row = mem_err->row;
+
+		row |= cper_get_mem_extension(mem_err->validation_bits, mem_err->extended);
+		p += sprintf(p, "row:%d ", row);
+	}
 	if (mem_err->validation_bits & CPER_MEM_VALID_COLUMN)
 		p += sprintf(p, "col:%d ", mem_err->column);
 	if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index f564e15fbc7e..a60acd17bcaa 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -234,8 +234,12 @@  static int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg)
 		n += scnprintf(msg + n, len - n, "bank: %d ", mem->bank);
 	if (mem->validation_bits & CPER_MEM_VALID_DEVICE)
 		n += scnprintf(msg + n, len - n, "device: %d ", mem->device);
-	if (mem->validation_bits & CPER_MEM_VALID_ROW)
-		n += scnprintf(msg + n, len - n, "row: %d ", mem->row);
+	if (mem->validation_bits & (CPER_MEM_VALID_ROW | CPER_MEM_VALID_ROW_EXT)) {
+		u32 row = mem->row;
+
+		row |= cper_get_mem_extension(mem->validation_bits, mem->extended);
+		n += scnprintf(msg + n, len - n, "row: %d ", row);
+	}
 	if (mem->validation_bits & CPER_MEM_VALID_COLUMN)
 		n += scnprintf(msg + n, len - n, "column: %d ", mem->column);
 	if (mem->validation_bits & CPER_MEM_VALID_BIT_POSITION)
@@ -292,6 +296,7 @@  void cper_mem_err_pack(const struct cper_sec_mem_err *mem,
 	cmem->requestor_id = mem->requestor_id;
 	cmem->responder_id = mem->responder_id;
 	cmem->target_id = mem->target_id;
+	cmem->extended = mem->extended;
 	cmem->rank = mem->rank;
 	cmem->mem_array_handle = mem->mem_array_handle;
 	cmem->mem_dev_handle = mem->mem_dev_handle;
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 8537e9282a65..bd2d8a77a784 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -230,6 +230,10 @@  enum {
 #define CPER_MEM_VALID_RANK_NUMBER		0x8000
 #define CPER_MEM_VALID_CARD_HANDLE		0x10000
 #define CPER_MEM_VALID_MODULE_HANDLE		0x20000
+#define CPER_MEM_VALID_ROW_EXT			0x40000
+
+#define CPER_MEM_EXT_ROW_MASK			0x3
+#define CPER_MEM_EXT_ROW_SHIFT			16
 
 #define CPER_PCIE_VALID_PORT_TYPE		0x0001
 #define CPER_PCIE_VALID_VERSION			0x0002
@@ -443,7 +447,7 @@  struct cper_sec_mem_err_old {
 	u8	error_type;
 };
 
-/* Memory Error Section (UEFI >= v2.3), UEFI v2.7 sec N.2.5 */
+/* Memory Error Section (UEFI >= v2.3), UEFI v2.8 sec N.2.5 */
 struct cper_sec_mem_err {
 	u64	validation_bits;
 	u64	error_status;
@@ -461,7 +465,7 @@  struct cper_sec_mem_err {
 	u64	responder_id;
 	u64	target_id;
 	u8	error_type;
-	u8	reserved;
+	u8	extended;
 	u16	rank;
 	u16	mem_array_handle;	/* "card handle" in UEFI 2.4 */
 	u16	mem_dev_handle;		/* "module handle" in UEFI 2.4 */
@@ -483,8 +487,16 @@  struct cper_mem_err_compact {
 	u16	rank;
 	u16	mem_array_handle;
 	u16	mem_dev_handle;
+	u8      extended;
 };
 
+static inline u32 cper_get_mem_extension(u64 mem_valid, u8 mem_extended)
+{
+	if (!(mem_valid & CPER_MEM_VALID_ROW_EXT))
+		return 0;
+	return (mem_extended & CPER_MEM_EXT_ROW_MASK) << CPER_MEM_EXT_ROW_SHIFT;
+}
+
 /* PCI Express Error Section, UEFI v2.7 sec N.2.7 */
 struct cper_sec_pcie {
 	u64		validation_bits;