Message ID | d808b8b76c58054ccd4a8c49dcc2d23fee5ed397.1718906288.git.mchehab+huawei@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix CPER issues related to UEFI 2.9A Errata | expand |
On Thu, 20 Jun 2024 20:01:46 +0200 Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote: > Up to UEFI spec, the type byte of CPER struct for ARM processor was > defined simply as: > > Type at byte offset 4: > > - Cache error > - TLB Error > - Bus Error > - Micro-architectural Error > All other values are reserved > > Yet, there was no information about how this would be encoded. > > Spec 2.9A errata corrected it by defining: > > - Bit 1 - Cache Error > - Bit 2 - TLB Error > - Bit 3 - Bus Error > - Bit 4 - Micro-architectural Error > All other values are reserved > > That actually aligns with the values already defined on older > versions at N.2.4.1. Generic Processor Error Section. > > Spec 2.10 also preserve the same encoding as 2.9A > > See: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information > > Adjust CPER and GHES handling code for both generic and ARM > processors to properly handle UEFI 2.9A and 2.10 encoding. > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> I think you can avoid complexity of your masking solution. Cost is we don't have that function print that there were reserved bits set, but that could be easily handled at the caller including notifying on bits above the defined range which might be helpful. > diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c > index d9bbcea0adf4..4c101a09fd80 100644 > --- a/drivers/firmware/efi/cper-arm.c > +++ b/drivers/firmware/efi/cper-arm.c ... > if (error_info & CPER_ARM_ERR_VALID_PROC_CONTEXT_CORRUPT) { > @@ -241,6 +232,7 @@ void cper_print_proc_arm(const char *pfx, > struct cper_arm_err_info *err_info; > struct cper_arm_ctx_info *ctx_info; > char newpfx[64], infopfx[65]; > + char error_type[120]; > > printk("%sMIDR: 0x%016llx\n", pfx, proc->midr); > > @@ -289,9 +281,11 @@ void cper_print_proc_arm(const char *pfx, > newpfx); > } > > - printk("%serror_type: %d, %s\n", newpfx, err_info->type, > - err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ? > - cper_proc_error_type_strs[err_info->type] : "unknown"); > + cper_bits_to_str(error_type, sizeof(error_type), err_info->type, > + cper_proc_error_type_strs, > + ARRAY_SIZE(cper_proc_error_type_strs), > + CPER_ARM_ERR_TYPE_MASK); Maybe drop this mask complexity and just use FIELD_GET() to extract the relevant field with no shift from 0. > + printk("%serror_type: %s\n", newpfx, error_type); > if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) { > printk("%serror_info: 0x%016llx\n", newpfx, > err_info->error_info);
Em Fri, 21 Jun 2024 10:30:50 +0100 Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu: > On Thu, 20 Jun 2024 20:01:46 +0200 > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote: > > > Up to UEFI spec, the type byte of CPER struct for ARM processor was > > defined simply as: > > > > Type at byte offset 4: > > > > - Cache error > > - TLB Error > > - Bus Error > > - Micro-architectural Error > > All other values are reserved > > > > Yet, there was no information about how this would be encoded. > > > > Spec 2.9A errata corrected it by defining: > > > > - Bit 1 - Cache Error > > - Bit 2 - TLB Error > > - Bit 3 - Bus Error > > - Bit 4 - Micro-architectural Error > > All other values are reserved > > > > That actually aligns with the values already defined on older > > versions at N.2.4.1. Generic Processor Error Section. > > > > Spec 2.10 also preserve the same encoding as 2.9A > > > > See: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information > > > > Adjust CPER and GHES handling code for both generic and ARM > > processors to properly handle UEFI 2.9A and 2.10 encoding. > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > > I think you can avoid complexity of your masking solution. > Cost is we don't have that function print that there were reserved bits > set, but that could be easily handled at the caller including notifying > on bits above the defined range which might be helpful. > > > diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c > > index d9bbcea0adf4..4c101a09fd80 100644 > > --- a/drivers/firmware/efi/cper-arm.c > > +++ b/drivers/firmware/efi/cper-arm.c > ... > > > if (error_info & CPER_ARM_ERR_VALID_PROC_CONTEXT_CORRUPT) { > > @@ -241,6 +232,7 @@ void cper_print_proc_arm(const char *pfx, > > struct cper_arm_err_info *err_info; > > struct cper_arm_ctx_info *ctx_info; > > char newpfx[64], infopfx[65]; > > + char error_type[120]; > > > > printk("%sMIDR: 0x%016llx\n", pfx, proc->midr); > > > > @@ -289,9 +281,11 @@ void cper_print_proc_arm(const char *pfx, > > newpfx); > > } > > > > - printk("%serror_type: %d, %s\n", newpfx, err_info->type, > > - err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ? > > - cper_proc_error_type_strs[err_info->type] : "unknown"); > > + cper_bits_to_str(error_type, sizeof(error_type), err_info->type, > > + cper_proc_error_type_strs, > > + ARRAY_SIZE(cper_proc_error_type_strs), > > + CPER_ARM_ERR_TYPE_MASK); > > Maybe drop this mask complexity and just use > FIELD_GET() to extract the relevant field with no shift from 0. IMO not using the function will make the code here more complex, as the same code needs to be duplicated on two places: here and at ghes, where the error bits are printed using pr_warn_ratelimited(): cper_bits_to_str(error_type, sizeof(error_type), err_info->type, cper_proc_error_type_strs, ARRAY_SIZE(cper_proc_error_type_strs), CPER_ARM_ERR_TYPE_MASK); pr_warn_ratelimited(FW_WARN GHES_PFX "Unhandled processor error type: %s\n", Also, other parts of CPER uses cper_bits_print() for the same reason: to have the common print code handled inside a function instead of repeating the same print pattern everywhere. > > + printk("%serror_type: %s\n", newpfx, error_type); > > if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) { > > printk("%serror_info: 0x%016llx\n", newpfx, > > err_info->error_info); > > Regards, Mauro
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 623cc0cb4a65..093a2d0e49e7 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -533,6 +533,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, { struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); int flags = sync ? MF_ACTION_REQUIRED : 0; + char error_type[120]; bool queued = false; int sec_sev, i; char *p; @@ -546,9 +547,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, p = (char *)(err + 1); for (i = 0; i < err->err_info_num; i++) { struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p; - bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR); + bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR; bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR); - const char *error_type = "unknown error"; /* * The field (err_info->error_info & BIT(26)) is fixed to set to @@ -562,8 +562,10 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, continue; } - if (err_info->type < ARRAY_SIZE(cper_proc_error_type_strs)) - error_type = cper_proc_error_type_strs[err_info->type]; + cper_bits_to_str(error_type, sizeof(error_type), err_info->type, + cper_proc_error_type_strs, + ARRAY_SIZE(cper_proc_error_type_strs), + CPER_ARM_ERR_TYPE_MASK); pr_warn_ratelimited(FW_WARN GHES_PFX "Unhandled processor error type: %s\n", diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c index d9bbcea0adf4..4c101a09fd80 100644 --- a/drivers/firmware/efi/cper-arm.c +++ b/drivers/firmware/efi/cper-arm.c @@ -93,15 +93,11 @@ static void cper_print_arm_err_info(const char *pfx, u32 type, bool proc_context_corrupt, corrected, precise_pc, restartable_pc; bool time_out, access_mode; - /* If the type is unknown, bail. */ - if (type > CPER_ARM_MAX_TYPE) - return; - /* * Vendor type errors have error information values that are vendor * specific. */ - if (type == CPER_ARM_VENDOR_ERROR) + if (type & CPER_ARM_VENDOR_ERROR) return; if (error_info & CPER_ARM_ERR_VALID_TRANSACTION_TYPE) { @@ -116,43 +112,38 @@ static void cper_print_arm_err_info(const char *pfx, u32 type, if (error_info & CPER_ARM_ERR_VALID_OPERATION_TYPE) { op_type = ((error_info >> CPER_ARM_ERR_OPERATION_SHIFT) & CPER_ARM_ERR_OPERATION_MASK); - switch (type) { - case CPER_ARM_CACHE_ERROR: + if (type & CPER_ARM_CACHE_ERROR) { if (op_type < ARRAY_SIZE(arm_cache_err_op_strs)) { - printk("%soperation type: %s\n", pfx, + printk("%scache error, operation type: %s\n", pfx, arm_cache_err_op_strs[op_type]); } - break; - case CPER_ARM_TLB_ERROR: + } + if (type & CPER_ARM_TLB_ERROR) { if (op_type < ARRAY_SIZE(arm_tlb_err_op_strs)) { - printk("%soperation type: %s\n", pfx, + printk("%sTLB error, operation type: %s\n", pfx, arm_tlb_err_op_strs[op_type]); } - break; - case CPER_ARM_BUS_ERROR: + } + if (type & CPER_ARM_BUS_ERROR) { if (op_type < ARRAY_SIZE(arm_bus_err_op_strs)) { - printk("%soperation type: %s\n", pfx, + printk("%sbus error, operation type: %s\n", pfx, arm_bus_err_op_strs[op_type]); } - break; } } if (error_info & CPER_ARM_ERR_VALID_LEVEL) { level = ((error_info >> CPER_ARM_ERR_LEVEL_SHIFT) & CPER_ARM_ERR_LEVEL_MASK); - switch (type) { - case CPER_ARM_CACHE_ERROR: + if (type & CPER_ARM_CACHE_ERROR) printk("%scache level: %d\n", pfx, level); - break; - case CPER_ARM_TLB_ERROR: + + if (type & CPER_ARM_TLB_ERROR) printk("%sTLB level: %d\n", pfx, level); - break; - case CPER_ARM_BUS_ERROR: + + if (type & CPER_ARM_BUS_ERROR) printk("%saffinity level at which the bus error occurred: %d\n", pfx, level); - break; - } } if (error_info & CPER_ARM_ERR_VALID_PROC_CONTEXT_CORRUPT) { @@ -241,6 +232,7 @@ void cper_print_proc_arm(const char *pfx, struct cper_arm_err_info *err_info; struct cper_arm_ctx_info *ctx_info; char newpfx[64], infopfx[65]; + char error_type[120]; printk("%sMIDR: 0x%016llx\n", pfx, proc->midr); @@ -289,9 +281,11 @@ void cper_print_proc_arm(const char *pfx, newpfx); } - printk("%serror_type: %d, %s\n", newpfx, err_info->type, - err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ? - cper_proc_error_type_strs[err_info->type] : "unknown"); + cper_bits_to_str(error_type, sizeof(error_type), err_info->type, + cper_proc_error_type_strs, + ARRAY_SIZE(cper_proc_error_type_strs), + CPER_ARM_ERR_TYPE_MASK); + printk("%serror_type: %s\n", newpfx, error_type); if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) { printk("%serror_info: 0x%016llx\n", newpfx, err_info->error_info); diff --git a/include/linux/cper.h b/include/linux/cper.h index 856e8f00a7fb..4fef462944d6 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -293,11 +293,11 @@ enum { #define CPER_ARM_INFO_FLAGS_PROPAGATED BIT(2) #define CPER_ARM_INFO_FLAGS_OVERFLOW BIT(3) -#define CPER_ARM_CACHE_ERROR 0 -#define CPER_ARM_TLB_ERROR 1 -#define CPER_ARM_BUS_ERROR 2 -#define CPER_ARM_VENDOR_ERROR 3 -#define CPER_ARM_MAX_TYPE CPER_ARM_VENDOR_ERROR +#define CPER_ARM_ERR_TYPE_MASK GENMASK(4,1) +#define CPER_ARM_CACHE_ERROR BIT(1) +#define CPER_ARM_TLB_ERROR BIT(2) +#define CPER_ARM_BUS_ERROR BIT(3) +#define CPER_ARM_VENDOR_ERROR BIT(4) #define CPER_ARM_ERR_VALID_TRANSACTION_TYPE BIT(0) #define CPER_ARM_ERR_VALID_OPERATION_TYPE BIT(1)
Up to UEFI spec, the type byte of CPER struct for ARM processor was defined simply as: Type at byte offset 4: - Cache error - TLB Error - Bus Error - Micro-architectural Error All other values are reserved Yet, there was no information about how this would be encoded. Spec 2.9A errata corrected it by defining: - Bit 1 - Cache Error - Bit 2 - TLB Error - Bit 3 - Bus Error - Bit 4 - Micro-architectural Error All other values are reserved That actually aligns with the values already defined on older versions at N.2.4.1. Generic Processor Error Section. Spec 2.10 also preserve the same encoding as 2.9A See: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information Adjust CPER and GHES handling code for both generic and ARM processors to properly handle UEFI 2.9A and 2.10 encoding. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> --- drivers/acpi/apei/ghes.c | 10 ++++--- drivers/firmware/efi/cper-arm.c | 46 ++++++++++++++------------------- include/linux/cper.h | 10 +++---- 3 files changed, 31 insertions(+), 35 deletions(-)