Message ID | 20240412172349.544064-2-jithu.joseph@intel.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | Miscelleanous In Field Scan changes | expand |
On 4/12/24 10:23 AM, Jithu Joseph wrote: > Based on inputs from hardware architects, only "scan signature failures" > should be treated as actual hardware/cpu failure. Instead of just saying input from hardware architects, it would be better if you mention the rationale behind it. > Current driver, in addition, classifies "scan controller error" scenario > too as a hardware/cpu failure. Modify the driver to classify this situation > with a more appropriate "untested" status instead of "fail" status. > > Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> > Reviewed-by: Tony Luck <tony.luck@intel.com> > Reviewe Code wise it looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > d-by: Ashok Raj <ashok.raj@intel.com> > --- > drivers/platform/x86/intel/ifs/runtest.c | 27 +++++++++++++----------- > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c > index 95b4b71fab53..282e4bfe30da 100644 > --- a/drivers/platform/x86/intel/ifs/runtest.c > +++ b/drivers/platform/x86/intel/ifs/runtest.c > @@ -69,6 +69,19 @@ static const char * const scan_test_status[] = { > > static void message_not_tested(struct device *dev, int cpu, union ifs_status status) > { > + struct ifs_data *ifsd = ifs_get_data(dev); > + > + /* > + * control_error is set when the microcode runs into a problem > + * loading the image from the reserved BIOS memory, or it has > + * been corrupted. Reloading the image may fix this issue. > + */ > + if (status.control_error) { > + dev_warn(dev, "CPU(s) %*pbl: Scan controller error. Batch: %02x version: 0x%x\n", > + cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version); > + return; > + } > + > if (status.error_code < ARRAY_SIZE(scan_test_status)) { > dev_info(dev, "CPU(s) %*pbl: SCAN operation did not start. %s\n", > cpumask_pr_args(cpu_smt_mask(cpu)), > @@ -90,16 +103,6 @@ static void message_fail(struct device *dev, int cpu, union ifs_status status) > { > struct ifs_data *ifsd = ifs_get_data(dev); > > - /* > - * control_error is set when the microcode runs into a problem > - * loading the image from the reserved BIOS memory, or it has > - * been corrupted. Reloading the image may fix this issue. > - */ > - if (status.control_error) { > - dev_err(dev, "CPU(s) %*pbl: could not execute from loaded scan image. Batch: %02x version: 0x%x\n", > - cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version); > - } > - > /* > * signature_error is set when the output from the scan chains does not > * match the expected signature. This might be a transient problem (e.g. > @@ -285,10 +288,10 @@ static void ifs_test_core(int cpu, struct device *dev) > /* Update status for this core */ > ifsd->scan_details = status.data; > > - if (status.control_error || status.signature_error) { > + if (status.signature_error) { > ifsd->status = SCAN_TEST_FAIL; > message_fail(dev, cpu, status); > - } else if (status.error_code) { > + } else if (status.control_error || status.error_code) { > ifsd->status = SCAN_NOT_TESTED; > message_not_tested(dev, cpu, status); > } else {
Sathya, Thanks for reviewing this On 4/12/2024 11:32 AM, Kuppuswamy Sathyanarayanan wrote: > > On 4/12/24 10:23 AM, Jithu Joseph wrote: >> Based on inputs from hardware architects, only "scan signature failures" >> should be treated as actual hardware/cpu failure. > > Instead of just saying input from hardware architects, it would be better > if you mention the rationale behind it. I can reword the first para as below: "Scan controller error" means that scan hardware encountered an error prior to doing an actual test on the target CPU. It does not mean that there is an actual cpu/core failure. "scan signature failure" indicates that the test result on the target core did not match the expected value and should be treated as a cpu failure. Current driver classifies both these scenarios as failures. Modify ... > >> Current driver, in addition, classifies "scan controller error" scenario >> too as a hardware/cpu failure. Modify the driver to classify this situation >> with a more appropriate "untested" status instead of "fail" status. >> >> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> >> Reviewed-by: Tony Luck <tony.luck@intel.com> >> Reviewe > > Code wise it looks good to me. > > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > Jithu
On 4/12/24 12:31 PM, Joseph, Jithu wrote: > Sathya, > > Thanks for reviewing this > > On 4/12/2024 11:32 AM, Kuppuswamy Sathyanarayanan wrote: >> On 4/12/24 10:23 AM, Jithu Joseph wrote: >>> Based on inputs from hardware architects, only "scan signature failures" >>> should be treated as actual hardware/cpu failure. >> Instead of just saying input from hardware architects, it would be better >> if you mention the rationale behind it. > I can reword the first para as below: > > "Scan controller error" means that scan hardware encountered an error > prior to doing an actual test on the target CPU. It does not mean that > there is an actual cpu/core failure. "scan signature failure" indicates > that the test result on the target core did not match the expected value > and should be treated as a cpu failure. > > Current driver classifies both these scenarios as failures. Modify ... Looks good to me. >>> Current driver, in addition, classifies "scan controller error" scenario >>> too as a hardware/cpu failure. Modify the driver to classify this situation >>> with a more appropriate "untested" status instead of "fail" status. >>> >>> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> >>> Reviewed-by: Tony Luck <tony.luck@intel.com> >>> Reviewe >> Code wise it looks good to me. >> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> >> > > Jithu
Hi, Thank you for this patch series. On 4/12/24 9:31 PM, Joseph, Jithu wrote: > Sathya, > > Thanks for reviewing this > > On 4/12/2024 11:32 AM, Kuppuswamy Sathyanarayanan wrote: >> >> On 4/12/24 10:23 AM, Jithu Joseph wrote: >>> Based on inputs from hardware architects, only "scan signature failures" >>> should be treated as actual hardware/cpu failure. >> >> Instead of just saying input from hardware architects, it would be better >> if you mention the rationale behind it. > > I can reword the first para as below: > > "Scan controller error" means that scan hardware encountered an error > prior to doing an actual test on the target CPU. It does not mean that > there is an actual cpu/core failure. "scan signature failure" indicates > that the test result on the target core did not match the expected value > and should be treated as a cpu failure. > > Current driver classifies both these scenarios as failures. Modify ... I've modified the commit message using the rewording suggested above while merging this patch and I have merged the entire series: Thank you for your patch-series, I've applied the series to my review-hans branch: https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=review-hans Once I've run some tests on this branch the patches there will be added to the platform-drivers-x86/for-next branch and eventually will be included in the pdx86 pull-request to Linus for the next merge-window. Regards, Hans >>> Current driver, in addition, classifies "scan controller error" scenario >>> too as a hardware/cpu failure. Modify the driver to classify this situation >>> with a more appropriate "untested" status instead of "fail" status. >>> >>> Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> >>> Reviewed-by: Tony Luck <tony.luck@intel.com> >>> Reviewe >> >> Code wise it looks good to me. >> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> >> > > > Jithu >
diff --git a/drivers/platform/x86/intel/ifs/runtest.c b/drivers/platform/x86/intel/ifs/runtest.c index 95b4b71fab53..282e4bfe30da 100644 --- a/drivers/platform/x86/intel/ifs/runtest.c +++ b/drivers/platform/x86/intel/ifs/runtest.c @@ -69,6 +69,19 @@ static const char * const scan_test_status[] = { static void message_not_tested(struct device *dev, int cpu, union ifs_status status) { + struct ifs_data *ifsd = ifs_get_data(dev); + + /* + * control_error is set when the microcode runs into a problem + * loading the image from the reserved BIOS memory, or it has + * been corrupted. Reloading the image may fix this issue. + */ + if (status.control_error) { + dev_warn(dev, "CPU(s) %*pbl: Scan controller error. Batch: %02x version: 0x%x\n", + cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version); + return; + } + if (status.error_code < ARRAY_SIZE(scan_test_status)) { dev_info(dev, "CPU(s) %*pbl: SCAN operation did not start. %s\n", cpumask_pr_args(cpu_smt_mask(cpu)), @@ -90,16 +103,6 @@ static void message_fail(struct device *dev, int cpu, union ifs_status status) { struct ifs_data *ifsd = ifs_get_data(dev); - /* - * control_error is set when the microcode runs into a problem - * loading the image from the reserved BIOS memory, or it has - * been corrupted. Reloading the image may fix this issue. - */ - if (status.control_error) { - dev_err(dev, "CPU(s) %*pbl: could not execute from loaded scan image. Batch: %02x version: 0x%x\n", - cpumask_pr_args(cpu_smt_mask(cpu)), ifsd->cur_batch, ifsd->loaded_version); - } - /* * signature_error is set when the output from the scan chains does not * match the expected signature. This might be a transient problem (e.g. @@ -285,10 +288,10 @@ static void ifs_test_core(int cpu, struct device *dev) /* Update status for this core */ ifsd->scan_details = status.data; - if (status.control_error || status.signature_error) { + if (status.signature_error) { ifsd->status = SCAN_TEST_FAIL; message_fail(dev, cpu, status); - } else if (status.error_code) { + } else if (status.control_error || status.error_code) { ifsd->status = SCAN_NOT_TESTED; message_not_tested(dev, cpu, status); } else {