Message ID | 1306303538-30524-1-git-send-email-ying.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote: > In APEI firmware first mode, hardware error is reported by hardware to > firmware firstly, then firmware reports the error to Linux in a GHES > error record via POLL/SCI/IRQ/NMI etc. > > This may result in some issues if OS has no full APEI support. So > some firmware implementation will work in a back-compatible mode by > default. Where firmware will only notify OS in old-fashion, without > GHES record. For example, for a fatal hardware error, only NMI is > signaled, no GHES record. > > To gain full APEI power on these machines, a special APEI _OSC needs > to be evaluated to tell firmware that Linux has full APEI support. > This patch add the APEI _OSC support. Using an Intel box I have over at RedHat, I was able to use this patch to get error injection (EINJ) to provide me a GHES record. Prior to this patch I would just get unknown NMIs. Talking with Matthew Garret, I guess it seems that uuids like this are typical for ACPI. I can't speak for all the ACPI parts, but the patch looks simple and corret from my perspective. Reviewed-and-Tested-by: Don Zickus <dzickus@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
? 6/13/2011 10:50 PM, Don Zickus ??: > On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote: >> In APEI firmware first mode, hardware error is reported by hardware to >> firmware firstly, then firmware reports the error to Linux in a GHES >> error record via POLL/SCI/IRQ/NMI etc. >> >> This may result in some issues if OS has no full APEI support. So >> some firmware implementation will work in a back-compatible mode by >> default. Where firmware will only notify OS in old-fashion, without >> GHES record. For example, for a fatal hardware error, only NMI is >> signaled, no GHES record. >> >> To gain full APEI power on these machines, a special APEI _OSC needs >> to be evaluated to tell firmware that Linux has full APEI support. >> This patch add the APEI _OSC support. > > Using an Intel box I have over at RedHat, I was able to use this patch to > get error injection (EINJ) to provide me a GHES record. Prior to this > patch I would just get unknown NMIs. > > Talking with Matthew Garret, I guess it seems that uuids like this are > typical for ACPI. > > I can't speak for all the ACPI parts, but the patch looks simple and > corret from my perspective. > Interesting, but I can't reproduce it on our machines by now. Would you please share your test information such as machine type, BIOS version, and test method (such as what type error you inject) ? -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 14, 2011 at 02:33:19PM +0800, Chen Gong wrote: > >Using an Intel box I have over at RedHat, I was able to use this patch to > >get error injection (EINJ) to provide me a GHES record. Prior to this > >patch I would just get unknown NMIs. > > > >Talking with Matthew Garret, I guess it seems that uuids like this are > >typical for ACPI. > > > >I can't speak for all the ACPI parts, but the patch looks simple and > >corret from my perspective. > > > Interesting, but I can't reproduce it on our machines by now. Would you > please share your test information such as machine type, BIOS > version, and test method (such as what type error you inject) ? Sure, #cat /proc/cpuinfo processor : 23 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz stepping : 2 cpu MHz : 1596.000 #dmidecode Handle 0x0002, DMI type 1, 27 bytes System Information Manufacturer: Intel Corporation Product Name: S5520UR Version: .................... Serial Number: ............ UUID: 57890342-6DA6-11DF-BD83-001517EE9AE8 Wake-up Type: AC Power Restored SKU Number: Not Specified Family: Not Specified Handle 0x0003, DMI type 2, 16 bytes Base Board Information Manufacturer: Intel Corporation Product Name: S5520UR Version: E81084-751 Serial Number: BZUB02211743 Asset Tag: .................... Features: Board is a hosting board Board is replaceable Location In Chassis: Not Specified Chassis Handle: 0x0004 Type: Motherboard Contained Object Handles: 0 Handle 0x0005, DMI type 0, 24 bytes BIOS Information Vendor: Intel Corp. Version: S5500.86B.01.00.0055.112620101923 Release Date: 11/26/2010 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 8192 kB Characteristics: PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported EDD is supported 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) : CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported LS-120 boot is supported ATAPI Zip drive boot is supported Function key-initiated network boot is supported Targeted content distribution is supported BIOS Revision: 17.18 Firmware Revision: 0.0 #instructions mount -t debugfs debugfs /sys/kernel/debug cd /sys/kernel/debug/apei/einj echo 0x20 > error_type echo 1 > error_inject #should fail within one second Let me know if you need more info. By the way this is the only machine where this works. Other Intel machines stopped producing NMIs when their BIOS was updated (before I got the chance to validate this patch on them). Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote: > To gain full APEI power on these machines, a special APEI _OSC needs > to be evaluated to tell firmware that Linux has full APEI support. > This patch add the APEI _OSC support. (snip) > + static DEFINE_MUTEX(mutex); > + static int status = APEI_OSC_SETUP_UNKNOWN; > + static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c"; This is the WHEA UUID, right? > + u32 capbuf[3]; > + struct acpi_osc_context context = { > + .uuid_str = apei_uuid_str, > + .rev = 1, > + .cap.length = sizeof(capbuf), > + .cap.pointer = capbuf, > + }; > + > + mutex_lock(&mutex); > + if (status == APEI_OSC_SETUP_UNKNOWN) { > + capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE; > + capbuf[OSC_SUPPORT_TYPE] = 0; > + capbuf[OSC_CONTROL_TYPE] = 0; > + > + if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)) > + || ACPI_FAILURE(acpi_run_osc(handle, &context))) { > + pr_err(APEI_PFX "APEI _OSC failed!\n"); > + status = APEI_OSC_SETUP_FAILED; > + } else { > + kfree(context.ret.pointer); > + status = APEI_OSC_SETUP_SUCCEEDED; > + } > + } > + mutex_unlock(&mutex); > + > + return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO; So we fail if the platform doesn't implement WHEA... > + rc = apei_osc_setup(); > + if (rc) { > + ghes_remove(ghes_dev); > + return rc; > + } > + And then tear down GHES. This seems wrong. A platform could predicate APEI functionality on the ACPI spec APEI indication (which we currently don't pass) without implementing WHEA, but with this patch we'd refuse to enable GHES support? We should probably try both the standard method and the WHEA method and only disable GHES if both fail. (Also, are there any other sideeffects of indicating that we support WHEA?)
Hi, Matthew, On 06/14/2011 10:52 PM, Matthew Garrett wrote: > On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote: > >> To gain full APEI power on these machines, a special APEI _OSC needs >> to be evaluated to tell firmware that Linux has full APEI support. >> This patch add the APEI _OSC support. > > (snip) > >> + static DEFINE_MUTEX(mutex); >> + static int status = APEI_OSC_SETUP_UNKNOWN; >> + static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c"; > > This is the WHEA UUID, right? Yes. >> + u32 capbuf[3]; >> + struct acpi_osc_context context = { >> + .uuid_str = apei_uuid_str, >> + .rev = 1, >> + .cap.length = sizeof(capbuf), >> + .cap.pointer = capbuf, >> + }; >> + >> + mutex_lock(&mutex); >> + if (status == APEI_OSC_SETUP_UNKNOWN) { >> + capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE; >> + capbuf[OSC_SUPPORT_TYPE] = 0; >> + capbuf[OSC_CONTROL_TYPE] = 0; >> + >> + if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)) >> + || ACPI_FAILURE(acpi_run_osc(handle, &context))) { >> + pr_err(APEI_PFX "APEI _OSC failed!\n"); >> + status = APEI_OSC_SETUP_FAILED; >> + } else { >> + kfree(context.ret.pointer); >> + status = APEI_OSC_SETUP_SUCCEEDED; >> + } >> + } >> + mutex_unlock(&mutex); >> + >> + return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO; > > So we fail if the platform doesn't implement WHEA... > >> + rc = apei_osc_setup(); >> + if (rc) { >> + ghes_remove(ghes_dev); >> + return rc; >> + } >> + > > And then tear down GHES. This seems wrong. A platform could predicate > APEI functionality on the ACPI spec APEI indication (which we currently > don't pass) without implementing WHEA, but with this patch we'd refuse > to enable GHES support? We should probably try both the standard method > and the WHEA method and only disable GHES if both fail. You means the "APEI Support" bit for standard UUID? Do you know which machine uses this bit? I can write the code, but I have no machine to test it. BTW, it is better for us to enable APEI firmware first mode (that is, what is enabled by evaluating the WHEA UUID) after GHES reporting is ready (that is, after GHES module is successfully loaded). That is later than current ACPI _OSC evaluation with standard UUID. Is it possible to evaluate _OSC with standard UUID twice? So that we can enable APEI firmware first mode later. > (Also, are there any other sideeffects of indicating that we support > WHEA?) After evaluating _OSC with this UUID, firmware will produce error record to OS, otherwise only unknown NMI. Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 15, 2011 at 11:53:32AM +0800, Huang Ying wrote: > Hi, Matthew, > On 06/14/2011 10:52 PM, Matthew Garrett wrote: > > And then tear down GHES. This seems wrong. A platform could predicate > > APEI functionality on the ACPI spec APEI indication (which we currently > > don't pass) without implementing WHEA, but with this patch we'd refuse > > to enable GHES support? We should probably try both the standard method > > and the WHEA method and only disable GHES if both fail. > > You means the "APEI Support" bit for standard UUID? Do you know which > machine uses this bit? I can write the code, but I have no machine to > test it. I have access to a Dell system that uses this. > BTW, it is better for us to enable APEI firmware first mode (that is, > what is enabled by evaluating the WHEA UUID) after GHES reporting is > ready (that is, after GHES module is successfully loaded). That is > later than current ACPI _OSC evaluation with standard UUID. Is it > possible to evaluate _OSC with standard UUID twice? So that we can > enable APEI firmware first mode later. Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is made, and then clears a flag if any other _OSC call is made. In that specific case it doesn't seem to matter (the flag never actually gets checked in any of the other codepaths), but it seems that the intention is for the generic call to be made and the WHEA one to be made after that. > > (Also, are there any other sideeffects of indicating that we support > > WHEA?) > > After evaluating _OSC with this UUID, firmware will produce error record > to OS, otherwise only unknown NMI. Ok, that sounds fine.
On 06/15/2011 08:17 PM, Matthew Garrett wrote: > On Wed, Jun 15, 2011 at 11:53:32AM +0800, Huang Ying wrote: >> Hi, Matthew, >> On 06/14/2011 10:52 PM, Matthew Garrett wrote: >>> And then tear down GHES. This seems wrong. A platform could predicate >>> APEI functionality on the ACPI spec APEI indication (which we currently >>> don't pass) without implementing WHEA, but with this patch we'd refuse >>> to enable GHES support? We should probably try both the standard method >>> and the WHEA method and only disable GHES if both fail. >> >> You means the "APEI Support" bit for standard UUID? Do you know which >> machine uses this bit? I can write the code, but I have no machine to >> test it. > > I have access to a Dell system that uses this. Great! Can you help us to test the code? >> BTW, it is better for us to enable APEI firmware first mode (that is, >> what is enabled by evaluating the WHEA UUID) after GHES reporting is >> ready (that is, after GHES module is successfully loaded). That is >> later than current ACPI _OSC evaluation with standard UUID. Is it >> possible to evaluate _OSC with standard UUID twice? So that we can >> enable APEI firmware first mode later. > > Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is > made, and then clears a flag if any other _OSC call is made. In that > specific case it doesn't seem to matter (the flag never actually gets > checked in any of the other codepaths), but it seems that the intention > is for the generic call to be made and the WHEA one to be made after > that. Yes. The WHEA call should be made after the generic one. Another situation is as follow: - Generic _OSC call without "APEI Support" bit is called (in acpi_bus_osc_support). - After some time, when we think it is good to turn on firmware first mode fully, usually after we checking HEST and initializing corresponding module, we make generic _OSC call with "APEI Support" bit to turn on firmware first mode fully in standard way. Is it a good idea to make generic _OSC call twice, one without "APEI Support" bit, the other with "APEI Support" bit? Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote: > On 06/15/2011 08:17 PM, Matthew Garrett wrote: > >> You means the "APEI Support" bit for standard UUID? Do you know which > >> machine uses this bit? I can write the code, but I have no machine to > >> test it. > > > > I have access to a Dell system that uses this. > > Great! Can you help us to test the code? Yup, no problem. > > Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is > > made, and then clears a flag if any other _OSC call is made. In that > > specific case it doesn't seem to matter (the flag never actually gets > > checked in any of the other codepaths), but it seems that the intention > > is for the generic call to be made and the WHEA one to be made after > > that. > > Yes. The WHEA call should be made after the generic one. Another > situation is as follow: > > - Generic _OSC call without "APEI Support" bit is called (in > acpi_bus_osc_support). > > - After some time, when we think it is good to turn on firmware first > mode fully, usually after we checking HEST and initializing > corresponding module, we make generic _OSC call with "APEI Support" bit > to turn on firmware first mode fully in standard way. > > Is it a good idea to make generic _OSC call twice, one without "APEI > Support" bit, the other with "APEI Support" bit? I think we probably need to make the HEST decision early, and use that to decide how to make the generic call. Our experience has been that many firmware vendors only expect _OSC to be called once with any given UUID - multiple calls can result in unexpected behaviour.
On 06/16/2011 09:38 AM, Matthew Garrett wrote: > On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote: >> On 06/15/2011 08:17 PM, Matthew Garrett wrote: >>>> You means the "APEI Support" bit for standard UUID? Do you know which >>>> machine uses this bit? I can write the code, but I have no machine to >>>> test it. >>> >>> I have access to a Dell system that uses this. >> >> Great! Can you help us to test the code? > > Yup, no problem. > >>> Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is >>> made, and then clears a flag if any other _OSC call is made. In that >>> specific case it doesn't seem to matter (the flag never actually gets >>> checked in any of the other codepaths), but it seems that the intention >>> is for the generic call to be made and the WHEA one to be made after >>> that. >> >> Yes. The WHEA call should be made after the generic one. Another >> situation is as follow: >> >> - Generic _OSC call without "APEI Support" bit is called (in >> acpi_bus_osc_support). >> >> - After some time, when we think it is good to turn on firmware first >> mode fully, usually after we checking HEST and initializing >> corresponding module, we make generic _OSC call with "APEI Support" bit >> to turn on firmware first mode fully in standard way. >> >> Is it a good idea to make generic _OSC call twice, one without "APEI >> Support" bit, the other with "APEI Support" bit? > > I think we probably need to make the HEST decision early, and use that > to decide how to make the generic call. Our experience has been that > many firmware vendors only expect _OSC to be called once with any given > UUID - multiple calls can result in unexpected behaviour. acpi_bus_osc_support is called via subsys_initcall. It is a little hard to do all checking before that. Is it possible to call acpi_bus_osc_support later? Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 16, 2011 at 09:55:58AM +0800, Huang Ying wrote: > On 06/16/2011 09:38 AM, Matthew Garrett wrote: > > I think we probably need to make the HEST decision early, and use that > > to decide how to make the generic call. Our experience has been that > > many firmware vendors only expect _OSC to be called once with any given > > UUID - multiple calls can result in unexpected behaviour. > > acpi_bus_osc_support is called via subsys_initcall. It is a little hard > to do all checking before that. Is it possible to call > acpi_bus_osc_support later? Yeah, this is going to be a problem. We have the HEST available at this point so we ought to be able to parse it, though. I'll take a look tomorrow.
On Thursday, June 16, 2011, Matthew Garrett wrote: > On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote: > > On 06/15/2011 08:17 PM, Matthew Garrett wrote: > > >> You means the "APEI Support" bit for standard UUID? Do you know which > > >> machine uses this bit? I can write the code, but I have no machine to > > >> test it. > > > > > > I have access to a Dell system that uses this. > > > > Great! Can you help us to test the code? > > Yup, no problem. > > > > Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is > > > made, and then clears a flag if any other _OSC call is made. In that > > > specific case it doesn't seem to matter (the flag never actually gets > > > checked in any of the other codepaths), but it seems that the intention > > > is for the generic call to be made and the WHEA one to be made after > > > that. > > > > Yes. The WHEA call should be made after the generic one. Another > > situation is as follow: > > > > - Generic _OSC call without "APEI Support" bit is called (in > > acpi_bus_osc_support). > > > > - After some time, when we think it is good to turn on firmware first > > mode fully, usually after we checking HEST and initializing > > corresponding module, we make generic _OSC call with "APEI Support" bit > > to turn on firmware first mode fully in standard way. > > > > Is it a good idea to make generic _OSC call twice, one without "APEI > > Support" bit, the other with "APEI Support" bit? > > I think we probably need to make the HEST decision early, and use that > to decide how to make the generic call. Our experience has been that > many firmware vendors only expect _OSC to be called once with any given > UUID - multiple calls can result in unexpected behaviour. Not really, as long as the "query enable" bit is set. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/16/2011 09:57 AM, Matthew Garrett wrote: > On Thu, Jun 16, 2011 at 09:55:58AM +0800, Huang Ying wrote: >> On 06/16/2011 09:38 AM, Matthew Garrett wrote: >>> I think we probably need to make the HEST decision early, and use that >>> to decide how to make the generic call. Our experience has been that >>> many firmware vendors only expect _OSC to be called once with any given >>> UUID - multiple calls can result in unexpected behaviour. >> >> acpi_bus_osc_support is called via subsys_initcall. It is a little hard >> to do all checking before that. Is it possible to call >> acpi_bus_osc_support later? > > Yeah, this is going to be a problem. We have the HEST available at this > point so we ought to be able to parse it, though. I'll take a look > tomorrow. We can check the HEST table before _OSC evaluating. But it is much harder to check software part, because we have implemented GHES support (Generic Hardware Error Source, the handler of firmware first mode hardware error notification) as device driver and module. So I think we can do that in 2 steps. At first, we just enable WHEA UUID, because that is easier to do. Then we find a way to implement "APEI bit" in generic _OSC call. Do you think that is a good idea? Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 17, 2011 at 08:57:09AM +0800, Huang Ying wrote: > On 06/16/2011 09:57 AM, Matthew Garrett wrote: > > Yeah, this is going to be a problem. We have the HEST available at this > > point so we ought to be able to parse it, though. I'll take a look > > tomorrow. > > We can check the HEST table before _OSC evaluating. But it is much > harder to check software part, because we have implemented GHES support > (Generic Hardware Error Source, the handler of firmware first mode > hardware error notification) as device driver and module. If the kernel has been configured with support for the feature then I think we ought to be able to assume that the kernel will support it at runtime. > So I think we can do that in 2 steps. At first, we just enable WHEA > UUID, because that is easier to do. Then we find a way to implement > "APEI bit" in generic _OSC call. Do you think that is a good idea? I'm fine with that, providing that GHES isn't disabled purely because the WHEA UUID call wasn't successful.
On 06/17/2011 09:34 AM, Matthew Garrett wrote: > On Fri, Jun 17, 2011 at 08:57:09AM +0800, Huang Ying wrote: >> On 06/16/2011 09:57 AM, Matthew Garrett wrote: >>> Yeah, this is going to be a problem. We have the HEST available at this >>> point so we ought to be able to parse it, though. I'll take a look >>> tomorrow. >> >> We can check the HEST table before _OSC evaluating. But it is much >> harder to check software part, because we have implemented GHES support >> (Generic Hardware Error Source, the handler of firmware first mode >> hardware error notification) as device driver and module. > > If the kernel has been configured with support for the feature then I > think we ought to be able to assume that the kernel will support it at > runtime. There may be error during driver initialization. That is what I am concerned. >> So I think we can do that in 2 steps. At first, we just enable WHEA >> UUID, because that is easier to do. Then we find a way to implement >> "APEI bit" in generic _OSC call. Do you think that is a good idea? > > I'm fine with that, providing that GHES isn't disabled purely because > the WHEA UUID call wasn't successful. Because we have not added the code to make generic _OSC call with "APEI bit" now, so if WHEA UUID call failed, we have no firmware first mode enabled. So I think it is safe to disable GHES if WHEA UUID call failed. But in another hand, keeping GHES has no harm too. So I am OK to keep GHES if WHEA UUID call failed. Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 17, 2011 at 09:40:17AM +0800, Huang Ying wrote: > On 06/17/2011 09:34 AM, Matthew Garrett wrote: > > If the kernel has been configured with support for the feature then I > > think we ought to be able to assume that the kernel will support it at > > runtime. > > There may be error during driver initialization. That is what I am > concerned. That's true of any _OSC functionality. > >> So I think we can do that in 2 steps. At first, we just enable WHEA > >> UUID, because that is easier to do. Then we find a way to implement > >> "APEI bit" in generic _OSC call. Do you think that is a good idea? > > > > I'm fine with that, providing that GHES isn't disabled purely because > > the WHEA UUID call wasn't successful. > > Because we have not added the code to make generic _OSC call with "APEI > bit" now, so if WHEA UUID call failed, we have no firmware first mode > enabled. So I think it is safe to disable GHES if WHEA UUID call > failed. But in another hand, keeping GHES has no harm too. So I am OK > to keep GHES if WHEA UUID call failed. I see your point. But this does need to be fixed in the long run.
On 06/17/2011 09:42 AM, Matthew Garrett wrote: > On Fri, Jun 17, 2011 at 09:40:17AM +0800, Huang Ying wrote: >> On 06/17/2011 09:34 AM, Matthew Garrett wrote: >>> If the kernel has been configured with support for the feature then I >>> think we ought to be able to assume that the kernel will support it at >>> runtime. >> >> There may be error during driver initialization. That is what I am >> concerned. > > That's true of any _OSC functionality. I see. This should be fixed in the long run. So I will add "APEI bit" support if GHES is configured. Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/drivers/acpi/apei/apei-base.c +++ b/drivers/acpi/apei/apei-base.c @@ -603,3 +603,45 @@ struct dentry *apei_get_debugfs_dir(void return dapei; } EXPORT_SYMBOL_GPL(apei_get_debugfs_dir); + +enum { + APEI_OSC_SETUP_UNKNOWN, + APEI_OSC_SETUP_FAILED, + APEI_OSC_SETUP_SUCCEEDED, +}; + +int apei_osc_setup(void) +{ + /* Prevent _OSC to be evaluated simultaneously */ + static DEFINE_MUTEX(mutex); + static int status = APEI_OSC_SETUP_UNKNOWN; + static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c"; + acpi_handle handle; + u32 capbuf[3]; + struct acpi_osc_context context = { + .uuid_str = apei_uuid_str, + .rev = 1, + .cap.length = sizeof(capbuf), + .cap.pointer = capbuf, + }; + + mutex_lock(&mutex); + if (status == APEI_OSC_SETUP_UNKNOWN) { + capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE; + capbuf[OSC_SUPPORT_TYPE] = 0; + capbuf[OSC_CONTROL_TYPE] = 0; + + if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)) + || ACPI_FAILURE(acpi_run_osc(handle, &context))) { + pr_err(APEI_PFX "APEI _OSC failed!\n"); + status = APEI_OSC_SETUP_FAILED; + } else { + kfree(context.ret.pointer); + status = APEI_OSC_SETUP_SUCCEEDED; + } + } + mutex_unlock(&mutex); + + return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO; +} +EXPORT_SYMBOL_GPL(apei_osc_setup); --- a/drivers/acpi/apei/apei-internal.h +++ b/drivers/acpi/apei/apei-internal.h @@ -113,4 +113,6 @@ void apei_estatus_print(const char *pfx, const struct acpi_hest_generic_status *estatus); int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus); int apei_estatus_check(const struct acpi_hest_generic_status *estatus); + +int apei_osc_setup(void); #endif --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -687,6 +687,8 @@ static unsigned long ghes_esource_preall return prealloc_size; } +static int ghes_remove(struct platform_device *ghes_dev); + static int __devinit ghes_probe(struct platform_device *ghes_dev) { struct acpi_hest_generic *generic; @@ -770,6 +772,12 @@ static int __devinit ghes_probe(struct p } platform_set_drvdata(ghes_dev, ghes); + rc = apei_osc_setup(); + if (rc) { + ghes_remove(ghes_dev); + return rc; + } + return 0; err: if (ghes) {
In APEI firmware first mode, hardware error is reported by hardware to firmware firstly, then firmware reports the error to Linux in a GHES error record via POLL/SCI/IRQ/NMI etc. This may result in some issues if OS has no full APEI support. So some firmware implementation will work in a back-compatible mode by default. Where firmware will only notify OS in old-fashion, without GHES record. For example, for a fatal hardware error, only NMI is signaled, no GHES record. To gain full APEI power on these machines, a special APEI _OSC needs to be evaluated to tell firmware that Linux has full APEI support. This patch add the APEI _OSC support. Signed-off-by: Huang Ying <ying.huang@intel.com> --- drivers/acpi/apei/apei-base.c | 42 ++++++++++++++++++++++++++++++++++++++ drivers/acpi/apei/apei-internal.h | 2 + drivers/acpi/apei/ghes.c | 8 +++++++ 3 files changed, 52 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html