diff mbox

ACPI, APEI, Add APEI _OSC support

Message ID 1306303538-30524-1-git-send-email-ying.huang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Huang, Ying May 25, 2011, 6:05 a.m. UTC
In APEI firmware first mode, hardware error is reported by hardware to
firmware firstly, then firmware reports the error to Linux in a GHES
error record via POLL/SCI/IRQ/NMI etc.

This may result in some issues if OS has no full APEI support.  So
some firmware implementation will work in a back-compatible mode by
default.  Where firmware will only notify OS in old-fashion, without
GHES record.  For example, for a fatal hardware error, only NMI is
signaled, no GHES record.

To gain full APEI power on these machines, a special APEI _OSC needs
to be evaluated to tell firmware that Linux has full APEI support.
This patch add the APEI _OSC support.

Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/apei/apei-base.c     |   42 ++++++++++++++++++++++++++++++++++++++
 drivers/acpi/apei/apei-internal.h |    2 +
 drivers/acpi/apei/ghes.c          |    8 +++++++
 3 files changed, 52 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Don Zickus June 13, 2011, 2:50 p.m. UTC | #1
On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:
> In APEI firmware first mode, hardware error is reported by hardware to
> firmware firstly, then firmware reports the error to Linux in a GHES
> error record via POLL/SCI/IRQ/NMI etc.
> 
> This may result in some issues if OS has no full APEI support.  So
> some firmware implementation will work in a back-compatible mode by
> default.  Where firmware will only notify OS in old-fashion, without
> GHES record.  For example, for a fatal hardware error, only NMI is
> signaled, no GHES record.
> 
> To gain full APEI power on these machines, a special APEI _OSC needs
> to be evaluated to tell firmware that Linux has full APEI support.
> This patch add the APEI _OSC support.

Using an Intel box I have over at RedHat, I was able to use this patch to
get error injection (EINJ) to provide me a GHES record.  Prior to this 
patch I would just get unknown NMIs.

Talking with Matthew Garret, I guess it seems that uuids like this are
typical for ACPI.

I can't speak for all the ACPI parts, but the patch looks simple and
corret from my perspective.

Reviewed-and-Tested-by: Don Zickus <dzickus@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chen Gong June 14, 2011, 6:33 a.m. UTC | #2
? 6/13/2011 10:50 PM, Don Zickus ??:
> On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:
>> In APEI firmware first mode, hardware error is reported by hardware to
>> firmware firstly, then firmware reports the error to Linux in a GHES
>> error record via POLL/SCI/IRQ/NMI etc.
>>
>> This may result in some issues if OS has no full APEI support.  So
>> some firmware implementation will work in a back-compatible mode by
>> default.  Where firmware will only notify OS in old-fashion, without
>> GHES record.  For example, for a fatal hardware error, only NMI is
>> signaled, no GHES record.
>>
>> To gain full APEI power on these machines, a special APEI _OSC needs
>> to be evaluated to tell firmware that Linux has full APEI support.
>> This patch add the APEI _OSC support.
>
> Using an Intel box I have over at RedHat, I was able to use this patch to
> get error injection (EINJ) to provide me a GHES record.  Prior to this
> patch I would just get unknown NMIs.
>
> Talking with Matthew Garret, I guess it seems that uuids like this are
> typical for ACPI.
>
> I can't speak for all the ACPI parts, but the patch looks simple and
> corret from my perspective.
>
Interesting, but I can't reproduce it on our machines by now. Would you
please share your test information such as machine type, BIOS
version, and test method (such as what type error you inject) ?
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Zickus June 14, 2011, 12:11 p.m. UTC | #3
On Tue, Jun 14, 2011 at 02:33:19PM +0800, Chen Gong wrote:
> >Using an Intel box I have over at RedHat, I was able to use this patch to
> >get error injection (EINJ) to provide me a GHES record.  Prior to this
> >patch I would just get unknown NMIs.
> >
> >Talking with Matthew Garret, I guess it seems that uuids like this are
> >typical for ACPI.
> >
> >I can't speak for all the ACPI parts, but the patch looks simple and
> >corret from my perspective.
> >
> Interesting, but I can't reproduce it on our machines by now. Would you
> please share your test information such as machine type, BIOS
> version, and test method (such as what type error you inject) ?

Sure,

#cat /proc/cpuinfo

processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
stepping        : 2
cpu MHz         : 1596.000

#dmidecode

Handle 0x0002, DMI type 1, 27 bytes
System Information
        Manufacturer: Intel Corporation
        Product Name: S5520UR
        Version: ....................
        Serial Number: ............
        UUID: 57890342-6DA6-11DF-BD83-001517EE9AE8
        Wake-up Type: AC Power Restored
        SKU Number: Not Specified
        Family: Not Specified

Handle 0x0003, DMI type 2, 16 bytes
Base Board Information
        Manufacturer: Intel Corporation
        Product Name: S5520UR
        Version: E81084-751
        Serial Number: BZUB02211743
        Asset Tag: ....................
        Features:
                Board is a hosting board
                Board is replaceable
        Location In Chassis: Not Specified
        Chassis Handle: 0x0004
        Type: Motherboard
        Contained Object Handles: 0

Handle 0x0005, DMI type 0, 24 bytes
BIOS Information
        Vendor: Intel Corp.
        Version: S5500.86B.01.00.0055.112620101923
        Release Date: 11/26/2010
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 8192 kB
        Characteristics:
                PCI is supported
                PNP is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                EDD is supported
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
:
                CGA/mono video services are supported (int 10h)
                ACPI is supported
                USB legacy is supported
                LS-120 boot is supported
                ATAPI Zip drive boot is supported
                Function key-initiated network boot is supported
                Targeted content distribution is supported
        BIOS Revision: 17.18
        Firmware Revision: 0.0

#instructions
mount -t debugfs debugfs /sys/kernel/debug
cd /sys/kernel/debug/apei/einj
echo 0x20 > error_type
echo 1 > error_inject
#should fail within one second

Let me know if you need more info.
By the way this is the only machine where this works.  Other Intel
machines stopped producing NMIs when their BIOS was updated (before I got
the chance to validate this patch on them).

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 14, 2011, 2:52 p.m. UTC | #4
On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:

> To gain full APEI power on these machines, a special APEI _OSC needs
> to be evaluated to tell firmware that Linux has full APEI support.
> This patch add the APEI _OSC support.

(snip)

> +	static DEFINE_MUTEX(mutex);
> +	static int status = APEI_OSC_SETUP_UNKNOWN;
> +	static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";

This is the WHEA UUID, right? 

> +	u32 capbuf[3];
> +	struct acpi_osc_context context = {
> +		.uuid_str	= apei_uuid_str,
> +		.rev		= 1,
> +		.cap.length	= sizeof(capbuf),
> +		.cap.pointer	= capbuf,
> +	};
> +
> +	mutex_lock(&mutex);
> +	if (status == APEI_OSC_SETUP_UNKNOWN) {
> +		capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
> +		capbuf[OSC_SUPPORT_TYPE] = 0;
> +		capbuf[OSC_CONTROL_TYPE] = 0;
> +
> +		if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
> +		    || ACPI_FAILURE(acpi_run_osc(handle, &context))) {
> +			pr_err(APEI_PFX "APEI _OSC failed!\n");
> +			status = APEI_OSC_SETUP_FAILED;
> +		} else {
> +			kfree(context.ret.pointer);
> +			status = APEI_OSC_SETUP_SUCCEEDED;
> +		}
> +	}
> +	mutex_unlock(&mutex);
> +
> +	return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO;

So we fail if the platform doesn't implement WHEA...

> +	rc = apei_osc_setup();
> +	if (rc) {
> +		ghes_remove(ghes_dev);
> +		return rc;
> +	}
> +

And then tear down GHES. This seems wrong. A platform could predicate 
APEI functionality on the ACPI spec APEI indication (which we currently 
don't pass) without implementing WHEA, but with this patch we'd refuse 
to enable GHES support? We should probably try both the standard method 
and the WHEA method and only disable GHES if both fail.

(Also, are there any other sideeffects of indicating that we support 
WHEA?)
Huang, Ying June 15, 2011, 3:53 a.m. UTC | #5
Hi, Matthew,

On 06/14/2011 10:52 PM, Matthew Garrett wrote:
> On Wed, May 25, 2011 at 02:05:38PM +0800, Huang Ying wrote:
> 
>> To gain full APEI power on these machines, a special APEI _OSC needs
>> to be evaluated to tell firmware that Linux has full APEI support.
>> This patch add the APEI _OSC support.
> 
> (snip)
> 
>> +	static DEFINE_MUTEX(mutex);
>> +	static int status = APEI_OSC_SETUP_UNKNOWN;
>> +	static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";
> 
> This is the WHEA UUID, right? 

Yes.

>> +	u32 capbuf[3];
>> +	struct acpi_osc_context context = {
>> +		.uuid_str	= apei_uuid_str,
>> +		.rev		= 1,
>> +		.cap.length	= sizeof(capbuf),
>> +		.cap.pointer	= capbuf,
>> +	};
>> +
>> +	mutex_lock(&mutex);
>> +	if (status == APEI_OSC_SETUP_UNKNOWN) {
>> +		capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
>> +		capbuf[OSC_SUPPORT_TYPE] = 0;
>> +		capbuf[OSC_CONTROL_TYPE] = 0;
>> +
>> +		if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
>> +		    || ACPI_FAILURE(acpi_run_osc(handle, &context))) {
>> +			pr_err(APEI_PFX "APEI _OSC failed!\n");
>> +			status = APEI_OSC_SETUP_FAILED;
>> +		} else {
>> +			kfree(context.ret.pointer);
>> +			status = APEI_OSC_SETUP_SUCCEEDED;
>> +		}
>> +	}
>> +	mutex_unlock(&mutex);
>> +
>> +	return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO;
> 
> So we fail if the platform doesn't implement WHEA...
> 
>> +	rc = apei_osc_setup();
>> +	if (rc) {
>> +		ghes_remove(ghes_dev);
>> +		return rc;
>> +	}
>> +
> 
> And then tear down GHES. This seems wrong. A platform could predicate 
> APEI functionality on the ACPI spec APEI indication (which we currently 
> don't pass) without implementing WHEA, but with this patch we'd refuse 
> to enable GHES support? We should probably try both the standard method 
> and the WHEA method and only disable GHES if both fail.

You means the "APEI Support" bit for standard UUID?  Do you know which
machine uses this bit?  I can write the code, but I have no machine to
test it.

BTW, it is better for us to enable APEI firmware first mode (that is,
what is enabled by evaluating the WHEA UUID) after GHES reporting is
ready (that is, after GHES module is successfully loaded).  That is
later than current ACPI _OSC evaluation with standard UUID.  Is it
possible to evaluate _OSC with standard UUID twice?  So that we can
enable APEI firmware first mode later.

> (Also, are there any other sideeffects of indicating that we support 
> WHEA?)

After evaluating _OSC with this UUID, firmware will produce error record
to OS, otherwise only unknown NMI.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 15, 2011, 12:17 p.m. UTC | #6
On Wed, Jun 15, 2011 at 11:53:32AM +0800, Huang Ying wrote:
> Hi, Matthew,
> On 06/14/2011 10:52 PM, Matthew Garrett wrote:
> > And then tear down GHES. This seems wrong. A platform could predicate 
> > APEI functionality on the ACPI spec APEI indication (which we currently 
> > don't pass) without implementing WHEA, but with this patch we'd refuse 
> > to enable GHES support? We should probably try both the standard method 
> > and the WHEA method and only disable GHES if both fail.
> 
> You means the "APEI Support" bit for standard UUID?  Do you know which
> machine uses this bit?  I can write the code, but I have no machine to
> test it.

I have access to a Dell system that uses this.

> BTW, it is better for us to enable APEI firmware first mode (that is,
> what is enabled by evaluating the WHEA UUID) after GHES reporting is
> ready (that is, after GHES module is successfully loaded).  That is
> later than current ACPI _OSC evaluation with standard UUID.  Is it
> possible to evaluate _OSC with standard UUID twice?  So that we can
> enable APEI firmware first mode later.

Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is 
made, and then clears a flag if any other _OSC call is made. In that 
specific case it doesn't seem to matter (the flag never actually gets 
checked in any of the other codepaths), but it seems that the intention 
is for the generic call to be made and the WHEA one to be made after 
that.

> > (Also, are there any other sideeffects of indicating that we support 
> > WHEA?)
> 
> After evaluating _OSC with this UUID, firmware will produce error record
> to OS, otherwise only unknown NMI.

Ok, that sounds fine.
Huang, Ying June 16, 2011, 12:40 a.m. UTC | #7
On 06/15/2011 08:17 PM, Matthew Garrett wrote:
> On Wed, Jun 15, 2011 at 11:53:32AM +0800, Huang Ying wrote:
>> Hi, Matthew,
>> On 06/14/2011 10:52 PM, Matthew Garrett wrote:
>>> And then tear down GHES. This seems wrong. A platform could predicate 
>>> APEI functionality on the ACPI spec APEI indication (which we currently 
>>> don't pass) without implementing WHEA, but with this patch we'd refuse 
>>> to enable GHES support? We should probably try both the standard method 
>>> and the WHEA method and only disable GHES if both fail.
>>
>> You means the "APEI Support" bit for standard UUID?  Do you know which
>> machine uses this bit?  I can write the code, but I have no machine to
>> test it.
> 
> I have access to a Dell system that uses this.

Great! Can you help us to test the code?

>> BTW, it is better for us to enable APEI firmware first mode (that is,
>> what is enabled by evaluating the WHEA UUID) after GHES reporting is
>> ready (that is, after GHES module is successfully loaded).  That is
>> later than current ACPI _OSC evaluation with standard UUID.  Is it
>> possible to evaluate _OSC with standard UUID twice?  So that we can
>> enable APEI firmware first mode later.
> 
> Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is 
> made, and then clears a flag if any other _OSC call is made. In that 
> specific case it doesn't seem to matter (the flag never actually gets 
> checked in any of the other codepaths), but it seems that the intention 
> is for the generic call to be made and the WHEA one to be made after 
> that.

Yes.  The WHEA call should be made after the generic one.  Another
situation is as follow:

- Generic _OSC call without "APEI Support" bit is called (in
acpi_bus_osc_support).

- After some time, when we think it is good to turn on firmware first
mode fully, usually after we checking HEST and initializing
corresponding module, we make generic _OSC call with "APEI Support" bit
to turn on firmware first mode fully in standard way.

Is it a good idea to make generic _OSC call twice, one without "APEI
Support" bit, the other with "APEI Support" bit?

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 16, 2011, 1:38 a.m. UTC | #8
On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote:
> On 06/15/2011 08:17 PM, Matthew Garrett wrote:
> >> You means the "APEI Support" bit for standard UUID?  Do you know which
> >> machine uses this bit?  I can write the code, but I have no machine to
> >> test it.
> > 
> > I have access to a Dell system that uses this.
> 
> Great! Can you help us to test the code?

Yup, no problem.

> > Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is 
> > made, and then clears a flag if any other _OSC call is made. In that 
> > specific case it doesn't seem to matter (the flag never actually gets 
> > checked in any of the other codepaths), but it seems that the intention 
> > is for the generic call to be made and the WHEA one to be made after 
> > that.
> 
> Yes.  The WHEA call should be made after the generic one.  Another
> situation is as follow:
> 
> - Generic _OSC call without "APEI Support" bit is called (in
> acpi_bus_osc_support).
> 
> - After some time, when we think it is good to turn on firmware first
> mode fully, usually after we checking HEST and initializing
> corresponding module, we make generic _OSC call with "APEI Support" bit
> to turn on firmware first mode fully in standard way.
> 
> Is it a good idea to make generic _OSC call twice, one without "APEI
> Support" bit, the other with "APEI Support" bit?

I think we probably need to make the HEST decision early, and use that 
to decide how to make the generic call. Our experience has been that 
many firmware vendors only expect _OSC to be called once with any given 
UUID - multiple calls can result in unexpected behaviour.
Huang, Ying June 16, 2011, 1:55 a.m. UTC | #9
On 06/16/2011 09:38 AM, Matthew Garrett wrote:
> On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote:
>> On 06/15/2011 08:17 PM, Matthew Garrett wrote:
>>>> You means the "APEI Support" bit for standard UUID?  Do you know which
>>>> machine uses this bit?  I can write the code, but I have no machine to
>>>> test it.
>>>
>>> I have access to a Dell system that uses this.
>>
>> Great! Can you help us to test the code?
> 
> Yup, no problem.
> 
>>> Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is 
>>> made, and then clears a flag if any other _OSC call is made. In that 
>>> specific case it doesn't seem to matter (the flag never actually gets 
>>> checked in any of the other codepaths), but it seems that the intention 
>>> is for the generic call to be made and the WHEA one to be made after 
>>> that.
>>
>> Yes.  The WHEA call should be made after the generic one.  Another
>> situation is as follow:
>>
>> - Generic _OSC call without "APEI Support" bit is called (in
>> acpi_bus_osc_support).
>>
>> - After some time, when we think it is good to turn on firmware first
>> mode fully, usually after we checking HEST and initializing
>> corresponding module, we make generic _OSC call with "APEI Support" bit
>> to turn on firmware first mode fully in standard way.
>>
>> Is it a good idea to make generic _OSC call twice, one without "APEI
>> Support" bit, the other with "APEI Support" bit?
> 
> I think we probably need to make the HEST decision early, and use that 
> to decide how to make the generic call. Our experience has been that 
> many firmware vendors only expect _OSC to be called once with any given 
> UUID - multiple calls can result in unexpected behaviour.

acpi_bus_osc_support is called via subsys_initcall.  It is a little hard
to do all checking before that.  Is it possible to call
acpi_bus_osc_support later?

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 16, 2011, 1:57 a.m. UTC | #10
On Thu, Jun 16, 2011 at 09:55:58AM +0800, Huang Ying wrote:
> On 06/16/2011 09:38 AM, Matthew Garrett wrote:
> > I think we probably need to make the HEST decision early, and use that 
> > to decide how to make the generic call. Our experience has been that 
> > many firmware vendors only expect _OSC to be called once with any given 
> > UUID - multiple calls can result in unexpected behaviour.
> 
> acpi_bus_osc_support is called via subsys_initcall.  It is a little hard
> to do all checking before that.  Is it possible to call
> acpi_bus_osc_support later?

Yeah, this is going to be a problem. We have the HEST available at this 
point so we ought to be able to parse it, though. I'll take a look 
tomorrow.
Rafael Wysocki June 16, 2011, 9 a.m. UTC | #11
On Thursday, June 16, 2011, Matthew Garrett wrote:
> On Thu, Jun 16, 2011 at 08:40:11AM +0800, Huang Ying wrote:
> > On 06/15/2011 08:17 PM, Matthew Garrett wrote:
> > >> You means the "APEI Support" bit for standard UUID?  Do you know which
> > >> machine uses this bit?  I can write the code, but I have no machine to
> > >> test it.
> > > 
> > > I have access to a Dell system that uses this.
> > 
> > Great! Can you help us to test the code?
> 
> Yup, no problem.
> 
> > > Urgh. One machine I've looked at enables APEI if the WHEA _OSC call is 
> > > made, and then clears a flag if any other _OSC call is made. In that 
> > > specific case it doesn't seem to matter (the flag never actually gets 
> > > checked in any of the other codepaths), but it seems that the intention 
> > > is for the generic call to be made and the WHEA one to be made after 
> > > that.
> > 
> > Yes.  The WHEA call should be made after the generic one.  Another
> > situation is as follow:
> > 
> > - Generic _OSC call without "APEI Support" bit is called (in
> > acpi_bus_osc_support).
> > 
> > - After some time, when we think it is good to turn on firmware first
> > mode fully, usually after we checking HEST and initializing
> > corresponding module, we make generic _OSC call with "APEI Support" bit
> > to turn on firmware first mode fully in standard way.
> > 
> > Is it a good idea to make generic _OSC call twice, one without "APEI
> > Support" bit, the other with "APEI Support" bit?
> 
> I think we probably need to make the HEST decision early, and use that 
> to decide how to make the generic call. Our experience has been that 
> many firmware vendors only expect _OSC to be called once with any given 
> UUID - multiple calls can result in unexpected behaviour.

Not really, as long as the "query enable" bit is set.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang, Ying June 17, 2011, 12:57 a.m. UTC | #12
On 06/16/2011 09:57 AM, Matthew Garrett wrote:
> On Thu, Jun 16, 2011 at 09:55:58AM +0800, Huang Ying wrote:
>> On 06/16/2011 09:38 AM, Matthew Garrett wrote:
>>> I think we probably need to make the HEST decision early, and use that 
>>> to decide how to make the generic call. Our experience has been that 
>>> many firmware vendors only expect _OSC to be called once with any given 
>>> UUID - multiple calls can result in unexpected behaviour.
>>
>> acpi_bus_osc_support is called via subsys_initcall.  It is a little hard
>> to do all checking before that.  Is it possible to call
>> acpi_bus_osc_support later?
> 
> Yeah, this is going to be a problem. We have the HEST available at this 
> point so we ought to be able to parse it, though. I'll take a look 
> tomorrow.

We can check the HEST table before _OSC evaluating.  But it is much
harder to check software part, because we have implemented GHES support
(Generic Hardware Error Source, the handler of firmware first mode
hardware error notification) as device driver and module.

So I think we can do that in 2 steps.  At first, we just enable WHEA
UUID, because that is easier to do.  Then we find a way to implement
"APEI bit" in generic _OSC call.  Do you think that is a good idea?

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 17, 2011, 1:34 a.m. UTC | #13
On Fri, Jun 17, 2011 at 08:57:09AM +0800, Huang Ying wrote:
> On 06/16/2011 09:57 AM, Matthew Garrett wrote:
> > Yeah, this is going to be a problem. We have the HEST available at this 
> > point so we ought to be able to parse it, though. I'll take a look 
> > tomorrow.
> 
> We can check the HEST table before _OSC evaluating.  But it is much
> harder to check software part, because we have implemented GHES support
> (Generic Hardware Error Source, the handler of firmware first mode
> hardware error notification) as device driver and module.

If the kernel has been configured with support for the feature then I 
think we ought to be able to assume that the kernel will support it at 
runtime.

> So I think we can do that in 2 steps.  At first, we just enable WHEA
> UUID, because that is easier to do.  Then we find a way to implement
> "APEI bit" in generic _OSC call.  Do you think that is a good idea?

I'm fine with that, providing that GHES isn't disabled purely because 
the WHEA UUID call wasn't successful.
Huang, Ying June 17, 2011, 1:40 a.m. UTC | #14
On 06/17/2011 09:34 AM, Matthew Garrett wrote:
> On Fri, Jun 17, 2011 at 08:57:09AM +0800, Huang Ying wrote:
>> On 06/16/2011 09:57 AM, Matthew Garrett wrote:
>>> Yeah, this is going to be a problem. We have the HEST available at this 
>>> point so we ought to be able to parse it, though. I'll take a look 
>>> tomorrow.
>>
>> We can check the HEST table before _OSC evaluating.  But it is much
>> harder to check software part, because we have implemented GHES support
>> (Generic Hardware Error Source, the handler of firmware first mode
>> hardware error notification) as device driver and module.
> 
> If the kernel has been configured with support for the feature then I 
> think we ought to be able to assume that the kernel will support it at 
> runtime.

There may be error during driver initialization.  That is what I am
concerned.

>> So I think we can do that in 2 steps.  At first, we just enable WHEA
>> UUID, because that is easier to do.  Then we find a way to implement
>> "APEI bit" in generic _OSC call.  Do you think that is a good idea?
> 
> I'm fine with that, providing that GHES isn't disabled purely because 
> the WHEA UUID call wasn't successful.

Because we have not added the code to make generic _OSC call with "APEI
bit" now, so if WHEA UUID call failed, we have no firmware first mode
enabled.  So I think it is safe to disable GHES if WHEA UUID call
failed.  But in another hand, keeping GHES has no harm too.  So I am OK
to keep GHES if WHEA UUID call failed.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthew Garrett June 17, 2011, 1:42 a.m. UTC | #15
On Fri, Jun 17, 2011 at 09:40:17AM +0800, Huang Ying wrote:
> On 06/17/2011 09:34 AM, Matthew Garrett wrote:
> > If the kernel has been configured with support for the feature then I 
> > think we ought to be able to assume that the kernel will support it at 
> > runtime.
> 
> There may be error during driver initialization.  That is what I am
> concerned.

That's true of any _OSC functionality.

> >> So I think we can do that in 2 steps.  At first, we just enable WHEA
> >> UUID, because that is easier to do.  Then we find a way to implement
> >> "APEI bit" in generic _OSC call.  Do you think that is a good idea?
> > 
> > I'm fine with that, providing that GHES isn't disabled purely because 
> > the WHEA UUID call wasn't successful.
> 
> Because we have not added the code to make generic _OSC call with "APEI
> bit" now, so if WHEA UUID call failed, we have no firmware first mode
> enabled.  So I think it is safe to disable GHES if WHEA UUID call
> failed.  But in another hand, keeping GHES has no harm too.  So I am OK
> to keep GHES if WHEA UUID call failed.

I see your point. But this does need to be fixed in the long run.
Huang, Ying June 17, 2011, 1:53 a.m. UTC | #16
On 06/17/2011 09:42 AM, Matthew Garrett wrote:
> On Fri, Jun 17, 2011 at 09:40:17AM +0800, Huang Ying wrote:
>> On 06/17/2011 09:34 AM, Matthew Garrett wrote:
>>> If the kernel has been configured with support for the feature then I 
>>> think we ought to be able to assume that the kernel will support it at 
>>> runtime.
>>
>> There may be error during driver initialization.  That is what I am
>> concerned.
> 
> That's true of any _OSC functionality.

I see.  This should be fixed in the long run.  So I will add "APEI bit"
support if GHES is configured.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -603,3 +603,45 @@  struct dentry *apei_get_debugfs_dir(void
 	return dapei;
 }
 EXPORT_SYMBOL_GPL(apei_get_debugfs_dir);
+
+enum {
+	APEI_OSC_SETUP_UNKNOWN,
+	APEI_OSC_SETUP_FAILED,
+	APEI_OSC_SETUP_SUCCEEDED,
+};
+
+int apei_osc_setup(void)
+{
+	/* Prevent _OSC to be evaluated simultaneously */
+	static DEFINE_MUTEX(mutex);
+	static int status = APEI_OSC_SETUP_UNKNOWN;
+	static u8 apei_uuid_str[] = "ed855e0c-6c90-47bf-a62a-26de0fc5ad5c";
+	acpi_handle handle;
+	u32 capbuf[3];
+	struct acpi_osc_context context = {
+		.uuid_str	= apei_uuid_str,
+		.rev		= 1,
+		.cap.length	= sizeof(capbuf),
+		.cap.pointer	= capbuf,
+	};
+
+	mutex_lock(&mutex);
+	if (status == APEI_OSC_SETUP_UNKNOWN) {
+		capbuf[OSC_QUERY_TYPE] = OSC_QUERY_ENABLE;
+		capbuf[OSC_SUPPORT_TYPE] = 0;
+		capbuf[OSC_CONTROL_TYPE] = 0;
+
+		if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle))
+		    || ACPI_FAILURE(acpi_run_osc(handle, &context))) {
+			pr_err(APEI_PFX "APEI _OSC failed!\n");
+			status = APEI_OSC_SETUP_FAILED;
+		} else {
+			kfree(context.ret.pointer);
+			status = APEI_OSC_SETUP_SUCCEEDED;
+		}
+	}
+	mutex_unlock(&mutex);
+
+	return status == APEI_OSC_SETUP_SUCCEEDED ? 0 : -EIO;
+}
+EXPORT_SYMBOL_GPL(apei_osc_setup);
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -113,4 +113,6 @@  void apei_estatus_print(const char *pfx,
 			const struct acpi_hest_generic_status *estatus);
 int apei_estatus_check_header(const struct acpi_hest_generic_status *estatus);
 int apei_estatus_check(const struct acpi_hest_generic_status *estatus);
+
+int apei_osc_setup(void);
 #endif
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -687,6 +687,8 @@  static unsigned long ghes_esource_preall
 	return prealloc_size;
 }
 
+static int ghes_remove(struct platform_device *ghes_dev);
+
 static int __devinit ghes_probe(struct platform_device *ghes_dev)
 {
 	struct acpi_hest_generic *generic;
@@ -770,6 +772,12 @@  static int __devinit ghes_probe(struct p
 	}
 	platform_set_drvdata(ghes_dev, ghes);
 
+	rc = apei_osc_setup();
+	if (rc) {
+		ghes_remove(ghes_dev);
+		return rc;
+	}
+
 	return 0;
 err:
 	if (ghes) {