diff mbox

APEI: Can not request iomem region for GARs

Message ID 4E571D66.1070302@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Huang, Ying Aug. 26, 2011, 4:13 a.m. UTC
Hi, Pavel,

Sorry,  there is a minor issue in the patch.  Do you have time to try
the updated version attached.

Best Regards,
Haung Ying

On 08/25/2011 10:45 AM, Pavel Ivanov wrote:
> Huang,
> 
> Applying of your patches didn't help. I'm attaching new dmesg in case
> if you need it.
> 
> 
> Pavel
> 
> 
> On Mon, Aug 22, 2011 at 3:12 AM, Huang Ying <ying.huang@intel.com> wrote:
>> Hi, Pavel,
>>
>> Do you have time to try the patch attached with the mail?
>> acpi_nvs.patch should go first.
>>
>> Best Regards,
>> Huang Ying
>>
>>

Comments

Bjorn Helgaas Aug. 26, 2011, 1:43 p.m. UTC | #1
On Thu, Aug 25, 2011 at 10:13 PM, Huang Ying <ying.huang@intel.com> wrote:
> Hi, Pavel,
>
> Sorry,  there is a minor issue in the patch.  Do you have time to try
> the updated version attached.

I think Yinghai's patch is a better approach (though it needs a changelog).

The ACPI NVS space should not be marked as "busy" by the e820 code in
the iomem_resource tree.

It's way too complicated to mess around with registering NVS space and
trying to deal with it specially in APEI.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang, Ying Aug. 29, 2011, 1:27 a.m. UTC | #2
On 08/26/2011 09:43 PM, Bjorn Helgaas wrote:
> On Thu, Aug 25, 2011 at 10:13 PM, Huang Ying <ying.huang@intel.com> wrote:
>> Hi, Pavel,
>>
>> Sorry,  there is a minor issue in the patch.  Do you have time to try
>> the updated version attached.
> 
> I think Yinghai's patch is a better approach (though it needs a changelog).
> 
> The ACPI NVS space should not be marked as "busy" by the e820 code in
> the iomem_resource tree.
> 
> It's way too complicated to mess around with registering NVS space and
> trying to deal with it specially in APEI.

ACPI NVS is different, it can be used only by firmware, and its
interpreter, such as ACPI AML interpreter and APEI interpreter.  It can
not be used by ordinary driver.  If my understanding is correct, this is
why ACPI NVS is marked as busy.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang, Ying Aug. 29, 2011, 5:37 a.m. UTC | #3
On 08/27/2011 11:28 AM, Pavel Ivanov wrote:
> Huang,
> 
> I can confirm that new patch got rid of the old error message but
> introduced a new one:
> 
> ERST: Failed to get Error Log Address Range
> 
> Also I noticed there's another suspicious message in dmesg and it was
> there earlier too (although it's not shown on console):
> 
> GHES: Failed to enable APEI firmware first mode
> 
> Is it bad? What does it mean?

Maybe this has been fixed, please this patch,

https://lkml.org/lkml/2011/8/11/297

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 29, 2011, 2:48 p.m. UTC | #4
On Sun, Aug 28, 2011 at 7:27 PM, Huang Ying <ying.huang@intel.com> wrote:
> On 08/26/2011 09:43 PM, Bjorn Helgaas wrote:
>> On Thu, Aug 25, 2011 at 10:13 PM, Huang Ying <ying.huang@intel.com> wrote:
>>> Hi, Pavel,
>>>
>>> Sorry,  there is a minor issue in the patch.  Do you have time to try
>>> the updated version attached.
>>
>> I think Yinghai's patch is a better approach (though it needs a changelog).
>>
>> The ACPI NVS space should not be marked as "busy" by the e820 code in
>> the iomem_resource tree.
>>
>> It's way too complicated to mess around with registering NVS space and
>> trying to deal with it specially in APEI.
>
> ACPI NVS is different, it can be used only by firmware, and its
> interpreter, such as ACPI AML interpreter and APEI interpreter.  It can
> not be used by ordinary driver.  If my understanding is correct, this is
> why ACPI NVS is marked as busy.

I don't understand why ACPI NVS is different.  If we reserve it in
iomem_resource (without marking it busy), it's already unavailable to
anything else unless you know something connected to the NVS region.

The only way to allocate space inside NVS is to first discover the
ACPI NVS resource, so you could pass it to request_resource() or
allocate_resource() as the "root" to allocate from.  There's no
mechanism for locating that resource, so it's already protected in
that sense.

To use request_mem_region() (which marks the region busy), you all you
need is an address.  The only way for a driver to learn an address
inside NVS is from the ACPI tables.  So I think it's effectively
limited to APEI there, too.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang, Ying Aug. 30, 2011, 12:57 a.m. UTC | #5
On 08/29/2011 10:48 PM, Bjorn Helgaas wrote:
> On Sun, Aug 28, 2011 at 7:27 PM, Huang Ying <ying.huang@intel.com> wrote:
>> On 08/26/2011 09:43 PM, Bjorn Helgaas wrote:
>>> On Thu, Aug 25, 2011 at 10:13 PM, Huang Ying <ying.huang@intel.com> wrote:
>>>> Hi, Pavel,
>>>>
>>>> Sorry,  there is a minor issue in the patch.  Do you have time to try
>>>> the updated version attached.
>>>
>>> I think Yinghai's patch is a better approach (though it needs a changelog).
>>>
>>> The ACPI NVS space should not be marked as "busy" by the e820 code in
>>> the iomem_resource tree.
>>>
>>> It's way too complicated to mess around with registering NVS space and
>>> trying to deal with it specially in APEI.
>>
>> ACPI NVS is different, it can be used only by firmware, and its
>> interpreter, such as ACPI AML interpreter and APEI interpreter.  It can
>> not be used by ordinary driver.  If my understanding is correct, this is
>> why ACPI NVS is marked as busy.
> 
> I don't understand why ACPI NVS is different.  If we reserve it in
> iomem_resource (without marking it busy), it's already unavailable to
> anything else unless you know something connected to the NVS region.

I think ACPI NVS can be seen as reserved by ACPI or firmware
interpreter.  So that, all other drivers should not access it.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Ivanov Sept. 3, 2011, 2:44 a.m. UTC | #6
On Mon, Aug 29, 2011 at 1:37 AM, Huang Ying <ying.huang@intel.com> wrote:
> On 08/27/2011 11:28 AM, Pavel Ivanov wrote:
>> Also I noticed there's another suspicious message in dmesg and it was
>> there earlier too (although it's not shown on console):
>>
>> GHES: Failed to enable APEI firmware first mode
>>
>> Is it bad? What does it mean?
>
> Maybe this has been fixed, please this patch,
>
> https://lkml.org/lkml/2011/8/11/297

Yes, this patch indeed fixed this failure message. It's now replaced by

GHES: APEI firmware first mode is enabled by WHEA _OSC.

Can I ask you another question: what does the following message mean?

[Firmware Warn]: GHES: Poll interval is 0 for generic hardware error
source: 1, disabled.


Pavel


P.S. No word from Asus yet on the ERST issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Huang, Ying Sept. 5, 2011, 2:05 a.m. UTC | #7
On 09/03/2011 10:44 AM, Pavel Ivanov wrote:
> On Mon, Aug 29, 2011 at 1:37 AM, Huang Ying <ying.huang@intel.com> wrote:
>> On 08/27/2011 11:28 AM, Pavel Ivanov wrote:
>>> Also I noticed there's another suspicious message in dmesg and it was
>>> there earlier too (although it's not shown on console):
>>>
>>> GHES: Failed to enable APEI firmware first mode
>>>
>>> Is it bad? What does it mean?
>>
>> Maybe this has been fixed, please this patch,
>>
>> https://lkml.org/lkml/2011/8/11/297
> 
> Yes, this patch indeed fixed this failure message. It's now replaced by
> 
> GHES: APEI firmware first mode is enabled by WHEA _OSC.
> 
> Can I ask you another question: what does the following message mean?
> 
> [Firmware Warn]: GHES: Poll interval is 0 for generic hardware error
> source: 1, disabled.

This means:

The generic hardware error source with ID 1 is notified by poll, that
is, Linux should poll the specified address to check whether there is
hardware error report from hardware.  But the poll interval specified by
firmware is "0".  I think this should be a firmware error, and disabled
it.  I think maybe you can contact firmware vendor to fix this.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Subject: [BUGFIX] ACPI, APEI, Resolve false conflict between ACPI NVS and APEI

Some firmware will access memory in ACPI NVS region via APEI.  That
is, instructions in APEI ERST/EINJ table will read/write ACPI NVS
region.  The original resource conflict checking in APEI code will
check memory/ioport accessed by APEI via general resource management
mech.  But ACPI NVS region is marked as busy already, so that the
false resource conflict will prevent APEI ERST/EINJ to work.

To fix this, this patch excludes ACPI NVS regions when APEI components
request resources.  So that they will not conflict with ACPI NVS
regions.

Reported-by: Pavel Ivanov <paivanof@gmail.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/apei/apei-base.c |   29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -438,8 +438,19 @@  int apei_resources_sub(struct apei_resou
 }
 EXPORT_SYMBOL_GPL(apei_resources_sub);
 
+static int apei_get_nvs_callback(__u64 start, __u64 size, void *data)
+{
+	struct apei_resources *resources = data;
+	return apei_res_add(&resources->iomem, start, size);
+}
+
+static int apei_get_nvs_resources(struct apei_resources *resources)
+{
+	return acpi_nvs_for_each_region(apei_get_nvs_callback, resources);
+}
+
 /*
- * IO memory/port rersource management mechanism is used to check
+ * IO memory/port resource management mechanism is used to check
  * whether memory/port area used by GARs conflicts with normal memory
  * or IO memory/port of devices.
  */
@@ -448,12 +459,26 @@  int apei_resources_request(struct apei_r
 {
 	struct apei_res *res, *res_bak = NULL;
 	struct resource *r;
+	struct apei_resources nvs_resources;
 	int rc;
 
 	rc = apei_resources_sub(resources, &apei_resources_all);
 	if (rc)
 		return rc;
 
+	/*
+	 * Some firmware uses ACPI NVS region, that has been marked as
+	 * busy, so exclude it from APEI resources to avoid false
+	 * conflict.
+	 */
+	apei_resources_init(&nvs_resources);
+	rc = apei_get_nvs_resources(&nvs_resources);
+	if (rc)
+		goto res_fini;
+	rc = apei_resources_sub(resources, &nvs_resources);
+	if (rc)
+		goto res_fini;
+
 	rc = -EINVAL;
 	list_for_each_entry(res, &resources->iomem, list) {
 		r = request_mem_region(res->start, res->end - res->start,
@@ -500,6 +525,8 @@  err_unmap_iomem:
 			break;
 		release_mem_region(res->start, res->end - res->start);
 	}
+res_fini:
+	apei_resources_fini(&nvs_resources);
 	return rc;
 }
 EXPORT_SYMBOL_GPL(apei_resources_request);