diff mbox

arm64: kconfig: allow support for memory failure handling

Message ID 1485985115-27274-1-git-send-email-tbaicar@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tyler Baicar Feb. 1, 2017, 9:38 p.m. UTC
From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

If ACPI_APEI and MEMORY_FAILURE is configured, select
ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
when such memory failure is reported through ACPI APEI. APEI
(ACPI Platform Error Interfaces) provides a means for the
platform to convey error information to the kernel.

Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
memory failure recovery attempt.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 arch/arm64/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

Comments

Punit Agrawal Feb. 3, 2017, 4:27 p.m. UTC | #1
Tyler Baicar <tbaicar@codeaurora.org> writes:

> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>
> If ACPI_APEI and MEMORY_FAILURE is configured, select
> ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
> when such memory failure is reported through ACPI APEI. APEI
> (ACPI Platform Error Interfaces) provides a means for the
> platform to convey error information to the kernel.
>
> Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
> memory failure recovery attempt.
>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> ---
>  arch/arm64/Kconfig | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f92778d..4cd12a0 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -15,6 +15,8 @@ config ARM64
>  	select ARCH_HAS_SG_CHAIN
>  	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>  	select ARCH_USE_CMPXCHG_LOCKREF
> +	select ACPI_APEI_MEMORY_FAILURE if ACPI_APEI && MEMORY_FAILURE
> +	select ARCH_SUPPORTS_MEMORY_FAILURE

Although enabling support for memory failure handling makes sense in the
architecture config, it feels out of place to select
ACPI_APEI_MEMORY_FAILURE here.

Maybe key it off of CONFIG_APEI?

Thanks,
Punit

>  	select ARCH_SUPPORTS_ATOMIC_RMW
>  	select ARCH_SUPPORTS_NUMA_BALANCING
>  	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
Tyler Baicar Feb. 6, 2017, 10:26 p.m. UTC | #2
Hello Punit,


On 2/3/2017 9:27 AM, Punit Agrawal wrote:
> Tyler Baicar <tbaicar@codeaurora.org> writes:
>
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>
>> If ACPI_APEI and MEMORY_FAILURE is configured, select
>> ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
>> when such memory failure is reported through ACPI APEI. APEI
>> (ACPI Platform Error Interfaces) provides a means for the
>> platform to convey error information to the kernel.
>>
>> Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
>> memory failure recovery attempt.
>>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> ---
>>   arch/arm64/Kconfig | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f92778d..4cd12a0 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -15,6 +15,8 @@ config ARM64
>>   	select ARCH_HAS_SG_CHAIN
>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>   	select ARCH_USE_CMPXCHG_LOCKREF
>> +	select ACPI_APEI_MEMORY_FAILURE if ACPI_APEI && MEMORY_FAILURE
>> +	select ARCH_SUPPORTS_MEMORY_FAILURE
> Although enabling support for memory failure handling makes sense in the
> architecture config, it feels out of place to select
> ACPI_APEI_MEMORY_FAILURE here.
>
> Maybe key it off of CONFIG_APEI?
Yes, I can move it there.

config ACPI_APEI
         bool "ACPI Platform Error Interface (APEI)"
         select MISC_FILESYSTEMS
         select PSTORE
         select UEFI_CPER
+        select ACPI_APEI_MEMORY_FAILURE if MEMORY_FAILURE
         depends on HAVE_ACPI_APEI

The ARCH_SUPPORTS_MEMORY_FAILURE should remain in arch/arm64/Kconfig 
though, correct?

Thanks,
Tyler
Punit Agrawal Feb. 7, 2017, 12:03 p.m. UTC | #3
"Baicar, Tyler" <tbaicar@codeaurora.org> writes:

> Hello Punit,
>
>
> On 2/3/2017 9:27 AM, Punit Agrawal wrote:
>> Tyler Baicar <tbaicar@codeaurora.org> writes:
>>
>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>
>>> If ACPI_APEI and MEMORY_FAILURE is configured, select
>>> ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
>>> when such memory failure is reported through ACPI APEI. APEI
>>> (ACPI Platform Error Interfaces) provides a means for the
>>> platform to convey error information to the kernel.
>>>
>>> Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
>>> memory failure recovery attempt.
>>>
>>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>>> ---
>>>   arch/arm64/Kconfig | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index f92778d..4cd12a0 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -15,6 +15,8 @@ config ARM64
>>>   	select ARCH_HAS_SG_CHAIN
>>>   	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>   	select ARCH_USE_CMPXCHG_LOCKREF
>>> +	select ACPI_APEI_MEMORY_FAILURE if ACPI_APEI && MEMORY_FAILURE
>>> +	select ARCH_SUPPORTS_MEMORY_FAILURE
>> Although enabling support for memory failure handling makes sense in the
>> architecture config, it feels out of place to select
>> ACPI_APEI_MEMORY_FAILURE here.
>>
>> Maybe key it off of CONFIG_APEI?
> Yes, I can move it there.
>
> config ACPI_APEI
>         bool "ACPI Platform Error Interface (APEI)"
>         select MISC_FILESYSTEMS
>         select PSTORE
>         select UEFI_CPER
> +        select ACPI_APEI_MEMORY_FAILURE if MEMORY_FAILURE
>         depends on HAVE_ACPI_APEI
>

That's what I was suggesting - we'll see what the ACPI maintainers think
of the change.

> The ARCH_SUPPORTS_MEMORY_FAILURE should remain in arch/arm64/Kconfig
> though, correct?

Yes, that's right - as that's a feature the architecture is advertising
support for, it should stay in the arm64 Kconfig,

>
> Thanks,
> Tyler
James Morse March 23, 2017, 2:33 p.m. UTC | #4
Hi Punit,

On 01/02/17 21:38, Tyler Baicar wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> 
> If ACPI_APEI and MEMORY_FAILURE is configured, select
> ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
> when such memory failure is reported through ACPI APEI. APEI
> (ACPI Platform Error Interfaces) provides a means for the
> platform to convey error information to the kernel.
> 
> Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
> memory failure recovery attempt.

Am I right in thinking we should wait for the hugepage issue you found with
hwpoison [0] to be fixed before arm64 can have ARCH_SUPPORTS_MEMORY_FAILURE?

(If so, can this patch become part of that series to they are obviously related!)

Thanks,

James

[0] https://www.spinics.net/lists/arm-kernel/msg568995.html




> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f92778d..4cd12a0 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -15,6 +15,8 @@ config ARM64
>  	select ARCH_HAS_SG_CHAIN
>  	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>  	select ARCH_USE_CMPXCHG_LOCKREF
> +	select ACPI_APEI_MEMORY_FAILURE if ACPI_APEI && MEMORY_FAILURE
> +	select ARCH_SUPPORTS_MEMORY_FAILURE
>  	select ARCH_SUPPORTS_ATOMIC_RMW
>  	select ARCH_SUPPORTS_NUMA_BALANCING
>  	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
>
Punit Agrawal March 23, 2017, 4:12 p.m. UTC | #5
On 23/03/17 14:33, James Morse wrote:
> Hi Punit,
>
> On 01/02/17 21:38, Tyler Baicar wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>
>> If ACPI_APEI and MEMORY_FAILURE is configured, select
>> ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
>> when such memory failure is reported through ACPI APEI. APEI
>> (ACPI Platform Error Interfaces) provides a means for the
>> platform to convey error information to the kernel.
>>
>> Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
>> memory failure recovery attempt.
>
> Am I right in thinking we should wait for the hugepage issue you found with
> hwpoison [0] to be fixed before arm64 can have ARCH_SUPPORTS_MEMORY_FAILURE?

We should at the least fix the huge_pte_offset() issue discovered in [0]
before we enable memory failure handling. Earlier today I posted a
RFC[1] fix for it based on Catalin's suggestion.

>
> (If so, can this patch become part of that series to they are obviously related!)

Good point - I can include the patches enabling memory failure handling
on ARM64 if Tyler's fine with it.

Thanks,
Punit

[1] https://lkml.org/lkml/2017/3/23/293

>
> Thanks,
>
> James
>
> [0] https://www.spinics.net/lists/arm-kernel/msg568995.html
>
>
>
>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f92778d..4cd12a0 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -15,6 +15,8 @@ config ARM64
>>      select ARCH_HAS_SG_CHAIN
>>      select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>      select ARCH_USE_CMPXCHG_LOCKREF
>> +    select ACPI_APEI_MEMORY_FAILURE if ACPI_APEI && MEMORY_FAILURE
>> +    select ARCH_SUPPORTS_MEMORY_FAILURE
>>      select ARCH_SUPPORTS_ATOMIC_RMW
>>      select ARCH_SUPPORTS_NUMA_BALANCING
>>      select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
>>
>
>
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Tyler Baicar March 24, 2017, 4:36 p.m. UTC | #6
On 3/23/2017 10:12 AM, Punit Agrawal wrote:
>
>
> On 23/03/17 14:33, James Morse wrote:
>> Hi Punit,
>>
>> On 01/02/17 21:38, Tyler Baicar wrote:
>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>
>>> If ACPI_APEI and MEMORY_FAILURE is configured, select
>>> ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
>>> when such memory failure is reported through ACPI APEI. APEI
>>> (ACPI Platform Error Interfaces) provides a means for the
>>> platform to convey error information to the kernel.
>>>
>>> Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
>>> memory failure recovery attempt.
>>
>> Am I right in thinking we should wait for the hugepage issue you 
>> found with
>> hwpoison [0] to be fixed before arm64 can have 
>> ARCH_SUPPORTS_MEMORY_FAILURE?
>
> We should at the least fix the huge_pte_offset() issue discovered in [0]
> before we enable memory failure handling. Earlier today I posted a
> RFC[1] fix for it based on Catalin's suggestion.
>
>>
>> (If so, can this patch become part of that series to they are 
>> obviously related!)
>
> Good point - I can include the patches enabling memory failure handling
> on ARM64 if Tyler's fine with it.
That's fine with me!

Thanks,
Tyler
>
> Thanks,
> Punit
>
> [1] https://lkml.org/lkml/2017/3/23/293
>
>>
>> Thanks,
>>
>> James
>>
>> [0] https://www.spinics.net/lists/arm-kernel/msg568995.html
>>
>>
>>
>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index f92778d..4cd12a0 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -15,6 +15,8 @@ config ARM64
>>>      select ARCH_HAS_SG_CHAIN
>>>      select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>>>      select ARCH_USE_CMPXCHG_LOCKREF
>>> +    select ACPI_APEI_MEMORY_FAILURE if ACPI_APEI && MEMORY_FAILURE
>>> +    select ARCH_SUPPORTS_MEMORY_FAILURE
>>>      select ARCH_SUPPORTS_ATOMIC_RMW
>>>      select ARCH_SUPPORTS_NUMA_BALANCING
>>>      select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
>>>
>>
>>
>>
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose 
> the contents to any other person, use it for any purpose, or store or 
> copy the information in any medium. Thank you.
diff mbox

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f92778d..4cd12a0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -15,6 +15,8 @@  config ARM64
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ACPI_APEI_MEMORY_FAILURE if ACPI_APEI && MEMORY_FAILURE
+	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION