Message ID | 1449558183-12259-1-git-send-email-fu.wei@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Fu Wei, On 12/08/2015 03:03 PM, fu.wei@linaro.org wrote: > From: Tomasz Nowicki <tomasz.nowicki@linaro.org> > > This commit provides APEI arch-specific bits for aarch64 > > Meanwhile, add a new subfunction "hest_ia_init" for > "acpi_disable_cmcff" which is used by IA-32 Architecture > Corrected Machine Check (CMC). > > Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> > Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> > Signed-off-by: Fu Wei <fu.wei@linaro.org> > --- > Changelog: > v4: Rebase to latest kernel version(4.4-rc4). > Move arch_apei_flush_tlb_one into header file as a inline function > Add a new subfunction "hest_ia_init" for "acpi_disable_cmcff". > > v3: https://lkml.org/lkml/2015/12/3/521 > Remove "acpi_disable_cmcff" from arm64 code, > and wrap it in hest.c by "#if defined(__i386__) || defined(__x86_64__)" > > v2: https://lkml.org/lkml/2015/12/2/432 > Rebase to latest kernel version(4.4-rc3). > Move arch_apei_flush_tlb_one() to arch/arm64/kernel/acpi.c > > v1: https://lkml.org/lkml/2015/8/14/199 > Move arch_apei_flush_tlb_one() to arch/arm64/include/asm/apci.h. > Delete arch/arm64/kernel/apei.c. > Add "#ifdef CONFIG_ACPI_APEI" for "acpi_disable_cmcff". > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/acpi.h | 5 +++++ > drivers/acpi/apei/hest.c | 19 ++++++++++++++++--- > 3 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 871f217..58c8992 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -3,6 +3,7 @@ config ARM64 > select ACPI_CCA_REQUIRED if ACPI > select ACPI_GENERIC_GSI if ACPI > select ACPI_REDUCED_HARDWARE_ONLY if ACPI > + select HAVE_ACPI_APEI if ACPI > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_GCOV_PROFILE_ALL > diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h > index caafd63..31d3d9a 100644 > --- a/arch/arm64/include/asm/acpi.h > +++ b/arch/arm64/include/asm/acpi.h > @@ -17,6 +17,7 @@ > > #include <asm/cputype.h> > #include <asm/smp_plat.h> > +#include <asm/tlbflush.h> > > /* Macros for consistency checks of the GICC subtable of MADT */ > #define ACPI_MADT_GICC_LENGTH \ > @@ -94,6 +95,10 @@ static inline const char *acpi_get_enable_method(int cpu) > > #ifdef CONFIG_ACPI_APEI > pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr); How bout adding a empty line here? Except that, Acked-by: Hanjun Guo <hanjun.guo@linaro.org> Thanks Hanjun
On Tue, Dec 08, 2015 at 03:03:03PM +0800, fu.wei@linaro.org wrote: > From: Tomasz Nowicki <tomasz.nowicki@linaro.org> [...] > +#if defined(__i386__) || defined(__x86_64__) > /* > * Check if firmware advertises firmware first mode. We need FF bit to be set > * along with a set of MC banks which work in FF mode. > */ > static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) > { > - return arch_apei_enable_cmcff(hest_hdr, data); > + if (!acpi_disable_cmcff) Why do not you define the flag above in this file (move it out of x86 - that's what I was aiming at in my previous reply) and remove this ifdeffery altogether (First firmware handling could apply to arm64 too according to specs and ACPI on arm64 guidelines) ? arch_apei_enable_cmcff() is a weak function that does nothing on arm64 and if we need to add an implementation we can do it later. Thanks, Lorenzo > + return !arch_apei_enable_cmcff(hest_hdr, data); > + > + return 0; > } > > +static inline int __init hest_ia_init(void) > +{ > + return apei_hest_parse(hest_parse_cmc, NULL); > +} > +#else > +static inline int __init hest_ia_init(void) { return 0; } > +#endif > + > struct ghes_arr { > struct platform_device **ghes_devs; > unsigned int count; > @@ -232,8 +244,9 @@ void __init acpi_hest_init(void) > goto err; > } > > - if (!acpi_disable_cmcff) > - apei_hest_parse(hest_parse_cmc, NULL); > + rc = hest_ia_init(); > + if (rc) > + goto err; > > if (!ghes_disable) { > rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count); > -- > 2.5.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >
Hi Lorenzo, On 12/08/2015 08:34 PM, Lorenzo Pieralisi wrote: > On Tue, Dec 08, 2015 at 03:03:03PM +0800, fu.wei@linaro.org wrote: >> From: Tomasz Nowicki <tomasz.nowicki@linaro.org> > > [...] > >> +#if defined(__i386__) || defined(__x86_64__) >> /* >> * Check if firmware advertises firmware first mode. We need FF bit to be set >> * along with a set of MC banks which work in FF mode. >> */ >> static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) >> { >> - return arch_apei_enable_cmcff(hest_hdr, data); >> + if (!acpi_disable_cmcff) > > Why do not you define the flag above in this file (move it out of x86 - > that's what I was aiming at in my previous reply) and remove this ifdeffery > altogether (First firmware handling could apply to arm64 too according to > specs and ACPI on arm64 guidelines) ? If I understand it correctly, CMC (Corrected Machine Check) is for IA32 only, see section 18.3.2.1 IA-32 Architecture Machine Check Exception in ACPI 6.0. for ARM64, we can use other type of error source for firmware first handling, such as Generic Hardware Error Source, did I miss something? Thanks Hanjun
Hi Lorenzo, On 8 December 2015 at 20:52, Hanjun Guo <hanjun.guo@linaro.org> wrote: > Hi Lorenzo, > > On 12/08/2015 08:34 PM, Lorenzo Pieralisi wrote: >> >> On Tue, Dec 08, 2015 at 03:03:03PM +0800, fu.wei@linaro.org wrote: >>> >>> From: Tomasz Nowicki <tomasz.nowicki@linaro.org> >> >> >> [...] >> >>> +#if defined(__i386__) || defined(__x86_64__) >>> /* >>> * Check if firmware advertises firmware first mode. We need FF bit to >>> be set >>> * along with a set of MC banks which work in FF mode. >>> */ >>> static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, >>> void *data) >>> { >>> - return arch_apei_enable_cmcff(hest_hdr, data); >>> + if (!acpi_disable_cmcff) >> >> >> Why do not you define the flag above in this file (move it out of x86 - >> that's what I was aiming at in my previous reply) and remove this >> ifdeffery >> altogether (First firmware handling could apply to arm64 too according to >> specs and ACPI on arm64 guidelines) ? > > > If I understand it correctly, CMC (Corrected Machine Check) is for IA32 > only, see section 18.3.2.1 IA-32 Architecture Machine Check Exception > in ACPI 6.0. for ARM64, we can use other type of error source for > firmware first handling, such as Generic Hardware Error Source, did > I miss something? yes, that is why I try to use "#if defined(__i386__) || defined(__x86_64__)" instead of moving acpi_disable_cmcff out of x86 code to here. And I thinks we also can do "arch_apei_enable_cmcff" --> "apei_enable_ia_cmcff" because that is IA32 only. Please correct me if I misunderstand something. Thanks :-) Great thanks for your feedback :-) > > Thanks > Hanjun
On Tue, Dec 08, 2015 at 09:08:24PM +0800, Fu Wei wrote: > Hi Lorenzo, > > > > On 8 December 2015 at 20:52, Hanjun Guo <hanjun.guo@linaro.org> wrote: > > Hi Lorenzo, > > > > On 12/08/2015 08:34 PM, Lorenzo Pieralisi wrote: > >> > >> On Tue, Dec 08, 2015 at 03:03:03PM +0800, fu.wei@linaro.org wrote: > >>> > >>> From: Tomasz Nowicki <tomasz.nowicki@linaro.org> > >> > >> > >> [...] > >> > >>> +#if defined(__i386__) || defined(__x86_64__) > >>> /* > >>> * Check if firmware advertises firmware first mode. We need FF bit to > >>> be set > >>> * along with a set of MC banks which work in FF mode. > >>> */ > >>> static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, > >>> void *data) > >>> { > >>> - return arch_apei_enable_cmcff(hest_hdr, data); > >>> + if (!acpi_disable_cmcff) > >> > >> > >> Why do not you define the flag above in this file (move it out of x86 - > >> that's what I was aiming at in my previous reply) and remove this > >> ifdeffery > >> altogether (First firmware handling could apply to arm64 too according to > >> specs and ACPI on arm64 guidelines) ? > > > > > > If I understand it correctly, CMC (Corrected Machine Check) is for IA32 > > only, see section 18.3.2.1 IA-32 Architecture Machine Check Exception > > in ACPI 6.0. for ARM64, we can use other type of error source for > > firmware first handling, such as Generic Hardware Error Source, did > > I miss something? > > yes, that is why I try to use "#if defined(__i386__) || > defined(__x86_64__)" instead of moving acpi_disable_cmcff out of x86 > code to here. > > And I thinks we also can do "arch_apei_enable_cmcff" --> > "apei_enable_ia_cmcff" because that is IA32 only. > > Please correct me if I misunderstand something. Thanks :-) No you are right, I was confused by the arch_apei_enable_cmcff __weak function declaration, I am not sure that makes much sense, as you say. Side note: I wonder if there is a way to make the TLB flushing API common across architectures therefore avoiding this arch_apei_flush_tlb* churn. Thanks, Lorenzo > Great thanks for your feedback :-) > > > > > Thanks > > Hanjun > > > > -- > Best regards, > > Fu Wei > Software Engineer > Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch > Ph: +86 21 61221326(direct) > Ph: +86 186 2020 4684 (mobile) > Room 1512, Regus One Corporate Avenue,Level 15, > One Corporate Avenue,222 Hubin Road,Huangpu District, > Shanghai,China 200021 >
On 08/12/15 07:03, fu.wei@linaro.org wrote: > From: Tomasz Nowicki <tomasz.nowicki@linaro.org> > > This commit provides APEI arch-specific bits for aarch64 > > Meanwhile, add a new subfunction "hest_ia_init" for > "acpi_disable_cmcff" which is used by IA-32 Architecture > Corrected Machine Check (CMC). > > Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> > Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> > Signed-off-by: Fu Wei <fu.wei@linaro.org> > --- > Changelog: > v4: Rebase to latest kernel version(4.4-rc4). > Move arch_apei_flush_tlb_one into header file as a inline function > Add a new subfunction "hest_ia_init" for "acpi_disable_cmcff". > > v3: https://lkml.org/lkml/2015/12/3/521 > Remove "acpi_disable_cmcff" from arm64 code, > and wrap it in hest.c by "#if defined(__i386__) || defined(__x86_64__)" > > v2: https://lkml.org/lkml/2015/12/2/432 > Rebase to latest kernel version(4.4-rc3). > Move arch_apei_flush_tlb_one() to arch/arm64/kernel/acpi.c > > v1: https://lkml.org/lkml/2015/8/14/199 > Move arch_apei_flush_tlb_one() to arch/arm64/include/asm/apci.h. > Delete arch/arm64/kernel/apei.c. > Add "#ifdef CONFIG_ACPI_APEI" for "acpi_disable_cmcff". > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/acpi.h | 5 +++++ > drivers/acpi/apei/hest.c | 19 ++++++++++++++++--- > 3 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 871f217..58c8992 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -3,6 +3,7 @@ config ARM64 > select ACPI_CCA_REQUIRED if ACPI > select ACPI_GENERIC_GSI if ACPI > select ACPI_REDUCED_HARDWARE_ONLY if ACPI > + select HAVE_ACPI_APEI if ACPI > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_GCOV_PROFILE_ALL > diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h > index caafd63..31d3d9a 100644 > --- a/arch/arm64/include/asm/acpi.h > +++ b/arch/arm64/include/asm/acpi.h > @@ -17,6 +17,7 @@ > > #include <asm/cputype.h> > #include <asm/smp_plat.h> > +#include <asm/tlbflush.h> > > /* Macros for consistency checks of the GICC subtable of MADT */ > #define ACPI_MADT_GICC_LENGTH \ > @@ -94,6 +95,10 @@ static inline const char *acpi_get_enable_method(int cpu) > > #ifdef CONFIG_ACPI_APEI > pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr); > +static inline void arch_apei_flush_tlb_one(unsigned long addr) > +{ > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > +} > #endif > > #endif /*_ASM_ACPI_H*/ > diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c > index 20b3fcf..715c58b 100644 > --- a/drivers/acpi/apei/hest.c > +++ b/drivers/acpi/apei/hest.c > @@ -117,15 +117,27 @@ int apei_hest_parse(apei_hest_func_t func, void *data) > } > EXPORT_SYMBOL_GPL(apei_hest_parse); > > +#if defined(__i386__) || defined(__x86_64__) Would it be better if define a symbol like : HAVE_ARCH_APEI_CMCFF for x86 and wrap it around that ? > /* > * Check if firmware advertises firmware first mode. We need FF bit to be set > * along with a set of MC banks which work in FF mode. > */ > static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) > { > - return arch_apei_enable_cmcff(hest_hdr, data); > + if (!acpi_disable_cmcff) > + return !arch_apei_enable_cmcff(hest_hdr, data); > + > + return 0; > } > > +static inline int __init hest_ia_init(void) > +{ > + return apei_hest_parse(hest_parse_cmc, NULL); > +} > +#else > +static inline int __init hest_ia_init(void) { return 0; } > +#endif Cheers Suzuki
Hi Suzuki, On 8 December 2015 at 23:53, Suzuki K. Poulose <Suzuki.Poulose@arm.com> wrote: > On 08/12/15 07:03, fu.wei@linaro.org wrote: >> >> From: Tomasz Nowicki <tomasz.nowicki@linaro.org> >> >> This commit provides APEI arch-specific bits for aarch64 >> >> Meanwhile, add a new subfunction "hest_ia_init" for >> "acpi_disable_cmcff" which is used by IA-32 Architecture >> Corrected Machine Check (CMC). >> >> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> >> Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> >> Signed-off-by: Fu Wei <fu.wei@linaro.org> >> --- >> Changelog: >> v4: Rebase to latest kernel version(4.4-rc4). >> Move arch_apei_flush_tlb_one into header file as a inline function >> Add a new subfunction "hest_ia_init" for "acpi_disable_cmcff". >> >> v3: https://lkml.org/lkml/2015/12/3/521 >> Remove "acpi_disable_cmcff" from arm64 code, >> and wrap it in hest.c by "#if defined(__i386__) || >> defined(__x86_64__)" >> >> v2: https://lkml.org/lkml/2015/12/2/432 >> Rebase to latest kernel version(4.4-rc3). >> Move arch_apei_flush_tlb_one() to arch/arm64/kernel/acpi.c >> >> v1: https://lkml.org/lkml/2015/8/14/199 >> Move arch_apei_flush_tlb_one() to arch/arm64/include/asm/apci.h. >> Delete arch/arm64/kernel/apei.c. >> Add "#ifdef CONFIG_ACPI_APEI" for "acpi_disable_cmcff". >> >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/acpi.h | 5 +++++ >> drivers/acpi/apei/hest.c | 19 ++++++++++++++++--- >> 3 files changed, 22 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index 871f217..58c8992 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -3,6 +3,7 @@ config ARM64 >> select ACPI_CCA_REQUIRED if ACPI >> select ACPI_GENERIC_GSI if ACPI >> select ACPI_REDUCED_HARDWARE_ONLY if ACPI >> + select HAVE_ACPI_APEI if ACPI >> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE >> select ARCH_HAS_ELF_RANDOMIZE >> select ARCH_HAS_GCOV_PROFILE_ALL >> diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h >> index caafd63..31d3d9a 100644 >> --- a/arch/arm64/include/asm/acpi.h >> +++ b/arch/arm64/include/asm/acpi.h >> @@ -17,6 +17,7 @@ >> >> #include <asm/cputype.h> >> #include <asm/smp_plat.h> >> +#include <asm/tlbflush.h> >> >> /* Macros for consistency checks of the GICC subtable of MADT */ >> #define ACPI_MADT_GICC_LENGTH \ >> @@ -94,6 +95,10 @@ static inline const char *acpi_get_enable_method(int >> cpu) >> >> #ifdef CONFIG_ACPI_APEI >> pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr); >> +static inline void arch_apei_flush_tlb_one(unsigned long addr) >> +{ >> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); >> +} >> #endif >> >> #endif /*_ASM_ACPI_H*/ >> diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c >> index 20b3fcf..715c58b 100644 >> --- a/drivers/acpi/apei/hest.c >> +++ b/drivers/acpi/apei/hest.c >> @@ -117,15 +117,27 @@ int apei_hest_parse(apei_hest_func_t func, void >> *data) >> } >> EXPORT_SYMBOL_GPL(apei_hest_parse); >> >> +#if defined(__i386__) || defined(__x86_64__) > > > Would it be better if define a symbol like : > > HAVE_ARCH_APEI_CMCFF for x86 and wrap it around that ? I think that is a brilliant idea! By this way , we can avoid to use "defined(__i386__) || defined(__x86_64__)" which is not recommended in Linux kernel. Thanks for your suggestion. I thinks maybe we can name this as HAVE_ACPI_APEI_HEST_IA32 for or x86 and wrap all hest_ia_init relevant code. > >> /* >> * Check if firmware advertises firmware first mode. We need FF bit to >> be set >> * along with a set of MC banks which work in FF mode. >> */ >> static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void >> *data) >> { >> - return arch_apei_enable_cmcff(hest_hdr, data); >> + if (!acpi_disable_cmcff) >> + return !arch_apei_enable_cmcff(hest_hdr, data); >> + >> + return 0; >> } >> >> +static inline int __init hest_ia_init(void) >> +{ >> + return apei_hest_parse(hest_parse_cmc, NULL); >> +} >> +#else >> +static inline int __init hest_ia_init(void) { return 0; } >> +#endif > > > Cheers > Suzuki >
Hi Lorenzo, On 8 December 2015 at 22:07, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote: > On Tue, Dec 08, 2015 at 09:08:24PM +0800, Fu Wei wrote: >> Hi Lorenzo, >> >> >> >> On 8 December 2015 at 20:52, Hanjun Guo <hanjun.guo@linaro.org> wrote: >> > Hi Lorenzo, >> > >> > On 12/08/2015 08:34 PM, Lorenzo Pieralisi wrote: >> >> >> >> On Tue, Dec 08, 2015 at 03:03:03PM +0800, fu.wei@linaro.org wrote: >> >>> >> >>> From: Tomasz Nowicki <tomasz.nowicki@linaro.org> >> >> >> >> >> >> [...] >> >> >> >>> +#if defined(__i386__) || defined(__x86_64__) >> >>> /* >> >>> * Check if firmware advertises firmware first mode. We need FF bit to >> >>> be set >> >>> * along with a set of MC banks which work in FF mode. >> >>> */ >> >>> static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, >> >>> void *data) >> >>> { >> >>> - return arch_apei_enable_cmcff(hest_hdr, data); >> >>> + if (!acpi_disable_cmcff) >> >> >> >> >> >> Why do not you define the flag above in this file (move it out of x86 - >> >> that's what I was aiming at in my previous reply) and remove this >> >> ifdeffery >> >> altogether (First firmware handling could apply to arm64 too according to >> >> specs and ACPI on arm64 guidelines) ? >> > >> > >> > If I understand it correctly, CMC (Corrected Machine Check) is for IA32 >> > only, see section 18.3.2.1 IA-32 Architecture Machine Check Exception >> > in ACPI 6.0. for ARM64, we can use other type of error source for >> > firmware first handling, such as Generic Hardware Error Source, did >> > I miss something? >> >> yes, that is why I try to use "#if defined(__i386__) || >> defined(__x86_64__)" instead of moving acpi_disable_cmcff out of x86 >> code to here. >> >> And I thinks we also can do "arch_apei_enable_cmcff" --> >> "apei_enable_ia_cmcff" because that is IA32 only. >> >> Please correct me if I misunderstand something. Thanks :-) > > No you are right, I was confused by the arch_apei_enable_cmcff __weak > function declaration, I am not sure that makes much sense, as you say. > Thanks :-) > Side note: I wonder if there is a way to make the TLB flushing API common > across architectures therefore avoiding this arch_apei_flush_tlb* churn. yes, make sense, I will think about this today , thanks for your suggestion. > > Thanks, > Lorenzo > >> Great thanks for your feedback :-) >> >> > >> > Thanks >> > Hanjun >> >> >> >> -- >> Best regards, >> >> Fu Wei >> Software Engineer >> Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch >> Ph: +86 21 61221326(direct) >> Ph: +86 186 2020 4684 (mobile) >> Room 1512, Regus One Corporate Avenue,Level 15, >> One Corporate Avenue,222 Hubin Road,Huangpu District, >> Shanghai,China 200021 >>
Hi Lorenzo, On 9 December 2015 at 11:25, Fu Wei <fu.wei@linaro.org> wrote: > Hi Lorenzo, > > On 8 December 2015 at 22:07, Lorenzo Pieralisi > <lorenzo.pieralisi@arm.com> wrote: >> On Tue, Dec 08, 2015 at 09:08:24PM +0800, Fu Wei wrote: >>> Hi Lorenzo, >>> >>> >>> >>> On 8 December 2015 at 20:52, Hanjun Guo <hanjun.guo@linaro.org> wrote: >>> > Hi Lorenzo, >>> > >>> > On 12/08/2015 08:34 PM, Lorenzo Pieralisi wrote: >>> >> >>> >> On Tue, Dec 08, 2015 at 03:03:03PM +0800, fu.wei@linaro.org wrote: >>> >>> >>> >>> From: Tomasz Nowicki <tomasz.nowicki@linaro.org> >>> >> >>> >> >>> >> [...] >>> >> >>> >>> +#if defined(__i386__) || defined(__x86_64__) >>> >>> /* >>> >>> * Check if firmware advertises firmware first mode. We need FF bit to >>> >>> be set >>> >>> * along with a set of MC banks which work in FF mode. >>> >>> */ >>> >>> static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, >>> >>> void *data) >>> >>> { >>> >>> - return arch_apei_enable_cmcff(hest_hdr, data); >>> >>> + if (!acpi_disable_cmcff) >>> >> >>> >> >>> >> Why do not you define the flag above in this file (move it out of x86 - >>> >> that's what I was aiming at in my previous reply) and remove this >>> >> ifdeffery >>> >> altogether (First firmware handling could apply to arm64 too according to >>> >> specs and ACPI on arm64 guidelines) ? >>> > >>> > >>> > If I understand it correctly, CMC (Corrected Machine Check) is for IA32 >>> > only, see section 18.3.2.1 IA-32 Architecture Machine Check Exception >>> > in ACPI 6.0. for ARM64, we can use other type of error source for >>> > firmware first handling, such as Generic Hardware Error Source, did >>> > I miss something? >>> >>> yes, that is why I try to use "#if defined(__i386__) || >>> defined(__x86_64__)" instead of moving acpi_disable_cmcff out of x86 >>> code to here. >>> >>> And I thinks we also can do "arch_apei_enable_cmcff" --> >>> "apei_enable_ia_cmcff" because that is IA32 only. >>> >>> Please correct me if I misunderstand something. Thanks :-) >> >> No you are right, I was confused by the arch_apei_enable_cmcff __weak >> function declaration, I am not sure that makes much sense, as you say. >> > > Thanks :-) > > >> Side note: I wonder if there is a way to make the TLB flushing API common >> across architectures therefore avoiding this arch_apei_flush_tlb* churn. > > yes, make sense, I will think about this today , thanks for your suggestion. I do some investigation on this "tlb" problem(actually not just tlb, but also "get_mem_attribute") today, I think we need to (1)have a common API in tlbflush.h (at least for flushing one page) across architectures(at least in x86 and arm64) (2)use this API in drivers/acpi/apei/gest.c instead of arch_apei_flush_tlb_one (3)delete the old function from arm64 and x86 So maybe we can have the another patchset to solve this problem, make this patch just enable APEI for arm64 first. :-) > >> >> Thanks, >> Lorenzo >> >>> Great thanks for your feedback :-) >>> >>> > >>> > Thanks >>> > Hanjun >>> >>> >>> >>> -- >>> Best regards, >>> >>> Fu Wei >>> Software Engineer >>> Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch >>> Ph: +86 21 61221326(direct) >>> Ph: +86 186 2020 4684 (mobile) >>> Room 1512, Regus One Corporate Avenue,Level 15, >>> One Corporate Avenue,222 Hubin Road,Huangpu District, >>> Shanghai,China 200021 >>> > > > > -- > Best regards, > > Fu Wei > Software Engineer > Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch > Ph: +86 21 61221326(direct) > Ph: +86 186 2020 4684 (mobile) > Room 1512, Regus One Corporate Avenue,Level 15, > One Corporate Avenue,222 Hubin Road,Huangpu District, > Shanghai,China 200021
[adding Boris, as he might know how this works] On Thu, Dec 10, 2015 at 10:02:39AM +0800, Fu Wei wrote: > On 9 December 2015 at 11:25, Fu Wei <fu.wei@linaro.org> wrote: > > On 8 December 2015 at 22:07, Lorenzo Pieralisi > > <lorenzo.pieralisi@arm.com> wrote: > >> Side note: I wonder if there is a way to make the TLB flushing API common > >> across architectures therefore avoiding this arch_apei_flush_tlb* churn. > > > > yes, make sense, I will think about this today , thanks for your suggestion. > > I do some investigation on this "tlb" problem(actually not just tlb, > but also "get_mem_attribute") today, > I think we need to > (1)have a common API in tlbflush.h (at least for flushing one page) > across architectures(at least in x86 and arm64) It's not about flushing one page, flush_tlb_kernel_range (which is called by unmap_kernel_range) already takes care of that. The problem is that the unmap is called from irq/nmi context, so the IPIs required for broadcasting the TLB maintenance on x86 cannot be safely executed. So, if you're going to introduce anything in the way of TLB API, then it should be a generic form of the local_flush_tlb_* functions that we already have on ARM, imo. That sounds like a lot of work for this one problem. You could call flush_tlb_page with a fake vma/mm, but it's pretty ugly. > (2)use this API in drivers/acpi/apei/gest.c instead of arch_apei_flush_tlb_one > (3)delete the old function from arm64 and x86 Ideally, I think the ghes code would just use unmap_kernel_range unless the architecture specifically says that doesn't work in irq context. In that case, we don't need to implement the arch_apei_flush_tlb_one callback on arm64. One thing I don't fully grok about the code: since the page is mapped using ioremap_page_range, doesn't that allow other CPUs to speculatively fill their TLB with entries corresponding to the page mapped by the IRQ handler on another core? If the core with the speculative entries then takes an APEI exception, what guarantees do we have that it's looking at the right page? I think, for x86, we need a local invalidation on map, too. Will
On Tue, Dec 08, 2015 at 03:03:03PM +0800, fu.wei@linaro.org wrote: > From: Tomasz Nowicki <tomasz.nowicki@linaro.org> > > This commit provides APEI arch-specific bits for aarch64 > > Meanwhile, add a new subfunction "hest_ia_init" for > "acpi_disable_cmcff" which is used by IA-32 Architecture > Corrected Machine Check (CMC). > > Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> > Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> > Signed-off-by: Fu Wei <fu.wei@linaro.org> > --- > Changelog: > v4: Rebase to latest kernel version(4.4-rc4). > Move arch_apei_flush_tlb_one into header file as a inline function > Add a new subfunction "hest_ia_init" for "acpi_disable_cmcff". > > v3: https://lkml.org/lkml/2015/12/3/521 > Remove "acpi_disable_cmcff" from arm64 code, > and wrap it in hest.c by "#if defined(__i386__) || defined(__x86_64__)" > > v2: https://lkml.org/lkml/2015/12/2/432 > Rebase to latest kernel version(4.4-rc3). > Move arch_apei_flush_tlb_one() to arch/arm64/kernel/acpi.c > > v1: https://lkml.org/lkml/2015/8/14/199 > Move arch_apei_flush_tlb_one() to arch/arm64/include/asm/apci.h. > Delete arch/arm64/kernel/apei.c. > Add "#ifdef CONFIG_ACPI_APEI" for "acpi_disable_cmcff". > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/acpi.h | 5 +++++ > drivers/acpi/apei/hest.c | 19 ++++++++++++++++--- > 3 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 871f217..58c8992 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -3,6 +3,7 @@ config ARM64 > select ACPI_CCA_REQUIRED if ACPI > select ACPI_GENERIC_GSI if ACPI > select ACPI_REDUCED_HARDWARE_ONLY if ACPI > + select HAVE_ACPI_APEI if ACPI > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select ARCH_HAS_ELF_RANDOMIZE > select ARCH_HAS_GCOV_PROFILE_ALL > diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h > index caafd63..31d3d9a 100644 > --- a/arch/arm64/include/asm/acpi.h > +++ b/arch/arm64/include/asm/acpi.h > @@ -17,6 +17,7 @@ > > #include <asm/cputype.h> > #include <asm/smp_plat.h> > +#include <asm/tlbflush.h> > > /* Macros for consistency checks of the GICC subtable of MADT */ > #define ACPI_MADT_GICC_LENGTH \ > @@ -94,6 +95,10 @@ static inline const char *acpi_get_enable_method(int cpu) > > #ifdef CONFIG_ACPI_APEI > pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr); > +static inline void arch_apei_flush_tlb_one(unsigned long addr) > +{ > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > +} > #endif > > #endif /*_ASM_ACPI_H*/ > diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c > index 20b3fcf..715c58b 100644 > --- a/drivers/acpi/apei/hest.c > +++ b/drivers/acpi/apei/hest.c > @@ -117,15 +117,27 @@ int apei_hest_parse(apei_hest_func_t func, void *data) > } > EXPORT_SYMBOL_GPL(apei_hest_parse); > > +#if defined(__i386__) || defined(__x86_64__) > /* > * Check if firmware advertises firmware first mode. We need FF bit to be set > * along with a set of MC banks which work in FF mode. > */ > static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) > { > - return arch_apei_enable_cmcff(hest_hdr, data); > + if (!acpi_disable_cmcff) > + return !arch_apei_enable_cmcff(hest_hdr, data); > + > + return 0; So the proper way to do this is to leave the code still do this check (in arch/x86/kernel/acpi/apei.c:arch_apei_enable_cmcff()): if (hest_hdr->type != ACPI_HEST_TYPE_IA32_CORRECTED_CHECK) return 0; which should JustWork(tm) on ARM too because if ARM doesn't support IA32 CMCI, then that header type should be different and the function will return 0. For that to work, though, the check should be moved to a generic place, like drivers/acpi/apei/hest.c:hest_parse_cmc() for example. No crazy ifdeffery and no redundant Kconfig symbols please.
On Thu, Dec 10, 2015 at 11:01:35AM +0000, Will Deacon wrote: > [adding Boris, as he might know how this works] Gee, thanks Will, now you're making me look at this too :-) > It's not about flushing one page, flush_tlb_kernel_range (which is called > by unmap_kernel_range) already takes care of that. The problem is that > the unmap is called from irq/nmi context, so the IPIs required for > broadcasting the TLB maintenance on x86 cannot be safely executed. Hmm, if you're talking about ghes_iounmap_nmi() and ghes_iounmap_irq() which are the two callers of unmap_kernel_range_noflush(), that last one is calling vunmap_page_range() which is fiddling with the page table. And I don't see TLB flushing IPIs there. If you mean arch_apei_flush_tlb_one(), that's INVLPG on x86 so also no IPI. What am I missing? > Ideally, I think the ghes code would just use unmap_kernel_range unless > the architecture specifically says that doesn't work in irq context. In > that case, we don't need to implement the arch_apei_flush_tlb_one callback > on arm64. Well, what bothers me with using unmap_kernel_range()/vunmap_page_range() is that if a GHES IRQ/NMI happens while something is executing those, the NMI will interrupt whatever's happening and it will possibly corrupt the pagetable, IMHO. Michal, Vlasta, can you please take a look? More specifically, those ghes_iounmap_nmi/ghes_iounmap_irq calls to unmap_kernel_range_noflush() happening in NMI/IRQ context. > One thing I don't fully grok about the code: since the page is mapped > using ioremap_page_range, doesn't that allow other CPUs to speculatively > fill their TLB with entries corresponding to the page mapped by the IRQ > handler on another core? If the core with the speculative entries then > takes an APEI exception, what guarantees do we have that it's looking at > the right page? I think, for x86, we need a local invalidation on map, > too. You're looking at ghes_copy_tofrom_phys(), right? That's grabbing spinlocks in IRQ/NMI context and doing the iounmap a bit later, below on the same core. I mean, I don't see us landing on another core in between, we're non-preemptible... Or do you mean something else?
On Mon, Dec 14, 2015 at 12:20:04PM +0100, Borislav Petkov wrote: > On Thu, Dec 10, 2015 at 11:01:35AM +0000, Will Deacon wrote: > > [adding Boris, as he might know how this works] > > Gee, thanks Will, now you're making me look at this too :-) Hey, I was having way too much fun by myself, so figured you'd like to be involved as well! > > It's not about flushing one page, flush_tlb_kernel_range (which is called > > by unmap_kernel_range) already takes care of that. The problem is that > > the unmap is called from irq/nmi context, so the IPIs required for > > broadcasting the TLB maintenance on x86 cannot be safely executed. > > Hmm, if you're talking about ghes_iounmap_nmi() and ghes_iounmap_irq() > which are the two callers of unmap_kernel_range_noflush(), that last one > is calling vunmap_page_range() which is fiddling with the page table. > And I don't see TLB flushing IPIs there. > > If you mean arch_apei_flush_tlb_one(), that's INVLPG on x86 so also no > IPI. > > What am I missing? We're in violent agreement. I'm just saying that's *why* arch_apei_flush_tlb_one exists, as opposed to calling unmap_kernel_range in the driver (which will attempt IPIs). On arm64, unmap_kernel_range will actually work correctly, since we don't need IPIs to broadcast TLB maintenance. The (incorrect) premise earlier in the thread was that arch_apei_flush_tlb_one exists because there's no portable API for flushing a single page, but that's simply not true. > > Ideally, I think the ghes code would just use unmap_kernel_range unless > > the architecture specifically says that doesn't work in irq context. In > > that case, we don't need to implement the arch_apei_flush_tlb_one callback > > on arm64. > > Well, what bothers me with using > unmap_kernel_range()/vunmap_page_range() is that if a GHES IRQ/NMI > happens while something is executing those, the NMI will interrupt > whatever's happening and it will possibly corrupt the pagetable, IMHO. > > Michal, Vlasta, can you please take a look? > > More specifically, those ghes_iounmap_nmi/ghes_iounmap_irq calls to > unmap_kernel_range_noflush() happening in NMI/IRQ context. Yikes, I'd not even thought about that. Perhaps its all serialised somehow, but I have no idea. > > One thing I don't fully grok about the code: since the page is mapped > > using ioremap_page_range, doesn't that allow other CPUs to speculatively > > fill their TLB with entries corresponding to the page mapped by the IRQ > > handler on another core? If the core with the speculative entries then > > takes an APEI exception, what guarantees do we have that it's looking at > > the right page? I think, for x86, we need a local invalidation on map, > > too. > > You're looking at ghes_copy_tofrom_phys(), right? That's grabbing > spinlocks in IRQ/NMI context and doing the iounmap a bit later, below > on the same core. I mean, I don't see us landing on another core in > between, we're non-preemptible... > > Or do you mean something else? Right, imagine the following sequence of events: 1. CPU x takes a GHES IRQ 2. CPU x then maps the buffer a page at a time in ghes_copy_tofrom_phys. After each unmap, it performs a local TLBI. Let's say that it has the final page of the buffer mapped when... 3. ... CPU y is meanwhile happily executing some other kernel code. 4. CPU y's page table walker speculatively fills the TLB with a translation for the last buffer page that CPU x has mapped (because its just been mapped with ioremap_page_range and is in the kernel page table). 5. CPU x unmaps the last page, performs a *local* TLBI, handles the event and returns from the exception 6. CPU y takes a GHES IRQ 7. CPU y then maps the first buffer page at the same virtual address that CPU x used to map the last buffer page 8. CPU y accesses the page, hits the stale TLB entry and gets junk which I think means you need to perform local TLB invalidation on map as well as unmap. Is there some reason this can't happen on x86? It sounds plausible on arm64 if we were to use local invalidation. Will
On Mon, Dec 14, 2015 at 11:46:59AM +0000, Will Deacon wrote: > We're in violent agreement. I'm just saying that's *why* > arch_apei_flush_tlb_one exists, as opposed to calling unmap_kernel_range > in the driver (which will attempt IPIs). On arm64, unmap_kernel_range > will actually work correctly, since we don't need IPIs to broadcast TLB > maintenance. > > The (incorrect) premise earlier in the thread was that > arch_apei_flush_tlb_one exists because there's no portable API for > flushing a single page, but that's simply not true. Right. > Yikes, I'd not even thought about that. Perhaps its all serialised > somehow, but I have no idea. Yeah, didn't see any serialization there... > Right, imagine the following sequence of events: > > 1. CPU x takes a GHES IRQ > 2. CPU x then maps the buffer a page at a time in ghes_copy_tofrom_phys. > After each unmap, it performs a local TLBI. Let's say that it has > the final page of the buffer mapped when... > 3. ... CPU y is meanwhile happily executing some other kernel code. > 4. CPU y's page table walker speculatively fills the TLB with a translation > for the last buffer page that CPU x has mapped (because its just been > mapped with ioremap_page_range and is in the kernel page table). > 5. CPU x unmaps the last page, performs a *local* TLBI, handles the > event and returns from the exception > 6. CPU y takes a GHES IRQ > 7. CPU y then maps the first buffer page at the same virtual address > that CPU x used to map the last buffer page > 8. CPU y accesses the page, hits the stale TLB entry and gets junk > > which I think means you need to perform local TLB invalidation on map > as well as unmap. > > Is there some reason this can't happen on x86? It sounds plausible on > arm64 if we were to use local invalidation. Ha, thanks for the detailed example, I see it now! And I too don't see a reason why that can't happen. And the GHES IRQ is a GSI, which has "global" in the name but I don't think that means it interrupts the whole system like an NMI does. Especially if it is registered/handled like a normal irq: acpi_gsi_to_irq() .. request_irq()... Adding Tony. If anything, we probably should be doing something with irq_work at the end of ghes_copy_tofrom_phys() so that the invalidation of any possible speculative mappings happens before we return from the GHES IRQ. Hmm, currently I'm not even clear whether this'll work: we would theoretically need to send IPIs from IRQ context, at the end of the GHES IRQ... Thanks.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 871f217..58c8992 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -3,6 +3,7 @@ config ARM64 select ACPI_CCA_REQUIRED if ACPI select ACPI_GENERIC_GSI if ACPI select ACPI_REDUCED_HARDWARE_ONLY if ACPI + select HAVE_ACPI_APEI if ACPI select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index caafd63..31d3d9a 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -17,6 +17,7 @@ #include <asm/cputype.h> #include <asm/smp_plat.h> +#include <asm/tlbflush.h> /* Macros for consistency checks of the GICC subtable of MADT */ #define ACPI_MADT_GICC_LENGTH \ @@ -94,6 +95,10 @@ static inline const char *acpi_get_enable_method(int cpu) #ifdef CONFIG_ACPI_APEI pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr); +static inline void arch_apei_flush_tlb_one(unsigned long addr) +{ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); +} #endif #endif /*_ASM_ACPI_H*/ diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c index 20b3fcf..715c58b 100644 --- a/drivers/acpi/apei/hest.c +++ b/drivers/acpi/apei/hest.c @@ -117,15 +117,27 @@ int apei_hest_parse(apei_hest_func_t func, void *data) } EXPORT_SYMBOL_GPL(apei_hest_parse); +#if defined(__i386__) || defined(__x86_64__) /* * Check if firmware advertises firmware first mode. We need FF bit to be set * along with a set of MC banks which work in FF mode. */ static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data) { - return arch_apei_enable_cmcff(hest_hdr, data); + if (!acpi_disable_cmcff) + return !arch_apei_enable_cmcff(hest_hdr, data); + + return 0; } +static inline int __init hest_ia_init(void) +{ + return apei_hest_parse(hest_parse_cmc, NULL); +} +#else +static inline int __init hest_ia_init(void) { return 0; } +#endif + struct ghes_arr { struct platform_device **ghes_devs; unsigned int count; @@ -232,8 +244,9 @@ void __init acpi_hest_init(void) goto err; } - if (!acpi_disable_cmcff) - apei_hest_parse(hest_parse_cmc, NULL); + rc = hest_ia_init(); + if (rc) + goto err; if (!ghes_disable) { rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);