Message ID | 1503916701-13516-4-git-send-email-gengdongjiu@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Dongjiu Geng, On 28/08/17 11:38, Dongjiu Geng wrote: > In current code logic, the two functions ghes_sea_add() and > ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA > is defined. If not, it will return errors in the ghes_probe() > and not contiue. Hence, remove the unnecessary handling when > CONFIG_ACPI_APEI_SEI is not defined. This doesn't match what the patch does. I get this feeling this is needed for some future patch you haven't included. > change since v5: > 1. remove the SEI notification type handling, because the SEI is > asynchronous exception and the address is not accurate. so > not call memory_failure() to handle it. Setting NOTIFY_SEI as the GHES entry's notification type means the OS should check the GHES->ErrorStatusAddress for CPER records when it receives an SError-Interrupt, as it may be a notification of an error from this error source. If you aren't handling the notification, why is this is in the HEST at all? (and if its not: its not firmware-first) James > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index d661d452b238..c15a08db2c7c 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -813,7 +813,6 @@ static struct notifier_block ghes_notifier_hed = { > .notifier_call = ghes_notify_hed, > }; > > -#ifdef CONFIG_ACPI_APEI_SEA > static LIST_HEAD(ghes_sea); > > /* > @@ -848,19 +847,6 @@ static void ghes_sea_remove(struct ghes *ghes) > mutex_unlock(&ghes_list_mutex); > synchronize_rcu(); > } > -#else /* CONFIG_ACPI_APEI_SEA */ > -static inline void ghes_sea_add(struct ghes *ghes) > -{ > - pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n", > - ghes->generic->header.source_id); > -} > - > -static inline void ghes_sea_remove(struct ghes *ghes) > -{ > - pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n", > - ghes->generic->header.source_id); > -} > -#endif /* CONFIG_ACPI_APEI_SEA */ > > #ifdef CONFIG_HAVE_ACPI_APEI_NMI > /* >
Hi James, On 2017/9/1 1:50, James Morse wrote: > Hi Dongjiu Geng, > > On 28/08/17 11:38, Dongjiu Geng wrote: >> In current code logic, the two functions ghes_sea_add() and >> ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA >> is defined. If not, it will return errors in the ghes_probe() >> and not contiue. Hence, remove the unnecessary handling when >> CONFIG_ACPI_APEI_SEI is not defined. > > This doesn't match what the patch does. I get this feeling this is needed for > some future patch you haven't included. James, let check the code, when the ghes_probe, if the CONFIG_ACPI_APEI_SEA is not defined. it will return -ENOTSUPP and goto error, and the ghes_sea_add has no chance to execute. similar, if the probe is failed, it should not have chance to execute ghes_sea_remove. static int ghes_probe(struct platform_device *ghes_dev) { struct acpi_hest_generic *generic; struct ghes *ghes = NULL; int rc = -EINVAL; generic = *(struct acpi_hest_generic **)ghes_dev->dev.platform_data; if (!generic->enabled) return -ENODEV; switch (generic->notify.type) { case ACPI_HEST_NOTIFY_POLLED: case ACPI_HEST_NOTIFY_EXTERNAL: case ACPI_HEST_NOTIFY_SCI: case ACPI_HEST_NOTIFY_GSIV: case ACPI_HEST_NOTIFY_GPIO: break; case ACPI_HEST_NOTIFY_SEA: if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA)) { pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n", generic->header.source_id); rc = -ENOTSUPP; goto err; } break; > > >> change since v5: >> 1. remove the SEI notification type handling, because the SEI is >> asynchronous exception and the address is not accurate. so >> not call memory_failure() to handle it. > > Setting NOTIFY_SEI as the GHES entry's notification type means the OS should > check the GHES->ErrorStatusAddress for CPER records when it receives an > SError-Interrupt, as it may be a notification of an error from this error source. previously I added the NOTIFY_SEI support, but consider the error address in CPER is not accurate and calling memory_failure() may not make sense. so I remove it. > > If you aren't handling the notification, why is this is in the HEST at all? > (and if its not: its not firmware-first) For the SEI notification, may be we can parse and handle the CPER record other than the Error physical address > > > James > > >> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c >> index d661d452b238..c15a08db2c7c 100644 >> --- a/drivers/acpi/apei/ghes.c >> +++ b/drivers/acpi/apei/ghes.c >> @@ -813,7 +813,6 @@ static struct notifier_block ghes_notifier_hed = { >> .notifier_call = ghes_notify_hed, >> }; >> >> -#ifdef CONFIG_ACPI_APEI_SEA >> static LIST_HEAD(ghes_sea); >> >> /* >> @@ -848,19 +847,6 @@ static void ghes_sea_remove(struct ghes *ghes) >> mutex_unlock(&ghes_list_mutex); >> synchronize_rcu(); >> } >> -#else /* CONFIG_ACPI_APEI_SEA */ >> -static inline void ghes_sea_add(struct ghes *ghes) >> -{ >> - pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not > supported\n", >> - ghes->generic->header.source_id); >> -} >> - >> -static inline void ghes_sea_remove(struct ghes *ghes) >> -{ >> - pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not > supported\n", >> - ghes->generic->header.source_id); >> -} >> -#endif /* CONFIG_ACPI_APEI_SEA */ >> >> #ifdef CONFIG_HAVE_ACPI_APEI_NMI >> /* >> > > > . >
Hi gengdongjiu, On 04/09/17 12:43, gengdongjiu wrote: > On 2017/9/1 1:50, James Morse wrote: >> On 28/08/17 11:38, Dongjiu Geng wrote: >>> In current code logic, the two functions ghes_sea_add() and >>> ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA >>> is defined. If not, it will return errors in the ghes_probe() >>> and not contiue. Hence, remove the unnecessary handling when >>> CONFIG_ACPI_APEI_SEI is not defined. >> >> This doesn't match what the patch does. I get this feeling this is needed for >> some future patch you haven't included. > > James, let check the code, when the ghes_probe, if the CONFIG_ACPI_APEI_SEA is not defined. > it will return -ENOTSUPP and goto error, and the ghes_sea_add has no chance to execute. > similar, if the probe is failed, it should not have chance to execute ghes_sea_remove. It's the 'unnecessary handling when CONFIG_ACPI_APEI_SEI' in the commit message that confuses me: this patch doesn't reference that Kconfig symbol. I guess that sentence needs removing for this v6? Re-reading without that part of the commit-message: You're relying on the compiler's dead-code elimination to spot unused static functions and silently drop them. Great! (there is the small risk that gcc 3.2[0] can't do this, x86 still has to support this gcc version) As this is just clean-up patch can you break it out of this series, it isn't needed to add support for SEI. (This series adds support for what should be an APEI notification, but the only code that touches APEI removes some code from a different notification method.) >> Setting NOTIFY_SEI as the GHES entry's notification type means the OS should >> check the GHES->ErrorStatusAddress for CPER records when it receives an >> SError-Interrupt, as it may be a notification of an error from this error source. > previously I added the NOTIFY_SEI support, (Yes, I saw that in v5 and expected this series to add some APEI support code ) > but consider the error address in CPER is not accurate and calling memory_failure() may not make sense. > so I remove it. 'not accurate'... this is going to be a problem, but lets keep that discussion on the cover-letter. >> If you aren't handling the notification, why is this is in the HEST at all? >> (and if its not: its not firmware-first) > For the SEI notification, may be we can parse and handle the CPER record other than the Error physical address Sure, but I only see this cleanup patch in this series, where does APEI learn about NOTIFY_SEI? As this is nothing will ever touch those CPER records, if you're using GHESv2 firmware will be prevented from delivering subsequent notifications. Thanks, James [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/README.rst#n251
James, Thanks for the review. On 2017/9/9 2:17, James Morse wrote: > Hi gengdongjiu, > > On 04/09/17 12:43, gengdongjiu wrote: >> On 2017/9/1 1:50, James Morse wrote: >>> On 28/08/17 11:38, Dongjiu Geng wrote: >>>> In current code logic, the two functions ghes_sea_add() and >>>> ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA >>>> is defined. If not, it will return errors in the ghes_probe() >>>> and not contiue. Hence, remove the unnecessary handling when >>>> CONFIG_ACPI_APEI_SEI is not defined. >>> >>> This doesn't match what the patch does. I get this feeling this is needed for >>> some future patch you haven't included. >> >> James, let check the code, when the ghes_probe, if the CONFIG_ACPI_APEI_SEA is not defined. >> it will return -ENOTSUPP and goto error, and the ghes_sea_add has no chance to execute. >> similar, if the probe is failed, it should not have chance to execute ghes_sea_remove. > > It's the 'unnecessary handling when CONFIG_ACPI_APEI_SEI' in the commit message > that confuses me: this patch doesn't reference that Kconfig symbol. I guess that > sentence needs removing for this v6? thanks for the pointing out, That needs to be removed for v6. > > Re-reading without that part of the commit-message: > > You're relying on the compiler's dead-code elimination to spot unused static > functions and silently drop them. Great! > (there is the small risk that gcc 3.2[0] can't do this, x86 still has to support > this gcc version) > > As this is just clean-up patch can you break it out of this series, it isn't > needed to add support for SEI. sure, I will. > > (This series adds support for what should be an APEI notification, but the only > code that touches APEI removes some code from a different notification method.) understand. > > >>> Setting NOTIFY_SEI as the GHES entry's notification type means the OS should >>> check the GHES->ErrorStatusAddress for CPER records when it receives an >>> SError-Interrupt, as it may be a notification of an error from this error source. > >> previously I added the NOTIFY_SEI support, > > (Yes, I saw that in v5 and expected this series to add some APEI support code ) > > >> but consider the error address in CPER is not accurate and calling memory_failure() may not make sense. >> so I remove it. > > 'not accurate'... this is going to be a problem, but lets keep that discussion > on the cover-letter. Ok. > > >>> If you aren't handling the notification, why is this is in the HEST at all? >>> (and if its not: its not firmware-first) > >> For the SEI notification, may be we can parse and handle the CPER record other than the Error physical address > > Sure, but I only see this cleanup patch in this series, where does APEI learn > about NOTIFY_SEI? As this is nothing will ever touch those CPER records, if > you're using GHESv2 firmware will be prevented from delivering subsequent > notifications. James, whether it is possible you can review the previous v5 patch which adds the support for NOTIFY_SEI? thanks in advancecIn that patch, I share the SEI notification handling with the SEA notification handling to avoid duplicated code. https://www.spinics.net/lists/arm-kernel/msg601767.html > > > Thanks, > > James > > [0] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/README.rst#n251 > > . >
Hi gengdongjiu, On 11/09/17 13:04, gengdongjiu wrote: > On 2017/9/9 2:17, James Morse wrote: >> On 04/09/17 12:43, gengdongjiu wrote: >>> On 2017/9/1 1:50, James Morse wrote: >>>> On 28/08/17 11:38, Dongjiu Geng wrote: >>>> If you aren't handling the notification, why is this is in the HEST at all? >>>> (and if its not: its not firmware-first) >> >>> For the SEI notification, may be we can parse and handle the CPER record other than the Error physical address >> >> Sure, but I only see this cleanup patch in this series, where does APEI learn >> about NOTIFY_SEI? As this is nothing will ever touch those CPER records, if >> you're using GHESv2 firmware will be prevented from delivering subsequent >> notifications. > James, whether it is possible you can review the previous v5 patch which adds the support for Spreading 'current discussion' over two versions is a problem for anyone trying to follow this series. If you post a newer version its normal for people to delete the older versions. When you post a new version you should be happy that its the latest and greatest. > NOTIFY_SEI? thanks in advancecIn that patch, I share the SEI notification handling with the SEA > notification handling to avoid duplicated code. You may be able to share some of the code, but I don't think you should share the list of GHES between notification methods. This leads to races between the firmware and OS: If CPU-A has received an SEI firmware would have to avoid generating an SEA on CPU-B as the SEI-handler running on CPU-A may find and process the second set of CPER records. CPU-B then gets a spurious notification. Why is this a problem? KVM needs to know if APEI handled the error, or whether the Synchronous-External-Abort/SError-Interrupt was due to something else, in which case we invoke todays default behaviour, which isn't appropriate for a RAS event. Thanks, James
On 2017/9/14 20:35, James Morse wrote: >> James, whether it is possible you can review the previous v5 patch which adds the support for > Spreading 'current discussion' over two versions is a problem for anyone trying > to follow this series. > > If you post a newer version its normal for people to delete the older versions. > When you post a new version you should be happy that its the latest and greatest. > > James, today I paste the new version here, thanks. https://patchwork.kernel.org/patch/9952495/ https://patchwork.kernel.org/patch/9950955/
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index d661d452b238..c15a08db2c7c 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -813,7 +813,6 @@ static struct notifier_block ghes_notifier_hed = { .notifier_call = ghes_notify_hed, }; -#ifdef CONFIG_ACPI_APEI_SEA static LIST_HEAD(ghes_sea); /* @@ -848,19 +847,6 @@ static void ghes_sea_remove(struct ghes *ghes) mutex_unlock(&ghes_list_mutex); synchronize_rcu(); } -#else /* CONFIG_ACPI_APEI_SEA */ -static inline void ghes_sea_add(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n", - ghes->generic->header.source_id); -} - -static inline void ghes_sea_remove(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n", - ghes->generic->header.source_id); -} -#endif /* CONFIG_ACPI_APEI_SEA */ #ifdef CONFIG_HAVE_ACPI_APEI_NMI /*
In current code logic, the two functions ghes_sea_add() and ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA is defined. If not, it will return errors in the ghes_probe() and not contiue. Hence, remove the unnecessary handling when CONFIG_ACPI_APEI_SEI is not defined. change since v5: 1. remove the SEI notification type handling, because the SEI is asynchronous exception and the address is not accurate. so not call memory_failure() to handle it. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> --- drivers/acpi/apei/ghes.c | 14 -------------- 1 file changed, 14 deletions(-)