Message ID | 1386476273-2418-1-git-send-email-rui.y.wang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Rui, I like this patch. thanks your revision. Ethan On Sun, Dec 8, 2013 at 12:17 PM, Rui Wang <ruiv.wang@gmail.com> wrote: > There's inconsistency between dmesg and the trace event output. > When dmesg says "severity=Corrected", the trace event says > "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is > defined in edac.h: > > enum hw_event_mc_err_type { > HW_EVENT_ERR_CORRECTED, > HW_EVENT_ERR_UNCORRECTED, > HW_EVENT_ERR_FATAL, > HW_EVENT_ERR_INFO, > }; > > while aer_print_error() uses aer_error_severity_string[] defined as: > > static const char *aer_error_severity_string[] = { > "Uncorrected (Non-Fatal)", > "Uncorrected (Fatal)", > "Corrected" > }; > > In this case dmesg is correct because info->severity is assigned in > aer_isr_one_error() using the definitions in include/linux/ras.h: > > Signed-off-by: Rui Wang <rui.y.wang@intel.com> > --- > include/trace/events/ras.h | 10 +++++----- > 1 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h > index 88b8783..1c875ad 100644 > --- a/include/trace/events/ras.h > +++ b/include/trace/events/ras.h > @@ -5,7 +5,7 @@ > #define _TRACE_AER_H > > #include <linux/tracepoint.h> > -#include <linux/edac.h> > +#include <linux/aer.h> > > > /* > @@ -63,10 +63,10 @@ TRACE_EVENT(aer_event, > > TP_printk("%s PCIe Bus Error: severity=%s, %s\n", > __get_str(dev_name), > - __entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" : > - __entry->severity == HW_EVENT_ERR_FATAL ? > - "Fatal" : "Uncorrected", > - __entry->severity == HW_EVENT_ERR_CORRECTED ? > + __entry->severity == AER_CORRECTABLE ? "Corrected" : > + __entry->severity == AER_FATAL ? > + "Fatal" : "Uncorrected, non-fatal", > + __entry->severity == AER_CORRECTABLE ? > __print_flags(__entry->status, "|", aer_correctable_errors) : > __print_flags(__entry->status, "|", aer_uncorrectable_errors)) > ); > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Dec 08, 2013 at 12:17:53PM +0800, Rui Wang wrote: > There's inconsistency between dmesg and the trace event output. > When dmesg says "severity=Corrected", the trace event says > "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is > defined in edac.h: > > enum hw_event_mc_err_type { > HW_EVENT_ERR_CORRECTED, > HW_EVENT_ERR_UNCORRECTED, > HW_EVENT_ERR_FATAL, > HW_EVENT_ERR_INFO, > }; > > while aer_print_error() uses aer_error_severity_string[] defined as: > > static const char *aer_error_severity_string[] = { > "Uncorrected (Non-Fatal)", > "Uncorrected (Fatal)", > "Corrected" > }; > > In this case dmesg is correct because info->severity is assigned in > aer_isr_one_error() using the definitions in include/linux/ras.h: > > Signed-off-by: Rui Wang <rui.y.wang@intel.com> Applied, thanks.
On Sun, Dec 08, 2013 at 03:43:12PM +0100, Borislav Petkov wrote: > On Sun, Dec 08, 2013 at 12:17:53PM +0800, Rui Wang wrote: > > There's inconsistency between dmesg and the trace event output. > > When dmesg says "severity=Corrected", the trace event says > > "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is > > defined in edac.h: > > > > enum hw_event_mc_err_type { > > HW_EVENT_ERR_CORRECTED, > > HW_EVENT_ERR_UNCORRECTED, > > HW_EVENT_ERR_FATAL, > > HW_EVENT_ERR_INFO, > > }; > > > > while aer_print_error() uses aer_error_severity_string[] defined as: > > > > static const char *aer_error_severity_string[] = { > > "Uncorrected (Non-Fatal)", > > "Uncorrected (Fatal)", > > "Corrected" > > }; > > > > In this case dmesg is correct because info->severity is assigned in > > aer_isr_one_error() using the definitions in include/linux/ras.h: > > > > Signed-off-by: Rui Wang <rui.y.wang@intel.com> > > Applied, thanks. I said "Applied" but this is Bjorn's area: Bjorn, wanna take this one or are you fine with it going over the RAS tree? Btw, I have a 2 trivial cleanups for aerdrv_errprint.c which are following... Thanks.
[+cc Joe] On Sun, Dec 8, 2013 at 8:09 AM, Borislav Petkov <bp@alien8.de> wrote: > On Sun, Dec 08, 2013 at 03:43:12PM +0100, Borislav Petkov wrote: >> On Sun, Dec 08, 2013 at 12:17:53PM +0800, Rui Wang wrote: >> > There's inconsistency between dmesg and the trace event output. >> > When dmesg says "severity=Corrected", the trace event says >> > "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is >> > defined in edac.h: >> > >> > enum hw_event_mc_err_type { >> > HW_EVENT_ERR_CORRECTED, >> > HW_EVENT_ERR_UNCORRECTED, >> > HW_EVENT_ERR_FATAL, >> > HW_EVENT_ERR_INFO, >> > }; >> > >> > while aer_print_error() uses aer_error_severity_string[] defined as: >> > >> > static const char *aer_error_severity_string[] = { >> > "Uncorrected (Non-Fatal)", >> > "Uncorrected (Fatal)", >> > "Corrected" >> > }; >> > >> > In this case dmesg is correct because info->severity is assigned in >> > aer_isr_one_error() using the definitions in include/linux/ras.h: >> > >> > Signed-off-by: Rui Wang <rui.y.wang@intel.com> >> >> Applied, thanks. > > I said "Applied" but this is Bjorn's area: Bjorn, wanna take this one or > are you fine with it going over the RAS tree? > > Btw, I have a 2 trivial cleanups for aerdrv_errprint.c which are > following... I think it's worthwhile to keep all three patches together, and I'd be happy to merge them via PCI. It looks like Joe had some good questions, so once you resolve them, I can merge them, or ack them and you can take them. I think either way will be fine. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 09, 2013 at 04:41:18PM -0700, Bjorn Helgaas wrote: > I think it's worthwhile to keep all three patches together, and I'd > be happy to merge them via PCI. It looks like Joe had some good > questions, so once you resolve them, I can merge them, or ack them and > you can take them. I think either way will be fine. So actually I'm only really interested in the first patch which clearly fixes a minor issue in the PCIe AER tracepoint and that one I probably should take through the RAS tree as the original patch adding the tracepoint went through it too. Unless you really want to take it - then be my guest! :-) My intention with aerdrv_errprint.c was to cleanup some obvious stuff which sprang at me while looking at the code, *without* *any* change in functionality except the minor and obviously sensible s/aer_tlp_header:/TLP Header:/ replacement. (printk strings are not an API anyway). If you're fine with those patches as they are right now, I can apply them or you can take them or whatever. If not, then so be it. If someone wants to diddle with what could be done better and more readable, then that someone can do this at his own time - I don't really want to waste mine. Thanks.
diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h index 88b8783..1c875ad 100644 --- a/include/trace/events/ras.h +++ b/include/trace/events/ras.h @@ -5,7 +5,7 @@ #define _TRACE_AER_H #include <linux/tracepoint.h> -#include <linux/edac.h> +#include <linux/aer.h> /* @@ -63,10 +63,10 @@ TRACE_EVENT(aer_event, TP_printk("%s PCIe Bus Error: severity=%s, %s\n", __get_str(dev_name), - __entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" : - __entry->severity == HW_EVENT_ERR_FATAL ? - "Fatal" : "Uncorrected", - __entry->severity == HW_EVENT_ERR_CORRECTED ? + __entry->severity == AER_CORRECTABLE ? "Corrected" : + __entry->severity == AER_FATAL ? + "Fatal" : "Uncorrected, non-fatal", + __entry->severity == AER_CORRECTABLE ? __print_flags(__entry->status, "|", aer_correctable_errors) : __print_flags(__entry->status, "|", aer_uncorrectable_errors)) );
There's inconsistency between dmesg and the trace event output. When dmesg says "severity=Corrected", the trace event says "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is defined in edac.h: enum hw_event_mc_err_type { HW_EVENT_ERR_CORRECTED, HW_EVENT_ERR_UNCORRECTED, HW_EVENT_ERR_FATAL, HW_EVENT_ERR_INFO, }; while aer_print_error() uses aer_error_severity_string[] defined as: static const char *aer_error_severity_string[] = { "Uncorrected (Non-Fatal)", "Uncorrected (Fatal)", "Corrected" }; In this case dmesg is correct because info->severity is assigned in aer_isr_one_error() using the definitions in include/linux/ras.h: Signed-off-by: Rui Wang <rui.y.wang@intel.com> --- include/trace/events/ras.h | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-)