diff mbox

aerdrv: Fix severity usage in aer trace event

Message ID 1386476273-2418-1-git-send-email-rui.y.wang@intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Rui Wang Dec. 8, 2013, 4:17 a.m. UTC
There's inconsistency between dmesg and the trace event output.
When dmesg says "severity=Corrected", the trace event says
"severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is
defined in edac.h:

enum hw_event_mc_err_type {
        HW_EVENT_ERR_CORRECTED,
        HW_EVENT_ERR_UNCORRECTED,
        HW_EVENT_ERR_FATAL,
        HW_EVENT_ERR_INFO,
};

while aer_print_error() uses aer_error_severity_string[] defined as:

static const char *aer_error_severity_string[] = {
        "Uncorrected (Non-Fatal)",
        "Uncorrected (Fatal)",
        "Corrected"
};

In this case dmesg is correct because info->severity is assigned in
aer_isr_one_error() using the definitions in include/linux/ras.h:

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
---
 include/trace/events/ras.h |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

Comments

Ethan Zhao Dec. 8, 2013, 5:09 a.m. UTC | #1
Rui,
  I like this patch.  thanks your revision.

Ethan

On Sun, Dec 8, 2013 at 12:17 PM, Rui Wang <ruiv.wang@gmail.com> wrote:
> There's inconsistency between dmesg and the trace event output.
> When dmesg says "severity=Corrected", the trace event says
> "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is
> defined in edac.h:
>
> enum hw_event_mc_err_type {
>         HW_EVENT_ERR_CORRECTED,
>         HW_EVENT_ERR_UNCORRECTED,
>         HW_EVENT_ERR_FATAL,
>         HW_EVENT_ERR_INFO,
> };
>
> while aer_print_error() uses aer_error_severity_string[] defined as:
>
> static const char *aer_error_severity_string[] = {
>         "Uncorrected (Non-Fatal)",
>         "Uncorrected (Fatal)",
>         "Corrected"
> };
>
> In this case dmesg is correct because info->severity is assigned in
> aer_isr_one_error() using the definitions in include/linux/ras.h:
>
> Signed-off-by: Rui Wang <rui.y.wang@intel.com>
> ---
>  include/trace/events/ras.h |   10 +++++-----
>  1 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
> index 88b8783..1c875ad 100644
> --- a/include/trace/events/ras.h
> +++ b/include/trace/events/ras.h
> @@ -5,7 +5,7 @@
>  #define _TRACE_AER_H
>
>  #include <linux/tracepoint.h>
> -#include <linux/edac.h>
> +#include <linux/aer.h>
>
>
>  /*
> @@ -63,10 +63,10 @@ TRACE_EVENT(aer_event,
>
>         TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
>                 __get_str(dev_name),
> -               __entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" :
> -                       __entry->severity == HW_EVENT_ERR_FATAL ?
> -                       "Fatal" : "Uncorrected",
> -               __entry->severity == HW_EVENT_ERR_CORRECTED ?
> +               __entry->severity == AER_CORRECTABLE ? "Corrected" :
> +                       __entry->severity == AER_FATAL ?
> +                       "Fatal" : "Uncorrected, non-fatal",
> +               __entry->severity == AER_CORRECTABLE ?
>                 __print_flags(__entry->status, "|", aer_correctable_errors) :
>                 __print_flags(__entry->status, "|", aer_uncorrectable_errors))
>  );
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Dec. 8, 2013, 2:43 p.m. UTC | #2
On Sun, Dec 08, 2013 at 12:17:53PM +0800, Rui Wang wrote:
> There's inconsistency between dmesg and the trace event output.
> When dmesg says "severity=Corrected", the trace event says
> "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is
> defined in edac.h:
> 
> enum hw_event_mc_err_type {
>         HW_EVENT_ERR_CORRECTED,
>         HW_EVENT_ERR_UNCORRECTED,
>         HW_EVENT_ERR_FATAL,
>         HW_EVENT_ERR_INFO,
> };
> 
> while aer_print_error() uses aer_error_severity_string[] defined as:
> 
> static const char *aer_error_severity_string[] = {
>         "Uncorrected (Non-Fatal)",
>         "Uncorrected (Fatal)",
>         "Corrected"
> };
> 
> In this case dmesg is correct because info->severity is assigned in
> aer_isr_one_error() using the definitions in include/linux/ras.h:
> 
> Signed-off-by: Rui Wang <rui.y.wang@intel.com>

Applied, thanks.
Borislav Petkov Dec. 8, 2013, 3:09 p.m. UTC | #3
On Sun, Dec 08, 2013 at 03:43:12PM +0100, Borislav Petkov wrote:
> On Sun, Dec 08, 2013 at 12:17:53PM +0800, Rui Wang wrote:
> > There's inconsistency between dmesg and the trace event output.
> > When dmesg says "severity=Corrected", the trace event says
> > "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is
> > defined in edac.h:
> > 
> > enum hw_event_mc_err_type {
> >         HW_EVENT_ERR_CORRECTED,
> >         HW_EVENT_ERR_UNCORRECTED,
> >         HW_EVENT_ERR_FATAL,
> >         HW_EVENT_ERR_INFO,
> > };
> > 
> > while aer_print_error() uses aer_error_severity_string[] defined as:
> > 
> > static const char *aer_error_severity_string[] = {
> >         "Uncorrected (Non-Fatal)",
> >         "Uncorrected (Fatal)",
> >         "Corrected"
> > };
> > 
> > In this case dmesg is correct because info->severity is assigned in
> > aer_isr_one_error() using the definitions in include/linux/ras.h:
> > 
> > Signed-off-by: Rui Wang <rui.y.wang@intel.com>
> 
> Applied, thanks.

I said "Applied" but this is Bjorn's area: Bjorn, wanna take this one or
are you fine with it going over the RAS tree?

Btw, I have a 2 trivial cleanups for aerdrv_errprint.c which are
following...

Thanks.
Bjorn Helgaas Dec. 9, 2013, 11:41 p.m. UTC | #4
[+cc Joe]

On Sun, Dec 8, 2013 at 8:09 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Sun, Dec 08, 2013 at 03:43:12PM +0100, Borislav Petkov wrote:
>> On Sun, Dec 08, 2013 at 12:17:53PM +0800, Rui Wang wrote:
>> > There's inconsistency between dmesg and the trace event output.
>> > When dmesg says "severity=Corrected", the trace event says
>> > "severity=Fatal". What happens is that HW_EVENT_ERR_CORRECTED is
>> > defined in edac.h:
>> >
>> > enum hw_event_mc_err_type {
>> >         HW_EVENT_ERR_CORRECTED,
>> >         HW_EVENT_ERR_UNCORRECTED,
>> >         HW_EVENT_ERR_FATAL,
>> >         HW_EVENT_ERR_INFO,
>> > };
>> >
>> > while aer_print_error() uses aer_error_severity_string[] defined as:
>> >
>> > static const char *aer_error_severity_string[] = {
>> >         "Uncorrected (Non-Fatal)",
>> >         "Uncorrected (Fatal)",
>> >         "Corrected"
>> > };
>> >
>> > In this case dmesg is correct because info->severity is assigned in
>> > aer_isr_one_error() using the definitions in include/linux/ras.h:
>> >
>> > Signed-off-by: Rui Wang <rui.y.wang@intel.com>
>>
>> Applied, thanks.
>
> I said "Applied" but this is Bjorn's area: Bjorn, wanna take this one or
> are you fine with it going over the RAS tree?
>
> Btw, I have a 2 trivial cleanups for aerdrv_errprint.c which are
> following...

I think it's worthwhile to keep all three patches together, and I'd be
happy to merge them via PCI.  It looks like Joe had some good
questions, so once you resolve them, I can merge them, or ack them and
you can take them.  I think either way will be fine.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Dec. 10, 2013, 12:40 a.m. UTC | #5
On Mon, Dec 09, 2013 at 04:41:18PM -0700, Bjorn Helgaas wrote:
> I think it's worthwhile to keep all three patches together, and I'd
> be happy to merge them via PCI. It looks like Joe had some good
> questions, so once you resolve them, I can merge them, or ack them and
> you can take them. I think either way will be fine.

So actually I'm only really interested in the first patch which clearly
fixes a minor issue in the PCIe AER tracepoint and that one I probably
should take through the RAS tree as the original patch adding the
tracepoint went through it too. Unless you really want to take it - then
be my guest! :-)

My intention with aerdrv_errprint.c was to cleanup some obvious stuff
which sprang at me while looking at the code, *without* *any* change in
functionality except the minor and obviously sensible

	s/aer_tlp_header:/TLP Header:/

replacement. (printk strings are not an API anyway).

If you're fine with those patches as they are right now, I can apply
them or you can take them or whatever. If not, then so be it.

If someone wants to diddle with what could be done better and more
readable, then that someone can do this at his own time - I don't really
want to waste mine.

Thanks.
diff mbox

Patch

diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
index 88b8783..1c875ad 100644
--- a/include/trace/events/ras.h
+++ b/include/trace/events/ras.h
@@ -5,7 +5,7 @@ 
 #define _TRACE_AER_H
 
 #include <linux/tracepoint.h>
-#include <linux/edac.h>
+#include <linux/aer.h>
 
 
 /*
@@ -63,10 +63,10 @@  TRACE_EVENT(aer_event,
 
 	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
 		__get_str(dev_name),
-		__entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" :
-			__entry->severity == HW_EVENT_ERR_FATAL ?
-			"Fatal" : "Uncorrected",
-		__entry->severity == HW_EVENT_ERR_CORRECTED ?
+		__entry->severity == AER_CORRECTABLE ? "Corrected" :
+			__entry->severity == AER_FATAL ?
+			"Fatal" : "Uncorrected, non-fatal",
+		__entry->severity == AER_CORRECTABLE ?
 		__print_flags(__entry->status, "|", aer_correctable_errors) :
 		__print_flags(__entry->status, "|", aer_uncorrectable_errors))
 );