Message ID | 22b2dae2a6ac340d9d45c28481d746ec1064cd6c.1633453452.git.naveennaidu479@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Fix long standing AER Error Handling Issues | expand |
On Tue, Oct 05, 2021 at 10:48:08PM +0530, Naveen Naidu wrote: > Currently, we do not print the "id" field in the AER error logs. Yet the > aer_agent_string[] has the word "id" in it. The AER error log looks > like: > > pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) > > Without the "id" field in the error log, The aer_agent_string[] > (eg: "Receiver ID") does not make sense. A user reading the > aer_agent_string[] in the log, might inadvertently look for an "id" > field and not finding it might lead to confusion. > > Remove the "ID" from the aer_agent_string[]. > > The following are sample dummy errors inject via aer-inject. I like this, and the problem it fixes was my fault because these "ID" strings should have been removed by 010caed4ccb6. If it's straightforward enough, it would be nice to have the aer-inject command line here in the commit log to make it easier for people to play with this. > Before > ======= > > In 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID"), > the "id" field was removed from the AER error logs, so currently AER > logs look like: > > pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03:0 > pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) <--- no id field > pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 > pcieport 0000:00:03.0: [ 6] BadTLP > > After > ====== > > pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0 > pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) > pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 > pcieport 0000:00:03.0: [ 6] BadTLP > > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com> > --- > drivers/pci/pcie/aer.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 9784fdcf3006..241ff361b43c 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -516,10 +516,10 @@ static const char *aer_uncorrectable_error_string[] = { > }; > > static const char *aer_agent_string[] = { > - "Receiver ID", > - "Requester ID", > - "Completer ID", > - "Transmitter ID" > + "Receiver", > + "Requester", > + "Completer", > + "Transmitter" > }; > > #define aer_stats_dev_attr(name, stats_array, strings_array, \ > @@ -703,7 +703,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > const char *level; > > if (!info->status) { > - pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", > + pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent)\n", > aer_error_severity_string[info->severity]); > goto out; > } > -- > 2.25.1 > > _______________________________________________ > Linux-kernel-mentees mailing list > Linux-kernel-mentees@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
On 20/10, Bjorn Helgaas wrote: > On Tue, Oct 05, 2021 at 10:48:08PM +0530, Naveen Naidu wrote: > > Currently, we do not print the "id" field in the AER error logs. Yet the > > aer_agent_string[] has the word "id" in it. The AER error log looks > > like: > > > > pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) > > > > Without the "id" field in the error log, The aer_agent_string[] > > (eg: "Receiver ID") does not make sense. A user reading the > > aer_agent_string[] in the log, might inadvertently look for an "id" > > field and not finding it might lead to confusion. > > > > Remove the "ID" from the aer_agent_string[]. > > > > The following are sample dummy errors inject via aer-inject. > > I like this, and the problem it fixes was my fault because > these "ID" strings should have been removed by 010caed4ccb6. > > If it's straightforward enough, it would be nice to have the > aer-inject command line here in the commit log to make it easier > for people to play with this. > Thank you for the review. Do you mean something like: The following sample dummy errors are injected via aer-inject via the following steps: 1. The steps to compile the aer-inject tool is mentioned in (Section 4. Software error inject) of the document [1] [1]: https://www.kernel.org/doc/Documentation/PCI/pcieaer-howto.txt Make sure to place the aer-inject executable at the home directory of the qemu system or at any other place. 2. Emulate a PCIE architecture using qemu, A sample looks like following: qemu-system-x86_64 -kernel ../linux/arch/x86_64/boot/bzImage \ -initrd buildroot-build/images/rootfs.cpio.gz \ -append "console=ttyS0" \ -enable-kvm -nographic \ -M q35 \ -device pcie-root-port,bus=pcie.0,id=rp1,slot=1 \ -device pcie-pci-bridge,id=br1,bus=rp1 \ -device e1000,bus=br1,addr=8 Note that the PCIe features are available only when using the 'q35' Machine [2] [2]: https://github.com/qemu/qemu/blob/master/docs/pcie.txt 3. Once the qemu system starts up, create a sample aer-file or use any example aer file from [3] [3]: https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git/tree/examples 4. Inject any aer-error using ./aer-inject aer-file This does look a tad bit longer for a commit log so I am unsure if you would like to have it there. If you are okay with it, I would be happy to add it to that :) > > Before > > ======= > > > > In 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID"), > > the "id" field was removed from the AER error logs, so currently AER > > logs look like: > > > > pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03:0 > > pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) <--- no id field > > pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 > > pcieport 0000:00:03.0: [ 6] BadTLP > > > > After > > ====== > > > > pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0 > > pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) > > pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 > > pcieport 0000:00:03.0: [ 6] BadTLP > > > > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com> > > --- > > drivers/pci/pcie/aer.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 9784fdcf3006..241ff361b43c 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -516,10 +516,10 @@ static const char *aer_uncorrectable_error_string[] = { > > }; > > > > static const char *aer_agent_string[] = { > > - "Receiver ID", > > - "Requester ID", > > - "Completer ID", > > - "Transmitter ID" > > + "Receiver", > > + "Requester", > > + "Completer", > > + "Transmitter" > > }; > > > > #define aer_stats_dev_attr(name, stats_array, strings_array, \ > > @@ -703,7 +703,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > > const char *level; > > > > if (!info->status) { > > - pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", > > + pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent)\n", > > aer_error_severity_string[info->severity]); > > goto out; > > } > > -- > > 2.25.1 > > > > _______________________________________________ > > Linux-kernel-mentees mailing list > > Linux-kernel-mentees@lists.linuxfoundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
On Thu, Oct 21, 2021 at 10:00:21PM +0530, Naveen Naidu wrote: > On 20/10, Bjorn Helgaas wrote: > > On Tue, Oct 05, 2021 at 10:48:08PM +0530, Naveen Naidu wrote: > > > Currently, we do not print the "id" field in the AER error logs. Yet the > > > aer_agent_string[] has the word "id" in it. The AER error log looks > > > like: > > > > > > pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) > > > > > > Without the "id" field in the error log, The aer_agent_string[] > > > (eg: "Receiver ID") does not make sense. A user reading the > > > aer_agent_string[] in the log, might inadvertently look for an "id" > > > field and not finding it might lead to confusion. > > > > > > Remove the "ID" from the aer_agent_string[]. > > > > > > The following are sample dummy errors inject via aer-inject. > > > > I like this, and the problem it fixes was my fault because > > these "ID" strings should have been removed by 010caed4ccb6. > > > > If it's straightforward enough, it would be nice to have the > > aer-inject command line here in the commit log to make it easier > > for people to play with this. > > > > Thank you for the review. Do you mean something like: > > The following sample dummy errors are injected via aer-inject via the > following steps: > > 1. The steps to compile the aer-inject tool is mentioned in (Section > 4. Software error inject) of the document [1] > > [1]: https://www.kernel.org/doc/Documentation/PCI/pcieaer-howto.txt > > Make sure to place the aer-inject executable at the home directory > of the qemu system or at any other place. > > 2. Emulate a PCIE architecture using qemu, A sample looks like > following: > > qemu-system-x86_64 -kernel ../linux/arch/x86_64/boot/bzImage \ > -initrd buildroot-build/images/rootfs.cpio.gz \ > -append "console=ttyS0" \ > -enable-kvm -nographic \ > -M q35 \ > -device pcie-root-port,bus=pcie.0,id=rp1,slot=1 \ > -device pcie-pci-bridge,id=br1,bus=rp1 \ > -device e1000,bus=br1,addr=8 > > Note that the PCIe features are available only when using the > 'q35' Machine [2] > [2]: https://github.com/qemu/qemu/blob/master/docs/pcie.txt > > 3. Once the qemu system starts up, create a sample aer-file or use any > example aer file from [3] > > [3]: > https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git/tree/examples > > 4. Inject any aer-error using > > ./aer-inject aer-file > > This does look a tad bit longer for a commit log so I am unsure if you > would like to have it there. If you are okay with it, I would be happy > to add it to that :) Yes, that's kind of long. Something like this https://git.kernel.org/linus/d95f20c4f070 would be enough for the commit log, especially since you've now provided all the details in the email thread, where we can find them via the Link: tag. Bjorn
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9784fdcf3006..241ff361b43c 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -516,10 +516,10 @@ static const char *aer_uncorrectable_error_string[] = { }; static const char *aer_agent_string[] = { - "Receiver ID", - "Requester ID", - "Completer ID", - "Transmitter ID" + "Receiver", + "Requester", + "Completer", + "Transmitter" }; #define aer_stats_dev_attr(name, stats_array, strings_array, \ @@ -703,7 +703,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) const char *level; if (!info->status) { - pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", + pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent)\n", aer_error_severity_string[info->severity]); goto out; }
Currently, we do not print the "id" field in the AER error logs. Yet the aer_agent_string[] has the word "id" in it. The AER error log looks like: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Without the "id" field in the error log, The aer_agent_string[] (eg: "Receiver ID") does not make sense. A user reading the aer_agent_string[] in the log, might inadvertently look for an "id" field and not finding it might lead to confusion. Remove the "ID" from the aer_agent_string[]. The following are sample dummy errors inject via aer-inject. Before ======= In 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID"), the "id" field was removed from the AER error logs, so currently AER logs look like: pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03:0 pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) <--- no id field pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 pcieport 0000:00:03.0: [ 6] BadTLP After ====== pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0 pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver) pcieport 0000:00:03.0: device [1b36:000c] error status/mask=00000040/0000e000 pcieport 0000:00:03.0: [ 6] BadTLP Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com> --- drivers/pci/pcie/aer.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)