Message ID | 1508254922-30925-1-git-send-email-tbaicar@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
From: Tyler Baicar > Sent: 17 October 2017 16:42 > Currently the AER driver uses cper_print_bits() to print the AER status > string. This causes the status string to not include the proper PCI device > name prefix that the other AER prints include. Also, it has a different > print level than all the other AER prints. > > Update the AER driver to print the AER status string with the proper string > prefix and proper print level. > > Previous log example: > > e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 > Receiver Error, Bad TLP ... > New log: > > e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 > e1000e 0003:01:00.1: Receiver Error > e1000e 0003:01:00.1: Bad TLP Wouldn't it be better to manage to print the above all on 1 line? ... > index 54c4b69..b718daa 100644 > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > @@ -206,6 +206,19 @@ void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > } > > #ifdef CONFIG_ACPI_APEI_PCIEAER > +void dev_print_bits(struct pci_dev *dev, unsigned int bits, > + const char * const strs[], unsigned int strs_size) static and rename to aer_print_bits since this isn't a generic 'dev' function. David
On 10/17/2017 12:00 PM, David Laight wrote: > From: Tyler Baicar >> Sent: 17 October 2017 16:42 >> Currently the AER driver uses cper_print_bits() to print the AER status >> string. This causes the status string to not include the proper PCI device >> name prefix that the other AER prints include. Also, it has a different >> print level than all the other AER prints. >> >> Update the AER driver to print the AER status string with the proper string >> prefix and proper print level. >> >> Previous log example: >> >> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 >> Receiver Error, Bad TLP > ... >> New log: >> >> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 >> e1000e 0003:01:00.1: Receiver Error >> e1000e 0003:01:00.1: Bad TLP > Wouldn't it be better to manage to print the above all on 1 line? Hello David, I broke them up into separate lines to simplify the code. If you look at cper_print_bits(), it is not a clean solution and involves some hard coded values to try to limit the lines to 80 characters. http://elixir.free-electrons.com/linux/v4.14-rc5/source/drivers/firmware/efi/cper.c#L85 I think printing one error per line in this case is a better solution since the code is much cleaner. If you would like me to add this code to print them in a list and limit the lines to 80 characters I can add that in though. > > ... >> index 54c4b69..b718daa 100644 >> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c >> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c >> @@ -206,6 +206,19 @@ void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) >> } >> >> #ifdef CONFIG_ACPI_APEI_PCIEAER >> +void dev_print_bits(struct pci_dev *dev, unsigned int bits, >> + const char * const strs[], unsigned int strs_size) > static and rename to aer_print_bits since this isn't a generic 'dev' > function. Will do. Thanks, Tyler -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
From: Tyler Baicar [mailto:tbaicar@codeaurora.org] > Sent: 17 October 2017 18:14 > On 10/17/2017 12:00 PM, David Laight wrote: > > From: Tyler Baicar > >> Sent: 17 October 2017 16:42 > >> Currently the AER driver uses cper_print_bits() to print the AER status > >> string. This causes the status string to not include the proper PCI device > >> name prefix that the other AER prints include. Also, it has a different > >> print level than all the other AER prints. > >> > >> Update the AER driver to print the AER status string with the proper string > >> prefix and proper print level. > >> > >> Previous log example: > >> > >> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 > >> Receiver Error, Bad TLP > > ... > >> New log: > >> > >> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 > >> e1000e 0003:01:00.1: Receiver Error > >> e1000e 0003:01:00.1: Bad TLP > > Wouldn't it be better to manage to print the above all on 1 line? > I broke them up into separate lines to simplify the code. If you look at > cper_print_bits(), > it is not a clean solution and involves some hard coded values to try to limit > the lines to 80 characters. I'm not sure the 80 char limit is needed. How about: #define MAX_STR 32 void pr_bits(unsigned int val, const char *strs[], unsigned int num_str) { const char *str[MAX_STR] = {}; unsigned int i, num; if (num_str > MAX_STR) num_str = MAX_STR; for (i = 0, num = 0; i < num_str; i++) { if (!(val & (1 << i))) continue; str[num++] = strs[i]; } printf(" %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s\n" + (MAX_STR - num) * 3, str[0], str[1], str[2], str[3], str[4], str[5], str[6], str[7], str[8], str[9], str[10], str[11], str[12], str[13], str[14], str[15], str[16], str[17], str[18], str[19], str[20], str[21], str[22], str[23], str[24], str[25], str[26], str[27], str[28], str[29], str[30], str[31]); } For kernel use you'd probably want to pass in 'dev' and a printf list and use %pV to put the fixed text on the front of the line. All rather begging for a new %p? feature that is passed the value, strings and separator. David
On 10/18/2017 6:14 AM, David Laight wrote: > From: Tyler Baicar [mailto:tbaicar@codeaurora.org] >> Sent: 17 October 2017 18:14 >> On 10/17/2017 12:00 PM, David Laight wrote: >>> From: Tyler Baicar >>>> Sent: 17 October 2017 16:42 >>>> Currently the AER driver uses cper_print_bits() to print the AER status >>>> string. This causes the status string to not include the proper PCI device >>>> name prefix that the other AER prints include. Also, it has a different >>>> print level than all the other AER prints. >>>> >>>> Update the AER driver to print the AER status string with the proper string >>>> prefix and proper print level. >>>> >>>> Previous log example: >>>> >>>> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 >>>> Receiver Error, Bad TLP >>> ... >>>> New log: >>>> >>>> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 >>>> e1000e 0003:01:00.1: Receiver Error >>>> e1000e 0003:01:00.1: Bad TLP >>> Wouldn't it be better to manage to print the above all on 1 line? > >> I broke them up into separate lines to simplify the code. If you look at >> cper_print_bits(), >> it is not a clean solution and involves some hard coded values to try to limit >> the lines to 80 characters. > I'm not sure the 80 char limit is needed. > > > How about: > #define MAX_STR 32 > void pr_bits(unsigned int val, const char *strs[], unsigned int num_str) > { > const char *str[MAX_STR] = {}; > unsigned int i, num; > > if (num_str > MAX_STR) > num_str = MAX_STR; > for (i = 0, num = 0; i < num_str; i++) { > if (!(val & (1 << i))) > continue; > str[num++] = strs[i]; > } > printf(" %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s\n" + (MAX_STR - num) * 3, > str[0], str[1], str[2], str[3], > str[4], str[5], str[6], str[7], > str[8], str[9], str[10], str[11], > str[12], str[13], str[14], str[15], > str[16], str[17], str[18], str[19], > str[20], str[21], str[22], str[23], > str[24], str[25], str[26], str[27], > str[28], str[29], str[30], str[31]); > } > > For kernel use you'd probably want to pass in 'dev' and a printf list > and use %pV to put the fixed text on the front of the line. > > All rather begging for a new %p? feature that is passed the value, strings > and separator. Hi David, This seems like a bad approach. This can make the print in the kernel logs and the code both look pretty awful. I would prefer to have each error that occurred have it's own print line in the logs rather than introduce this code for the sole purpose of keeping the list on a single print line. I don't see any real downside to having a few additional print lines in error scenarios. Thanks, Tyler
On Tue, Oct 17, 2017 at 09:42:02AM -0600, Tyler Baicar wrote: > Currently the AER driver uses cper_print_bits() to print the AER status > string. This causes the status string to not include the proper PCI device > name prefix that the other AER prints include. Also, it has a different > print level than all the other AER prints. > > Update the AER driver to print the AER status string with the proper string > prefix and proper print level. > > Previous log example: > > e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 > Receiver Error, Bad TLP > e1000e 0003:01:00.1: aer_layer=Physical Layer, aer_agent=Receiver ID > pcieport 0003:00:00.0: aer_status: 0x00001000, aer_mask: 0x0000e000 > Replay Timer Timeout > pcieport 0003:00:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID > > New log: > > e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 > e1000e 0003:01:00.1: Receiver Error > e1000e 0003:01:00.1: Bad TLP > e1000e 0003:01:00.1: aer_layer=Physical Layer, aer_agent=Receiver ID > pcieport 0003:00:00.0: aer_status: 0x00001000, aer_mask: 0x0000e000 > pcieport 0003:00:00.0: Replay Timer Timeout > pcieport 0003:00:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID I definitely think it's MUCH better to use dev_err() as you do. I don't like the cper_print_bits() strategy of inserting line breaks to fit in 80 columns. That leads to atomicity issues, e.g., other printk output getting inserted in the middle of a single AER log, and suggests an ordering ("Receiver Error" occurred before "Bad TLP") that isn't real. It'd be ideal if everything fit on one line per event, but that might not be practical. I'm not necessarily attached to the actual strings. These messages are for sophisticated users and maybe could be abbreviated as in lspci output. It might actually be kind of neat if the output here matched up with the output of "lspci -vv" (lspci prints all the bits; here you probably want only the set bits). Or maybe not. But even what you have here is a huge improvement. I *hate* unattached things in dmesg like we currently get. There's no reliable way to connect that "Receiver Error, Bad TLP" with the device. > Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> > --- > drivers/pci/pcie/aer/aerdrv_errprint.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c > index 54c4b69..b718daa 100644 > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > @@ -206,6 +206,19 @@ void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > } > > #ifdef CONFIG_ACPI_APEI_PCIEAER > +void dev_print_bits(struct pci_dev *dev, unsigned int bits, > + const char * const strs[], unsigned int strs_size) > +{ > + unsigned int i; > + > + for (i = 0; i < strs_size; i++) { > + if (!(bits & (1U << i))) > + continue; > + if (strs[i]) > + dev_err(&dev->dev, "%s\n", strs[i]); > + } > +} > + > int cper_severity_to_aer(int cper_severity) > { > switch (cper_severity) { > @@ -243,7 +256,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, > agent = AER_GET_AGENT(aer_severity, status); > > dev_err(&dev->dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); > - cper_print_bits("", status, status_strs, status_strs_size); > + dev_print_bits(dev, status, status_strs, status_strs_size); > dev_err(&dev->dev, "aer_layer=%s, aer_agent=%s\n", > aer_error_layer[layer], aer_agent_string[agent]); > > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project. >
On 10/20/2017 7:55 PM, Bjorn Helgaas wrote: > On Tue, Oct 17, 2017 at 09:42:02AM -0600, Tyler Baicar wrote: >> Currently the AER driver uses cper_print_bits() to print the AER status >> string. This causes the status string to not include the proper PCI device >> name prefix that the other AER prints include. Also, it has a different >> print level than all the other AER prints. >> >> Update the AER driver to print the AER status string with the proper string >> prefix and proper print level. >> >> Previous log example: >> >> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 >> Receiver Error, Bad TLP >> e1000e 0003:01:00.1: aer_layer=Physical Layer, aer_agent=Receiver ID >> pcieport 0003:00:00.0: aer_status: 0x00001000, aer_mask: 0x0000e000 >> Replay Timer Timeout >> pcieport 0003:00:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID >> >> New log: >> >> e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 >> e1000e 0003:01:00.1: Receiver Error >> e1000e 0003:01:00.1: Bad TLP >> e1000e 0003:01:00.1: aer_layer=Physical Layer, aer_agent=Receiver ID >> pcieport 0003:00:00.0: aer_status: 0x00001000, aer_mask: 0x0000e000 >> pcieport 0003:00:00.0: Replay Timer Timeout >> pcieport 0003:00:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID > I definitely think it's MUCH better to use dev_err() as you do. > > I don't like the cper_print_bits() strategy of inserting line breaks > to fit in 80 columns. That leads to atomicity issues, e.g., other > printk output getting inserted in the middle of a single AER log, and > suggests an ordering ("Receiver Error" occurred before "Bad TLP") that > isn't real. It'd be ideal if everything fit on one line per event, > but that might not be practical. > > I'm not necessarily attached to the actual strings. These messages > are for sophisticated users and maybe could be abbreviated as in lspci > output. It might actually be kind of neat if the output here matched > up with the output of "lspci -vv" (lspci prints all the bits; here you > probably want only the set bits). Or maybe not. > > But even what you have here is a huge improvement. I *hate* > unattached things in dmesg like we currently get. There's no reliable > way to connect that "Receiver Error, Bad TLP" with the device. Hello Bjorn, Thanks for the feedback. Do you think this can get into 4.15? Thanks, Tyler >> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> >> --- >> drivers/pci/pcie/aer/aerdrv_errprint.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c >> index 54c4b69..b718daa 100644 >> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c >> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c >> @@ -206,6 +206,19 @@ void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) >> } >> >> #ifdef CONFIG_ACPI_APEI_PCIEAER >> +void dev_print_bits(struct pci_dev *dev, unsigned int bits, >> + const char * const strs[], unsigned int strs_size) >> +{ >> + unsigned int i; >> + >> + for (i = 0; i < strs_size; i++) { >> + if (!(bits & (1U << i))) >> + continue; >> + if (strs[i]) >> + dev_err(&dev->dev, "%s\n", strs[i]); >> + } >> +} >> + >> int cper_severity_to_aer(int cper_severity) >> { >> switch (cper_severity) { >> @@ -243,7 +256,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, >> agent = AER_GET_AGENT(aer_severity, status); >> >> dev_err(&dev->dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); >> - cper_print_bits("", status, status_strs, status_strs_size); >> + dev_print_bits(dev, status, status_strs, status_strs_size); >> dev_err(&dev->dev, "aer_layer=%s, aer_agent=%s\n", >> aer_error_layer[layer], aer_agent_string[agent]); >> >> -- >> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. >> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, >> a Linux Foundation Collaborative Project. >>
On 10/17/2017 11:42 AM, Tyler Baicar wrote: > Currently the AER driver uses cper_print_bits() to print the AER status > string. This causes the status string to not include the proper PCI device > name prefix that the other AER prints include. Also, it has a different > print level than all the other AER prints. > > Update the AER driver to print the AER status string with the proper string > prefix and proper print level. Hello, Will this patch be pulled into 4.15? Thanks, Tyler
Hi Tyler, On Wed, Nov 15, 2017 at 09:47:41AM -0500, Tyler Baicar wrote: > On 10/17/2017 11:42 AM, Tyler Baicar wrote: > >Currently the AER driver uses cper_print_bits() to print the AER status > >string. This causes the status string to not include the proper PCI device > >name prefix that the other AER prints include. Also, it has a different > >print level than all the other AER prints. > > > >Update the AER driver to print the AER status string with the proper string > >prefix and proper print level. > Hello, > > Will this patch be pulled into 4.15? Sorry, I am preparing the 4.15 pull request right now, and it doesn't include this change. I do like the dev_err() change, but would prefer fewer lines of output. I could have applied just the dev_err() change, but to minimize pain for people who parse the logs, I'd rather make one change in the output instead of making one change now and another later. Bjorn
On 11/15/2017 12:56 PM, Bjorn Helgaas wrote: > Hi Tyler, > > On Wed, Nov 15, 2017 at 09:47:41AM -0500, Tyler Baicar wrote: >> On 10/17/2017 11:42 AM, Tyler Baicar wrote: >>> Currently the AER driver uses cper_print_bits() to print the AER status >>> string. This causes the status string to not include the proper PCI device >>> name prefix that the other AER prints include. Also, it has a different >>> print level than all the other AER prints. >>> >>> Update the AER driver to print the AER status string with the proper string >>> prefix and proper print level. >> Hello, >> >> Will this patch be pulled into 4.15? > Sorry, I am preparing the 4.15 pull request right now, and it doesn't > include this change. > > I do like the dev_err() change, but would prefer fewer lines of > output. I could have applied just the dev_err() change, but to > minimize pain for people who parse the logs, I'd rather make one > change in the output instead of making one change now and another > later. Hello Bjorn, Are there existing abbreviations for these AER status strings that I cannot find? Or do you want me to abbreviate them similar to the style used with prints in lspci -vv? Once they are abbreviated, you'd prefer to have all errors that have occurred to be printed on the same line, correct? Thanks, Tyler
On Wed, Dec 13, 2017 at 11:50:56AM -0500, Tyler Baicar wrote: > On 11/15/2017 12:56 PM, Bjorn Helgaas wrote: > >Hi Tyler, > > > >On Wed, Nov 15, 2017 at 09:47:41AM -0500, Tyler Baicar wrote: > >>On 10/17/2017 11:42 AM, Tyler Baicar wrote: > >>>Currently the AER driver uses cper_print_bits() to print the AER status > >>>string. This causes the status string to not include the proper PCI device > >>>name prefix that the other AER prints include. Also, it has a different > >>>print level than all the other AER prints. > >>> > >>>Update the AER driver to print the AER status string with the proper string > >>>prefix and proper print level. > >>Hello, > >> > >>Will this patch be pulled into 4.15? > >Sorry, I am preparing the 4.15 pull request right now, and it doesn't > >include this change. > > > >I do like the dev_err() change, but would prefer fewer lines of > >output. I could have applied just the dev_err() change, but to > >minimize pain for people who parse the logs, I'd rather make one > >change in the output instead of making one change now and another > >later. > Hello Bjorn, > > Are there existing abbreviations for these AER status strings that I > cannot find? Or do you want > me to abbreviate them similar to the style used with prints in lspci -vv? I think the terms used by lspci -vv would be a good start. > Once they are abbreviated, you'd prefer to have all errors that have > occurred to be printed on > the same line, correct? Yes. Multiple lines suggests an ordering that really isn't there, so if we can print them all at once, it both improves atomicity and removes the erroneous suggestion that "this error occurred before this other one". Bjorn
diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index 54c4b69..b718daa 100644 --- a/drivers/pci/pcie/aer/aerdrv_errprint.c +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c @@ -206,6 +206,19 @@ void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) } #ifdef CONFIG_ACPI_APEI_PCIEAER +void dev_print_bits(struct pci_dev *dev, unsigned int bits, + const char * const strs[], unsigned int strs_size) +{ + unsigned int i; + + for (i = 0; i < strs_size; i++) { + if (!(bits & (1U << i))) + continue; + if (strs[i]) + dev_err(&dev->dev, "%s\n", strs[i]); + } +} + int cper_severity_to_aer(int cper_severity) { switch (cper_severity) { @@ -243,7 +256,7 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, agent = AER_GET_AGENT(aer_severity, status); dev_err(&dev->dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask); - cper_print_bits("", status, status_strs, status_strs_size); + dev_print_bits(dev, status, status_strs, status_strs_size); dev_err(&dev->dev, "aer_layer=%s, aer_agent=%s\n", aer_error_layer[layer], aer_agent_string[agent]);
Currently the AER driver uses cper_print_bits() to print the AER status string. This causes the status string to not include the proper PCI device name prefix that the other AER prints include. Also, it has a different print level than all the other AER prints. Update the AER driver to print the AER status string with the proper string prefix and proper print level. Previous log example: e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 Receiver Error, Bad TLP e1000e 0003:01:00.1: aer_layer=Physical Layer, aer_agent=Receiver ID pcieport 0003:00:00.0: aer_status: 0x00001000, aer_mask: 0x0000e000 Replay Timer Timeout pcieport 0003:00:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID New log: e1000e 0003:01:00.1: aer_status: 0x00000041, aer_mask: 0x00000000 e1000e 0003:01:00.1: Receiver Error e1000e 0003:01:00.1: Bad TLP e1000e 0003:01:00.1: aer_layer=Physical Layer, aer_agent=Receiver ID pcieport 0003:00:00.0: aer_status: 0x00001000, aer_mask: 0x0000e000 pcieport 0003:00:00.0: Replay Timer Timeout pcieport 0003:00:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> --- drivers/pci/pcie/aer/aerdrv_errprint.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)