Message ID | 1407910961-7798-6-git-send-email-gong.chen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Wed, Aug 13, 2014 at 02:22:41AM -0400, Chen, Gong wrote: > In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register > is redefined and it has an explicit requirement that when writing > this field, a value of 1b is the only choice. So change previous > initial maks from 0 to 1. > > Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> > --- > NOTE: After scratching all use cases, this is the most obvious use > case to violate the SPEC. Most of use cases just read first and > then overwrite for clear purpose. Even so, such fix is obvious to > not compatiable with previous SPEC definition. Do we need a dirty > hack? > > arch/mips/pci/pci-octeon.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c > index 59cccd95688b..f1bfdc201297 100644 > --- a/arch/mips/pci/pci-octeon.c > +++ b/arch/mips/pci/pci-octeon.c > @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev) > dconfig); > /* Enable reporting of all uncorrectable errors */ > /* Uncorrectable Error Mask - turned on bits disable errors */ > - pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0); > + pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1); I see the text in the spec that says we should only write 1 to bit 0 (sec 7.10.3, for anybody following along). It looks like that change was made between PCIe r1.0 and r1.1. It would really be nice to have more context about why the change was made, because if there's hardware in the field that implements r1.0 behavior, this patch will change the way it works, and I don't know how to verify that is safe. Does this actually change fix a problem? If it fixes a problem that happens on real hardware, that's a much better reason to make a change than just to comply with the spec. Sec 7.10.2 also says we should ignore the value of bit 0 in the Uncorrectable Error Status register, and I don't see any place where we follow that advice. Bjorn > * Leave severity at HW default. This only controls if > * errors are reported as uncorrectable or > -- > 2.0.0.rc2 > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 05, 2014 at 05:34:21PM -0600, Bjorn Helgaas wrote: > Date: Fri, 5 Sep 2014 17:34:21 -0600 > From: Bjorn Helgaas <bhelgaas@google.com> > To: "Chen, Gong" <gong.chen@linux.intel.com> > Cc: rdunlap@infradead.org, bp@alien8.de, tony.luck@intel.com, > linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org > Subject: Re: [RESEND RFC 5/5] PCIe, AER: Update initial value of UC error > mask > User-Agent: Mutt/1.5.21 (2010-09-15) > > On Wed, Aug 13, 2014 at 02:22:41AM -0400, Chen, Gong wrote: > > In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register > > is redefined and it has an explicit requirement that when writing > > this field, a value of 1b is the only choice. So change previous > > initial maks from 0 to 1. > > > > Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> > > --- > > NOTE: After scratching all use cases, this is the most obvious use > > case to violate the SPEC. Most of use cases just read first and > > then overwrite for clear purpose. Even so, such fix is obvious to > > not compatiable with previous SPEC definition. Do we need a dirty > > hack? > > > > arch/mips/pci/pci-octeon.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c > > index 59cccd95688b..f1bfdc201297 100644 > > --- a/arch/mips/pci/pci-octeon.c > > +++ b/arch/mips/pci/pci-octeon.c > > @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev) > > dconfig); > > /* Enable reporting of all uncorrectable errors */ > > /* Uncorrectable Error Mask - turned on bits disable errors */ > > - pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0); > > + pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1); > > I see the text in the spec that says we should only write 1 to bit 0 (sec > 7.10.3, for anybody following along). It looks like that change was made > between PCIe r1.0 and r1.1. It would really be nice to have more context > about why the change was made, because if there's hardware in the field > that implements r1.0 behavior, this patch will change the way it works, and > I don't know how to verify that is safe. > > Does this actually change fix a problem? If it fixes a problem that > happens on real hardware, that's a much better reason to make a change than > just to comply with the spec. > > Sec 7.10.2 also says we should ignore the value of bit 0 in the > Uncorrectable Error Status register, and I don't see any place where we > follow that advice. > That's why I mark this patch as RFC. As you mentioned above, these are my concerns, too. I submit such a patch not for merging but throwing a potential issue. As I noted above, I don't know if it is deserved to fix all affected placed to comply with spec change. After all, no one reports such an issue (or maybe have happened :-))
diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c index 59cccd95688b..f1bfdc201297 100644 --- a/arch/mips/pci/pci-octeon.c +++ b/arch/mips/pci/pci-octeon.c @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev) dconfig); /* Enable reporting of all uncorrectable errors */ /* Uncorrectable Error Mask - turned on bits disable errors */ - pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0); + pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1); /* * Leave severity at HW default. This only controls if * errors are reported as uncorrectable or
In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register is redefined and it has an explicit requirement that when writing this field, a value of 1b is the only choice. So change previous initial maks from 0 to 1. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> --- NOTE: After scratching all use cases, this is the most obvious use case to violate the SPEC. Most of use cases just read first and then overwrite for clear purpose. Even so, such fix is obvious to not compatiable with previous SPEC definition. Do we need a dirty hack? arch/mips/pci/pci-octeon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)