diff mbox

[RESEND,RFC,5/5] PCIe, AER: Update initial value of UC error mask

Message ID 1407910961-7798-6-git-send-email-gong.chen@linux.intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Chen Gong Aug. 13, 2014, 6:22 a.m. UTC
In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register
is redefined and it has an explicit requirement that when writing
this field, a value of 1b is the only choice. So change previous
initial maks from 0 to 1.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
NOTE: After scratching all use cases, this is the most obvious use
case to violate the SPEC. Most of use cases just read first and
then overwrite for clear purpose. Even so, such fix is obvious to
not compatiable with previous SPEC definition. Do we need a dirty
hack?

 arch/mips/pci/pci-octeon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bjorn Helgaas Sept. 5, 2014, 11:34 p.m. UTC | #1
On Wed, Aug 13, 2014 at 02:22:41AM -0400, Chen, Gong wrote:
> In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register
> is redefined and it has an explicit requirement that when writing
> this field, a value of 1b is the only choice. So change previous
> initial maks from 0 to 1.
> 
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> ---
> NOTE: After scratching all use cases, this is the most obvious use
> case to violate the SPEC. Most of use cases just read first and
> then overwrite for clear purpose. Even so, such fix is obvious to
> not compatiable with previous SPEC definition. Do we need a dirty
> hack?
> 
>  arch/mips/pci/pci-octeon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
> index 59cccd95688b..f1bfdc201297 100644
> --- a/arch/mips/pci/pci-octeon.c
> +++ b/arch/mips/pci/pci-octeon.c
> @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev)
>  				       dconfig);
>  		/* Enable reporting of all uncorrectable errors */
>  		/* Uncorrectable Error Mask - turned on bits disable errors */
> -		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0);
> +		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1);

I see the text in the spec that says we should only write 1 to bit 0 (sec
7.10.3, for anybody following along).  It looks like that change was made
between PCIe r1.0 and r1.1.  It would really be nice to have more context
about why the change was made, because if there's hardware in the field
that implements r1.0 behavior, this patch will change the way it works, and
I don't know how to verify that is safe.

Does this actually change fix a problem?  If it fixes a problem that
happens on real hardware, that's a much better reason to make a change than
just to comply with the spec.

Sec 7.10.2 also says we should ignore the value of bit 0 in the
Uncorrectable Error Status register, and I don't see any place where we
follow that advice.

Bjorn

>  		 * Leave severity at HW default. This only controls if
>  		 * errors are reported as uncorrectable or
> -- 
> 2.0.0.rc2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chen Gong Sept. 9, 2014, 7:12 a.m. UTC | #2
On Fri, Sep 05, 2014 at 05:34:21PM -0600, Bjorn Helgaas wrote:
> Date: Fri, 5 Sep 2014 17:34:21 -0600
> From: Bjorn Helgaas <bhelgaas@google.com>
> To: "Chen, Gong" <gong.chen@linux.intel.com>
> Cc: rdunlap@infradead.org, bp@alien8.de, tony.luck@intel.com,
>  linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
> Subject: Re: [RESEND RFC 5/5] PCIe, AER: Update initial value of UC error
>  mask
> User-Agent: Mutt/1.5.21 (2010-09-15)
> 
> On Wed, Aug 13, 2014 at 02:22:41AM -0400, Chen, Gong wrote:
> > In PCI-e SPEC r3.0, BIT 0 of Uncorrectable Error Status Register
> > is redefined and it has an explicit requirement that when writing
> > this field, a value of 1b is the only choice. So change previous
> > initial maks from 0 to 1.
> > 
> > Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> > ---
> > NOTE: After scratching all use cases, this is the most obvious use
> > case to violate the SPEC. Most of use cases just read first and
> > then overwrite for clear purpose. Even so, such fix is obvious to
> > not compatiable with previous SPEC definition. Do we need a dirty
> > hack?
> > 
> >  arch/mips/pci/pci-octeon.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
> > index 59cccd95688b..f1bfdc201297 100644
> > --- a/arch/mips/pci/pci-octeon.c
> > +++ b/arch/mips/pci/pci-octeon.c
> > @@ -134,7 +134,7 @@ int pcibios_plat_dev_init(struct pci_dev *dev)
> >  				       dconfig);
> >  		/* Enable reporting of all uncorrectable errors */
> >  		/* Uncorrectable Error Mask - turned on bits disable errors */
> > -		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0);
> > +		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1);
> 
> I see the text in the spec that says we should only write 1 to bit 0 (sec
> 7.10.3, for anybody following along).  It looks like that change was made
> between PCIe r1.0 and r1.1.  It would really be nice to have more context
> about why the change was made, because if there's hardware in the field
> that implements r1.0 behavior, this patch will change the way it works, and
> I don't know how to verify that is safe.
> 
> Does this actually change fix a problem?  If it fixes a problem that
> happens on real hardware, that's a much better reason to make a change than
> just to comply with the spec.
> 
> Sec 7.10.2 also says we should ignore the value of bit 0 in the
> Uncorrectable Error Status register, and I don't see any place where we
> follow that advice.
> 
That's why I mark this patch as RFC. As you mentioned above, these are my
concerns, too. I submit such a patch not for merging but throwing a potential
issue. As I noted above, I don't know if it is deserved to fix all affected
placed to comply with spec change. After all, no one reports such an
issue (or maybe have happened :-))
diff mbox

Patch

diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
index 59cccd95688b..f1bfdc201297 100644
--- a/arch/mips/pci/pci-octeon.c
+++ b/arch/mips/pci/pci-octeon.c
@@ -134,7 +134,7 @@  int pcibios_plat_dev_init(struct pci_dev *dev)
 				       dconfig);
 		/* Enable reporting of all uncorrectable errors */
 		/* Uncorrectable Error Mask - turned on bits disable errors */
-		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 0);
+		pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, 1);
 		/*
 		 * Leave severity at HW default. This only controls if
 		 * errors are reported as uncorrectable or