Message ID | 20180608144617.2900894-1-arnd@arndb.de (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
+Will, On 6/8/2018 10:46 AM, Arnd Bergmann wrote: > Replacing writeq() with writeq_relaxed() doesn't work on many architectures, > as that variant is not available in general: > > net/Makefile:24: CC cannot link executables. Skipping bpfilter. > drivers/scsi/ipr.c: In function 'ipr_mask_and_clear_interrupts': > drivers/scsi/ipr.c:767:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writew_relaxed'? [-Werror=implicit-function-declaration] > writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg); > ^~~~~~~~~~~~~~ > writew_relaxed > > The other issue here is that the patch eliminated the wrong barrier. > As per a long discussion that followed Sinan's original patch submission, > the conclusion was that drivers should generally assume that the barrier > implied by writel() is sufficient for ordering DMA, so this reverts his > change and instead removes the extraneous wmb() before it, which is no > longer needed on any architecture now. > > Fixes: 0109a4f2e02d ("scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs") > Signed-off-by: Arnd Bergmann <arnd@arndb.de> This looks good on paper however we need an input from the driver maintainer because some drivers like Intel NIC are using write barriers in place of a SMP barrier + write barrier combination as an optimizatin. Removing the barrier itself can actually break the driver if SMP barrier is actually needed instead. So, it is difficult to judge how this barrier has been used without an expert opinion. Changing wmb() + writel() to wmb() + writel_relaxed() is safer than dropping the wmb() altogether. Will Deacon should probably look at why writeq_relaxed is missing on some ARM arches too. Drivers shouldn't worry about write derivatives.
On Fri, Jun 8, 2018 at 5:27 PM, Sinan Kaya <okaya@codeaurora.org> wrote: > +Will, > > On 6/8/2018 10:46 AM, Arnd Bergmann wrote: >> Replacing writeq() with writeq_relaxed() doesn't work on many architectures, >> as that variant is not available in general: >> >> net/Makefile:24: CC cannot link executables. Skipping bpfilter. >> drivers/scsi/ipr.c: In function 'ipr_mask_and_clear_interrupts': >> drivers/scsi/ipr.c:767:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writew_relaxed'? [-Werror=implicit-function-declaration] >> writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg); >> ^~~~~~~~~~~~~~ >> writew_relaxed >> >> The other issue here is that the patch eliminated the wrong barrier. >> As per a long discussion that followed Sinan's original patch submission, >> the conclusion was that drivers should generally assume that the barrier >> implied by writel() is sufficient for ordering DMA, so this reverts his >> change and instead removes the extraneous wmb() before it, which is no >> longer needed on any architecture now. >> >> Fixes: 0109a4f2e02d ("scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs") >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > This looks good on paper however we need an input from the driver maintainer > because some drivers like Intel NIC are using write barriers in place of > a SMP barrier + write barrier combination as an optimizatin. > > Removing the barrier itself can actually break the driver if SMP barrier is > actually needed instead. > > So, it is difficult to judge how this barrier has been used without an > expert opinion. > > Changing > > wmb() + writel() > > to > > wmb() + writel_relaxed() > > is safer than dropping the wmb() altogether. If the wmb() was not just about the writeq() then I would argue your patch description was misleading. We certainly shouldn't replace random writeq() calls with writeq_relaxed() just because we can show that the driver has a barrier in front of it. In particular, the ipr_mask_and_clear_interrupts() function has multiple writeq() or writel() calls, and even a readl() and your patch only changes one of them, which seems like a rather pointless exercise as the function still fully synchronizes the I/O multiple times. > Will Deacon should probably look at why writeq_relaxed is missing on some ARM > arches too. > > Drivers shouldn't worry about write derivatives. This driver defines writeq() itself for architectures that don't have it, but it doesn't define writeq_relaxed() and doesn't include linux/io-64-nonatomic-lo-hi.h or linux/io-64-nonatomic-hi-lo.h. It seems that it needs a different behavior from all other drivers here, storing the upper 32 bits into the lower address and the lower 32 bits into the upper address. Arnd
On 6/8/2018 11:47 AM, Arnd Bergmann wrote: > On Fri, Jun 8, 2018 at 5:27 PM, Sinan Kaya <okaya@codeaurora.org> wrote: >> +Will, >> [snip] >> So, it is difficult to judge how this barrier has been used without an >> expert opinion. >> >> Changing >> >> wmb() + writel() >> >> to >> >> wmb() + writel_relaxed() >> >> is safer than dropping the wmb() altogether. > > If the wmb() was not just about the writeq() then I would argue your patch > description was misleading. We certainly shouldn't replace random writeq() > calls with writeq_relaxed() just because we can show that the driver has > a barrier in front of it. > > In particular, the ipr_mask_and_clear_interrupts() function has multiple > writeq() or writel() calls, and even a readl() and your patch only changes > one of them, which seems like a rather pointless exercise as the function > still fully synchronizes the I/O multiple times. You are right, I only searched wmb() + writel() combinations. Changed only the places where I found issues. We were given a direction to go to eliminating barriers direction as you already noted. I just wanted to highlight the difficulty of wholesale dropping of wmb() without careful inspection. [1] [2] We certainly need a better patch that covers all use cases. Your patch is a step in the good direction. We just need some attention from the maintainer that we are not actually breaking something. > >> Will Deacon should probably look at why writeq_relaxed is missing on some ARM >> arches too. >> >> Drivers shouldn't worry about write derivatives. > > This driver defines writeq() itself for architectures that don't have it, but > it doesn't define writeq_relaxed() and doesn't include > linux/io-64-nonatomic-lo-hi.h > or linux/io-64-nonatomic-hi-lo.h. It seems that it needs a different behavior > from all other drivers here, storing the upper 32 bits into the lower > address and > the lower 32 bits into the upper address. I don't think there is a consensus about using these includes in the community. I bumped into this issue before and came up with an include you pointed. I didn't get too much enthusiasm from the maintainer. Why are we pushing the responsibility into the drivers? I'd think that architecture should take care of this. Is there a portability issue that I'm missing from some architecture I never heart of? (I work on Little-Endian machines most of the time) [1] https://patchwork.kernel.org/patch/10301861/ [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg227443.html > > Arnd >
On 06/08/2018 11:10 AM, Sinan Kaya wrote: > On 6/8/2018 11:47 AM, Arnd Bergmann wrote: >> On Fri, Jun 8, 2018 at 5:27 PM, Sinan Kaya <okaya@codeaurora.org> wrote: >>> +Will, >>> > > [snip] > >>> So, it is difficult to judge how this barrier has been used without an >>> expert opinion. >>> >>> Changing >>> >>> wmb() + writel() >>> >>> to >>> >>> wmb() + writel_relaxed() >>> >>> is safer than dropping the wmb() altogether. >> >> If the wmb() was not just about the writeq() then I would argue your patch >> description was misleading. We certainly shouldn't replace random writeq() >> calls with writeq_relaxed() just because we can show that the driver has >> a barrier in front of it. >> >> In particular, the ipr_mask_and_clear_interrupts() function has multiple >> writeq() or writel() calls, and even a readl() and your patch only changes >> one of them, which seems like a rather pointless exercise as the function >> still fully synchronizes the I/O multiple times. > > You are right, I only searched wmb() + writel() combinations. Changed only > the places where I found issues. > > We were given a direction to go to eliminating barriers direction as you already > noted. > > I just wanted to highlight the difficulty of wholesale dropping of wmb() without > careful inspection. [1] [2] > > We certainly need a better patch that covers all use cases. Your patch is > a step in the good direction. We just need some attention from the maintainer > that we are not actually breaking something. To be clear here, we are talking about two code paths that are not in the performance path, so attempting to performance optimize them to use lighter weight mmio accessors isn't exactly critical. I'm fine with the replacement patch from Arnd. Thanks Arnd! Acked-by: Brian King <brking@linux.vnet.ibm.com> > >> >>> Will Deacon should probably look at why writeq_relaxed is missing on some ARM >>> arches too. >>> >>> Drivers shouldn't worry about write derivatives. >> >> This driver defines writeq() itself for architectures that don't have it, but >> it doesn't define writeq_relaxed() and doesn't include >> linux/io-64-nonatomic-lo-hi.h >> or linux/io-64-nonatomic-hi-lo.h. It seems that it needs a different behavior >> from all other drivers here, storing the upper 32 bits into the lower >> address and >> the lower 32 bits into the upper address. > > I don't think there is a consensus about using these includes in the community. > I bumped into this issue before and came up with an include you pointed. > I didn't get too much enthusiasm from the maintainer. > > Why are we pushing the responsibility into the drivers? I'd think that architecture > should take care of this. Is there a portability issue that I'm missing from some > architecture I never heart of? (I work on Little-Endian machines most of the time) The attributes of the adapter hardware can have an impact here. The ipr hardware, for example, depends on the upper 4 bytes to be written first, then the lower 4 bytes to be written second, and its the act of writing the lower 4 bytes that triggers the adapter hardware to read the value and take action on it. Thanks, Brian
On 6/8/2018 3:20 PM, Brian King wrote: >> I don't think there is a consensus about using these includes in the community. >> I bumped into this issue before and came up with an include you pointed. >> I didn't get too much enthusiasm from the maintainer. >> >> Why are we pushing the responsibility into the drivers? I'd think that architecture >> should take care of this. Is there a portability issue that I'm missing from some >> architecture I never heart of? (I work on Little-Endian machines most of the time) > The attributes of the adapter hardware can have an impact here. The ipr hardware, for > example, depends on the upper 4 bytes to be written first, then the lower 4 bytes > to be written second, and its the act of writing the lower 4 bytes that triggers > the adapter hardware to read the value and take action on it. Thanks, I never thought about this.
Arnd, > the conclusion was that drivers should generally assume that the > barrier implied by writel() is sufficient for ordering DMA, so this > reverts his change and instead removes the extraneous wmb() before it, > which is no longer needed on any architecture now. Applied to 4.18/scsi-fixes and squashed with Sinan's patch.
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index 865c07dc11ea..d2f67a41fcdd 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -760,13 +760,12 @@ static void ipr_mask_and_clear_interrupts(struct ipr_ioa_cfg *ioa_cfg, ioa_cfg->hrrq[i].allow_interrupts = 0; spin_unlock(&ioa_cfg->hrrq[i]._lock); } - wmb(); /* Set interrupt mask to stop all new interrupts */ if (ioa_cfg->sis64) - writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg); + writeq(~0, ioa_cfg->regs.set_interrupt_mask_reg); else - writel_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg); + writel(~0, ioa_cfg->regs.set_interrupt_mask_reg); /* Clear any pending interrupts */ if (ioa_cfg->sis64) @@ -8401,10 +8400,9 @@ static int ipr_reset_enable_ioa(struct ipr_cmnd *ipr_cmd) ioa_cfg->hrrq[i].allow_interrupts = 1; spin_unlock(&ioa_cfg->hrrq[i]._lock); } - wmb(); if (ioa_cfg->sis64) { /* Set the adapter to the correct endian mode. */ - writel_relaxed(IPR_ENDIAN_SWAP_KEY, + writel(IPR_ENDIAN_SWAP_KEY, ioa_cfg->regs.endian_swap_reg); int_reg = readl(ioa_cfg->regs.endian_swap_reg); }
Replacing writeq() with writeq_relaxed() doesn't work on many architectures, as that variant is not available in general: net/Makefile:24: CC cannot link executables. Skipping bpfilter. drivers/scsi/ipr.c: In function 'ipr_mask_and_clear_interrupts': drivers/scsi/ipr.c:767:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writew_relaxed'? [-Werror=implicit-function-declaration] writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg); ^~~~~~~~~~~~~~ writew_relaxed The other issue here is that the patch eliminated the wrong barrier. As per a long discussion that followed Sinan's original patch submission, the conclusion was that drivers should generally assume that the barrier implied by writel() is sufficient for ordering DMA, so this reverts his change and instead removes the extraneous wmb() before it, which is no longer needed on any architecture now. Fixes: 0109a4f2e02d ("scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/scsi/ipr.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)