diff mbox

scsi: ipr: fix build on 32-bit architectures

Message ID 20180608144617.2900894-1-arnd@arndb.de (mailing list archive)
State Accepted
Headers show

Commit Message

Arnd Bergmann June 8, 2018, 2:46 p.m. UTC
Replacing writeq() with writeq_relaxed() doesn't work on many architectures,
as that variant is not available in general:

net/Makefile:24: CC cannot link executables. Skipping bpfilter.
drivers/scsi/ipr.c: In function 'ipr_mask_and_clear_interrupts':
drivers/scsi/ipr.c:767:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writew_relaxed'? [-Werror=implicit-function-declaration]
   writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg);
   ^~~~~~~~~~~~~~
   writew_relaxed

The other issue here is that the patch eliminated the wrong barrier.
As per a long discussion that followed Sinan's original patch submission,
the conclusion was that drivers should generally assume that the barrier
implied by writel() is sufficient for ordering DMA, so this reverts his
change and instead removes the extraneous wmb() before it, which is no
longer needed on any architecture now.

Fixes: 0109a4f2e02d ("scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/scsi/ipr.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

Sinan Kaya June 8, 2018, 3:27 p.m. UTC | #1
+Will,

On 6/8/2018 10:46 AM, Arnd Bergmann wrote:
> Replacing writeq() with writeq_relaxed() doesn't work on many architectures,
> as that variant is not available in general:
> 
> net/Makefile:24: CC cannot link executables. Skipping bpfilter.
> drivers/scsi/ipr.c: In function 'ipr_mask_and_clear_interrupts':
> drivers/scsi/ipr.c:767:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writew_relaxed'? [-Werror=implicit-function-declaration]
>    writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg);
>    ^~~~~~~~~~~~~~
>    writew_relaxed
> 
> The other issue here is that the patch eliminated the wrong barrier.
> As per a long discussion that followed Sinan's original patch submission,
> the conclusion was that drivers should generally assume that the barrier
> implied by writel() is sufficient for ordering DMA, so this reverts his
> change and instead removes the extraneous wmb() before it, which is no
> longer needed on any architecture now.
> 
> Fixes: 0109a4f2e02d ("scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

This looks good on paper however we need an input from the driver maintainer
because some drivers like Intel NIC are using write barriers in place of
a SMP barrier + write barrier combination as an optimizatin.

Removing the barrier itself can actually break the driver if SMP barrier is
actually needed instead.

So, it is difficult to judge how this barrier has been used without an
expert opinion.

Changing 

wmb() + writel()

to 

wmb() + writel_relaxed()

is safer than dropping the wmb() altogether.

Will Deacon should probably look at why writeq_relaxed is missing on some ARM
arches too.

Drivers shouldn't worry about write derivatives.
Arnd Bergmann June 8, 2018, 3:47 p.m. UTC | #2
On Fri, Jun 8, 2018 at 5:27 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> +Will,
>
> On 6/8/2018 10:46 AM, Arnd Bergmann wrote:
>> Replacing writeq() with writeq_relaxed() doesn't work on many architectures,
>> as that variant is not available in general:
>>
>> net/Makefile:24: CC cannot link executables. Skipping bpfilter.
>> drivers/scsi/ipr.c: In function 'ipr_mask_and_clear_interrupts':
>> drivers/scsi/ipr.c:767:3: error: implicit declaration of function 'writeq_relaxed'; did you mean 'writew_relaxed'? [-Werror=implicit-function-declaration]
>>    writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg);
>>    ^~~~~~~~~~~~~~
>>    writew_relaxed
>>
>> The other issue here is that the patch eliminated the wrong barrier.
>> As per a long discussion that followed Sinan's original patch submission,
>> the conclusion was that drivers should generally assume that the barrier
>> implied by writel() is sufficient for ordering DMA, so this reverts his
>> change and instead removes the extraneous wmb() before it, which is no
>> longer needed on any architecture now.
>>
>> Fixes: 0109a4f2e02d ("scsi: ipr: Eliminate duplicate barriers on weakly-ordered archs")
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> This looks good on paper however we need an input from the driver maintainer
> because some drivers like Intel NIC are using write barriers in place of
> a SMP barrier + write barrier combination as an optimizatin.
>
> Removing the barrier itself can actually break the driver if SMP barrier is
> actually needed instead.
>
> So, it is difficult to judge how this barrier has been used without an
> expert opinion.
>
> Changing
>
> wmb() + writel()
>
> to
>
> wmb() + writel_relaxed()
>
> is safer than dropping the wmb() altogether.

If the wmb() was not just about the writeq() then I would argue your patch
description was misleading. We certainly shouldn't replace random writeq()
calls with writeq_relaxed() just because we can show that the driver has
a barrier in front of it.

In particular, the ipr_mask_and_clear_interrupts() function has multiple
writeq() or writel() calls, and even a readl() and your patch only changes
one of them, which seems like a rather pointless exercise as the function
still fully synchronizes the I/O multiple times.

> Will Deacon should probably look at why writeq_relaxed is missing on some ARM
> arches too.
>
> Drivers shouldn't worry about write derivatives.

This driver defines writeq() itself for architectures that don't have it, but
it doesn't define writeq_relaxed() and doesn't include
linux/io-64-nonatomic-lo-hi.h
or linux/io-64-nonatomic-hi-lo.h. It seems that it needs a different behavior
from all other drivers here, storing the upper 32 bits into the lower
address and
the lower 32 bits into the upper address.

         Arnd
Sinan Kaya June 8, 2018, 4:10 p.m. UTC | #3
On 6/8/2018 11:47 AM, Arnd Bergmann wrote:
> On Fri, Jun 8, 2018 at 5:27 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> +Will,
>>

[snip]

>> So, it is difficult to judge how this barrier has been used without an
>> expert opinion.
>>
>> Changing
>>
>> wmb() + writel()
>>
>> to
>>
>> wmb() + writel_relaxed()
>>
>> is safer than dropping the wmb() altogether.
> 
> If the wmb() was not just about the writeq() then I would argue your patch
> description was misleading. We certainly shouldn't replace random writeq()
> calls with writeq_relaxed() just because we can show that the driver has
> a barrier in front of it.
> 
> In particular, the ipr_mask_and_clear_interrupts() function has multiple
> writeq() or writel() calls, and even a readl() and your patch only changes
> one of them, which seems like a rather pointless exercise as the function
> still fully synchronizes the I/O multiple times.

You are right, I only searched wmb() + writel() combinations. Changed only
the places where I found issues. 

We were given a direction to go to eliminating barriers direction as you already
noted. 

I just wanted to highlight the difficulty of wholesale dropping of wmb() without
careful inspection. [1] [2]

We certainly need a better patch that covers all use cases. Your patch is
a step in the good direction. We just need some attention from the maintainer
that we are not actually breaking something.

> 
>> Will Deacon should probably look at why writeq_relaxed is missing on some ARM
>> arches too.
>>
>> Drivers shouldn't worry about write derivatives.
> 
> This driver defines writeq() itself for architectures that don't have it, but
> it doesn't define writeq_relaxed() and doesn't include
> linux/io-64-nonatomic-lo-hi.h
> or linux/io-64-nonatomic-hi-lo.h. It seems that it needs a different behavior
> from all other drivers here, storing the upper 32 bits into the lower
> address and
> the lower 32 bits into the upper address.

I don't think there is a consensus about using these includes in the community.
I bumped into this issue before and came up with an include you pointed.
I didn't get too much enthusiasm from the maintainer.

Why are we pushing the responsibility into the drivers? I'd think that architecture
should take care of this. Is there a portability issue that I'm missing from some
architecture I never heart of? (I work on Little-Endian machines most of the time)

[1] https://patchwork.kernel.org/patch/10301861/
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg227443.html

> 
>          Arnd
>
Brian King June 8, 2018, 7:20 p.m. UTC | #4
On 06/08/2018 11:10 AM, Sinan Kaya wrote:
> On 6/8/2018 11:47 AM, Arnd Bergmann wrote:
>> On Fri, Jun 8, 2018 at 5:27 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>>> +Will,
>>>
> 
> [snip]
> 
>>> So, it is difficult to judge how this barrier has been used without an
>>> expert opinion.
>>>
>>> Changing
>>>
>>> wmb() + writel()
>>>
>>> to
>>>
>>> wmb() + writel_relaxed()
>>>
>>> is safer than dropping the wmb() altogether.
>>
>> If the wmb() was not just about the writeq() then I would argue your patch
>> description was misleading. We certainly shouldn't replace random writeq()
>> calls with writeq_relaxed() just because we can show that the driver has
>> a barrier in front of it.
>>
>> In particular, the ipr_mask_and_clear_interrupts() function has multiple
>> writeq() or writel() calls, and even a readl() and your patch only changes
>> one of them, which seems like a rather pointless exercise as the function
>> still fully synchronizes the I/O multiple times.
> 
> You are right, I only searched wmb() + writel() combinations. Changed only
> the places where I found issues. 
> 
> We were given a direction to go to eliminating barriers direction as you already
> noted. 
> 
> I just wanted to highlight the difficulty of wholesale dropping of wmb() without
> careful inspection. [1] [2]
> 
> We certainly need a better patch that covers all use cases. Your patch is
> a step in the good direction. We just need some attention from the maintainer
> that we are not actually breaking something.

To be clear here, we are talking about two code paths that are not in the performance
path, so attempting to performance optimize them to use lighter weight mmio
accessors isn't exactly critical. 

I'm fine with the replacement patch from Arnd. Thanks Arnd!

Acked-by: Brian King <brking@linux.vnet.ibm.com>

> 
>>
>>> Will Deacon should probably look at why writeq_relaxed is missing on some ARM
>>> arches too.
>>>
>>> Drivers shouldn't worry about write derivatives.
>>
>> This driver defines writeq() itself for architectures that don't have it, but
>> it doesn't define writeq_relaxed() and doesn't include
>> linux/io-64-nonatomic-lo-hi.h
>> or linux/io-64-nonatomic-hi-lo.h. It seems that it needs a different behavior
>> from all other drivers here, storing the upper 32 bits into the lower
>> address and
>> the lower 32 bits into the upper address.
> 
> I don't think there is a consensus about using these includes in the community.
> I bumped into this issue before and came up with an include you pointed.
> I didn't get too much enthusiasm from the maintainer.
> 
> Why are we pushing the responsibility into the drivers? I'd think that architecture
> should take care of this. Is there a portability issue that I'm missing from some
> architecture I never heart of? (I work on Little-Endian machines most of the time)

The attributes of the adapter hardware can have an impact here. The ipr hardware, for
example, depends on the upper 4 bytes to be written first, then the lower 4 bytes
to be written second, and its the act of writing the lower 4 bytes that triggers
the adapter hardware to read the value and take action on it.

Thanks,

Brian
Sinan Kaya June 8, 2018, 7:39 p.m. UTC | #5
On 6/8/2018 3:20 PM, Brian King wrote:
>> I don't think there is a consensus about using these includes in the community.
>> I bumped into this issue before and came up with an include you pointed.
>> I didn't get too much enthusiasm from the maintainer.
>>
>> Why are we pushing the responsibility into the drivers? I'd think that architecture
>> should take care of this. Is there a portability issue that I'm missing from some
>> architecture I never heart of? (I work on Little-Endian machines most of the time)
> The attributes of the adapter hardware can have an impact here. The ipr hardware, for
> example, depends on the upper 4 bytes to be written first, then the lower 4 bytes
> to be written second, and its the act of writing the lower 4 bytes that triggers
> the adapter hardware to read the value and take action on it.

Thanks, I never thought about this.
Martin K. Petersen June 13, 2018, 5:14 p.m. UTC | #6
Arnd,

> the conclusion was that drivers should generally assume that the
> barrier implied by writel() is sufficient for ordering DMA, so this
> reverts his change and instead removes the extraneous wmb() before it,
> which is no longer needed on any architecture now.

Applied to 4.18/scsi-fixes and squashed with Sinan's patch.
diff mbox

Patch

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index 865c07dc11ea..d2f67a41fcdd 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -760,13 +760,12 @@  static void ipr_mask_and_clear_interrupts(struct ipr_ioa_cfg *ioa_cfg,
 		ioa_cfg->hrrq[i].allow_interrupts = 0;
 		spin_unlock(&ioa_cfg->hrrq[i]._lock);
 	}
-	wmb();
 
 	/* Set interrupt mask to stop all new interrupts */
 	if (ioa_cfg->sis64)
-		writeq_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg);
+		writeq(~0, ioa_cfg->regs.set_interrupt_mask_reg);
 	else
-		writel_relaxed(~0, ioa_cfg->regs.set_interrupt_mask_reg);
+		writel(~0, ioa_cfg->regs.set_interrupt_mask_reg);
 
 	/* Clear any pending interrupts */
 	if (ioa_cfg->sis64)
@@ -8401,10 +8400,9 @@  static int ipr_reset_enable_ioa(struct ipr_cmnd *ipr_cmd)
 		ioa_cfg->hrrq[i].allow_interrupts = 1;
 		spin_unlock(&ioa_cfg->hrrq[i]._lock);
 	}
-	wmb();
 	if (ioa_cfg->sis64) {
 		/* Set the adapter to the correct endian mode. */
-		writel_relaxed(IPR_ENDIAN_SWAP_KEY,
+		writel(IPR_ENDIAN_SWAP_KEY,
 			       ioa_cfg->regs.endian_swap_reg);
 		int_reg = readl(ioa_cfg->regs.endian_swap_reg);
 	}