Message ID | 20180223153700.2186058-1-arnd@arndb.de (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Fri, Feb 23, 2018 at 5:36 PM, Arnd Bergmann <arnd@arndb.de> wrote: > 32-bit architectures generally cannot use writeq(), so we now get a build > failure for the lpfc driver: > > drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': > drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration] > > Another problem here is that writing out actual data (unlike accessing > mmio registers) means we must write the data with the same endianess > that we have read from memory, but writeq() will perform byte swaps > and add barriers inbetween accesses as we do for registers. > > Using memcpy_toio() should do the right thing here, using register > sized stores with correct endianess conversion and barriers (i.e. none), > but on some architectures might fall back to byte-size access. > > Side note: shouldn't the driver use ioremap_wc() instead of ioremap() > to get a write-combining mapping on all architectures that support this? IIRC memcpy_toio() doesn't increment the destination address. lo_hi or hi_lo helpers sound better. > Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > @@ -115,7 +115,6 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) > struct lpfc_register doorbell; > uint32_t host_index; > uint32_t idx; > - uint32_t i = 0; > uint8_t *tmp; > > /* sanity check on queue memory */ > @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) > if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED) > bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id); > lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size); > - if (q->dpp_enable && q->phba->cfg_enable_dpp) { > + if (q->dpp_enable && q->phba->cfg_enable_dpp) > /* write to DPP aperture taking advatage of Combined Writes */ > - tmp = (uint8_t *)wqe; > - for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) > - writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i); > - } > + memcpy_toio(tmp, q->dpp_regaddr, q->entry_size); > + > /* ensure WQE bcopy and DPP flushed before doorbell write */ > wmb(); > > -- > 2.9.0 >
On Fri, Feb 23, 2018 at 5:59 PM, Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Fri, Feb 23, 2018 at 5:36 PM, Arnd Bergmann <arnd@arndb.de> wrote: > IIRC memcpy_toio() doesn't increment the destination address. > > lo_hi or hi_lo helpers sound better. Ah, sorry, I messed up with writesl() / etc. memcpy_toio() has another side-effect, but here it seems unimportant.
From: Arnd Bergmann > Sent: 23 February 2018 15:37 > > 32-bit architectures generally cannot use writeq(), so we now get a build > failure for the lpfc driver: > > drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': > drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean > 'writeb'? [-Werror=implicit-function-declaration] > > Another problem here is that writing out actual data (unlike accessing > mmio registers) means we must write the data with the same endianess > that we have read from memory, but writeq() will perform byte swaps > and add barriers inbetween accesses as we do for registers. > > Using memcpy_toio() should do the right thing here, using register > sized stores with correct endianess conversion and barriers (i.e. none), > but on some architectures might fall back to byte-size access. ... Have you looked at the performance impact of this on x86? Last time I looked memcpy_toio() aliased directly to memcpy(). memcpy() is run-time patched between several different algorithms. On recent Intel cpus memcpy() is implemented as 'rep movsb' relying on the hardware to DTRT. For uncached accesses (typical for io) the 'RT' has to be byte transfers. So instead of the 8 byte transfers (on 64 bit) you get single bytes. This won't be what is intended! memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer. David
On Fri, Feb 23, 2018 at 5:41 PM, David Laight <David.Laight@aculab.com> wrote: > From: Arnd Bergmann >> Sent: 23 February 2018 15:37 >> >> 32-bit architectures generally cannot use writeq(), so we now get a build >> failure for the lpfc driver: >> >> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': >> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean >> 'writeb'? [-Werror=implicit-function-declaration] >> >> Another problem here is that writing out actual data (unlike accessing >> mmio registers) means we must write the data with the same endianess >> that we have read from memory, but writeq() will perform byte swaps >> and add barriers inbetween accesses as we do for registers. >> >> Using memcpy_toio() should do the right thing here, using register >> sized stores with correct endianess conversion and barriers (i.e. none), >> but on some architectures might fall back to byte-size access. > ... > > Have you looked at the performance impact of this on x86? > Last time I looked memcpy_toio() aliased directly to memcpy(). > memcpy() is run-time patched between several different algorithms. > On recent Intel cpus memcpy() is implemented as 'rep movsb' relying > on the hardware to DTRT. > For uncached accesses (typical for io) the 'RT' has to be byte transfers. > So instead of the 8 byte transfers (on 64 bit) you get single bytes. > This won't be what is intended! > memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer. I'm not that familiar with x86, but I would have guessed that on a write-combining I/O mapping, the hardware will combine the 'rep movsb' output data the same was as on a cacheable mapping. Arnd
On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote: > From: Arnd Bergmann >> Sent: 23 February 2018 15:37 >> >> 32-bit architectures generally cannot use writeq(), so we now get a build >> failure for the lpfc driver: >> >> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': >> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean >> 'writeb'? [-Werror=implicit-function-declaration] >> >> Another problem here is that writing out actual data (unlike accessing >> mmio registers) means we must write the data with the same endianess >> that we have read from memory, but writeq() will perform byte swaps >> and add barriers inbetween accesses as we do for registers. >> >> Using memcpy_toio() should do the right thing here, using register >> sized stores with correct endianess conversion and barriers (i.e. none), >> but on some architectures might fall back to byte-size access. > ... > > Have you looked at the performance impact of this on x86? > Last time I looked memcpy_toio() aliased directly to memcpy(). > memcpy() is run-time patched between several different algorithms. > On recent Intel cpus memcpy() is implemented as 'rep movsb' relying > on the hardware to DTRT. > For uncached accesses (typical for io) the 'RT' has to be byte transfers. > So instead of the 8 byte transfers (on 64 bit) you get single bytes. > This won't be what is intended! > memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer. Maybe I'm wrong but it uses movsq on 64-bit and movsl on 32-bit. The side-effect I referred previously is about tails, i.e. unaligned bytes are transferred in portions like 7 on 64-bit will be 4 + 2 + 1, 5 = 4 + 1 etc Similar way on 32-bit.
On Fri, Feb 23, 2018 at 6:51 PM, Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote: >> From: Arnd Bergmann >>> Sent: 23 February 2018 15:37 >>> >>> 32-bit architectures generally cannot use writeq(), so we now get a build >>> failure for the lpfc driver: >>> >>> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': >>> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean >>> 'writeb'? [-Werror=implicit-function-declaration] >>> >>> Another problem here is that writing out actual data (unlike accessing >>> mmio registers) means we must write the data with the same endianess >>> that we have read from memory, but writeq() will perform byte swaps >>> and add barriers inbetween accesses as we do for registers. >>> >>> Using memcpy_toio() should do the right thing here, using register >>> sized stores with correct endianess conversion and barriers (i.e. none), >>> but on some architectures might fall back to byte-size access. >> ... >> >> Have you looked at the performance impact of this on x86? >> Last time I looked memcpy_toio() aliased directly to memcpy(). >> memcpy() is run-time patched between several different algorithms. >> On recent Intel cpus memcpy() is implemented as 'rep movsb' relying >> on the hardware to DTRT. >> For uncached accesses (typical for io) the 'RT' has to be byte transfers. >> So instead of the 8 byte transfers (on 64 bit) you get single bytes. >> This won't be what is intended! >> memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer. > > Maybe I'm wrong but it uses movsq on 64-bit and movsl on 32-bit. > > The side-effect I referred previously is about tails, i.e. unaligned > bytes are transferred in portions > like > 7 on 64-bit will be 4 + 2 + 1, > 5 = 4 + 1 > etc > > Similar way on 32-bit. Same for leading bytes as well. arch/x86/lib/memcpy_64.S So, I *hope* that in the code in question there is no unaligned access is used.
From: Andy Shevchenko > Sent: 23 February 2018 16:51 > On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote: > > From: Arnd Bergmann > >> Sent: 23 February 2018 15:37 > >> > >> 32-bit architectures generally cannot use writeq(), so we now get a build > >> failure for the lpfc driver: > >> > >> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': > >> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean > >> 'writeb'? [-Werror=implicit-function-declaration] > >> > >> Another problem here is that writing out actual data (unlike accessing > >> mmio registers) means we must write the data with the same endianess > >> that we have read from memory, but writeq() will perform byte swaps > >> and add barriers inbetween accesses as we do for registers. > >> > >> Using memcpy_toio() should do the right thing here, using register > >> sized stores with correct endianess conversion and barriers (i.e. none), > >> but on some architectures might fall back to byte-size access. > > ... > > > > Have you looked at the performance impact of this on x86? > > Last time I looked memcpy_toio() aliased directly to memcpy(). > > memcpy() is run-time patched between several different algorithms. > > On recent Intel cpus memcpy() is implemented as 'rep movsb' relying > > on the hardware to DTRT. > > For uncached accesses (typical for io) the 'RT' has to be byte transfers. > > So instead of the 8 byte transfers (on 64 bit) you get single bytes. > > This won't be what is intended! > > memcpy_toio() should probably use 'rep movsd' for the bulk of the transfer. > > Maybe I'm wrong but it uses movsq on 64-bit and movsl on 32-bit. (Let's not argue about the instruction mnemonic). You might expect that, but last time I looked at the bus cycles on a PCIe slave that wasn't what I saw. > The side-effect I referred previously is about tails, i.e. unaligned > bytes are transferred in portions > like > 7 on 64-bit will be 4 + 2 + 1, > 5 = 4 + 1 on 64bit memcpy() is allowed to do: (long *)(tgt+len)[-1] = (long *)(src+len)[-1]; rep_movsq(tgt, src, len >> 3); provided the length is at least 8. The misaligned PCIe transfer generates a single TLP covering 12 bytes with the relevant byte enables set for the first and last 32bit words. David
On Fri, Feb 23, 2018 at 7:09 PM, David Laight <David.Laight@aculab.com> wrote: > From: Andy Shevchenko >> Sent: 23 February 2018 16:51 >> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote: >> The side-effect I referred previously is about tails, i.e. unaligned >> bytes are transferred in portions >> like >> 7 on 64-bit will be 4 + 2 + 1, >> 5 = 4 + 1 > > on 64bit memcpy() is allowed to do: > (long *)(tgt+len)[-1] = (long *)(src+len)[-1]; > rep_movsq(tgt, src, len >> 3); > provided the length is at least 8. > > The misaligned PCIe transfer generates a single TLP covering 12 bytes with the > relevant byte enables set for the first and last 32bit words. But is it guaranteed on any type of bus? memcpy_toio() is a generic helper, so, first of all we need to be sure what CPU on its side does, this is I think is pretty straight forward since it's all written in asm for 64-bit case. So, what about buses?
From: Andy Shevchenko > Sent: 23 February 2018 17:13 > To: David Laight > Cc: Arnd Bergmann; James Smart; Dick Kennedy; James E.J. Bottomley; Martin K. Petersen; Hannes > Reinecke; Johannes Thumshirn; linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] scsi: lpfc: use memcpy_toio instead of writeq > > On Fri, Feb 23, 2018 at 7:09 PM, David Laight <David.Laight@aculab.com> wrote: > > From: Andy Shevchenko > >> Sent: 23 February 2018 16:51 > >> On Fri, Feb 23, 2018 at 6:41 PM, David Laight <David.Laight@aculab.com> wrote: > > > >> The side-effect I referred previously is about tails, i.e. unaligned > >> bytes are transferred in portions > >> like > >> 7 on 64-bit will be 4 + 2 + 1, > >> 5 = 4 + 1 > > > > on 64bit memcpy() is allowed to do: > > (long *)(tgt+len)[-1] = (long *)(src+len)[-1]; > > rep_movsq(tgt, src, len >> 3); > > provided the length is at least 8. > > > > The misaligned PCIe transfer generates a single TLP covering 12 bytes with the > > relevant byte enables set for the first and last 32bit words. > > But is it guaranteed on any type of bus? > memcpy_toio() is a generic helper, so, first of all we need to be sure > what CPU on its side does, this is I think is pretty straight forward > since it's all written in asm for 64-bit case. I've just done a compile test, on x86-64 memcpy_toio() generates a call to memcpy() (checked with objdump -dr). That is on a system running a 4.14 kernel, so is probably using the system headers from that release. I'd need to do a run-time test on a newer system verify what the PCIe slave sees - but I changed our driver to do its own copy loops in order to avoid byte transfers some time ago. FWIW I was originally doing copy_to/from_user() directly to PCIe memory addresses! On x86 'memory' on devices can always be accesses by simple instructions. Hardware 'IO' addresses are not valid for memcpy_to/fromio(). David
On Fri, Feb 23, 2018 at 4:36 PM, Arnd Bergmann <arnd@arndb.de> wrote: > @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) > if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED) > bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id); > lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size); > - if (q->dpp_enable && q->phba->cfg_enable_dpp) { > + if (q->dpp_enable && q->phba->cfg_enable_dpp) > /* write to DPP aperture taking advatage of Combined Writes */ > - tmp = (uint8_t *)wqe; > - for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) > - writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i); > - } > + memcpy_toio(tmp, q->dpp_regaddr, q->entry_size); > + > /* ensure WQE bcopy and DPP flushed before doorbell write */ > wmb(); > Not sure where we are with the question of whether memcpy_toio is a good replacement or not, but further build testing showed that my patch was completely broken in more than one way: I mixed up the source and destination arguments, and I used the uninitialized 'tmp' instead of 'wqe'. Don't try this patch. Arnd
About to post a patch to fix. Rather than fidgeting with the copy routine, I want to go back to what we originally proposed - writeq() on 64bit, writel() on 32-bit. -- james On 2/23/2018 1:02 PM, Arnd Bergmann wrote: > On Fri, Feb 23, 2018 at 4:36 PM, Arnd Bergmann <arnd@arndb.de> wrote: >> @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) >> if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED) >> bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id); >> lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size); >> - if (q->dpp_enable && q->phba->cfg_enable_dpp) { >> + if (q->dpp_enable && q->phba->cfg_enable_dpp) >> /* write to DPP aperture taking advatage of Combined Writes */ >> - tmp = (uint8_t *)wqe; >> - for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) >> - writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i); >> - } >> + memcpy_toio(tmp, q->dpp_regaddr, q->entry_size); >> + >> /* ensure WQE bcopy and DPP flushed before doorbell write */ >> wmb(); >> > Not sure where we are with the question of whether memcpy_toio > is a good replacement or not, but further build testing showed that > my patch was completely broken in more than one way: > > I mixed up the source and destination arguments, and I used > the uninitialized 'tmp' instead of 'wqe'. Don't try this patch. > > Arnd
Arnd Bergmann <arnd@arndb.de> writes: > 32-bit architectures generally cannot use writeq(), so we now get a build > failure for the lpfc driver: > > drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': > drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration] Hi Arnd, why can't we use the writeq() from 'io-64-nonatomic-lo-hi.h'? I always thought these are compat versions for 32 Bit archs and even asked James to do so, what's why he did the change in the first place. My apologies for this James. Thanks, Johannes > > Another problem here is that writing out actual data (unlike accessing > mmio registers) means we must write the data with the same endianess > that we have read from memory, but writeq() will perform byte swaps > and add barriers inbetween accesses as we do for registers. > > Using memcpy_toio() should do the right thing here, using register > sized stores with correct endianess conversion and barriers (i.e. none), > but on some architectures might fall back to byte-size access. > > Side note: shouldn't the driver use ioremap_wc() instead of ioremap() > to get a write-combining mapping on all architectures that support this? > > Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > drivers/scsi/lpfc/lpfc_sli.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c > index 4ce3ca6f4b79..6749d41753b4 100644 > --- a/drivers/scsi/lpfc/lpfc_sli.c > +++ b/drivers/scsi/lpfc/lpfc_sli.c > @@ -115,7 +115,6 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) > struct lpfc_register doorbell; > uint32_t host_index; > uint32_t idx; > - uint32_t i = 0; > uint8_t *tmp; > > /* sanity check on queue memory */ > @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) > if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED) > bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id); > lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size); > - if (q->dpp_enable && q->phba->cfg_enable_dpp) { > + if (q->dpp_enable && q->phba->cfg_enable_dpp) > /* write to DPP aperture taking advatage of Combined Writes */ > - tmp = (uint8_t *)wqe; > - for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) > - writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i); > - } > + memcpy_toio(tmp, q->dpp_regaddr, q->entry_size); > + > /* ensure WQE bcopy and DPP flushed before doorbell write */ > wmb();
On Sun, Feb 25, 2018 at 11:02 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote: > Arnd Bergmann <arnd@arndb.de> writes: >> 32-bit architectures generally cannot use writeq(), so we now get a build >> failure for the lpfc driver: >> >> drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': >> drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration] > Hi Arnd, > why can't we use the writeq() from 'io-64-nonatomic-lo-hi.h'? I always > thought these are compat versions for 32 Bit archs and even asked James > to do so, what's why he did the change in the first place. That could work as well, if someone can figure out what the correct incantation is that works on big-endian 32-bit architectures. I think the simplest version that works everywhere would be lo_hi_writeq(__le32_to_cpup(__le32 __force *)p) | (u64)__le32_to_cpup(__le32 __force *)p +1) << 32)); but this is ugly as hell and makes my head spin. I definitely would't want that applied without being tested properly first on a variety of architectures. There are three variants that I'd prefer over that: - use memcpy_toio() and change the x86 implementation to guarantee aligned word aligned accesses on the output if at all possible (which seems to be a good idea anyway) - use a loop around __raw_writeq()/__raw_writel() depending on CONFIG_64BIT - add generic memcpy_toio_32() and memcpy_toio_64() helpers in linux/io.h and use those. Arnd
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 4ce3ca6f4b79..6749d41753b4 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -115,7 +115,6 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) struct lpfc_register doorbell; uint32_t host_index; uint32_t idx; - uint32_t i = 0; uint8_t *tmp; /* sanity check on queue memory */ @@ -138,12 +137,10 @@ lpfc_sli4_wq_put(struct lpfc_queue *q, union lpfc_wqe *wqe) if (q->phba->sli3_options & LPFC_SLI4_PHWQ_ENABLED) bf_set(wqe_wqid, &wqe->generic.wqe_com, q->queue_id); lpfc_sli_pcimem_bcopy(wqe, temp_wqe, q->entry_size); - if (q->dpp_enable && q->phba->cfg_enable_dpp) { + if (q->dpp_enable && q->phba->cfg_enable_dpp) /* write to DPP aperture taking advatage of Combined Writes */ - tmp = (uint8_t *)wqe; - for (i = 0; i < q->entry_size; i += sizeof(uint64_t)) - writeq(*((uint64_t *)(tmp + i)), q->dpp_regaddr + i); - } + memcpy_toio(tmp, q->dpp_regaddr, q->entry_size); + /* ensure WQE bcopy and DPP flushed before doorbell write */ wmb();
32-bit architectures generally cannot use writeq(), so we now get a build failure for the lpfc driver: drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_wq_put': drivers/scsi/lpfc/lpfc_sli.c:145:4: error: implicit declaration of function 'writeq'; did you mean 'writeb'? [-Werror=implicit-function-declaration] Another problem here is that writing out actual data (unlike accessing mmio registers) means we must write the data with the same endianess that we have read from memory, but writeq() will perform byte swaps and add barriers inbetween accesses as we do for registers. Using memcpy_toio() should do the right thing here, using register sized stores with correct endianess conversion and barriers (i.e. none), but on some architectures might fall back to byte-size access. Side note: shouldn't the driver use ioremap_wc() instead of ioremap() to get a write-combining mapping on all architectures that support this? Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/scsi/lpfc/lpfc_sli.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)