diff mbox series

[3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

Message ID 20230929170023.1020032-4-cleech@redhat.com (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1640 this patch: 1640
netdev/cc_maintainers warning 8 maintainers not CCed: keescook@chromium.org pabeni@redhat.com djwong@kernel.org davem@davemloft.net edumazet@google.com Jason@zx2c4.com GR-Linux-NIC-Dev@marvell.com kuba@kernel.org
netdev/build_clang success Errors and warnings before: 1367 this patch: 1367
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1664 this patch: 1664
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 90 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Chris Leech Sept. 29, 2023, 5 p.m. UTC
Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
for dma_alloc_coherent buffers.

The cnic l2_ring and l2_buf mmaps have caused page refcount issues since
the misuse of the __GFP_COMP flag was removed from their
dma_alloc_coherent calls. Fix that by having the uio device use
dma_mmap_coherent.

The bnx2 and bnx2x status block allocations are also dma_alloc_coherent,
and should use dma_mmap_coherent. They didn't allocate multiple pages,
but also didn't seem to work correctly with an iommu enabled.

Fixes: bb73955c0b1d ("cnic: don't pass bogus GFP_ flags to dma_alloc_coherent")
Signed-off-by: Chris Leech <cleech@redhat.com>
---
 drivers/net/ethernet/broadcom/bnx2.c          |  1 +
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  2 ++
 drivers/net/ethernet/broadcom/cnic.c          | 25 ++++++++++++-------
 drivers/net/ethernet/broadcom/cnic.h          |  1 +
 drivers/net/ethernet/broadcom/cnic_if.h       |  1 +
 5 files changed, 21 insertions(+), 9 deletions(-)

Comments

Jacob Keller Sept. 29, 2023, 5:19 p.m. UTC | #1
On 9/29/2023 10:00 AM, Chris Leech wrote:
> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> for dma_alloc_coherent buffers.
> 
> The cnic l2_ring and l2_buf mmaps have caused page refcount issues since
> the misuse of the __GFP_COMP flag was removed from their
> dma_alloc_coherent calls. Fix that by having the uio device use
> dma_mmap_coherent.
> 
> The bnx2 and bnx2x status block allocations are also dma_alloc_coherent,
> and should use dma_mmap_coherent. They didn't allocate multiple pages,
> but also didn't seem to work correctly with an iommu enabled.
> 
> Fixes: bb73955c0b1d ("cnic: don't pass bogus GFP_ flags to dma_alloc_coherent")
> Signed-off-by: Chris Leech <cleech@redhat.com>
> ---


Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Greg KH Sept. 30, 2023, 7:06 a.m. UTC | #2
On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> for dma_alloc_coherent buffers.

Why are ethernet drivers messing around with UIO devices?  That's not
what UIO is for, unless you are trying to do kernel bypass for these
devices without anyone noticing?

confused,

greg k-h
Jerry Snitselaar Sept. 30, 2023, 9:10 a.m. UTC | #3
On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > for dma_alloc_coherent buffers.
> 
> Why are ethernet drivers messing around with UIO devices?  That's not
> what UIO is for, unless you are trying to do kernel bypass for these
> devices without anyone noticing?
> 
> confused,
> 
> greg k-h

Hi Greg,

It is for iscsi offload [1], and apparently cnic has been handing off
dma alloc'd memory to uio for this since 2009, but it has just been
handing off the address from dma_alloc_coherent to uio, and giving the
dma handle the bnx2x device. That managed to avoid being an issue
until changes last year rightly cleaned up allowing __GFP_COMP to be
passed to the dma_alloc* calls, which cnic was passing to
dma_alloc_coherent. Now iscsiuio goes to mmap through the uio device
and the result is a BUG and stack trace like below.

It was my suggestion that it either needs to use dma_mmap_coherent to
mmap the memory, which possibly is a mistaken understanding on my
part but what dma_mmap_coherent says it is for, or possibly look to
do something similar to what qed is doing for uio.

Regards,
Jerry


[  129.196731] page:ffffea0004cacb40 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x132b2d
[  129.207285] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  129.214650] page_type: 0xffffffff()
[  129.218584] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[  129.227264] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[  129.235937] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[  129.246632] ------------[ cut here ]------------
[  129.251817] kernel BUG at include/linux/mm.h:1441!
[  129.257209] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  129.263440] CPU: 1 PID: 1930 Comm: iscsiuio Kdump: loaded Not tainted 6.6.0-0.rc3.26.eln130.x86_64+debug #1
[  129.274323] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.4.2 01/29/2015
[  129.282682] RIP: 0010:uio_vma_fault+0x40e/0x520 [uio]
[  129.288345] Code: 49 83 ee 01 e9 db fe ff ff be 01 00 00 00 4c 89 f7 e8 96 7f ae e9 e9 2b ff ff ff 48 c7 c6 00 08 52 c1 4c 89 f7 e8 12 1b 8e e9 <0f> 0b e8 cb 7b a3 e9 e9 f6 fd ff ff bb 02 00 00 00 e9 2b ff ff ff
[  129.309311] RSP: 0018:ffffc900022878b0 EFLAGS: 00010246
[  129.315156] RAX: 000000000000005c RBX: ffffea0004cacb40 RCX: 0000000000000000
[  129.323130] RDX: 0000000000000000 RSI: ffffffffad380f00 RDI: 0000000000000001
[  129.331102] RBP: ffff8881a65da528 R08: 0000000000000001 R09: fffff52000450eca
[  129.339074] R10: ffffc90002287657 R11: 0000000000000001 R12: ffffea0004cacb74
[  129.347044] R13: ffffc900022879f8 R14: ffffea0004cacb40 R15: ffff8881946a0400
[  129.355015] FS:  00007fc6dcafe640(0000) GS:ffff8885cb800000(0000) knlGS:0000000000000000
[  129.364054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  129.370474] CR2: 000055b5c9f091a0 CR3: 0000000162292001 CR4: 00000000001706e0
[  129.378446] Call Trace:
[  129.381183]  <TASK>
[  129.383532]  ? die+0x36/0x90
[  129.386765]  ? do_trap+0x199/0x240
[  129.390573]  ? uio_vma_fault+0x40e/0x520 [uio]
[  129.395560]  ? uio_vma_fault+0x40e/0x520 [uio]
[  129.400539]  ? do_error_trap+0xa3/0x170
[  129.404830]  ? uio_vma_fault+0x40e/0x520 [uio]
[  129.409815]  ? handle_invalid_op+0x2c/0x40
[  129.414395]  ? uio_vma_fault+0x40e/0x520 [uio]
[  129.419361]  ? exc_invalid_op+0x2d/0x40
[  129.423657]  ? asm_exc_invalid_op+0x1a/0x20
[  129.428332]  ? uio_vma_fault+0x40e/0x520 [uio]
[  129.433301]  __do_fault+0xf2/0x5a0
[  129.437103]  do_fault+0x68e/0xc30
[  129.440804]  ? __pfx___pte_offset_map_lock+0x10/0x10
[  129.446350]  __handle_mm_fault+0x8e0/0xeb0
[  129.450939]  ? follow_page_pte+0x29a/0xda0
[  129.455513]  ? __pfx___handle_mm_fault+0x10/0x10
[  129.460668]  ? __pfx_follow_page_pte+0x10/0x10
[  129.465626]  ? follow_pmd_mask.isra.0+0x1d4/0xab0
[  129.470877]  ? count_memcg_event_mm.part.0+0xc7/0x1f0
[  129.476515]  ? rcu_is_watching+0x15/0xb0
[  129.480896]  handle_mm_fault+0x2f2/0x8d0
[  129.485277]  ? check_vma_flags+0x1c2/0x420
[  129.489852]  __get_user_pages+0x333/0xa20
[  129.494331]  ? __pfx___get_user_pages+0x10/0x10
[  129.499389]  ? __pfx_mt_find+0x10/0x10
[  129.503568]  ? __mm_populate+0xe7/0x360
[  129.507850]  ? rcu_is_watching+0x15/0xb0
[  129.512231]  populate_vma_page_range+0x1e9/0x2d0
[  129.517388]  ? __pfx_populate_vma_page_range+0x10/0x10
[  129.523125]  ? __pfx_mmap_region+0x10/0x10
[  129.527693]  __mm_populate+0x1ff/0x360
[  129.531882]  ? __pfx___mm_populate+0x10/0x10
[  129.536649]  ? do_mmap+0x61d/0xcd0
[  129.540446]  ? __up_write+0x1a5/0x500
[  129.544536]  vm_mmap_pgoff+0x276/0x360
[  129.548722]  ? rcu_is_watching+0x15/0xb0
[  129.553093]  ? __pfx_vm_mmap_pgoff+0x10/0x10
[  129.557861]  ? rcu_is_watching+0x15/0xb0
[  129.562239]  ? lock_release+0x25c/0x300
[  129.566522]  ? __fget_files+0x1e0/0x380
[  129.570807]  ksys_mmap_pgoff+0x2e4/0x430
[  129.575189]  do_syscall_64+0x60/0x90
[  129.579180]  ? do_syscall_64+0x6c/0x90
[  129.583357]  ? lockdep_hardirqs_on+0x7d/0x100
[  129.588235]  ? do_syscall_64+0x6c/0x90
[  129.592420]  ? lockdep_hardirqs_on+0x7d/0x100
[  129.597283]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8



[1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README
Chris Leech Sept. 30, 2023, 6:19 p.m. UTC | #4
On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > for dma_alloc_coherent buffers.
> 
> Why are ethernet drivers messing around with UIO devices?  That's not
> what UIO is for, unless you are trying to do kernel bypass for these
> devices without anyone noticing?
> 
> confused,

It's confusing. The bnx2 driver stack included a cnic (converged nic?)
module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
offload drivers (iscsi, fcoe, rdma).

The iscsi module (bnx2i) uses a passthrough interface from cnic to
handle some network configuration that the device firmware doesn't do.
It uses a uio device and a userspace component called iscsiuio to do
that.

Questions beyond that will probably need to be answer by one of the many
Marvell engineers copied on this thread.

- Chris
Greg KH Sept. 30, 2023, 6:28 p.m. UTC | #5
On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > for dma_alloc_coherent buffers.
> > 
> > Why are ethernet drivers messing around with UIO devices?  That's not
> > what UIO is for, unless you are trying to do kernel bypass for these
> > devices without anyone noticing?
> > 
> > confused,
> 
> It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> offload drivers (iscsi, fcoe, rdma).
> 
> The iscsi module (bnx2i) uses a passthrough interface from cnic to
> handle some network configuration that the device firmware doesn't do.
> It uses a uio device and a userspace component called iscsiuio to do
> that.

That's horrible, and not what the UIO api is for at all.  Configure the
device like any other normal kernel device, don't poke at raw memory
values directly, that way lies madness.

Have a pointer to the userspace tool anywhere?  All I found looks like a
full IP stack in userspace under that name, and surely that's not what
this api is for...

thanks,

greg k-h
Greg KH Sept. 30, 2023, 6:29 p.m. UTC | #6
On Sat, Sep 30, 2023 at 02:10:50AM -0700, Jerry Snitselaar wrote:
> [1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README

That's IP offload, not what UIO is supposed to be for at all (yes, I
know DPDK abuses this api as well, and I hate it.)  But this is on a
network card, it shouldn't need UIO.  Why is iscsi somehow "special" for
this?

still confused,

greg k-h
Hannes Reinecke Oct. 1, 2023, 10:44 a.m. UTC | #7
On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
>> On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
>>> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
>>>> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
>>>> for dma_alloc_coherent buffers.
>>>
>>> Why are ethernet drivers messing around with UIO devices?  That's not
>>> what UIO is for, unless you are trying to do kernel bypass for these
>>> devices without anyone noticing?
>>>
>>> confused,
>>
>> It's confusing. The bnx2 driver stack included a cnic (converged nic?)
>> module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
>> offload drivers (iscsi, fcoe, rdma).
>>
>> The iscsi module (bnx2i) uses a passthrough interface from cnic to
>> handle some network configuration that the device firmware doesn't do.
>> It uses a uio device and a userspace component called iscsiuio to do
>> that.
> 
> That's horrible, and not what the UIO api is for at all.  Configure the
> device like any other normal kernel device, don't poke at raw memory
> values directly, that way lies madness.
> 
> Have a pointer to the userspace tool anywhere?  All I found looks like a
> full IP stack in userspace under that name, and surely that's not what
> this api is for...
> 
But that's how the interface is used, in particular for the bnx2i 
driver. Problem is that the bnx2i iSCSI offload is just that, an iSCSI 
offload. Not a TCP offload. So if the iSCSI interface is configured to
acquire the IP address via DHCP, someone has to run the DHCP protocol.
But the iSCSI offload can't, and the bnx2i PCI device is not a network
device so that the normal network stack can't be used.
And so the architects of the bnx2i card decided to use UIO to pass
the network traffic to userspace, and used the userspace 'iscsiuio'
application to run DHCP in userspace.

But's been that way for several years now; so long, in fact, that
the card itself has been out of support from Marvell (since quite some 
years, too, IIRC). And even the successor of that card (the qedi driver)
is nearing EOL. Mind you, the qedi driver is using the same interface 
(by using UIO to run DHCP in userspace), so singling out the bnx2i for 
bad design can be construed as being unfair :-)

I agree, though, that the design is a mess.

Cheers,

Hannes
Greg KH Oct. 1, 2023, 11:57 a.m. UTC | #8
On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote:
> On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > > > for dma_alloc_coherent buffers.
> > > > 
> > > > Why are ethernet drivers messing around with UIO devices?  That's not
> > > > what UIO is for, unless you are trying to do kernel bypass for these
> > > > devices without anyone noticing?
> > > > 
> > > > confused,
> > > 
> > > It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> > > offload drivers (iscsi, fcoe, rdma).
> > > 
> > > The iscsi module (bnx2i) uses a passthrough interface from cnic to
> > > handle some network configuration that the device firmware doesn't do.
> > > It uses a uio device and a userspace component called iscsiuio to do
> > > that.
> > 
> > That's horrible, and not what the UIO api is for at all.  Configure the
> > device like any other normal kernel device, don't poke at raw memory
> > values directly, that way lies madness.
> > 
> > Have a pointer to the userspace tool anywhere?  All I found looks like a
> > full IP stack in userspace under that name, and surely that's not what
> > this api is for...
> > 
> But that's how the interface is used, in particular for the bnx2i driver.
> Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not
> a TCP offload. So if the iSCSI interface is configured to
> acquire the IP address via DHCP, someone has to run the DHCP protocol.
> But the iSCSI offload can't, and the bnx2i PCI device is not a network
> device so that the normal network stack can't be used.
> And so the architects of the bnx2i card decided to use UIO to pass
> the network traffic to userspace, and used the userspace 'iscsiuio'
> application to run DHCP in userspace.
> 
> But's been that way for several years now; so long, in fact, that
> the card itself has been out of support from Marvell (since quite some
> years, too, IIRC). And even the successor of that card (the qedi driver)
> is nearing EOL. Mind you, the qedi driver is using the same interface (by
> using UIO to run DHCP in userspace), so singling out the bnx2i for bad
> design can be construed as being unfair :-)

Ok, let's say they are all horrible! :)

> I agree, though, that the design is a mess.

Ok, so why are we papering over it and continuing to allow it to exist?

What "broke" to suddenly require this UIO change?  If this has been
around for a very long time, what has caused this to now require the UIO
layer to change?

thanks,

greg k-h
Jerry Snitselaar Oct. 1, 2023, 2:22 p.m. UTC | #9
On Sun, Oct 01, 2023 at 01:57:25PM +0200, Greg Kroah-Hartman wrote:
> On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote:
> > On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> > > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> > > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > > > > for dma_alloc_coherent buffers.
> > > > > 
> > > > > Why are ethernet drivers messing around with UIO devices?  That's not
> > > > > what UIO is for, unless you are trying to do kernel bypass for these
> > > > > devices without anyone noticing?
> > > > > 
> > > > > confused,
> > > > 
> > > > It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> > > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> > > > offload drivers (iscsi, fcoe, rdma).
> > > > 
> > > > The iscsi module (bnx2i) uses a passthrough interface from cnic to
> > > > handle some network configuration that the device firmware doesn't do.
> > > > It uses a uio device and a userspace component called iscsiuio to do
> > > > that.
> > > 
> > > That's horrible, and not what the UIO api is for at all.  Configure the
> > > device like any other normal kernel device, don't poke at raw memory
> > > values directly, that way lies madness.
> > > 
> > > Have a pointer to the userspace tool anywhere?  All I found looks like a
> > > full IP stack in userspace under that name, and surely that's not what
> > > this api is for...
> > > 
> > But that's how the interface is used, in particular for the bnx2i driver.
> > Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not
> > a TCP offload. So if the iSCSI interface is configured to
> > acquire the IP address via DHCP, someone has to run the DHCP protocol.
> > But the iSCSI offload can't, and the bnx2i PCI device is not a network
> > device so that the normal network stack can't be used.
> > And so the architects of the bnx2i card decided to use UIO to pass
> > the network traffic to userspace, and used the userspace 'iscsiuio'
> > application to run DHCP in userspace.
> > 
> > But's been that way for several years now; so long, in fact, that
> > the card itself has been out of support from Marvell (since quite some
> > years, too, IIRC). And even the successor of that card (the qedi driver)
> > is nearing EOL. Mind you, the qedi driver is using the same interface (by
> > using UIO to run DHCP in userspace), so singling out the bnx2i for bad
> > design can be construed as being unfair :-)
> 
> Ok, let's say they are all horrible! :)
> 
> > I agree, though, that the design is a mess.
> 
> Ok, so why are we papering over it and continuing to allow it to exist?
>
> What "broke" to suddenly require this UIO change?  If this has been
> around for a very long time, what has caused this to now require the UIO
> layer to change?
> 
> thanks,
> 
> greg k-h

Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
in particular these two (from the e529d3507a93 dma-mapping pull for
6.2):

ffcb75458460 dma-mapping: reject __GFP_COMP in dma_alloc_attrs | 2022-11-21 | (Christoph Hellwig)
bb73955c0b1d cnic: don't pass bogus GFP_ flags to dma_alloc_coherent | 2022-11-21 | (Christoph Hellwig)


Regards,
Jerry
Christoph Hellwig Oct. 2, 2023, 6:04 a.m. UTC | #10
On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> in particular these two (from the e529d3507a93 dma-mapping pull for
> 6.2):

That's complete BS.  The driver was broken since day 1 and always
ignored the DMA API requirement to never try to grab the page from the
dma coherent allocation because you generally speaking can't.  It just
happened to accidentally work the trivial dma coherent allocator that
is used on x86.
Jerry Snitselaar Oct. 2, 2023, 7:50 a.m. UTC | #11
On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > in particular these two (from the e529d3507a93 dma-mapping pull for
> > 6.2):
> 
> That's complete BS.  The driver was broken since day 1 and always
> ignored the DMA API requirement to never try to grab the page from the
> dma coherent allocation because you generally speaking can't.  It just
> happened to accidentally work the trivial dma coherent allocator that
> is used on x86.
> 

re-sending since gmail decided to not send plain text:

Yes, I agree that it has been broken and misusing the API. Greg's
question was what changed though, and it was the clean up of
__GFP_COMP in dma-mapping that brought the problem in the driver to
light.

I already said the other day that cnic has been doing this for 14
years. I'm not blaming you or your __GFP_COMP cleanup commits, they
just uncovered that cnic was doing something wrong. My apologies if
you took it that way.

Regards,
Jerry
Greg KH Oct. 2, 2023, 8:46 a.m. UTC | #12
On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
> On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > > in particular these two (from the e529d3507a93 dma-mapping pull for
> > > 6.2):
> > 
> > That's complete BS.  The driver was broken since day 1 and always
> > ignored the DMA API requirement to never try to grab the page from the
> > dma coherent allocation because you generally speaking can't.  It just
> > happened to accidentally work the trivial dma coherent allocator that
> > is used on x86.
> > 
> 
> re-sending since gmail decided to not send plain text:
> 
> Yes, I agree that it has been broken and misusing the API. Greg's
> question was what changed though, and it was the clean up of
> __GFP_COMP in dma-mapping that brought the problem in the driver to
> light.
> 
> I already said the other day that cnic has been doing this for 14
> years. I'm not blaming you or your __GFP_COMP cleanup commits, they
> just uncovered that cnic was doing something wrong. My apologies if
> you took it that way.

As these devices aren't being made anymore, and this api is really not a
good idea in the first place, why don't we just leave it broken and see
if anyone notices?

thanks,

greg k-h
Hannes Reinecke Oct. 2, 2023, 8:59 a.m. UTC | #13
On 10/2/23 10:46, Greg Kroah-Hartman wrote:
> On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
>> On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
>>> On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
>>>> Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
>>>> in particular these two (from the e529d3507a93 dma-mapping pull for
>>>> 6.2):
>>>
>>> That's complete BS.  The driver was broken since day 1 and always
>>> ignored the DMA API requirement to never try to grab the page from the
>>> dma coherent allocation because you generally speaking can't.  It just
>>> happened to accidentally work the trivial dma coherent allocator that
>>> is used on x86.
>>>
>>
>> re-sending since gmail decided to not send plain text:
>>
>> Yes, I agree that it has been broken and misusing the API. Greg's
>> question was what changed though, and it was the clean up of
>> __GFP_COMP in dma-mapping that brought the problem in the driver to
>> light.
>>
>> I already said the other day that cnic has been doing this for 14
>> years. I'm not blaming you or your __GFP_COMP cleanup commits, they
>> just uncovered that cnic was doing something wrong. My apologies if
>> you took it that way.
> 
> As these devices aren't being made anymore, and this api is really not a
> good idea in the first place, why don't we just leave it broken and see
> if anyone notices?
> 
Guess what triggered this mail thread.
Some customers did notice.

Problem is that these devices were built as the network interface in 
some bladecenter machines, so you can't just replace them with a 
different Ethernet card.

Cheers,

Hannes
Paolo Abeni Oct. 5, 2023, 10:39 a.m. UTC | #14
On Mon, 2023-10-02 at 10:59 +0200, Hannes Reinecke wrote:
> On 10/2/23 10:46, Greg Kroah-Hartman wrote:
> > On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
> > > On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> > > > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > > > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > > > > in particular these two (from the e529d3507a93 dma-mapping pull for
> > > > > 6.2):
> > > > 
> > > > That's complete BS.  The driver was broken since day 1 and always
> > > > ignored the DMA API requirement to never try to grab the page from the
> > > > dma coherent allocation because you generally speaking can't.  It just
> > > > happened to accidentally work the trivial dma coherent allocator that
> > > > is used on x86.
> > > > 
> > > 
> > > re-sending since gmail decided to not send plain text:
> > > 
> > > Yes, I agree that it has been broken and misusing the API. Greg's
> > > question was what changed though, and it was the clean up of
> > > __GFP_COMP in dma-mapping that brought the problem in the driver to
> > > light.
> > > 
> > > I already said the other day that cnic has been doing this for 14
> > > years. I'm not blaming you or your __GFP_COMP cleanup commits, they
> > > just uncovered that cnic was doing something wrong. My apologies if
> > > you took it that way.
> > 
> > As these devices aren't being made anymore, and this api is really not a
> > good idea in the first place, why don't we just leave it broken and see
> > if anyone notices?
> > 
> Guess what triggered this mail thread.
> Some customers did notice.
> 
> Problem is that these devices were built as the network interface in 
> some bladecenter machines, so you can't just replace them with a 
> different Ethernet card.

This route looks a no-go.

Out of sheer ignorance, would the iommu hack hinted in the cover letter
require similar controversial changes?

Cheers,

Paolo
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 84a04eec654a..490f88ad3bd2 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -367,6 +367,7 @@  static void bnx2_setup_cnic_irq_info(struct bnx2 *bp)
 	cp->irq_arr[0].status_blk = (void *)
 		((unsigned long) bnapi->status_blk.msi +
 		(BNX2_SBLK_MSIX_ALIGN_SIZE * sb_id));
+	cp->irq_arr[0].status_blk_map = bp->status_blk_mapping;
 	cp->irq_arr[0].status_blk_num = sb_id;
 	cp->num_irq = 1;
 }
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2fcde42a05c1..dbaa90b7f889 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14912,9 +14912,11 @@  void bnx2x_setup_cnic_irq_info(struct bnx2x *bp)
 	else
 		cp->irq_arr[0].status_blk = (void *)bp->cnic_sb.e1x_sb;
 
+	cp->irq_arr[0].status_blk_map = bp->cnic_sb_mapping;
 	cp->irq_arr[0].status_blk_num =  bnx2x_cnic_fw_sb_id(bp);
 	cp->irq_arr[0].status_blk_num2 = bnx2x_cnic_igu_sb_id(bp);
 	cp->irq_arr[1].status_blk = bp->def_status_blk;
+	cp->irq_arr[1].status_blk_map = bp->def_status_blk_mapping;
 	cp->irq_arr[1].status_blk_num = DEF_SB_ID;
 	cp->irq_arr[1].status_blk_num2 = DEF_SB_IGU_ID;
 
diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 67ec397bd171..05e28e00e916 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1106,8 +1106,8 @@  static int cnic_init_uio(struct cnic_dev *dev)
 	if (test_bit(CNIC_F_BNX2_CLASS, &dev->flags)) {
 		uinfo->mem[0].size = MB_GET_CID_ADDR(TX_TSS_CID +
 						     TX_MAX_TSS_RINGS + 1);
-		uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen &
-					CNIC_PAGE_MASK;
+		uinfo->mem[1].dma_addr = cp->status_blk_map;
+		uinfo->mem[1].virtual_addr = cp->status_blk.gen;
 		if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX)
 			uinfo->mem[1].size = PAGE_ALIGN(BNX2_SBLK_MSIX_ALIGN_SIZE * 9);
 		else
@@ -1117,22 +1117,27 @@  static int cnic_init_uio(struct cnic_dev *dev)
 	} else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) {
 		uinfo->mem[0].size = pci_resource_len(dev->pcidev, 0);
 
-		uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk &
-			CNIC_PAGE_MASK;
+		uinfo->mem[1].dma_addr = cp->status_blk_map;
+		uinfo->mem[1].virtual_addr = cp->bnx2x_def_status_blk;
 		uinfo->mem[1].size = PAGE_ALIGN(sizeof(*cp->bnx2x_def_status_blk));
 
 		uinfo->name = "bnx2x_cnic";
 	}
 
-	uinfo->mem[1].memtype = UIO_MEM_LOGICAL;
+	uinfo->mem[1].dma_device = &dev->pcidev->dev;
+	uinfo->mem[1].memtype = UIO_MEM_DMA_COHERENT;
 
-	uinfo->mem[2].addr = (unsigned long) udev->l2_ring;
+	uinfo->mem[2].dma_addr = udev->l2_ring_map;
+	uinfo->mem[2].virtual_addr = udev->l2_ring;
 	uinfo->mem[2].size = udev->l2_ring_size;
-	uinfo->mem[2].memtype = UIO_MEM_LOGICAL;
+	uinfo->mem[2].dma_device = &dev->pcidev->dev;
+	uinfo->mem[2].memtype = UIO_MEM_DMA_COHERENT;
 
-	uinfo->mem[3].addr = (unsigned long) udev->l2_buf;
+	uinfo->mem[3].dma_addr = udev->l2_buf_map;
+	uinfo->mem[3].virtual_addr = udev->l2_buf;
 	uinfo->mem[3].size = udev->l2_buf_size;
-	uinfo->mem[3].memtype = UIO_MEM_LOGICAL;
+	uinfo->mem[3].dma_device = &dev->pcidev->dev;
+	uinfo->mem[3].memtype = UIO_MEM_DMA_COHERENT;
 
 	uinfo->version = CNIC_MODULE_VERSION;
 	uinfo->irq = UIO_IRQ_CUSTOM;
@@ -1314,6 +1319,7 @@  static int cnic_alloc_bnx2x_resc(struct cnic_dev *dev)
 		return 0;
 
 	cp->bnx2x_def_status_blk = cp->ethdev->irq_arr[1].status_blk;
+	cp->status_blk_map = cp->ethdev->irq_arr[1].status_blk_map;
 
 	cp->l2_rx_ring_size = 15;
 
@@ -5324,6 +5330,7 @@  static int cnic_start_hw(struct cnic_dev *dev)
 	pci_dev_get(dev->pcidev);
 	cp->func = PCI_FUNC(dev->pcidev->devfn);
 	cp->status_blk.gen = ethdev->irq_arr[0].status_blk;
+	cp->status_blk_map = ethdev->irq_arr[0].status_blk_map;
 	cp->status_blk_num = ethdev->irq_arr[0].status_blk_num;
 
 	err = cp->alloc_resc(dev);
diff --git a/drivers/net/ethernet/broadcom/cnic.h b/drivers/net/ethernet/broadcom/cnic.h
index 4baea81bae7a..fedc84ada937 100644
--- a/drivers/net/ethernet/broadcom/cnic.h
+++ b/drivers/net/ethernet/broadcom/cnic.h
@@ -260,6 +260,7 @@  struct cnic_local {
 		#define SM_RX_ID		0
 		#define SM_TX_ID		1
 	} status_blk;
+	dma_addr_t status_blk_map;
 
 	struct host_sp_status_block	*bnx2x_def_status_blk;
 
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 789e5c7e9311..49a11ec80b36 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -190,6 +190,7 @@  struct cnic_ops {
 struct cnic_irq {
 	unsigned int	vector;
 	void		*status_blk;
+	dma_addr_t	status_blk_map;
 	u32		status_blk_num;
 	u32		status_blk_num2;
 	u32		irq_flags;