Message ID | 20230929170023.1020032-4-cleech@redhat.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x | expand |
On 9/29/2023 10:00 AM, Chris Leech wrote: > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap > for dma_alloc_coherent buffers. > > The cnic l2_ring and l2_buf mmaps have caused page refcount issues since > the misuse of the __GFP_COMP flag was removed from their > dma_alloc_coherent calls. Fix that by having the uio device use > dma_mmap_coherent. > > The bnx2 and bnx2x status block allocations are also dma_alloc_coherent, > and should use dma_mmap_coherent. They didn't allocate multiple pages, > but also didn't seem to work correctly with an iommu enabled. > > Fixes: bb73955c0b1d ("cnic: don't pass bogus GFP_ flags to dma_alloc_coherent") > Signed-off-by: Chris Leech <cleech@redhat.com> > --- Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote: > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap > for dma_alloc_coherent buffers. Why are ethernet drivers messing around with UIO devices? That's not what UIO is for, unless you are trying to do kernel bypass for these devices without anyone noticing? confused, greg k-h
On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote: > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote: > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap > > for dma_alloc_coherent buffers. > > Why are ethernet drivers messing around with UIO devices? That's not > what UIO is for, unless you are trying to do kernel bypass for these > devices without anyone noticing? > > confused, > > greg k-h Hi Greg, It is for iscsi offload [1], and apparently cnic has been handing off dma alloc'd memory to uio for this since 2009, but it has just been handing off the address from dma_alloc_coherent to uio, and giving the dma handle the bnx2x device. That managed to avoid being an issue until changes last year rightly cleaned up allowing __GFP_COMP to be passed to the dma_alloc* calls, which cnic was passing to dma_alloc_coherent. Now iscsiuio goes to mmap through the uio device and the result is a BUG and stack trace like below. It was my suggestion that it either needs to use dma_mmap_coherent to mmap the memory, which possibly is a mistaken understanding on my part but what dma_mmap_coherent says it is for, or possibly look to do something similar to what qed is doing for uio. Regards, Jerry [ 129.196731] page:ffffea0004cacb40 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x132b2d [ 129.207285] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) [ 129.214650] page_type: 0xffffffff() [ 129.218584] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000 [ 129.227264] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 129.235937] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u)) [ 129.246632] ------------[ cut here ]------------ [ 129.251817] kernel BUG at include/linux/mm.h:1441! [ 129.257209] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 129.263440] CPU: 1 PID: 1930 Comm: iscsiuio Kdump: loaded Not tainted 6.6.0-0.rc3.26.eln130.x86_64+debug #1 [ 129.274323] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.4.2 01/29/2015 [ 129.282682] RIP: 0010:uio_vma_fault+0x40e/0x520 [uio] [ 129.288345] Code: 49 83 ee 01 e9 db fe ff ff be 01 00 00 00 4c 89 f7 e8 96 7f ae e9 e9 2b ff ff ff 48 c7 c6 00 08 52 c1 4c 89 f7 e8 12 1b 8e e9 <0f> 0b e8 cb 7b a3 e9 e9 f6 fd ff ff bb 02 00 00 00 e9 2b ff ff ff [ 129.309311] RSP: 0018:ffffc900022878b0 EFLAGS: 00010246 [ 129.315156] RAX: 000000000000005c RBX: ffffea0004cacb40 RCX: 0000000000000000 [ 129.323130] RDX: 0000000000000000 RSI: ffffffffad380f00 RDI: 0000000000000001 [ 129.331102] RBP: ffff8881a65da528 R08: 0000000000000001 R09: fffff52000450eca [ 129.339074] R10: ffffc90002287657 R11: 0000000000000001 R12: ffffea0004cacb74 [ 129.347044] R13: ffffc900022879f8 R14: ffffea0004cacb40 R15: ffff8881946a0400 [ 129.355015] FS: 00007fc6dcafe640(0000) GS:ffff8885cb800000(0000) knlGS:0000000000000000 [ 129.364054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 129.370474] CR2: 000055b5c9f091a0 CR3: 0000000162292001 CR4: 00000000001706e0 [ 129.378446] Call Trace: [ 129.381183] <TASK> [ 129.383532] ? die+0x36/0x90 [ 129.386765] ? do_trap+0x199/0x240 [ 129.390573] ? uio_vma_fault+0x40e/0x520 [uio] [ 129.395560] ? uio_vma_fault+0x40e/0x520 [uio] [ 129.400539] ? do_error_trap+0xa3/0x170 [ 129.404830] ? uio_vma_fault+0x40e/0x520 [uio] [ 129.409815] ? handle_invalid_op+0x2c/0x40 [ 129.414395] ? uio_vma_fault+0x40e/0x520 [uio] [ 129.419361] ? exc_invalid_op+0x2d/0x40 [ 129.423657] ? asm_exc_invalid_op+0x1a/0x20 [ 129.428332] ? uio_vma_fault+0x40e/0x520 [uio] [ 129.433301] __do_fault+0xf2/0x5a0 [ 129.437103] do_fault+0x68e/0xc30 [ 129.440804] ? __pfx___pte_offset_map_lock+0x10/0x10 [ 129.446350] __handle_mm_fault+0x8e0/0xeb0 [ 129.450939] ? follow_page_pte+0x29a/0xda0 [ 129.455513] ? __pfx___handle_mm_fault+0x10/0x10 [ 129.460668] ? __pfx_follow_page_pte+0x10/0x10 [ 129.465626] ? follow_pmd_mask.isra.0+0x1d4/0xab0 [ 129.470877] ? count_memcg_event_mm.part.0+0xc7/0x1f0 [ 129.476515] ? rcu_is_watching+0x15/0xb0 [ 129.480896] handle_mm_fault+0x2f2/0x8d0 [ 129.485277] ? check_vma_flags+0x1c2/0x420 [ 129.489852] __get_user_pages+0x333/0xa20 [ 129.494331] ? __pfx___get_user_pages+0x10/0x10 [ 129.499389] ? __pfx_mt_find+0x10/0x10 [ 129.503568] ? __mm_populate+0xe7/0x360 [ 129.507850] ? rcu_is_watching+0x15/0xb0 [ 129.512231] populate_vma_page_range+0x1e9/0x2d0 [ 129.517388] ? __pfx_populate_vma_page_range+0x10/0x10 [ 129.523125] ? __pfx_mmap_region+0x10/0x10 [ 129.527693] __mm_populate+0x1ff/0x360 [ 129.531882] ? __pfx___mm_populate+0x10/0x10 [ 129.536649] ? do_mmap+0x61d/0xcd0 [ 129.540446] ? __up_write+0x1a5/0x500 [ 129.544536] vm_mmap_pgoff+0x276/0x360 [ 129.548722] ? rcu_is_watching+0x15/0xb0 [ 129.553093] ? __pfx_vm_mmap_pgoff+0x10/0x10 [ 129.557861] ? rcu_is_watching+0x15/0xb0 [ 129.562239] ? lock_release+0x25c/0x300 [ 129.566522] ? __fget_files+0x1e0/0x380 [ 129.570807] ksys_mmap_pgoff+0x2e4/0x430 [ 129.575189] do_syscall_64+0x60/0x90 [ 129.579180] ? do_syscall_64+0x6c/0x90 [ 129.583357] ? lockdep_hardirqs_on+0x7d/0x100 [ 129.588235] ? do_syscall_64+0x6c/0x90 [ 129.592420] ? lockdep_hardirqs_on+0x7d/0x100 [ 129.597283] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README
On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote: > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote: > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap > > for dma_alloc_coherent buffers. > > Why are ethernet drivers messing around with UIO devices? That's not > what UIO is for, unless you are trying to do kernel bypass for these > devices without anyone noticing? > > confused, It's confusing. The bnx2 driver stack included a cnic (converged nic?) module that sits between the ethernet drivers (bnx2, bnx2x) and protocol offload drivers (iscsi, fcoe, rdma). The iscsi module (bnx2i) uses a passthrough interface from cnic to handle some network configuration that the device firmware doesn't do. It uses a uio device and a userspace component called iscsiuio to do that. Questions beyond that will probably need to be answer by one of the many Marvell engineers copied on this thread. - Chris
On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote: > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote: > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote: > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap > > > for dma_alloc_coherent buffers. > > > > Why are ethernet drivers messing around with UIO devices? That's not > > what UIO is for, unless you are trying to do kernel bypass for these > > devices without anyone noticing? > > > > confused, > > It's confusing. The bnx2 driver stack included a cnic (converged nic?) > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol > offload drivers (iscsi, fcoe, rdma). > > The iscsi module (bnx2i) uses a passthrough interface from cnic to > handle some network configuration that the device firmware doesn't do. > It uses a uio device and a userspace component called iscsiuio to do > that. That's horrible, and not what the UIO api is for at all. Configure the device like any other normal kernel device, don't poke at raw memory values directly, that way lies madness. Have a pointer to the userspace tool anywhere? All I found looks like a full IP stack in userspace under that name, and surely that's not what this api is for... thanks, greg k-h
On Sat, Sep 30, 2023 at 02:10:50AM -0700, Jerry Snitselaar wrote:
> [1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README
That's IP offload, not what UIO is supposed to be for at all (yes, I
know DPDK abuses this api as well, and I hate it.) But this is on a
network card, it shouldn't need UIO. Why is iscsi somehow "special" for
this?
still confused,
greg k-h
On 9/30/23 20:28, Greg Kroah-Hartman wrote: > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote: >> On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote: >>> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote: >>>> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap >>>> for dma_alloc_coherent buffers. >>> >>> Why are ethernet drivers messing around with UIO devices? That's not >>> what UIO is for, unless you are trying to do kernel bypass for these >>> devices without anyone noticing? >>> >>> confused, >> >> It's confusing. The bnx2 driver stack included a cnic (converged nic?) >> module that sits between the ethernet drivers (bnx2, bnx2x) and protocol >> offload drivers (iscsi, fcoe, rdma). >> >> The iscsi module (bnx2i) uses a passthrough interface from cnic to >> handle some network configuration that the device firmware doesn't do. >> It uses a uio device and a userspace component called iscsiuio to do >> that. > > That's horrible, and not what the UIO api is for at all. Configure the > device like any other normal kernel device, don't poke at raw memory > values directly, that way lies madness. > > Have a pointer to the userspace tool anywhere? All I found looks like a > full IP stack in userspace under that name, and surely that's not what > this api is for... > But that's how the interface is used, in particular for the bnx2i driver. Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not a TCP offload. So if the iSCSI interface is configured to acquire the IP address via DHCP, someone has to run the DHCP protocol. But the iSCSI offload can't, and the bnx2i PCI device is not a network device so that the normal network stack can't be used. And so the architects of the bnx2i card decided to use UIO to pass the network traffic to userspace, and used the userspace 'iscsiuio' application to run DHCP in userspace. But's been that way for several years now; so long, in fact, that the card itself has been out of support from Marvell (since quite some years, too, IIRC). And even the successor of that card (the qedi driver) is nearing EOL. Mind you, the qedi driver is using the same interface (by using UIO to run DHCP in userspace), so singling out the bnx2i for bad design can be construed as being unfair :-) I agree, though, that the design is a mess. Cheers, Hannes
On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote: > On 9/30/23 20:28, Greg Kroah-Hartman wrote: > > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote: > > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote: > > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote: > > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap > > > > > for dma_alloc_coherent buffers. > > > > > > > > Why are ethernet drivers messing around with UIO devices? That's not > > > > what UIO is for, unless you are trying to do kernel bypass for these > > > > devices without anyone noticing? > > > > > > > > confused, > > > > > > It's confusing. The bnx2 driver stack included a cnic (converged nic?) > > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol > > > offload drivers (iscsi, fcoe, rdma). > > > > > > The iscsi module (bnx2i) uses a passthrough interface from cnic to > > > handle some network configuration that the device firmware doesn't do. > > > It uses a uio device and a userspace component called iscsiuio to do > > > that. > > > > That's horrible, and not what the UIO api is for at all. Configure the > > device like any other normal kernel device, don't poke at raw memory > > values directly, that way lies madness. > > > > Have a pointer to the userspace tool anywhere? All I found looks like a > > full IP stack in userspace under that name, and surely that's not what > > this api is for... > > > But that's how the interface is used, in particular for the bnx2i driver. > Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not > a TCP offload. So if the iSCSI interface is configured to > acquire the IP address via DHCP, someone has to run the DHCP protocol. > But the iSCSI offload can't, and the bnx2i PCI device is not a network > device so that the normal network stack can't be used. > And so the architects of the bnx2i card decided to use UIO to pass > the network traffic to userspace, and used the userspace 'iscsiuio' > application to run DHCP in userspace. > > But's been that way for several years now; so long, in fact, that > the card itself has been out of support from Marvell (since quite some > years, too, IIRC). And even the successor of that card (the qedi driver) > is nearing EOL. Mind you, the qedi driver is using the same interface (by > using UIO to run DHCP in userspace), so singling out the bnx2i for bad > design can be construed as being unfair :-) Ok, let's say they are all horrible! :) > I agree, though, that the design is a mess. Ok, so why are we papering over it and continuing to allow it to exist? What "broke" to suddenly require this UIO change? If this has been around for a very long time, what has caused this to now require the UIO layer to change? thanks, greg k-h
On Sun, Oct 01, 2023 at 01:57:25PM +0200, Greg Kroah-Hartman wrote: > On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote: > > On 9/30/23 20:28, Greg Kroah-Hartman wrote: > > > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote: > > > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote: > > > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote: > > > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap > > > > > > for dma_alloc_coherent buffers. > > > > > > > > > > Why are ethernet drivers messing around with UIO devices? That's not > > > > > what UIO is for, unless you are trying to do kernel bypass for these > > > > > devices without anyone noticing? > > > > > > > > > > confused, > > > > > > > > It's confusing. The bnx2 driver stack included a cnic (converged nic?) > > > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol > > > > offload drivers (iscsi, fcoe, rdma). > > > > > > > > The iscsi module (bnx2i) uses a passthrough interface from cnic to > > > > handle some network configuration that the device firmware doesn't do. > > > > It uses a uio device and a userspace component called iscsiuio to do > > > > that. > > > > > > That's horrible, and not what the UIO api is for at all. Configure the > > > device like any other normal kernel device, don't poke at raw memory > > > values directly, that way lies madness. > > > > > > Have a pointer to the userspace tool anywhere? All I found looks like a > > > full IP stack in userspace under that name, and surely that's not what > > > this api is for... > > > > > But that's how the interface is used, in particular for the bnx2i driver. > > Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not > > a TCP offload. So if the iSCSI interface is configured to > > acquire the IP address via DHCP, someone has to run the DHCP protocol. > > But the iSCSI offload can't, and the bnx2i PCI device is not a network > > device so that the normal network stack can't be used. > > And so the architects of the bnx2i card decided to use UIO to pass > > the network traffic to userspace, and used the userspace 'iscsiuio' > > application to run DHCP in userspace. > > > > But's been that way for several years now; so long, in fact, that > > the card itself has been out of support from Marvell (since quite some > > years, too, IIRC). And even the successor of that card (the qedi driver) > > is nearing EOL. Mind you, the qedi driver is using the same interface (by > > using UIO to run DHCP in userspace), so singling out the bnx2i for bad > > design can be construed as being unfair :-) > > Ok, let's say they are all horrible! :) > > > I agree, though, that the design is a mess. > > Ok, so why are we papering over it and continuing to allow it to exist? > > What "broke" to suddenly require this UIO change? If this has been > around for a very long time, what has caused this to now require the UIO > layer to change? > > thanks, > > greg k-h Changes last year to the dma-mapping api to no longer allow __GFP_COMP, in particular these two (from the e529d3507a93 dma-mapping pull for 6.2): ffcb75458460 dma-mapping: reject __GFP_COMP in dma_alloc_attrs | 2022-11-21 | (Christoph Hellwig) bb73955c0b1d cnic: don't pass bogus GFP_ flags to dma_alloc_coherent | 2022-11-21 | (Christoph Hellwig) Regards, Jerry
On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote: > Changes last year to the dma-mapping api to no longer allow __GFP_COMP, > in particular these two (from the e529d3507a93 dma-mapping pull for > 6.2): That's complete BS. The driver was broken since day 1 and always ignored the DMA API requirement to never try to grab the page from the dma coherent allocation because you generally speaking can't. It just happened to accidentally work the trivial dma coherent allocator that is used on x86.
On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote: > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote: > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP, > > in particular these two (from the e529d3507a93 dma-mapping pull for > > 6.2): > > That's complete BS. The driver was broken since day 1 and always > ignored the DMA API requirement to never try to grab the page from the > dma coherent allocation because you generally speaking can't. It just > happened to accidentally work the trivial dma coherent allocator that > is used on x86. > re-sending since gmail decided to not send plain text: Yes, I agree that it has been broken and misusing the API. Greg's question was what changed though, and it was the clean up of __GFP_COMP in dma-mapping that brought the problem in the driver to light. I already said the other day that cnic has been doing this for 14 years. I'm not blaming you or your __GFP_COMP cleanup commits, they just uncovered that cnic was doing something wrong. My apologies if you took it that way. Regards, Jerry
On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote: > On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote: > > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote: > > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP, > > > in particular these two (from the e529d3507a93 dma-mapping pull for > > > 6.2): > > > > That's complete BS. The driver was broken since day 1 and always > > ignored the DMA API requirement to never try to grab the page from the > > dma coherent allocation because you generally speaking can't. It just > > happened to accidentally work the trivial dma coherent allocator that > > is used on x86. > > > > re-sending since gmail decided to not send plain text: > > Yes, I agree that it has been broken and misusing the API. Greg's > question was what changed though, and it was the clean up of > __GFP_COMP in dma-mapping that brought the problem in the driver to > light. > > I already said the other day that cnic has been doing this for 14 > years. I'm not blaming you or your __GFP_COMP cleanup commits, they > just uncovered that cnic was doing something wrong. My apologies if > you took it that way. As these devices aren't being made anymore, and this api is really not a good idea in the first place, why don't we just leave it broken and see if anyone notices? thanks, greg k-h
On 10/2/23 10:46, Greg Kroah-Hartman wrote: > On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote: >> On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote: >>> On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote: >>>> Changes last year to the dma-mapping api to no longer allow __GFP_COMP, >>>> in particular these two (from the e529d3507a93 dma-mapping pull for >>>> 6.2): >>> >>> That's complete BS. The driver was broken since day 1 and always >>> ignored the DMA API requirement to never try to grab the page from the >>> dma coherent allocation because you generally speaking can't. It just >>> happened to accidentally work the trivial dma coherent allocator that >>> is used on x86. >>> >> >> re-sending since gmail decided to not send plain text: >> >> Yes, I agree that it has been broken and misusing the API. Greg's >> question was what changed though, and it was the clean up of >> __GFP_COMP in dma-mapping that brought the problem in the driver to >> light. >> >> I already said the other day that cnic has been doing this for 14 >> years. I'm not blaming you or your __GFP_COMP cleanup commits, they >> just uncovered that cnic was doing something wrong. My apologies if >> you took it that way. > > As these devices aren't being made anymore, and this api is really not a > good idea in the first place, why don't we just leave it broken and see > if anyone notices? > Guess what triggered this mail thread. Some customers did notice. Problem is that these devices were built as the network interface in some bladecenter machines, so you can't just replace them with a different Ethernet card. Cheers, Hannes
On Mon, 2023-10-02 at 10:59 +0200, Hannes Reinecke wrote: > On 10/2/23 10:46, Greg Kroah-Hartman wrote: > > On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote: > > > On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote: > > > > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote: > > > > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP, > > > > > in particular these two (from the e529d3507a93 dma-mapping pull for > > > > > 6.2): > > > > > > > > That's complete BS. The driver was broken since day 1 and always > > > > ignored the DMA API requirement to never try to grab the page from the > > > > dma coherent allocation because you generally speaking can't. It just > > > > happened to accidentally work the trivial dma coherent allocator that > > > > is used on x86. > > > > > > > > > > re-sending since gmail decided to not send plain text: > > > > > > Yes, I agree that it has been broken and misusing the API. Greg's > > > question was what changed though, and it was the clean up of > > > __GFP_COMP in dma-mapping that brought the problem in the driver to > > > light. > > > > > > I already said the other day that cnic has been doing this for 14 > > > years. I'm not blaming you or your __GFP_COMP cleanup commits, they > > > just uncovered that cnic was doing something wrong. My apologies if > > > you took it that way. > > > > As these devices aren't being made anymore, and this api is really not a > > good idea in the first place, why don't we just leave it broken and see > > if anyone notices? > > > Guess what triggered this mail thread. > Some customers did notice. > > Problem is that these devices were built as the network interface in > some bladecenter machines, so you can't just replace them with a > different Ethernet card. This route looks a no-go. Out of sheer ignorance, would the iommu hack hinted in the cover letter require similar controversial changes? Cheers, Paolo
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 84a04eec654a..490f88ad3bd2 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -367,6 +367,7 @@ static void bnx2_setup_cnic_irq_info(struct bnx2 *bp) cp->irq_arr[0].status_blk = (void *) ((unsigned long) bnapi->status_blk.msi + (BNX2_SBLK_MSIX_ALIGN_SIZE * sb_id)); + cp->irq_arr[0].status_blk_map = bp->status_blk_mapping; cp->irq_arr[0].status_blk_num = sb_id; cp->num_irq = 1; } diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 2fcde42a05c1..dbaa90b7f889 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -14912,9 +14912,11 @@ void bnx2x_setup_cnic_irq_info(struct bnx2x *bp) else cp->irq_arr[0].status_blk = (void *)bp->cnic_sb.e1x_sb; + cp->irq_arr[0].status_blk_map = bp->cnic_sb_mapping; cp->irq_arr[0].status_blk_num = bnx2x_cnic_fw_sb_id(bp); cp->irq_arr[0].status_blk_num2 = bnx2x_cnic_igu_sb_id(bp); cp->irq_arr[1].status_blk = bp->def_status_blk; + cp->irq_arr[1].status_blk_map = bp->def_status_blk_mapping; cp->irq_arr[1].status_blk_num = DEF_SB_ID; cp->irq_arr[1].status_blk_num2 = DEF_SB_IGU_ID; diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c index 67ec397bd171..05e28e00e916 100644 --- a/drivers/net/ethernet/broadcom/cnic.c +++ b/drivers/net/ethernet/broadcom/cnic.c @@ -1106,8 +1106,8 @@ static int cnic_init_uio(struct cnic_dev *dev) if (test_bit(CNIC_F_BNX2_CLASS, &dev->flags)) { uinfo->mem[0].size = MB_GET_CID_ADDR(TX_TSS_CID + TX_MAX_TSS_RINGS + 1); - uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen & - CNIC_PAGE_MASK; + uinfo->mem[1].dma_addr = cp->status_blk_map; + uinfo->mem[1].virtual_addr = cp->status_blk.gen; if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX) uinfo->mem[1].size = PAGE_ALIGN(BNX2_SBLK_MSIX_ALIGN_SIZE * 9); else @@ -1117,22 +1117,27 @@ static int cnic_init_uio(struct cnic_dev *dev) } else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) { uinfo->mem[0].size = pci_resource_len(dev->pcidev, 0); - uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk & - CNIC_PAGE_MASK; + uinfo->mem[1].dma_addr = cp->status_blk_map; + uinfo->mem[1].virtual_addr = cp->bnx2x_def_status_blk; uinfo->mem[1].size = PAGE_ALIGN(sizeof(*cp->bnx2x_def_status_blk)); uinfo->name = "bnx2x_cnic"; } - uinfo->mem[1].memtype = UIO_MEM_LOGICAL; + uinfo->mem[1].dma_device = &dev->pcidev->dev; + uinfo->mem[1].memtype = UIO_MEM_DMA_COHERENT; - uinfo->mem[2].addr = (unsigned long) udev->l2_ring; + uinfo->mem[2].dma_addr = udev->l2_ring_map; + uinfo->mem[2].virtual_addr = udev->l2_ring; uinfo->mem[2].size = udev->l2_ring_size; - uinfo->mem[2].memtype = UIO_MEM_LOGICAL; + uinfo->mem[2].dma_device = &dev->pcidev->dev; + uinfo->mem[2].memtype = UIO_MEM_DMA_COHERENT; - uinfo->mem[3].addr = (unsigned long) udev->l2_buf; + uinfo->mem[3].dma_addr = udev->l2_buf_map; + uinfo->mem[3].virtual_addr = udev->l2_buf; uinfo->mem[3].size = udev->l2_buf_size; - uinfo->mem[3].memtype = UIO_MEM_LOGICAL; + uinfo->mem[3].dma_device = &dev->pcidev->dev; + uinfo->mem[3].memtype = UIO_MEM_DMA_COHERENT; uinfo->version = CNIC_MODULE_VERSION; uinfo->irq = UIO_IRQ_CUSTOM; @@ -1314,6 +1319,7 @@ static int cnic_alloc_bnx2x_resc(struct cnic_dev *dev) return 0; cp->bnx2x_def_status_blk = cp->ethdev->irq_arr[1].status_blk; + cp->status_blk_map = cp->ethdev->irq_arr[1].status_blk_map; cp->l2_rx_ring_size = 15; @@ -5324,6 +5330,7 @@ static int cnic_start_hw(struct cnic_dev *dev) pci_dev_get(dev->pcidev); cp->func = PCI_FUNC(dev->pcidev->devfn); cp->status_blk.gen = ethdev->irq_arr[0].status_blk; + cp->status_blk_map = ethdev->irq_arr[0].status_blk_map; cp->status_blk_num = ethdev->irq_arr[0].status_blk_num; err = cp->alloc_resc(dev); diff --git a/drivers/net/ethernet/broadcom/cnic.h b/drivers/net/ethernet/broadcom/cnic.h index 4baea81bae7a..fedc84ada937 100644 --- a/drivers/net/ethernet/broadcom/cnic.h +++ b/drivers/net/ethernet/broadcom/cnic.h @@ -260,6 +260,7 @@ struct cnic_local { #define SM_RX_ID 0 #define SM_TX_ID 1 } status_blk; + dma_addr_t status_blk_map; struct host_sp_status_block *bnx2x_def_status_blk; diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h index 789e5c7e9311..49a11ec80b36 100644 --- a/drivers/net/ethernet/broadcom/cnic_if.h +++ b/drivers/net/ethernet/broadcom/cnic_if.h @@ -190,6 +190,7 @@ struct cnic_ops { struct cnic_irq { unsigned int vector; void *status_blk; + dma_addr_t status_blk_map; u32 status_blk_num; u32 status_blk_num2; u32 irq_flags;
Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap for dma_alloc_coherent buffers. The cnic l2_ring and l2_buf mmaps have caused page refcount issues since the misuse of the __GFP_COMP flag was removed from their dma_alloc_coherent calls. Fix that by having the uio device use dma_mmap_coherent. The bnx2 and bnx2x status block allocations are also dma_alloc_coherent, and should use dma_mmap_coherent. They didn't allocate multiple pages, but also didn't seem to work correctly with an iommu enabled. Fixes: bb73955c0b1d ("cnic: don't pass bogus GFP_ flags to dma_alloc_coherent") Signed-off-by: Chris Leech <cleech@redhat.com> --- drivers/net/ethernet/broadcom/bnx2.c | 1 + .../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 ++ drivers/net/ethernet/broadcom/cnic.c | 25 ++++++++++++------- drivers/net/ethernet/broadcom/cnic.h | 1 + drivers/net/ethernet/broadcom/cnic_if.h | 1 + 5 files changed, 21 insertions(+), 9 deletions(-)