Message ID | 0-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Move vfio_ccw to the new mdev API | expand |
On Thu, 2021-09-09 at 16:38 -0300, Jason Gunthorpe wrote: > This addresses Cornelia's remark on the earlier patch that ccw has a > confusing lifecycle. While it doesn't seem like the original attempt > was > functionally wrong, the result can be made better with a lot of > further > work. I thought I'd take a stab at seeing how this works with the hardware before looking at the code much. git couldn't apply patches 1, 6, or 9 to 5.15-rc1, but I was able to hand-fit them into place. Shutting down the guest and de-configuring a device ends up bringing my whole system down. I haven't looked at this any further; hopefully something jumps to mind for you: [ 64.585347] vfio_ccw 0.0.08fe: MDEV: Unregistering [ 64.585357] illegal operation: 0001 ilc:1 [#1] SMP [ 64.585362] Modules linked in: vhost_vsock vmw_vsock_virtio_transport_common vsock vhost [ 64.585364] vfio_ccw_mdev b50bbd4b-eab8-4f8c-9f0c-3cf636f936b9: Relaying device request to user (#0) [ 64.585364] vhost_iotlb lcs ctcm fsm kvm xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc dm_multipath dm_mod s390_trng eadm_sch zcrypt_cex4 qeth_l2 vfio_ccw mdev vfio_iommu_type1 vfio configfs zram zsmalloc ip_tables x_tables mlx5_core ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4 [ 64.585392] CPU: 14 PID: 4487 Comm: qemu-system-s39 Kdump: loaded Not tainted 5.15.0-rc1 #1 [ 64.585395] Hardware name: IBM 3906 M05 780 (LPAR) [ 64.585396] Krnl PSW : 0704c00180000000 0000000000000002 (0x2) [ 64.585404] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 64.585407] Krnl GPRS: 0000000000000001 0000000000000000 00000000005f4800 0000000000000004 [ 64.585410] 0000000000000000 0000000000000002 0000000000000000 000002aa3e65085e [ 64.585412] 000000008de09100 0000000000003b6f 000003ff8017fa08 00000000005f4800 [ 64.585413] 0000000081450000 000003ff7c032310 000003ff80179db0 000003800bf53da0 [ 64.585418] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [ 64.585462] Call Trace: [ 64.585464] [<0000000000000002>] 0x2 [ 64.585467] ([<000003ff80179d74>] vfio_ccw_mdev_ioctl+0x84/0x318 [vfio_ccw]) [ 64.585476] [<00000000bb7adda6>] __s390x_sys_ioctl+0xbe/0x100 [ 64.585481] [<00000000bbfbf5e4>] __do_syscall+0x1bc/0x1e8 [ 64.585488] [<00000000bbfcc8d8>] system_call+0x78/0xa0 Eric > > Reorganize the driver so that the mdev owns the private memory and > controls the lifecycle, not the css_driver. The memory associated > with the > css_driver lifecycle is only the mdev_parent/mdev_type registration. > > Along the way we change when the sch is quiescent or not to be linked > to > the open/close_device lifetime of the vfio_device, which is sort of > what > it was tring to do already, just not completely. > > The troublesome racey lifecycle of the css_driver callbacks is made > clear > with simple vfio_device refcounting so a callback is only delivered > into a > registered vfio_device and has obvious correctness. > > Move the only per-css_driver state, the "available instance" counter, > into > the core code and share that logic with many of the other drivers. > The > value is kept in the mdev_type memory. > > v2: > - Clean up the lifecycle in ccw with 7 new patches > - Rebase > v1: > https://lore.kernel.org/all/7-v2-7667f42c9bad+935-vfio3_jgg@nvidia.com > > Jason Gunthorpe (9): > vfio/ccw: Use functions for alloc/free of the vfio_ccw_private > vfio/ccw: Pass vfio_ccw_private not mdev_device to various > functions > vfio/ccw: Convert to use vfio_register_group_dev() > vfio/ccw: Make the FSM complete and synchronize it to the mdev > vfio/mdev: Consolidate all the device_api sysfs into the core code > vfio/mdev: Add mdev available instance checking to the core > vfio/ccw: Remove private->mdev > vfio: Export vfio_device_try_get() > vfio/ccw: Move the lifecycle of the struct vfio_ccw_private to the > mdev > > drivers/gpu/drm/i915/gvt/kvmgt.c | 9 +- > drivers/s390/cio/vfio_ccw_drv.c | 282 +++++++++++------------- > -- > drivers/s390/cio/vfio_ccw_fsm.c | 152 ++++++++++---- > drivers/s390/cio/vfio_ccw_ops.c | 240 ++++++++++------------ > drivers/s390/cio/vfio_ccw_private.h | 42 +++- > drivers/s390/crypto/vfio_ap_ops.c | 41 +--- > drivers/s390/crypto/vfio_ap_private.h | 2 - > drivers/vfio/mdev/mdev_core.c | 13 +- > drivers/vfio/mdev/mdev_private.h | 2 + > drivers/vfio/mdev/mdev_sysfs.c | 64 +++++- > drivers/vfio/vfio.c | 3 +- > include/linux/mdev.h | 13 +- > include/linux/vfio.h | 1 + > samples/vfio-mdev/mbochs.c | 9 +- > samples/vfio-mdev/mdpy.c | 31 +-- > samples/vfio-mdev/mtty.c | 10 +- > 16 files changed, 470 insertions(+), 444 deletions(-) >
On Mon, Sep 13, 2021 at 01:40:34PM -0400, Eric Farman wrote: > On Thu, 2021-09-09 at 16:38 -0300, Jason Gunthorpe wrote: > > This addresses Cornelia's remark on the earlier patch that ccw has a > > confusing lifecycle. While it doesn't seem like the original attempt > > was > > functionally wrong, the result can be made better with a lot of > > further > > work. > > I thought I'd take a stab at seeing how this works with the hardware > before looking at the code much. git couldn't apply patches 1, 6, or 9 > to 5.15-rc1, but I was able to hand-fit them into place. Oh? Thats odd, I had no remarks from git when rebasing onto v5.15-rc1.. Maybe this is a situation where you need "b4 am --prep-3way" ... > [ 64.585462] Call Trace: > [ 64.585464] [<0000000000000002>] 0x2 > [ 64.585467] ([<000003ff80179d74>] vfio_ccw_mdev_ioctl+0x84/0x318 > [vfio_ccw]) > [ 64.585476] [<00000000bb7adda6>] __s390x_sys_ioctl+0xbe/0x100 > [ 64.585481] [<00000000bbfbf5e4>] __do_syscall+0x1bc/0x1e8 > [ 64.585488] [<00000000bbfcc8d8>] system_call+0x78/0xa0 I think it is this: diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c index df1490943b20ec..5ea392959c0711 100644 --- a/drivers/s390/cio/vfio_ccw_fsm.c +++ b/drivers/s390/cio/vfio_ccw_fsm.c @@ -441,6 +441,7 @@ fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = { [VFIO_CCW_EVENT_IO_REQ] = fsm_io_error, [VFIO_CCW_EVENT_ASYNC_REQ] = fsm_async_error, [VFIO_CCW_EVENT_INTERRUPT] = fsm_disabled_irq, + [VFIO_CCW_EVENT_OPEN] = fsm_nop, [VFIO_CCW_EVENT_CLOSE] = fsm_nop, }, [VFIO_CCW_STATE_CLOSED] = { I rebased it and fixed it up here: https://github.com/jgunthorpe/linux/tree/vfio_ccw Can you try again? Thanks, Jason
On Mon, 2021-09-13 at 16:24 -0300, Jason Gunthorpe wrote: > On Mon, Sep 13, 2021 at 01:40:34PM -0400, Eric Farman wrote: > > On Thu, 2021-09-09 at 16:38 -0300, Jason Gunthorpe wrote: > > > This addresses Cornelia's remark on the earlier patch that ccw > > > has a > > > confusing lifecycle. While it doesn't seem like the original > > > attempt > > > was > > > functionally wrong, the result can be made better with a lot of > > > further > > > work. > > > > I thought I'd take a stab at seeing how this works with the > > hardware > > before looking at the code much. git couldn't apply patches 1, 6, > > or 9 > > to 5.15-rc1, but I was able to hand-fit them into place. > > Oh? Thats odd, I had no remarks from git when rebasing onto > v5.15-rc1.. > > Maybe this is a situation where you need "b4 am --prep-3way" ... Ah, that does indeed help, thanks. Still missing the vfio-ap patch that's elsewhere on the list, but I'm not caring about that at the moment. > > > [ 64.585462] Call Trace: > > [ 64.585464] [<0000000000000002>] 0x2 > > [ 64.585467] ([<000003ff80179d74>] vfio_ccw_mdev_ioctl+0x84/0x318 > > [vfio_ccw]) > > [ 64.585476] [<00000000bb7adda6>] __s390x_sys_ioctl+0xbe/0x100 > > [ 64.585481] [<00000000bbfbf5e4>] __do_syscall+0x1bc/0x1e8 > > [ 64.585488] [<00000000bbfcc8d8>] system_call+0x78/0xa0 > > I think it is this: > > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c > b/drivers/s390/cio/vfio_ccw_fsm.c > index df1490943b20ec..5ea392959c0711 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -441,6 +441,7 @@ fsm_func_t > *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = { > [VFIO_CCW_EVENT_IO_REQ] = fsm_io_error, > [VFIO_CCW_EVENT_ASYNC_REQ] = fsm_async_error, > [VFIO_CCW_EVENT_INTERRUPT] = fsm_disabled_irq, > + [VFIO_CCW_EVENT_OPEN] = fsm_nop, > [VFIO_CCW_EVENT_CLOSE] = fsm_nop, > }, > [VFIO_CCW_STATE_CLOSED] = { > > I rebased it and fixed it up here: > > https://github.com/jgunthorpe/linux/tree/vfio_ccw > > Can you try again? That does address the crash, but then why is it processing a BROKEN event? Seems problematic. All the configuration works fine, but the devices get ripped away once a guest is started that wants to open/use them. So, there's more problems to figure out. Eric > > Thanks, > Jason
On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote: > > I rebased it and fixed it up here: > > > > https://github.com/jgunthorpe/linux/tree/vfio_ccw > > > > Can you try again? > > That does address the crash, but then why is it processing a BROKEN > event? Seems problematic. The stuff related to the NOT_OPER looked really wonky to me. I'm guessing this is the issue - not sure about the pmcw.ena either.. diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c index 5ea392959c0711..0d4d4f425befac 100644 --- a/drivers/s390/cio/vfio_ccw_fsm.c +++ b/drivers/s390/cio/vfio_ccw_fsm.c @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private, spin_unlock_irq(sch->lock); } -static void fsm_close(struct vfio_ccw_private *private, - enum vfio_ccw_event event) +static int flush_sch(struct vfio_ccw_private *private) { struct subchannel *sch = private->sch; DECLARE_COMPLETION_ONSTACK(completion); int iretry, ret = 0; - spin_lock_irq(sch->lock); - if (!sch->schib.pmcw.ena) - goto err_unlock; - ret = cio_disable_subchannel(sch); - if (ret != -EBUSY) - goto err_unlock; - iretry = 255; do { - ret = cio_cancel_halt_clear(sch, &iretry); - if (ret == -EIO) { pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", sch->schid.ssid, sch->schid.sch_no); - break; + return ret; } /* @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, spin_unlock_irq(sch->lock); if (ret == -EBUSY) - wait_for_completion_timeout(&completion, 3*HZ); + wait_for_completion_timeout(&completion, 3 * HZ); private->completion = NULL; flush_workqueue(vfio_ccw_work_q); spin_lock_irq(sch->lock); ret = cio_disable_subchannel(sch); } while (ret == -EBUSY); + return ret; +} + +static void fsm_close(struct vfio_ccw_private *private, + enum vfio_ccw_event event) +{ + struct subchannel *sch = private->sch; + int ret; + + spin_lock_irq(sch->lock); + if (!sch->schib.pmcw.ena) + goto err_unlock; + ret = cio_disable_subchannel(sch); + if (ret == -EBUSY) + ret = flush_sch(private); if (ret) goto err_unlock; private->state = VFIO_CCW_STATE_CLOSED;
On Tue, Sep 14 2021, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote: >> > I rebased it and fixed it up here: >> > >> > https://github.com/jgunthorpe/linux/tree/vfio_ccw >> > >> > Can you try again? >> >> That does address the crash, but then why is it processing a BROKEN >> event? Seems problematic. > > The stuff related to the NOT_OPER looked really wonky to me. I'm > guessing this is the issue - not sure about the pmcw.ena either.. [I have still not been able to digest the whole series, sorry.] > > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c > index 5ea392959c0711..0d4d4f425befac 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > } > > -static void fsm_close(struct vfio_ccw_private *private, > - enum vfio_ccw_event event) > +static int flush_sch(struct vfio_ccw_private *private) > { > struct subchannel *sch = private->sch; > DECLARE_COMPLETION_ONSTACK(completion); > int iretry, ret = 0; > > - spin_lock_irq(sch->lock); > - if (!sch->schib.pmcw.ena) > - goto err_unlock; > - ret = cio_disable_subchannel(sch); > - if (ret != -EBUSY) > - goto err_unlock; > - > iretry = 255; > do { > - > ret = cio_cancel_halt_clear(sch, &iretry); > - > if (ret == -EIO) { > pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", > sch->schid.ssid, sch->schid.sch_no); > - break; > + return ret; Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV we should be done as well, as then the device is dead and we do not need to disable it. > } > > /* > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > > if (ret == -EBUSY) > - wait_for_completion_timeout(&completion, 3*HZ); > + wait_for_completion_timeout(&completion, 3 * HZ); > > private->completion = NULL; > flush_workqueue(vfio_ccw_work_q); > spin_lock_irq(sch->lock); > ret = cio_disable_subchannel(sch); > } while (ret == -EBUSY); > + return ret; > +} > + > +static void fsm_close(struct vfio_ccw_private *private, > + enum vfio_ccw_event event) > +{ > + struct subchannel *sch = private->sch; > + int ret; > + > + spin_lock_irq(sch->lock); > + if (!sch->schib.pmcw.ena) > + goto err_unlock; > + ret = cio_disable_subchannel(sch); cio_disable_subchannel() should be happy to disable an already disabled subchannel, so I guess we can just walk through this and end up in CLOSED state... unless entering with !ena actually indicates that we messed up somewhere else in the state machine. I still need to find time to read the patches. > + if (ret == -EBUSY) > + ret = flush_sch(private); > if (ret) > goto err_unlock; > private->state = VFIO_CCW_STATE_CLOSED;
On Fri, Sep 17, 2021 at 01:59:16PM +0200, Cornelia Huck wrote: > > ret = cio_cancel_halt_clear(sch, &iretry); > > - > > if (ret == -EIO) { > > pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", > > sch->schid.ssid, sch->schid.sch_no); > > - break; > > + return ret; > > Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV > we should be done as well, as then the device is dead and we do not need > to disable it. cio_cancel_halt_clear() should probably succeed in that case. > > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, > > spin_unlock_irq(sch->lock); > > > > if (ret == -EBUSY) > > - wait_for_completion_timeout(&completion, 3*HZ); > > + wait_for_completion_timeout(&completion, 3 * HZ); > > > > private->completion = NULL; > > flush_workqueue(vfio_ccw_work_q); > > spin_lock_irq(sch->lock); > > ret = cio_disable_subchannel(sch); > > } while (ret == -EBUSY); > > + return ret; > > +} > > + > > +static void fsm_close(struct vfio_ccw_private *private, > > + enum vfio_ccw_event event) > > +{ > > + struct subchannel *sch = private->sch; > > + int ret; > > + > > + spin_lock_irq(sch->lock); > > + if (!sch->schib.pmcw.ena) > > + goto err_unlock; > > + ret = cio_disable_subchannel(sch); > > cio_disable_subchannel() should be happy to disable an already disabled > subchannel, so I guess we can just walk through this and end up in > CLOSED state... unless entering with !ena actually indicates that we > messed up somewhere else in the state machine. I still need to find time > to read the patches. I don't know, I looked at that ena stuff for a bit and couldn't guess what it is trying to do. Arguably the channel should not be ripped away from vfio while the FSM is in the open states, so I'm not sure what a lot of this is for. Jason
On Fri, Sep 17 2021, Jason Gunthorpe <jgg@nvidia.com> wrote: > On Fri, Sep 17, 2021 at 01:59:16PM +0200, Cornelia Huck wrote: >> > ret = cio_cancel_halt_clear(sch, &iretry); >> > - >> > if (ret == -EIO) { >> > pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", >> > sch->schid.ssid, sch->schid.sch_no); >> > - break; >> > + return ret; >> >> Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV >> we should be done as well, as then the device is dead and we do not need >> to disable it. > > cio_cancel_halt_clear() should probably succeed in that case. It will actually give us -ENODEV, as the very first call in that function will already fail. > >> > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, >> > spin_unlock_irq(sch->lock); >> > >> > if (ret == -EBUSY) >> > - wait_for_completion_timeout(&completion, 3*HZ); >> > + wait_for_completion_timeout(&completion, 3 * HZ); >> > >> > private->completion = NULL; >> > flush_workqueue(vfio_ccw_work_q); >> > spin_lock_irq(sch->lock); >> > ret = cio_disable_subchannel(sch); >> > } while (ret == -EBUSY); >> > + return ret; >> > +} >> > + >> > +static void fsm_close(struct vfio_ccw_private *private, >> > + enum vfio_ccw_event event) >> > +{ >> > + struct subchannel *sch = private->sch; >> > + int ret; >> > + >> > + spin_lock_irq(sch->lock); >> > + if (!sch->schib.pmcw.ena) >> > + goto err_unlock; >> > + ret = cio_disable_subchannel(sch); >> >> cio_disable_subchannel() should be happy to disable an already disabled >> subchannel, so I guess we can just walk through this and end up in >> CLOSED state... unless entering with !ena actually indicates that we >> messed up somewhere else in the state machine. I still need to find time >> to read the patches. > > I don't know, I looked at that ena stuff for a bit and couldn't guess > what it is trying to do. It is one of the bits in the pmcw control block that can be modified; if it is 1, the subchannel is enabled and can be used for I/O, if it is 0, the subchannel is disabled and all instructions that initiate or stop I/O will fail. Basically, you enable the subchannel if you actually want to access the device associated with it. Online/offline for (normal usage) ccw devices maps (among other things) to associated subchannel enabled/disabled; for a subchannel that is supposed to be passed via vfio-ccw, we want to have it enabled so that it is actually usable. I think the ena checking had been inspired from what the ccw bus does. We could probably just forge ahead in any case and the called functions in the css bus would be able to handle it just fine, but I have not double checked. > Arguably the channel should not be ripped away from vfio while the FSM > is in the open states, so I'm not sure what a lot of this is for. We could have surprise removal (i.e. a subchannel in active use being ripped out), as that's what happens on real hardware as well. E.g. doing a device_del in QEMU.