Message ID | 20181122165432.4437-1-cohuck@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | vfio-ccw: support hsch/csch (kernel part) | expand |
On Thu, 22 Nov 2018 17:54:29 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > [This is the Linux kernel part, git tree is available at > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git > vfio-ccw-caps > > The companion QEMU patches are available at > https://github.com/cohuck/qemu vfio-ccw-caps] > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > device. This tends to work well for the most common 'good path' > scenarios; however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, > things like clearing pending requests at the device is currently not > supported. This may be a problem for e.g. error recovery. > > This patch series introduces capabilities (similar to what vfio-pci > uses) and exposes a new async region for handling hsch/csch. I'm on vacation and could not do more than skim over it real quick. FWIW it looks very promising. I intend to give it an in depth review once I'm back (i.e. second half of next week). Regards, Halil
On Sat, 24 Nov 2018 22:07:57 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Thu, 22 Nov 2018 17:54:29 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > [This is the Linux kernel part, git tree is available at > > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git > > vfio-ccw-caps > > > > The companion QEMU patches are available at > > https://github.com/cohuck/qemu vfio-ccw-caps] > > > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > > device. This tends to work well for the most common 'good path' > > scenarios; however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, > > things like clearing pending requests at the device is currently not > > supported. This may be a problem for e.g. error recovery. > > > > This patch series introduces capabilities (similar to what vfio-pci > > uses) and exposes a new async region for handling hsch/csch. > > I'm on vacation and could not do more than skim over it real quick. FWIW > it looks very promising. I intend to give it an in depth review once I'm > back (i.e. second half of next week). Thanks! Please do not feel pressured, it did take me long enough to get this out :)
On 11/22/2018 11:54 AM, Cornelia Huck wrote: > [This is the Linux kernel part, git tree is available at > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-caps > > The companion QEMU patches are available at > https://github.com/cohuck/qemu vfio-ccw-caps] > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > device. This tends to work well for the most common 'good path' scenarios; > however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like > clearing pending requests at the device is currently not supported. > This may be a problem for e.g. error recovery. > > This patch series introduces capabilities (similar to what vfio-pci uses) > and exposes a new async region for handling hsch/csch. > > Very lightly tested (I can interact with a dasd as before; I have not > found a reliable way to trigger hsch/csch in the Linux dasd guest driver.) > I was able to trigger the guest DASD driver to issue a csch instruction, and from my brief testing it seems to be working just like it would on the LPAR. (I basically tested with 2 threads trying to issue DASD device reserve and release ioctls, on the same DASD device, in a busy loop). I am going to spend some time doing a deeper review. Thanks Farhan > Cornelia Huck (3): > vfio-ccw: add capabilities chain > s390/cio: export hsch to modules > vfio-ccw: add handling for asnyc channel instructions > > drivers/s390/cio/Makefile | 3 +- > drivers/s390/cio/ioasm.c | 1 + > drivers/s390/cio/vfio_ccw_async.c | 88 +++++++++++++ > drivers/s390/cio/vfio_ccw_drv.c | 48 +++++-- > drivers/s390/cio/vfio_ccw_fsm.c | 158 +++++++++++++++++++++- > drivers/s390/cio/vfio_ccw_ops.c | 195 ++++++++++++++++++++++++---- > drivers/s390/cio/vfio_ccw_private.h | 44 +++++++ > include/uapi/linux/vfio.h | 5 + > include/uapi/linux/vfio_ccw.h | 12 ++ > 9 files changed, 509 insertions(+), 45 deletions(-) > create mode 100644 drivers/s390/cio/vfio_ccw_async.c >
On Mon, 26 Nov 2018 13:57:06 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > On 11/22/2018 11:54 AM, Cornelia Huck wrote: > > [This is the Linux kernel part, git tree is available at > > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-caps > > > > The companion QEMU patches are available at > > https://github.com/cohuck/qemu vfio-ccw-caps] > > > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > > device. This tends to work well for the most common 'good path' scenarios; > > however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like > > clearing pending requests at the device is currently not supported. > > This may be a problem for e.g. error recovery. > > > > This patch series introduces capabilities (similar to what vfio-pci uses) > > and exposes a new async region for handling hsch/csch. > > > > Very lightly tested (I can interact with a dasd as before; I have not > > found a reliable way to trigger hsch/csch in the Linux dasd guest driver.) > > > > I was able to trigger the guest DASD driver to issue a csch instruction, > and from my brief testing it seems to be working just like it would on > the LPAR. (I basically tested with 2 threads trying to issue DASD device > reserve and release ioctls, on the same DASD device, in a busy loop). > > I am going to spend some time doing a deeper review. Cool, thanks a lot!
On Thu, 22 Nov 2018 17:54:29 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > [This is the Linux kernel part, git tree is available at > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-caps > > The companion QEMU patches are available at > https://github.com/cohuck/qemu vfio-ccw-caps] > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > device. This tends to work well for the most common 'good path' scenarios; > however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like > clearing pending requests at the device is currently not supported. > This may be a problem for e.g. error recovery. I'm wondering: what about MODIFY SUBCHANNEL? Do we plan to add MSCH as well or is it supposed to remain 'userspace emulated'? AFAIR MSCH may have an effect on error recovery as well. BTW I would like to have the concurrency discussion sorted out before I proceed with my review, because reviewing the stuff without a fair idea of what exactly are we trying to achieve would yield poor results. Regards, Halil
On Tue, 4 Dec 2018 13:38:10 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Thu, 22 Nov 2018 17:54:29 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > [This is the Linux kernel part, git tree is available at > > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-caps > > > > The companion QEMU patches are available at > > https://github.com/cohuck/qemu vfio-ccw-caps] > > > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > > device. This tends to work well for the most common 'good path' scenarios; > > however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like > > clearing pending requests at the device is currently not supported. > > This may be a problem for e.g. error recovery. > > I'm wondering: what about MODIFY SUBCHANNEL? Do we plan to add MSCH > as well or is it supposed to remain 'userspace emulated'? AFAIR MSCH > may have an effect on error recovery as well. I think that would require a deeper change, as we have the requirement to enable the subchannel before handing it to userspace. IOW, the guest does not cause the subchannel to be enabled/disabled, but the host does. Parameters (like for channel measurements) are a different game. It is something we should look into, but it will need a different region. > BTW I would like to have the concurrency discussion sorted out before > I proceed with my review, because reviewing the stuff without a fair idea > of what exactly are we trying to achieve would yield poor results. I'm not sure what is unclear about what we're trying to achieve (enable the guest to issue halt/clear on real hardware)? But yes, we need to sort out that concurrency thing; I'm currently unsure if the core should do some things as well or if it's more of a vendor-driver thing.
On Tue, 4 Dec 2018 14:11:30 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Tue, 4 Dec 2018 13:38:10 +0100 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > On Thu, 22 Nov 2018 17:54:29 +0100 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > [This is the Linux kernel part, git tree is available at > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-caps > > > > > > The companion QEMU patches are available at > > > https://github.com/cohuck/qemu vfio-ccw-caps] > > > > > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > > > device. This tends to work well for the most common 'good path' scenarios; > > > however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like > > > clearing pending requests at the device is currently not supported. > > > This may be a problem for e.g. error recovery. > > > > I'm wondering: what about MODIFY SUBCHANNEL? Do we plan to add MSCH > > as well or is it supposed to remain 'userspace emulated'? AFAIR MSCH > > may have an effect on error recovery as well. > > I think that would require a deeper change, as we have the requirement > to enable the subchannel before handing it to userspace. IOW, the guest > does not cause the subchannel to be enabled/disabled, but the host does. > My point is, when the subchannel is disabled, 'firmware' is responsible for suppressing interrupts and error conditions, and also for doing the appropriate recovery procedure, so to say under the hood. I think Jason has discovered some problems related to this while doing his DASD IPL with vfio-ccw work, but I don't quite remember any more. IMHO it may be possible to emulate enable/disable, but it seems way more error prone and complicated, than letting the guest enable/disable the host subchannel. I have no idea what was the reason for going with the initial design. I would appreciate any hints or explanations, but I'm well aware that it was a long time ago. > Parameters (like for channel measurements) are a different game. It is > something we should look into, but it will need a different region. Yes emulation only channel measurements seem even less likely than proper enable/disable. And 'that would need a different' region helps me understanding the scope of async_cmd_region. Maybe we should reconsider the comment '+ * @cmd_region: MMIO region for asynchronous I/O commands other than START'. > > > BTW I would like to have the concurrency discussion sorted out before > > I proceed with my review, because reviewing the stuff without a fair idea > > of what exactly are we trying to achieve would yield poor results. > > I'm not sure what is unclear about what we're trying to achieve (enable > the guest to issue halt/clear on real hardware)? Yeah, that is perfectly clear, but it ain't the complete story. E.g. are subsequent commands blocking until the preceding command finishes is part of the interface. And what is good implementation depends on the answer. What I mean, I first need to understand how things are supposed to work (together) so I can double check that against the implementation. Otherwise all I can do is nitpicking. To get more tangible: we are in the middle of processing an SSCH request (e.g. doing the translation) when a HSCH comes in. What should happen? Should we start processing HSCH after he instruction part of SSCH is done -- which currently includes translation? Or should we -EBUSY? Or do we abort START related activities and do the HALT stuff? > > But yes, we need to sort out that concurrency thing; I'm currently > unsure if the core should do some things as well or if it's more of a > vendor-driver thing. > By core you mean vfio-mdev core? If yes, I think it is a vendor-driver thing: limiting concurrency for all vfio-mdev does not make sense IMHO. Regards, Halil
On Tue, 4 Dec 2018 16:02:36 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Tue, 4 Dec 2018 14:11:30 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > On Tue, 4 Dec 2018 13:38:10 +0100 > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > On Thu, 22 Nov 2018 17:54:29 +0100 > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > [This is the Linux kernel part, git tree is available at > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git vfio-ccw-caps > > > > > > > > The companion QEMU patches are available at > > > > https://github.com/cohuck/qemu vfio-ccw-caps] > > > > > > > > Currently, vfio-ccw only relays START SUBCHANNEL requests to the real > > > > device. This tends to work well for the most common 'good path' scenarios; > > > > however, as we emulate {HALT,CLEAR} SUBCHANNEL in QEMU, things like > > > > clearing pending requests at the device is currently not supported. > > > > This may be a problem for e.g. error recovery. > > > > > > I'm wondering: what about MODIFY SUBCHANNEL? Do we plan to add MSCH > > > as well or is it supposed to remain 'userspace emulated'? AFAIR MSCH > > > may have an effect on error recovery as well. > > > > I think that would require a deeper change, as we have the requirement > > to enable the subchannel before handing it to userspace. IOW, the guest > > does not cause the subchannel to be enabled/disabled, but the host does. > > > > My point is, when the subchannel is disabled, 'firmware' is responsible > for suppressing interrupts and error conditions, and also for > doing the appropriate recovery procedure, so to say under the hood. I don't think there's actually much of a 'recovery' possible at a subchannel level (other than 'have you tried turning it off and on again?'); the interesting stuff is all at the device-specific level. > > I think Jason has discovered some problems related to this while doing > his DASD IPL with vfio-ccw work, but I don't quite remember any more. cc:ing Jason, in case he remembers :) > IMHO it may be possible to emulate enable/disable, but it seems way more > error prone and complicated, than letting the guest enable/disable the > host subchannel. > > I have no idea what was the reason for going with the initial design. > I would appreciate any hints or explanations, but I'm well aware that it > was a long time ago. I don't really remember either, and any non-public mails from that time are inaccessible to me :( It *might* be an artifact of the original design (which operated at the ccw_device rather than the subchannel level), though. > > Parameters (like for channel measurements) are a different game. It is > > something we should look into, but it will need a different region. > > Yes emulation only channel measurements seem even less likely than proper > enable/disable. And 'that would need a different' region helps me > understanding the scope of async_cmd_region. Maybe we should reconsider > the comment '+ * @cmd_region: MMIO region for asynchronous I/O commands > other than START'. What do you think is wrong with that comment? > > > BTW I would like to have the concurrency discussion sorted out before > > > I proceed with my review, because reviewing the stuff without a fair idea > > > of what exactly are we trying to achieve would yield poor results. > > > > I'm not sure what is unclear about what we're trying to achieve (enable > > the guest to issue halt/clear on real hardware)? > > Yeah, that is perfectly clear, but it ain't the complete story. E.g. > are subsequent commands blocking until the preceding command finishes > is part of the interface. And what is good implementation depends on the > answer. What I mean, I first need to understand how things are supposed > to work (together) so I can double check that against the > implementation. Otherwise all I can do is nitpicking. > > To get more tangible: we are in the middle of processing an SSCH request > (e.g. doing the translation) when a HSCH comes in. What should happen? > Should we start processing HSCH after he instruction part of SSCH is > done -- which currently includes translation? Or should we -EBUSY? Or do > we abort START related activities and do the HALT stuff? I think most of the sorting-out-the-operations stuff should be done by the hardware itself, and we should not really try to enforce anything special in our vfio code. For your example, it might be best if a hsch is always accepted and send on towards the hardware. Probably best to reflect back -EAGAIN if we're currently processing another instruction from another vcpu, so that the user space caller can retry. Same for ssch, if another ssch is already being processed. We *could* reflect cc 2 if the fctl bit is already set, but that won't do for csch, so it is probably best to have the hardware figure that out in any case. If I read the code correctly, we currently reflect -EBUSY and not -EAGAIN if we get a ssch request while already processing another one. QEMU hands that back to the guest as a cc 2, which is not 100% correct. In practice, we don't see this with Linux guests due to locking. > > But yes, we need to sort out that concurrency thing; I'm currently > > unsure if the core should do some things as well or if it's more of > > a vendor-driver thing. > > > > By core you mean vfio-mdev core? If yes, I think it is a vendor-driver > thing: limiting concurrency for all vfio-mdev does not make sense > IMHO. Also generic vfio. But I'm still unclear which guarantees we have. I suspect none; I'm wondering whether other vfio devices might have issues as well.
On 12/05/2018 07:54 AM, Cornelia Huck wrote: >> Yeah, that is perfectly clear, but it ain't the complete story. E.g. >> are subsequent commands blocking until the preceding command finishes >> is part of the interface. And what is good implementation depends on the >> answer. What I mean, I first need to understand how things are supposed >> to work (together) so I can double check that against the >> implementation. Otherwise all I can do is nitpicking. >> >> To get more tangible: we are in the middle of processing an SSCH request >> (e.g. doing the translation) when a HSCH comes in. What should happen? >> Should we start processing HSCH after he instruction part of SSCH is >> done -- which currently includes translation? Or should we -EBUSY? Or do >> we abort START related activities and do the HALT stuff? > I think most of the sorting-out-the-operations stuff should be done by > the hardware itself, and we should not really try to enforce anything > special in our vfio code. > > For your example, it might be best if a hsch is always accepted and > send on towards the hardware. Probably best to reflect back -EAGAIN if > we're currently processing another instruction from another vcpu, so > that the user space caller can retry. Same for ssch, if another ssch is > already being processed. We*could* reflect cc 2 if the fctl bit is > already set, but that won't do for csch, so it is probably best to have > the hardware figure that out in any case. > > If I read the code correctly, we currently reflect -EBUSY and not > -EAGAIN if we get a ssch request while already processing another one. > QEMU hands that back to the guest as a cc 2, which is not 100% correct. > In practice, we don't see this with Linux guests due to locking. > If we have a ssch and a csch immediately afterwards from userspace, will we end up issuing csch first and then ssch to the hardware? If I understand correctly, the ccw translation as part of the ssch can be a slow operation so it might be possible we issue the csch first? In that case we won't actually clear the original start function as intended. Thanks Farhan
On Wed, 5 Dec 2018 13:34:11 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > On 12/05/2018 07:54 AM, Cornelia Huck wrote: > >> Yeah, that is perfectly clear, but it ain't the complete story. E.g. > >> are subsequent commands blocking until the preceding command finishes > >> is part of the interface. And what is good implementation depends on the > >> answer. What I mean, I first need to understand how things are supposed > >> to work (together) so I can double check that against the > >> implementation. Otherwise all I can do is nitpicking. > >> > >> To get more tangible: we are in the middle of processing an SSCH request > >> (e.g. doing the translation) when a HSCH comes in. What should happen? > >> Should we start processing HSCH after he instruction part of SSCH is > >> done -- which currently includes translation? Or should we -EBUSY? Or do > >> we abort START related activities and do the HALT stuff? > > I think most of the sorting-out-the-operations stuff should be done by > > the hardware itself, and we should not really try to enforce anything > > special in our vfio code. > > > > For your example, it might be best if a hsch is always accepted and > > send on towards the hardware. Probably best to reflect back -EAGAIN if > > we're currently processing another instruction from another vcpu, so > > that the user space caller can retry. Same for ssch, if another ssch is > > already being processed. We*could* reflect cc 2 if the fctl bit is > > already set, but that won't do for csch, so it is probably best to have > > the hardware figure that out in any case. > > > > If I read the code correctly, we currently reflect -EBUSY and not > > -EAGAIN if we get a ssch request while already processing another one. > > QEMU hands that back to the guest as a cc 2, which is not 100% correct. > > In practice, we don't see this with Linux guests due to locking. > > > > If we have a ssch and a csch immediately afterwards from userspace, will > we end up issuing csch first and then ssch to the hardware? > > If I understand correctly, the ccw translation as part of the ssch can > be a slow operation so it might be possible we issue the csch first? > In that case we won't actually clear the original start function as > intended. When we start processing the ssch request (translation and so on), we set the state to BUSY. This means that any csch request will get a -EBUSY, no overtaking possible. (I think maybe I'll need to check what this series looks like if I rebase it on top of Pierre's rework, as he did some changes in the state machine.) My idea above was to return -EAGAIN instead of -EBUSY, so that user space can retry the operation.
On 12/06/2018 09:39 AM, Cornelia Huck wrote: > On Wed, 5 Dec 2018 13:34:11 -0500 > Farhan Ali <alifm@linux.ibm.com> wrote: > >> On 12/05/2018 07:54 AM, Cornelia Huck wrote: >>>> Yeah, that is perfectly clear, but it ain't the complete story. E.g. >>>> are subsequent commands blocking until the preceding command finishes >>>> is part of the interface. And what is good implementation depends on the >>>> answer. What I mean, I first need to understand how things are supposed >>>> to work (together) so I can double check that against the >>>> implementation. Otherwise all I can do is nitpicking. >>>> >>>> To get more tangible: we are in the middle of processing an SSCH request >>>> (e.g. doing the translation) when a HSCH comes in. What should happen? >>>> Should we start processing HSCH after he instruction part of SSCH is >>>> done -- which currently includes translation? Or should we -EBUSY? Or do >>>> we abort START related activities and do the HALT stuff? >>> I think most of the sorting-out-the-operations stuff should be done by >>> the hardware itself, and we should not really try to enforce anything >>> special in our vfio code. >>> >>> For your example, it might be best if a hsch is always accepted and >>> send on towards the hardware. Probably best to reflect back -EAGAIN if >>> we're currently processing another instruction from another vcpu, so >>> that the user space caller can retry. Same for ssch, if another ssch is >>> already being processed. We*could* reflect cc 2 if the fctl bit is >>> already set, but that won't do for csch, so it is probably best to have >>> the hardware figure that out in any case. >>> >>> If I read the code correctly, we currently reflect -EBUSY and not >>> -EAGAIN if we get a ssch request while already processing another one. >>> QEMU hands that back to the guest as a cc 2, which is not 100% correct. >>> In practice, we don't see this with Linux guests due to locking. >>> >> >> If we have a ssch and a csch immediately afterwards from userspace, will >> we end up issuing csch first and then ssch to the hardware? >> >> If I understand correctly, the ccw translation as part of the ssch can >> be a slow operation so it might be possible we issue the csch first? >> In that case we won't actually clear the original start function as >> intended. > > When we start processing the ssch request (translation and so on), we > set the state to BUSY. This means that any csch request will get a > -EBUSY, no overtaking possible. (I think maybe I'll need to check what > this series looks like if I rebase it on top of Pierre's rework, as he > did some changes in the state machine.) I think you meant the state is set to BOXED? otherwise the patch 3 says if state is BUSY and CLEAR event request comes in, we issue the clear instruction, no? > > My idea above was to return -EAGAIN instead of -EBUSY, so that user > space can retry the operation. > >
On Thu, 6 Dec 2018 10:26:12 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > On 12/06/2018 09:39 AM, Cornelia Huck wrote: > > On Wed, 5 Dec 2018 13:34:11 -0500 > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > >> On 12/05/2018 07:54 AM, Cornelia Huck wrote: > >>>> Yeah, that is perfectly clear, but it ain't the complete story. E.g. > >>>> are subsequent commands blocking until the preceding command finishes > >>>> is part of the interface. And what is good implementation depends on the > >>>> answer. What I mean, I first need to understand how things are supposed > >>>> to work (together) so I can double check that against the > >>>> implementation. Otherwise all I can do is nitpicking. > >>>> > >>>> To get more tangible: we are in the middle of processing an SSCH request > >>>> (e.g. doing the translation) when a HSCH comes in. What should happen? > >>>> Should we start processing HSCH after he instruction part of SSCH is > >>>> done -- which currently includes translation? Or should we -EBUSY? Or do > >>>> we abort START related activities and do the HALT stuff? > >>> I think most of the sorting-out-the-operations stuff should be done by > >>> the hardware itself, and we should not really try to enforce anything > >>> special in our vfio code. > >>> > >>> For your example, it might be best if a hsch is always accepted and > >>> send on towards the hardware. Probably best to reflect back -EAGAIN if > >>> we're currently processing another instruction from another vcpu, so > >>> that the user space caller can retry. Same for ssch, if another ssch is > >>> already being processed. We*could* reflect cc 2 if the fctl bit is > >>> already set, but that won't do for csch, so it is probably best to have > >>> the hardware figure that out in any case. > >>> > >>> If I read the code correctly, we currently reflect -EBUSY and not > >>> -EAGAIN if we get a ssch request while already processing another one. > >>> QEMU hands that back to the guest as a cc 2, which is not 100% correct. > >>> In practice, we don't see this with Linux guests due to locking. > >>> > >> > >> If we have a ssch and a csch immediately afterwards from userspace, will > >> we end up issuing csch first and then ssch to the hardware? > >> > >> If I understand correctly, the ccw translation as part of the ssch can > >> be a slow operation so it might be possible we issue the csch first? > >> In that case we won't actually clear the original start function as > >> intended. > > > > When we start processing the ssch request (translation and so on), we > > set the state to BUSY. This means that any csch request will get a > > -EBUSY, no overtaking possible. (I think maybe I'll need to check what > > this series looks like if I rebase it on top of Pierre's rework, as he > > did some changes in the state machine.) > > I think you meant the state is set to BOXED? otherwise the patch 3 says > if state is BUSY and CLEAR event request comes in, we issue the clear > instruction, no? That's what I meant with "need to rebase" :) The BOXED state is gone; I just had not rebased on top of it. There's more changes in the state machine; if we are on the same page as to what should happen, I can start massaging the patches.
On 12/06/2018 11:21 AM, Cornelia Huck wrote: > On Thu, 6 Dec 2018 10:26:12 -0500 > Farhan Ali <alifm@linux.ibm.com> wrote: > >> On 12/06/2018 09:39 AM, Cornelia Huck wrote: >>> On Wed, 5 Dec 2018 13:34:11 -0500 >>> Farhan Ali <alifm@linux.ibm.com> wrote: >>> >>>> On 12/05/2018 07:54 AM, Cornelia Huck wrote: >>>>>> Yeah, that is perfectly clear, but it ain't the complete story. E.g. >>>>>> are subsequent commands blocking until the preceding command finishes >>>>>> is part of the interface. And what is good implementation depends on the >>>>>> answer. What I mean, I first need to understand how things are supposed >>>>>> to work (together) so I can double check that against the >>>>>> implementation. Otherwise all I can do is nitpicking. >>>>>> >>>>>> To get more tangible: we are in the middle of processing an SSCH request >>>>>> (e.g. doing the translation) when a HSCH comes in. What should happen? >>>>>> Should we start processing HSCH after he instruction part of SSCH is >>>>>> done -- which currently includes translation? Or should we -EBUSY? Or do >>>>>> we abort START related activities and do the HALT stuff? >>>>> I think most of the sorting-out-the-operations stuff should be done by >>>>> the hardware itself, and we should not really try to enforce anything >>>>> special in our vfio code. >>>>> >>>>> For your example, it might be best if a hsch is always accepted and >>>>> send on towards the hardware. Probably best to reflect back -EAGAIN if >>>>> we're currently processing another instruction from another vcpu, so >>>>> that the user space caller can retry. Same for ssch, if another ssch is >>>>> already being processed. We*could* reflect cc 2 if the fctl bit is >>>>> already set, but that won't do for csch, so it is probably best to have >>>>> the hardware figure that out in any case. >>>>> >>>>> If I read the code correctly, we currently reflect -EBUSY and not >>>>> -EAGAIN if we get a ssch request while already processing another one. >>>>> QEMU hands that back to the guest as a cc 2, which is not 100% correct. >>>>> In practice, we don't see this with Linux guests due to locking. >>>>> >>>> >>>> If we have a ssch and a csch immediately afterwards from userspace, will >>>> we end up issuing csch first and then ssch to the hardware? >>>> >>>> If I understand correctly, the ccw translation as part of the ssch can >>>> be a slow operation so it might be possible we issue the csch first? >>>> In that case we won't actually clear the original start function as >>>> intended. >>> >>> When we start processing the ssch request (translation and so on), we >>> set the state to BUSY. This means that any csch request will get a >>> -EBUSY, no overtaking possible. (I think maybe I'll need to check what >>> this series looks like if I rebase it on top of Pierre's rework, as he >>> did some changes in the state machine.) >> >> I think you meant the state is set to BOXED? otherwise the patch 3 says >> if state is BUSY and CLEAR event request comes in, we issue the clear >> instruction, no? > > That's what I meant with "need to rebase" :) The BOXED state is gone; I > just had not rebased on top of it. There's more changes in the state > machine; if we are on the same page as to what should happen, I can > start massaging the patches. > > Sorry maybe I missed it, but are you referring to Pierre's latest cleanup patches? I don't see him removing the BOXED state. I think returning -EAGAIN and asking the userspace to retry the operation sounds reasonable to me. But how do we handle the issue of protecting the cmd_region from simultaneous hsch and csch calls? Do we agree on Pierre's method of making write calls mutually exclusive?
On Wed, 5 Dec 2018 13:54:02 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Tue, 4 Dec 2018 16:02:36 +0100 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > On Tue, 4 Dec 2018 14:11:30 +0100 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > On Tue, 4 Dec 2018 13:38:10 +0100 > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > > On Thu, 22 Nov 2018 17:54:29 +0100 > > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > > > [This is the Linux kernel part, git tree is available at > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git > > > > > vfio-ccw-caps > > > > > > > > > > The companion QEMU patches are available at > > > > > https://github.com/cohuck/qemu vfio-ccw-caps] > > > > > > > > > > Currently, vfio-ccw only relays START SUBCHANNEL requests to > > > > > the real device. This tends to work well for the most common > > > > > 'good path' scenarios; however, as we emulate {HALT,CLEAR} > > > > > SUBCHANNEL in QEMU, things like clearing pending requests at > > > > > the device is currently not supported. This may be a problem > > > > > for e.g. error recovery. > > > > > > > > I'm wondering: what about MODIFY SUBCHANNEL? Do we plan to add > > > > MSCH as well or is it supposed to remain 'userspace emulated'? > > > > AFAIR MSCH may have an effect on error recovery as well. > > > > > > I think that would require a deeper change, as we have the > > > requirement to enable the subchannel before handing it to > > > userspace. IOW, the guest does not cause the subchannel to be > > > enabled/disabled, but the host does. > > > > My point is, when the subchannel is disabled, 'firmware' is > > responsible for suppressing interrupts and error conditions, and > > also for doing the appropriate recovery procedure, so to say under > > the hood. > > I don't think there's actually much of a 'recovery' possible at a > subchannel level (other than 'have you tried turning it off and on > again?'); the interesting stuff is all at the device-specific level. > To clarify my concern let me quote from the PoP (SA22-7832-10 page 14-9): """ If a device presents unsolicited status while the associated subchannel is disabled, that status is discarded by the channel subsystem without generating an I/O-interruption condition. How- ever, if the status presented contains unit check, the channel subsystem issues the clear signal for the associated subchannel and does not gener- ate an I/O-interruption condition. This should be taken into account when the program uses MOD- IFY SUBCHANNEL to enable a subchannel. For example, the medium on the associated device that was present when the subchannel became disabled may have been replaced, and, there- fore, the program should verify the integrity of that medium. """ > > > > I think Jason has discovered some problems related to this while > > doing his DASD IPL with vfio-ccw work, but I don't quite remember > > any more. > > cc:ing Jason, in case he remembers :) > > > IMHO it may be possible to emulate enable/disable, but it seems way > > more error prone and complicated, than letting the guest > > enable/disable the host subchannel. > > > > I have no idea what was the reason for going with the initial design. > > I would appreciate any hints or explanations, but I'm well aware > > that it was a long time ago. > > I don't really remember either, and any non-public mails from that time > are inaccessible to me :( > > It *might* be an artifact of the original design (which operated at the > ccw_device rather than the subchannel level), though. > Interesting. > > > Parameters (like for channel measurements) are a different game. > > > It is something we should look into, but it will need a different > > > region. > > > > Yes emulation only channel measurements seem even less likely than > > proper enable/disable. And 'that would need a different' region > > helps me understanding the scope of async_cmd_region. Maybe we > > should reconsider the comment '+ * @cmd_region: MMIO region for > > asynchronous I/O commands other than START'. > > What do you think is wrong with that comment? > Well msch is also an async I/O command other than START. If msch does not belong here but needs it's own region, then this description seems too generic. > > > > BTW I would like to have the concurrency discussion sorted out > > > > before I proceed with my review, because reviewing the stuff > > > > without a fair idea of what exactly are we trying to achieve > > > > would yield poor results. > > > > > > I'm not sure what is unclear about what we're trying to achieve > > > (enable the guest to issue halt/clear on real hardware)? > > > > Yeah, that is perfectly clear, but it ain't the complete story. E.g. > > are subsequent commands blocking until the preceding command finishes > > is part of the interface. And what is good implementation depends on > > the answer. What I mean, I first need to understand how things are > > supposed to work (together) so I can double check that against the > > implementation. Otherwise all I can do is nitpicking. > > > > To get more tangible: we are in the middle of processing an SSCH > > request (e.g. doing the translation) when a HSCH comes in. What > > should happen? Should we start processing HSCH after he instruction > > part of SSCH is done -- which currently includes translation? Or > > should we -EBUSY? Or do we abort START related activities and do the > > HALT stuff? > > I think most of the sorting-out-the-operations stuff should be done by > the hardware itself, and we should not really try to enforce anything > special in our vfio code. > Sounds very reasonable to me. Does this mean you are against Pierre's '[PATCH v3 6/6] vfio: ccw: serialize the write system calls' as it does not let HW sort out stuff, but enforces sequencing? > For your example, it might be best if a hsch is always accepted and > send on towards the hardware. Nod. > Probably best to reflect back -EAGAIN if > we're currently processing another instruction from another vcpu, so > that the user space caller can retry. Hm, not sure how this works together with your previous sentence. > Same for ssch, if another ssch is > already being processed. We *could* reflect cc 2 if the fctl bit is > already set, but that won't do for csch, so it is probably best to have > the hardware figure that out in any case. > We just need to be careful about avoiding races if we let hw sort out things. If an ssch is issued with the start function pending the correct response is cc 2. > If I read the code correctly, we currently reflect -EBUSY and not > -EAGAIN if we get a ssch request while already processing another one. > QEMU hands that back to the guest as a cc 2, which is not 100% correct. > In practice, we don't see this with Linux guests due to locking. > Nod, does not happen because of BQL. We currently do the user-space counterpart of vfio_ccw_mdev_write() in BQL context or (i.e. we hold BQL until translation is done and our host ssch() comes back)? I think -EBUSY is the correct response for ssch while start pending set. I think we set start pending in QEMU before we issue 'start command/io request' to the kernel. I don't think -EAGAIN is a good idea. AFAIU we would expect user-space to loop on -EAGAIN e.g. at least until the processing of a 'start command' is done and the (fist) ssch by the host is issued. And then what? Translate the second channel program issue the second ssch in the host and probably get a non-zero cc? Or return -EBUSY? Or keep returning -EAGAIN? > > > But yes, we need to sort out that concurrency thing; I'm currently > > > unsure if the core should do some things as well or if it's more of > > > a vendor-driver thing. > > > > > > > By core you mean vfio-mdev core? If yes, I think it is a > > vendor-driver thing: limiting concurrency for all vfio-mdev does > > not make sense IMHO. > > Also generic vfio. But I'm still unclear which guarantees we have. I > suspect none; I'm wondering whether other vfio devices might have > issues as well. > My intuition says this is something left to the devices. Regards, Halil
On Thu, 6 Dec 2018 12:50:50 -0500 Farhan Ali <alifm@linux.ibm.com> wrote: > On 12/06/2018 11:21 AM, Cornelia Huck wrote: > > On Thu, 6 Dec 2018 10:26:12 -0500 > > Farhan Ali <alifm@linux.ibm.com> wrote: > > > >> On 12/06/2018 09:39 AM, Cornelia Huck wrote: > >>> On Wed, 5 Dec 2018 13:34:11 -0500 > >>> Farhan Ali <alifm@linux.ibm.com> wrote: > >>> > >>>> On 12/05/2018 07:54 AM, Cornelia Huck wrote: > >>>>>> Yeah, that is perfectly clear, but it ain't the complete story. E.g. > >>>>>> are subsequent commands blocking until the preceding command finishes > >>>>>> is part of the interface. And what is good implementation depends on the > >>>>>> answer. What I mean, I first need to understand how things are supposed > >>>>>> to work (together) so I can double check that against the > >>>>>> implementation. Otherwise all I can do is nitpicking. > >>>>>> > >>>>>> To get more tangible: we are in the middle of processing an SSCH request > >>>>>> (e.g. doing the translation) when a HSCH comes in. What should happen? > >>>>>> Should we start processing HSCH after he instruction part of SSCH is > >>>>>> done -- which currently includes translation? Or should we -EBUSY? Or do > >>>>>> we abort START related activities and do the HALT stuff? > >>>>> I think most of the sorting-out-the-operations stuff should be done by > >>>>> the hardware itself, and we should not really try to enforce anything > >>>>> special in our vfio code. > >>>>> > >>>>> For your example, it might be best if a hsch is always accepted and > >>>>> send on towards the hardware. Probably best to reflect back -EAGAIN if > >>>>> we're currently processing another instruction from another vcpu, so > >>>>> that the user space caller can retry. Same for ssch, if another ssch is > >>>>> already being processed. We*could* reflect cc 2 if the fctl bit is > >>>>> already set, but that won't do for csch, so it is probably best to have > >>>>> the hardware figure that out in any case. > >>>>> > >>>>> If I read the code correctly, we currently reflect -EBUSY and not > >>>>> -EAGAIN if we get a ssch request while already processing another one. > >>>>> QEMU hands that back to the guest as a cc 2, which is not 100% correct. > >>>>> In practice, we don't see this with Linux guests due to locking. > >>>>> > >>>> > >>>> If we have a ssch and a csch immediately afterwards from userspace, will > >>>> we end up issuing csch first and then ssch to the hardware? > >>>> > >>>> If I understand correctly, the ccw translation as part of the ssch can > >>>> be a slow operation so it might be possible we issue the csch first? > >>>> In that case we won't actually clear the original start function as > >>>> intended. > >>> > >>> When we start processing the ssch request (translation and so on), we > >>> set the state to BUSY. This means that any csch request will get a > >>> -EBUSY, no overtaking possible. (I think maybe I'll need to check what > >>> this series looks like if I rebase it on top of Pierre's rework, as he > >>> did some changes in the state machine.) > >> > >> I think you meant the state is set to BOXED? otherwise the patch 3 says > >> if state is BUSY and CLEAR event request comes in, we issue the clear > >> instruction, no? > > > > That's what I meant with "need to rebase" :) The BOXED state is gone; I > > just had not rebased on top of it. There's more changes in the state > > machine; if we are on the same page as to what should happen, I can > > start massaging the patches. > > > > > > Sorry maybe I missed it, but are you referring to Pierre's latest > cleanup patches? I don't see him removing the BOXED state. The "remove BOXED state" patch is currently on my vfio-ccw-staging branch. (That reminds me, will need to move it to my vfio-ccw branch and possibly send a pull request. I had hoped to collect more patches for the next release...) > > I think returning -EAGAIN and asking the userspace to retry the > operation sounds reasonable to me. > > But how do we handle the issue of protecting the cmd_region from > simultaneous hsch and csch calls? Do we agree on Pierre's method of > making write calls mutually exclusive? That's in his patch series, right? I did not yet have time to look at it...
On Thu, 6 Dec 2018 19:47:14 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Wed, 5 Dec 2018 13:54:02 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > On Tue, 4 Dec 2018 16:02:36 +0100 > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > On Tue, 4 Dec 2018 14:11:30 +0100 > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > On Tue, 4 Dec 2018 13:38:10 +0100 > > > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > > > > > On Thu, 22 Nov 2018 17:54:29 +0100 > > > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > > > > > [This is the Linux kernel part, git tree is available at > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw.git > > > > > > vfio-ccw-caps > > > > > > > > > > > > The companion QEMU patches are available at > > > > > > https://github.com/cohuck/qemu vfio-ccw-caps] > > > > > > > > > > > > Currently, vfio-ccw only relays START SUBCHANNEL requests to > > > > > > the real device. This tends to work well for the most common > > > > > > 'good path' scenarios; however, as we emulate {HALT,CLEAR} > > > > > > SUBCHANNEL in QEMU, things like clearing pending requests at > > > > > > the device is currently not supported. This may be a problem > > > > > > for e.g. error recovery. > > > > > > > > > > I'm wondering: what about MODIFY SUBCHANNEL? Do we plan to add > > > > > MSCH as well or is it supposed to remain 'userspace emulated'? > > > > > AFAIR MSCH may have an effect on error recovery as well. > > > > > > > > I think that would require a deeper change, as we have the > > > > requirement to enable the subchannel before handing it to > > > > userspace. IOW, the guest does not cause the subchannel to be > > > > enabled/disabled, but the host does. > > > > > > My point is, when the subchannel is disabled, 'firmware' is > > > responsible for suppressing interrupts and error conditions, and > > > also for doing the appropriate recovery procedure, so to say under > > > the hood. > > > > I don't think there's actually much of a 'recovery' possible at a > > subchannel level (other than 'have you tried turning it off and on > > again?'); the interesting stuff is all at the device-specific level. > > > > To clarify my concern let me quote from the PoP > (SA22-7832-10 page 14-9): > > """ > If a device presents unsolicited status while the > associated subchannel is disabled, that status is > discarded by the channel subsystem without > generating an I/O-interruption condition. How- > ever, if the status presented contains unit check, > the channel subsystem issues the clear signal for > the associated subchannel and does not gener- > ate an I/O-interruption condition. This should be > taken into account when the program uses MOD- > IFY SUBCHANNEL to enable a subchannel. For > example, the medium on the associated device > that was present when the subchannel became > disabled may have been replaced, and, there- > fore, the program should verify the integrity of > that medium. > """ Hm, so is your concern that we might have a status (unit check) if we have an enabled subchannel that might not be present if the subchannel had been disabled all the time? Is that a problem in practice? > > > > > > I think Jason has discovered some problems related to this while > > > doing his DASD IPL with vfio-ccw work, but I don't quite remember > > > any more. > > > > cc:ing Jason, in case he remembers :) Like in that case. Couldn't a unit check status also arrive just when the subchannel has been enabled, and the code therefore has to deal with it anyway? > > > > > IMHO it may be possible to emulate enable/disable, but it seems way > > > more error prone and complicated, than letting the guest > > > enable/disable the host subchannel. > > > > > > I have no idea what was the reason for going with the initial design. > > > I would appreciate any hints or explanations, but I'm well aware > > > that it was a long time ago. > > > > I don't really remember either, and any non-public mails from that time > > are inaccessible to me :( > > > > It *might* be an artifact of the original design (which operated at the > > ccw_device rather than the subchannel level), though. > > > > Interesting. > > > > > Parameters (like for channel measurements) are a different game. > > > > It is something we should look into, but it will need a different > > > > region. > > > > > > Yes emulation only channel measurements seem even less likely than > > > proper enable/disable. And 'that would need a different' region > > > helps me understanding the scope of async_cmd_region. Maybe we > > > should reconsider the comment '+ * @cmd_region: MMIO region for > > > asynchronous I/O commands other than START'. > > > > What do you think is wrong with that comment? > > > > Well msch is also an async I/O command other than START. If msch does not > belong here but needs it's own region, then this description seems too > generic. Why do you consider msch to be async? ssch, hsch, csch all have the potential to cause the execution of an asynchronous (start/halt/clear) function, while msch just (possibly) modifies the subchannel and is done. > > > > > > BTW I would like to have the concurrency discussion sorted out > > > > > before I proceed with my review, because reviewing the stuff > > > > > without a fair idea of what exactly are we trying to achieve > > > > > would yield poor results. > > > > > > > > I'm not sure what is unclear about what we're trying to achieve > > > > (enable the guest to issue halt/clear on real hardware)? > > > > > > Yeah, that is perfectly clear, but it ain't the complete story. E.g. > > > are subsequent commands blocking until the preceding command finishes > > > is part of the interface. And what is good implementation depends on > > > the answer. What I mean, I first need to understand how things are > > > supposed to work (together) so I can double check that against the > > > implementation. Otherwise all I can do is nitpicking. > > > > > > To get more tangible: we are in the middle of processing an SSCH > > > request (e.g. doing the translation) when a HSCH comes in. What > > > should happen? Should we start processing HSCH after he instruction > > > part of SSCH is done -- which currently includes translation? Or > > > should we -EBUSY? Or do we abort START related activities and do the > > > HALT stuff? > > > > I think most of the sorting-out-the-operations stuff should be done by > > the hardware itself, and we should not really try to enforce anything > > special in our vfio code. > > > > Sounds very reasonable to me. Does this mean you are against Pierre's > '[PATCH v3 6/6] vfio: ccw: serialize the write system calls' as it does > not let HW sort out stuff, but enforces sequencing? I have not yet had time to look at that, sorry. > > > > For your example, it might be best if a hsch is always accepted and > > send on towards the hardware. > > Nod. > > > Probably best to reflect back -EAGAIN if > > we're currently processing another instruction from another vcpu, so > > that the user space caller can retry. > > Hm, not sure how this works together with your previous sentence. The software layering. We have the kernel layer (drivers/s390/cio/vfio_ccw_*) that interacts with the hardware more or less directly, and the QEMU layer, which does some writes on regions. In the end, the goal is to act on behalf of the guest issuing a ssch/hsch/csch, which is from the guest's view a single instruction. We should not have the individual "instructions" compete with each other so that they run essentially in parallel (kernel layer), but we should also not try to impose an artificial ordering as to when instructions executed by different vcpus are executed (QEMU layer). Therefore, don't try to run an instruction in the kernel when another one is in progress for the same subchannel (exclusivity in the kernel), but retry in QEMU if needed (no ordering between vcpus imposed). In short, don't create strange concurrency issues in the "instruction" handling, but make it possible to execute instructions in a non-predictable order if the guest does not care about enforcing ordering on its side. > > > Same for ssch, if another ssch is > > already being processed. We *could* reflect cc 2 if the fctl bit is > > already set, but that won't do for csch, so it is probably best to have > > the hardware figure that out in any case. > > > > We just need to be careful about avoiding races if we let hw sort out > things. If an ssch is issued with the start function pending the correct > response is cc 2. But sending it on to the hardware will give us that cc 2, no? > > > If I read the code correctly, we currently reflect -EBUSY and not > > -EAGAIN if we get a ssch request while already processing another one. > > QEMU hands that back to the guest as a cc 2, which is not 100% correct. > > In practice, we don't see this with Linux guests due to locking. > > > > Nod, does not happen because of BQL. We currently do the user-space > counterpart of vfio_ccw_mdev_write() in BQL context or (i.e. we hold > BQL until translation is done and our host ssch() comes back)? The Linux kernel uses the subchannel lock to enforce exclusivity for subchannel instructions, so we won't see Linux guests issue instructions on different vcpus in parallel, that's what I meant. > > I think -EBUSY is the correct response for ssch while start pending set. > I think we set start pending in QEMU before we issue 'start command/io > request' to the kernel. I don't think -EAGAIN is a good idea. AFAIU we > would expect user-space to loop on -EAGAIN e.g. at least until the > processing of a 'start command' is done and the (fist) ssch by the host > is issued. And then what? Translate the second channel program issue > the second ssch in the host and probably get a non-zero cc? Or return > -EBUSY? Or keep returning -EAGAIN? My idea was: - return -EAGAIN if we're already processing a channel instruction - continue returning -EBUSY etc. if the instruction gets the respective return code from the hardware So, the second ssch would first get a -EAGAIN and then a -EBUSY if the first ssch is done, but the subchannel is still doing the start function. Just as you would expect when you do a ssch while your last request has not finished yet. > > > > > But yes, we need to sort out that concurrency thing; I'm currently > > > > unsure if the core should do some things as well or if it's more of > > > > a vendor-driver thing. > > > > > > > > > > By core you mean vfio-mdev core? If yes, I think it is a > > > vendor-driver thing: limiting concurrency for all vfio-mdev does > > > not make sense IMHO. > > > > Also generic vfio. But I'm still unclear which guarantees we have. I > > suspect none; I'm wondering whether other vfio devices might have > > issues as well. > > > > My intuition says this is something left to the devices. Probably, yes.
On Fri, 7 Dec 2018 11:05:29 +0100 Cornelia Huck <cohuck@redhat.com> wrote: [..] > > To clarify my concern let me quote from the PoP > > (SA22-7832-10 page 14-9): > > > > """ > > If a device presents unsolicited status while the > > associated subchannel is disabled, that status is > > discarded by the channel subsystem without > > generating an I/O-interruption condition. How- > > ever, if the status presented contains unit check, > > the channel subsystem issues the clear signal for > > the associated subchannel and does not gener- > > ate an I/O-interruption condition. This should be > > taken into account when the program uses MOD- > > IFY SUBCHANNEL to enable a subchannel. For > > example, the medium on the associated device > > that was present when the subchannel became > > disabled may have been replaced, and, there- > > fore, the program should verify the integrity of > > that medium. > > """ > > Hm, so is your concern that we might have a status (unit check) if we > have an enabled subchannel that might not be present if the subchannel > had been disabled all the time? Is that a problem in practice? > No idea if it is a problem in practice. > > > > > > > > I think Jason has discovered some problems related to this while > > > > doing his DASD IPL with vfio-ccw work, but I don't quite remember > > > > any more. > > > > > > cc:ing Jason, in case he remembers :) > > Like in that case. Couldn't a unit check status also arrive just when > the subchannel has been enabled, and the code therefore has to deal > with it anyway? > I assumed that programming note is there for a reason. Of course if it can not been proven it ain't cheating. I don't remember exactly this interacts with the rest of the architecture. In fact I asked my question, because my feeling was that tying the virtual an the backing subchannel together is simpler, than proving that we are fine without doing it. > > > > > > > IMHO it may be possible to emulate enable/disable, but it seems way > > > > more error prone and complicated, than letting the guest > > > > enable/disable the host subchannel. > > > > > > > > I have no idea what was the reason for going with the initial design. > > > > I would appreciate any hints or explanations, but I'm well aware > > > > that it was a long time ago. > > > > > > I don't really remember either, and any non-public mails from that time > > > are inaccessible to me :( > > > > > > It *might* be an artifact of the original design (which operated at the > > > ccw_device rather than the subchannel level), though. > > > > > > > Interesting. > > > > > > > Parameters (like for channel measurements) are a different game. > > > > > It is something we should look into, but it will need a different > > > > > region. > > > > > > > > Yes emulation only channel measurements seem even less likely than > > > > proper enable/disable. And 'that would need a different' region > > > > helps me understanding the scope of async_cmd_region. Maybe we > > > > should reconsider the comment '+ * @cmd_region: MMIO region for > > > > asynchronous I/O commands other than START'. > > > > > > What do you think is wrong with that comment? > > > > > > > Well msch is also an async I/O command other than START. If msch does not > > belong here but needs it's own region, then this description seems too > > generic. > > Why do you consider msch to be async? ssch, hsch, csch all have the > potential to cause the execution of an asynchronous (start/halt/clear) > function, while msch just (possibly) modifies the subchannel and is > done. > Right, my bad. Got confused by my Z channel io is async superstition. I did not quite understand what async means in this context. Regards, Halil
On Fri, 7 Dec 2018 11:05:29 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > > > I think most of the sorting-out-the-operations stuff should be done by > > > the hardware itself, and we should not really try to enforce anything > > > special in our vfio code. > > > > > > > Sounds very reasonable to me. Does this mean you are against Pierre's > > '[PATCH v3 6/6] vfio: ccw: serialize the write system calls' as it does > > not let HW sort out stuff, but enforces sequencing? > > I have not yet had time to look at that, sorry. > > > > > > > > For your example, it might be best if a hsch is always accepted and > > > send on towards the hardware. > > > > Nod. > > > > > Probably best to reflect back -EAGAIN if > > > we're currently processing another instruction from another vcpu, so > > > that the user space caller can retry. > > > > Hm, not sure how this works together with your previous sentence. > > The software layering. We have the kernel layer > (drivers/s390/cio/vfio_ccw_*) that interacts with the hardware more or > less directly, and the QEMU layer, which does some writes on regions. > In the end, the goal is to act on behalf of the guest issuing a > ssch/hsch/csch, which is from the guest's view a single instruction. We > should not have the individual "instructions" compete with each other > so that they run essentially in parallel (kernel layer), but we should > also not try to impose an artificial ordering as to when instructions > executed by different vcpus are executed (QEMU layer). Therefore, don't > try to run an instruction in the kernel when another one is in progress > for the same subchannel (exclusivity in the kernel), but retry in QEMU > if needed (no ordering between vcpus imposed). > > In short, don't create strange concurrency issues in the "instruction" > handling, but make it possible to execute instructions in a > non-predictable order if the guest does not care about enforcing > ordering on its side. > I'm neither sold on this, nor am I violently opposing it. Will try to meditate on it some more if any spare cycles arise. Currently I don't see the benefit of the non-predictable order over plain FCFS. For example, let's assume we have a ssch "instruction" that 1 second to complete. Since normally ssch instruction does not have to process the channel program, and is thus kind of a constant time operation (now we do the translation and the pinning as a part of the "instruction), our strange guest gets jumpy and does a csch after T+0.2s. And on T+0.8 in desperation follows the whole up with a hsch. If I understand your proposal correctly, both userspace handlers would spin on -EAGAIN until T+1. When ssch is done the csch and the hsch would race for who can be the next. I don't quite get the value of that. > > > > > Same for ssch, if another ssch is > > > already being processed. We *could* reflect cc 2 if the fctl bit is > > > already set, but that won't do for csch, so it is probably best to have > > > the hardware figure that out in any case. > > > > > > > We just need to be careful about avoiding races if we let hw sort out > > things. If an ssch is issued with the start function pending the correct > > response is cc 2. > > But sending it on to the hardware will give us that cc 2, no? > > > > > > If I read the code correctly, we currently reflect -EBUSY and not > > > -EAGAIN if we get a ssch request while already processing another one. > > > QEMU hands that back to the guest as a cc 2, which is not 100% correct. > > > In practice, we don't see this with Linux guests due to locking. > > > > > > > Nod, does not happen because of BQL. We currently do the user-space > > counterpart of vfio_ccw_mdev_write() in BQL context or (i.e. we hold > > BQL until translation is done and our host ssch() comes back)? > > The Linux kernel uses the subchannel lock to enforce exclusivity for > subchannel instructions, so we won't see Linux guests issue > instructions on different vcpus in parallel, that's what I meant. > That is cool. Yet I think the situation with the BQL is relevant. Because while BQL is held, not only IO instructions on a single vfio-ccw device are mutually exclusive. AFAIU no other instruction QEMU instruction handler can engage. And a store subchannel for device A having to wait until the translation for the start subchannel on device B is done is not the most scary thing I can imagine. > > > > I think -EBUSY is the correct response for ssch while start pending set. > > I think we set start pending in QEMU before we issue 'start command/io > > request' to the kernel. I don't think -EAGAIN is a good idea. AFAIU we > > would expect user-space to loop on -EAGAIN e.g. at least until the > > processing of a 'start command' is done and the (fist) ssch by the host > > is issued. And then what? Translate the second channel program issue > > the second ssch in the host and probably get a non-zero cc? Or return > > -EBUSY? Or keep returning -EAGAIN? > > My idea was: > - return -EAGAIN if we're already processing a channel instruction > - continue returning -EBUSY etc. if the instruction gets the respective > return code from the hardware > > So, the second ssch would first get a -EAGAIN and then a -EBUSY if the > first ssch is done, but the subchannel is still doing the start > function. Just as you would expect when you do a ssch while your last > request has not finished yet. > But before you can issue the second ssch you have to do the translation for it. And we must assume the IO corresponding to the first ssch is not done yet -- so we still need the translated channel program of the first ssch. That is if we insist on doing the -EBUSY based on a return code from the hardware. I'm not sure we end up with a big simplification from making the "instructions" mutex on vfio-ccw device level in kernel as proposed above. But I'm not against it. If you have the time to write the patches I will find time to review them. Regards, Halil
On Fri, 7 Dec 2018 17:54:23 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Fri, 7 Dec 2018 11:05:29 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > I think most of the sorting-out-the-operations stuff should be done by > > > > the hardware itself, and we should not really try to enforce anything > > > > special in our vfio code. > > > > > > > > > > Sounds very reasonable to me. Does this mean you are against Pierre's > > > '[PATCH v3 6/6] vfio: ccw: serialize the write system calls' as it does > > > not let HW sort out stuff, but enforces sequencing? > > > > I have not yet had time to look at that, sorry. > > > > > > > > > > > > For your example, it might be best if a hsch is always accepted and > > > > send on towards the hardware. > > > > > > Nod. > > > > > > > Probably best to reflect back -EAGAIN if > > > > we're currently processing another instruction from another vcpu, so > > > > that the user space caller can retry. > > > > > > Hm, not sure how this works together with your previous sentence. > > > > The software layering. We have the kernel layer > > (drivers/s390/cio/vfio_ccw_*) that interacts with the hardware more or > > less directly, and the QEMU layer, which does some writes on regions. > > In the end, the goal is to act on behalf of the guest issuing a > > ssch/hsch/csch, which is from the guest's view a single instruction. We > > should not have the individual "instructions" compete with each other > > so that they run essentially in parallel (kernel layer), but we should > > also not try to impose an artificial ordering as to when instructions > > executed by different vcpus are executed (QEMU layer). Therefore, don't > > try to run an instruction in the kernel when another one is in progress > > for the same subchannel (exclusivity in the kernel), but retry in QEMU > > if needed (no ordering between vcpus imposed). > > > > In short, don't create strange concurrency issues in the "instruction" > > handling, but make it possible to execute instructions in a > > non-predictable order if the guest does not care about enforcing > > ordering on its side. > > > > I'm neither sold on this, nor am I violently opposing it. Will try to > meditate on it some more if any spare cycles arise. Currently I don't > see the benefit of the non-predictable order over plain FCFS. For > example, let's assume we have a ssch "instruction" that 1 second to > complete. Since normally ssch instruction does not have to process the > channel program, and is thus kind of a constant time operation (now we > do the translation and the pinning as a part of the "instruction), our > strange guest gets jumpy and does a csch after T+0.2s. And on T+0.8 in > desperation follows the whole up with a hsch. If I understand your > proposal correctly, both userspace handlers would spin on -EAGAIN until > T+1. When ssch is done the csch and the hsch would race for who can > be the next. I don't quite get the value of that. What would happen on real hardware for such a guest? I would expect that the csch and the hsch would be executed in a random order as well. My point is that it is up to the guest to impose an order on the execution of instructions, if wanted. We should not try to guess anything; I think that would make the implementation needlessly complex. > > > > > > > > Same for ssch, if another ssch is > > > > already being processed. We *could* reflect cc 2 if the fctl > > > > bit is already set, but that won't do for csch, so it is > > > > probably best to have the hardware figure that out in any case. > > > > > > > > > > We just need to be careful about avoiding races if we let hw sort > > > out things. If an ssch is issued with the start function pending > > > the correct response is cc 2. > > > > But sending it on to the hardware will give us that cc 2, no? > > > > > > > > > If I read the code correctly, we currently reflect -EBUSY and > > > > not -EAGAIN if we get a ssch request while already processing > > > > another one. QEMU hands that back to the guest as a cc 2, which > > > > is not 100% correct. In practice, we don't see this with Linux > > > > guests due to locking. > > > > > > Nod, does not happen because of BQL. We currently do the > > > user-space counterpart of vfio_ccw_mdev_write() in BQL context or > > > (i.e. we hold BQL until translation is done and our host ssch() > > > comes back)? > > > > The Linux kernel uses the subchannel lock to enforce exclusivity for > > subchannel instructions, so we won't see Linux guests issue > > instructions on different vcpus in parallel, that's what I meant. > > > > That is cool. Yet I think the situation with the BQL is relevant. > Because while BQL is held, not only IO instructions on a single > vfio-ccw device are mutually exclusive. AFAIU no other instruction > QEMU instruction handler can engage. And a store subchannel for > device A having to wait until the translation for the start > subchannel on device B is done is not the most scary thing I can > imagine. Yes. But we still need to be able to cope with a userspace that does not give us those guarantees. > > > > > > I think -EBUSY is the correct response for ssch while start > > > pending set. I think we set start pending in QEMU before we issue > > > 'start command/io request' to the kernel. I don't think -EAGAIN > > > is a good idea. AFAIU we would expect user-space to loop on > > > -EAGAIN e.g. at least until the processing of a 'start command' > > > is done and the (fist) ssch by the host is issued. And then > > > what? Translate the second channel program issue the second ssch > > > in the host and probably get a non-zero cc? Or return -EBUSY? Or > > > keep returning -EAGAIN? > > > > My idea was: > > - return -EAGAIN if we're already processing a channel instruction > > - continue returning -EBUSY etc. if the instruction gets the > > respective return code from the hardware > > > > So, the second ssch would first get a -EAGAIN and then a -EBUSY if > > the first ssch is done, but the subchannel is still doing the start > > function. Just as you would expect when you do a ssch while your > > last request has not finished yet. > > > > But before you can issue the second ssch you have to do the > translation for it. And we must assume the IO corresponding to the > first ssch is not done yet -- so we still need the translated channel > program of the first ssch. Yes, we need to be able to juggle different translated channel programs if we don't consider this part of the "instruction execution". But if we return -EAGAIN if the code is currently doing that translation, we should be fine, no? > That is if we insist on doing the -EBUSY > based on a return code from the hardware. I'm not sure we end up with > a big simplification from making the "instructions" mutex on vfio-ccw > device level in kernel as proposed above. I'm not sure we're not talking past each other here... the "translate and issue instruction" part should be mutually exclusive; I just don't want to return -EBUSY, but -EAGAIN, so that userspace knows it should try again. > But I'm not against it. If > you have the time to write the patches I will find time to review > them. Probably only on the new year...
On Wed, 19 Dec 2018 12:54:42 +0100 Cornelia Huck <cohuck@redhat.com> wrote: > On Fri, 7 Dec 2018 17:54:23 +0100 > Halil Pasic <pasic@linux.ibm.com> wrote: > > > On Fri, 7 Dec 2018 11:05:29 +0100 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > I think most of the sorting-out-the-operations stuff should be done by > > > > > the hardware itself, and we should not really try to enforce anything > > > > > special in our vfio code. > > > > > > > > > > > > > Sounds very reasonable to me. Does this mean you are against Pierre's > > > > '[PATCH v3 6/6] vfio: ccw: serialize the write system calls' as it does > > > > not let HW sort out stuff, but enforces sequencing? > > > > > > I have not yet had time to look at that, sorry. > > > > > > > > > > > > > > > > For your example, it might be best if a hsch is always accepted and > > > > > send on towards the hardware. > > > > > > > > Nod. > > > > > > > > > Probably best to reflect back -EAGAIN if > > > > > we're currently processing another instruction from another vcpu, so > > > > > that the user space caller can retry. > > > > > > > > Hm, not sure how this works together with your previous sentence. > > > > > > The software layering. We have the kernel layer > > > (drivers/s390/cio/vfio_ccw_*) that interacts with the hardware more or > > > less directly, and the QEMU layer, which does some writes on regions. > > > In the end, the goal is to act on behalf of the guest issuing a > > > ssch/hsch/csch, which is from the guest's view a single instruction. We > > > should not have the individual "instructions" compete with each other > > > so that they run essentially in parallel (kernel layer), but we should > > > also not try to impose an artificial ordering as to when instructions > > > executed by different vcpus are executed (QEMU layer). Therefore, don't > > > try to run an instruction in the kernel when another one is in progress > > > for the same subchannel (exclusivity in the kernel), but retry in QEMU > > > if needed (no ordering between vcpus imposed). > > > > > > In short, don't create strange concurrency issues in the "instruction" > > > handling, but make it possible to execute instructions in a > > > non-predictable order if the guest does not care about enforcing > > > ordering on its side. > > > > > > > I'm neither sold on this, nor am I violently opposing it. Will try to > > meditate on it some more if any spare cycles arise. Currently I don't > > see the benefit of the non-predictable order over plain FCFS. For > > example, let's assume we have a ssch "instruction" that 1 second to > > complete. Since normally ssch instruction does not have to process the > > channel program, and is thus kind of a constant time operation (now we > > do the translation and the pinning as a part of the "instruction), our > > strange guest gets jumpy and does a csch after T+0.2s. And on T+0.8 in > > desperation follows the whole up with a hsch. If I understand your > > proposal correctly, both userspace handlers would spin on -EAGAIN until > > T+1. When ssch is done the csch and the hsch would race for who can > > be the next. I don't quite get the value of that. > > What would happen on real hardware for such a guest? I would expect > that the csch and the hsch would be executed in a random order as well. > Yes, they would be executed in random order, but would not wait until the ssch is done (and especially not wait until the channel program gets translated). AFAIR bot cancel the start function immediately -- if any pending. Furthermore the point where the race is decided is changing the function control bits -- the update needs to be an interlocked one obviously. What I want to say, there is no merit in waiting -- one second in the example. At some point it needs to be decided who is considered first, and artificially procrastinating this decision does not do us any good, because we may end up with otherwise unlikely behavior. > My point is that it is up to the guest to impose an order on the > execution of instructions, if wanted. We should not try to guess > anything; I think that would make the implementation needlessly complex. > I'm not for guessing stuff, but rather for sticking to the architecture. > > > > > > > > > > > Same for ssch, if another ssch is > > > > > already being processed. We *could* reflect cc 2 if the fctl > > > > > bit is already set, but that won't do for csch, so it is > > > > > probably best to have the hardware figure that out in any case. > > > > > > > > > > > > > We just need to be careful about avoiding races if we let hw sort > > > > out things. If an ssch is issued with the start function pending > > > > the correct response is cc 2. > > > > > > But sending it on to the hardware will give us that cc 2, no? > > > > > > > > > > > > If I read the code correctly, we currently reflect -EBUSY and > > > > > not -EAGAIN if we get a ssch request while already processing > > > > > another one. QEMU hands that back to the guest as a cc 2, which > > > > > is not 100% correct. In practice, we don't see this with Linux > > > > > guests due to locking. > > > > > > > > Nod, does not happen because of BQL. We currently do the > > > > user-space counterpart of vfio_ccw_mdev_write() in BQL context or > > > > (i.e. we hold BQL until translation is done and our host ssch() > > > > comes back)? > > > > > > The Linux kernel uses the subchannel lock to enforce exclusivity for > > > subchannel instructions, so we won't see Linux guests issue > > > instructions on different vcpus in parallel, that's what I meant. > > > > > > > That is cool. Yet I think the situation with the BQL is relevant. > > Because while BQL is held, not only IO instructions on a single > > vfio-ccw device are mutually exclusive. AFAIU no other instruction > > QEMU instruction handler can engage. And a store subchannel for > > device A having to wait until the translation for the start > > subchannel on device B is done is not the most scary thing I can > > imagine. > > Yes. But we still need to be able to cope with a userspace that does > not give us those guarantees. > I agree. The point I was trying to make is not that 'We are good, because qemu takes care of it!' on the contrary, I wanted to give voice to my concern that a guest that has a couple of vfio-ccw devices in use could experience performance problems because vfio-ccw holds BQL for long. > > > > > > > > I think -EBUSY is the correct response for ssch while start > > > > pending set. I think we set start pending in QEMU before we issue > > > > 'start command/io request' to the kernel. I don't think -EAGAIN > > > > is a good idea. AFAIU we would expect user-space to loop on > > > > -EAGAIN e.g. at least until the processing of a 'start command' > > > > is done and the (fist) ssch by the host is issued. And then > > > > what? Translate the second channel program issue the second ssch > > > > in the host and probably get a non-zero cc? Or return -EBUSY? Or > > > > keep returning -EAGAIN? > > > > > > My idea was: > > > - return -EAGAIN if we're already processing a channel instruction > > > - continue returning -EBUSY etc. if the instruction gets the > > > respective return code from the hardware > > > > > > So, the second ssch would first get a -EAGAIN and then a -EBUSY if > > > the first ssch is done, but the subchannel is still doing the start > > > function. Just as you would expect when you do a ssch while your > > > last request has not finished yet. > > > > > > > But before you can issue the second ssch you have to do the > > translation for it. And we must assume the IO corresponding to the > > first ssch is not done yet -- so we still need the translated channel > > program of the first ssch. > > Yes, we need to be able to juggle different translated channel programs > if we don't consider this part of the "instruction execution". But if > we return -EAGAIN if the code is currently doing that translation, we > should be fine, no? > As long as you return -EAGAIN we are fine. But AFAIU you proposed to do that until the I/O is submitted to the HW subchannel via ssch(). But that is not the case I'm talking about here. We have already translated the channel program for the first request, submitted it via ssch() and are awaiting an interrupt that tells us the I/O is done. While waiting for this interrupt we get a new ssch request. I understood, you don't want to give -EAGAIN for this one, but make the ssch decide. The problem is you still need the old translated channel program for the interrupt handling, and at the same time you need the new channel program translated as well, before doing the ssch for it in the host. > > That is if we insist on doing the -EBUSY > > based on a return code from the hardware. I'm not sure we end up with > > a big simplification from making the "instructions" mutex on vfio-ccw > > device level in kernel as proposed above. > > I'm not sure we're not talking past each other here... I'm afraid we do. > the "translate > and issue instruction" part should be mutually exclusive; I just don't > want to return -EBUSY, but -EAGAIN, so that userspace knows it should > try again. > I got it. But I wanted to point out, that we need the old channel program *beyond* the "translate and issue instruction". > > But I'm not against it. If > > you have the time to write the patches I will find time to review > > them. > > Probably only on the new year... I think the stuff is better discussed with code at hand. I'm happy to continue this discussion if you think it is useful to you. Otherwise I suggest do it the way you think is the best, and I will try to find and to point out the problems, if any. Regards, Halil
On Wed, 19 Dec 2018 15:17:19 +0100 Halil Pasic <pasic@linux.ibm.com> wrote: > On Wed, 19 Dec 2018 12:54:42 +0100 > Cornelia Huck <cohuck@redhat.com> wrote: > > > On Fri, 7 Dec 2018 17:54:23 +0100 > > Halil Pasic <pasic@linux.ibm.com> wrote: > > > > > On Fri, 7 Dec 2018 11:05:29 +0100 > > > Cornelia Huck <cohuck@redhat.com> wrote: > > > > > > > > > I think most of the sorting-out-the-operations stuff should be done by > > > > > > the hardware itself, and we should not really try to enforce anything > > > > > > special in our vfio code. > > > > > > > > > > > > > > > > Sounds very reasonable to me. Does this mean you are against Pierre's > > > > > '[PATCH v3 6/6] vfio: ccw: serialize the write system calls' as it does > > > > > not let HW sort out stuff, but enforces sequencing? > > > > > > > > I have not yet had time to look at that, sorry. > > > > > > > > > > > > > > > > > > > > For your example, it might be best if a hsch is always accepted and > > > > > > send on towards the hardware. > > > > > > > > > > Nod. > > > > > > > > > > > Probably best to reflect back -EAGAIN if > > > > > > we're currently processing another instruction from another vcpu, so > > > > > > that the user space caller can retry. > > > > > > > > > > Hm, not sure how this works together with your previous sentence. > > > > > > > > The software layering. We have the kernel layer > > > > (drivers/s390/cio/vfio_ccw_*) that interacts with the hardware more or > > > > less directly, and the QEMU layer, which does some writes on regions. > > > > In the end, the goal is to act on behalf of the guest issuing a > > > > ssch/hsch/csch, which is from the guest's view a single instruction. We > > > > should not have the individual "instructions" compete with each other > > > > so that they run essentially in parallel (kernel layer), but we should > > > > also not try to impose an artificial ordering as to when instructions > > > > executed by different vcpus are executed (QEMU layer). Therefore, don't > > > > try to run an instruction in the kernel when another one is in progress > > > > for the same subchannel (exclusivity in the kernel), but retry in QEMU > > > > if needed (no ordering between vcpus imposed). > > > > > > > > In short, don't create strange concurrency issues in the "instruction" > > > > handling, but make it possible to execute instructions in a > > > > non-predictable order if the guest does not care about enforcing > > > > ordering on its side. > > > > > > > > > > I'm neither sold on this, nor am I violently opposing it. Will try to > > > meditate on it some more if any spare cycles arise. Currently I don't > > > see the benefit of the non-predictable order over plain FCFS. For > > > example, let's assume we have a ssch "instruction" that 1 second to > > > complete. Since normally ssch instruction does not have to process the > > > channel program, and is thus kind of a constant time operation (now we > > > do the translation and the pinning as a part of the "instruction), our > > > strange guest gets jumpy and does a csch after T+0.2s. And on T+0.8 in > > > desperation follows the whole up with a hsch. If I understand your > > > proposal correctly, both userspace handlers would spin on -EAGAIN until > > > T+1. When ssch is done the csch and the hsch would race for who can > > > be the next. I don't quite get the value of that. > > > > What would happen on real hardware for such a guest? I would expect > > that the csch and the hsch would be executed in a random order as well. > > > > Yes, they would be executed in random order, but would not wait until the > ssch is done (and especially not wait until the channel program gets > translated). AFAIR bot cancel the start function immediately -- if any > pending. > > Furthermore the point where the race is decided is changing the function > control bits -- the update needs to be an interlocked one obviously. > > What I want to say, there is no merit in waiting -- one second in the > example. At some point it needs to be decided who is considered first, > and artificially procrastinating this decision does not do us any good, > because we may end up with otherwise unlikely behavior. You've really lost me here :( I fear you're criticizing something I don't want to implement; I'll write some code, that should make things much easier to discuss. > > > My point is that it is up to the guest to impose an order on the > > execution of instructions, if wanted. We should not try to guess > > anything; I think that would make the implementation needlessly complex. > > > > I'm not for guessing stuff, but rather for sticking to the architecture. > > > > > > > > > > > > > > > Same for ssch, if another ssch is > > > > > > already being processed. We *could* reflect cc 2 if the fctl > > > > > > bit is already set, but that won't do for csch, so it is > > > > > > probably best to have the hardware figure that out in any case. > > > > > > > > > > > > > > > > We just need to be careful about avoiding races if we let hw sort > > > > > out things. If an ssch is issued with the start function pending > > > > > the correct response is cc 2. > > > > > > > > But sending it on to the hardware will give us that cc 2, no? > > > > > > > > > > > > > > > If I read the code correctly, we currently reflect -EBUSY and > > > > > > not -EAGAIN if we get a ssch request while already processing > > > > > > another one. QEMU hands that back to the guest as a cc 2, which > > > > > > is not 100% correct. In practice, we don't see this with Linux > > > > > > guests due to locking. > > > > > > > > > > Nod, does not happen because of BQL. We currently do the > > > > > user-space counterpart of vfio_ccw_mdev_write() in BQL context or > > > > > (i.e. we hold BQL until translation is done and our host ssch() > > > > > comes back)? > > > > > > > > The Linux kernel uses the subchannel lock to enforce exclusivity for > > > > subchannel instructions, so we won't see Linux guests issue > > > > instructions on different vcpus in parallel, that's what I meant. > > > > > > > > > > That is cool. Yet I think the situation with the BQL is relevant. > > > Because while BQL is held, not only IO instructions on a single > > > vfio-ccw device are mutually exclusive. AFAIU no other instruction > > > QEMU instruction handler can engage. And a store subchannel for > > > device A having to wait until the translation for the start > > > subchannel on device B is done is not the most scary thing I can > > > imagine. > > > > Yes. But we still need to be able to cope with a userspace that does > > not give us those guarantees. > > > > I agree. The point I was trying to make is not that 'We are good, because > qemu takes care of it!' on the contrary, I wanted to give voice to my > concern that a guest that has a couple of vfio-ccw devices in use could > experience performance problems because vfio-ccw holds BQL for long. TBH, I have no idea how this will scale to many vfio-ccw devices. > > > > > > > > > > > I think -EBUSY is the correct response for ssch while start > > > > > pending set. I think we set start pending in QEMU before we issue > > > > > 'start command/io request' to the kernel. I don't think -EAGAIN > > > > > is a good idea. AFAIU we would expect user-space to loop on > > > > > -EAGAIN e.g. at least until the processing of a 'start command' > > > > > is done and the (fist) ssch by the host is issued. And then > > > > > what? Translate the second channel program issue the second ssch > > > > > in the host and probably get a non-zero cc? Or return -EBUSY? Or > > > > > keep returning -EAGAIN? > > > > > > > > My idea was: > > > > - return -EAGAIN if we're already processing a channel instruction > > > > - continue returning -EBUSY etc. if the instruction gets the > > > > respective return code from the hardware > > > > > > > > So, the second ssch would first get a -EAGAIN and then a -EBUSY if > > > > the first ssch is done, but the subchannel is still doing the start > > > > function. Just as you would expect when you do a ssch while your > > > > last request has not finished yet. > > > > > > > > > > But before you can issue the second ssch you have to do the > > > translation for it. And we must assume the IO corresponding to the > > > first ssch is not done yet -- so we still need the translated channel > > > program of the first ssch. > > > > Yes, we need to be able to juggle different translated channel programs > > if we don't consider this part of the "instruction execution". But if > > we return -EAGAIN if the code is currently doing that translation, we > > should be fine, no? > > > > As long as you return -EAGAIN we are fine. But AFAIU you proposed to > do that until the I/O is submitted to the HW subchannel via ssch(). But > that is not the case I'm talking about here. We have already translated > the channel program for the first request, submitted it via ssch() and > are awaiting an interrupt that tells us the I/O is done. While waiting > for this interrupt we get a new ssch request. I understood, you don't > want to give -EAGAIN for this one, but make the ssch decide. The problem > is you still need the old translated channel program for the interrupt > handling, and at the same time you need the new channel program > translated as well, before doing the ssch for it in the host. Why? You're not doing anything with that second ssch at all, it returns before translation is started. > > > > That is if we insist on doing the -EBUSY > > > based on a return code from the hardware. I'm not sure we end up with > > > a big simplification from making the "instructions" mutex on vfio-ccw > > > device level in kernel as proposed above. > > > > I'm not sure we're not talking past each other here... > > I'm afraid we do. > > > the "translate > > and issue instruction" part should be mutually exclusive; I just don't > > want to return -EBUSY, but -EAGAIN, so that userspace knows it should > > try again. > > > > I got it. But I wanted to point out, that we need the old channel program > *beyond* the "translate and issue instruction". Of course. But I don't want to start a new channel program. > > > > But I'm not against it. If > > > you have the time to write the patches I will find time to review > > > them. > > > > Probably only on the new year... > > I think the stuff is better discussed with code at hand. I'm happy to > continue this discussion if you think it is useful to you. Otherwise I > suggest do it the way you think is the best, and I will try to find and > to point out the problems, if any. I'll try to post something in January. Have a nice holiday break :)
On Fri, 21 Dec 2018 12:23:32 +0100 Cornelia Huck <cohuck@redhat.com> wrote: [..] > > You've really lost me here :( I fear you're criticizing something I > don't want to implement; I'll write some code, that should make things > much easier to discuss. > Nod. > TBH, I have no idea how this will scale to many vfio-ccw devices. > > > [..] > > As long as you return -EAGAIN we are fine. But AFAIU you proposed to > > do that until the I/O is submitted to the HW subchannel via ssch(). But > > that is not the case I'm talking about here. We have already translated > > the channel program for the first request, submitted it via ssch() and > > are awaiting an interrupt that tells us the I/O is done. While waiting > > for this interrupt we get a new ssch request. I understood, you don't > > want to give -EAGAIN for this one, but make the ssch decide. The problem > > is you still need the old translated channel program for the interrupt > > handling, and at the same time you need the new channel program > > translated as well, before doing the ssch for it in the host. > > Why? You're not doing anything with that second ssch at all, it returns > before translation is started. > OK apparently I misunderstood something -- it was a long and twisty discussion. Looking forward to the code ;). [..] > > I'll try to post something in January. Have a nice holiday break :) > Don't feel pressured. :) Have nice holidays as well! Regards, Halil