Message ID | 20170317031743.40128-15-bjsdjshi@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 17 Mar 2017 04:17:41 +0100 Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote: > Add file Documentation/s390/vfio-ccw.txt that includes details > of vfio-ccw. > > Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com> > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> > --- > Documentation/s390/00-INDEX | 2 + > Documentation/s390/vfio-ccw.txt | 303 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 305 insertions(+) > create mode 100644 Documentation/s390/vfio-ccw.txt > > diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX > index 9189535..317f037 100644 > --- a/Documentation/s390/00-INDEX > +++ b/Documentation/s390/00-INDEX > @@ -22,5 +22,7 @@ qeth.txt > - HiperSockets Bridge Port Support. > s390dbf.txt > - information on using the s390 debug feature. > +vfio-ccw.txt > + information on the vfio-ccw I/O subchannel driver. > zfcpdump.txt > - information on the s390 SCSI dump tool. > diff --git a/Documentation/s390/vfio-ccw.txt b/Documentation/s390/vfio-ccw.txt > new file mode 100644 > index 0000000..90b3dfe > --- /dev/null > +++ b/Documentation/s390/vfio-ccw.txt > @@ -0,0 +1,303 @@ > +vfio-ccw: the basic infrastructure > +================================== > + > +Introduction > +------------ > + > +Here we describe the vfio support for I/O subchannel devices for > +Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a > +virtual machine, while vfio is the means. > + > +Different than other hardware architectures, s390 has defined a unified > +I/O access method, which is so called Channel I/O. It has its own access > +patterns: > +- Channel programs run asynchronously on a separate (co)processor. > +- The channel subsystem will access any memory designated by the caller > + in the channel program directly, i.e. there is no iommu involved. > +Thus when we introduce vfio support for these devices, we realize it > +with a mediated device (mdev) implementation. The vfio mdev will be > +added to an iommu group, so as to make itself able to be managed by the > +vfio framework. And we add read/write callbacks for special vfio I/O > +regions to pass the channel programs from the mdev to its parent device > +(the real I/O subchannel device) to do further address translation and > +to perform I/O instructions. > + > +This document does not intend to explain the s390 I/O architecture in > +every detail. More information/reference could be found here: > +- A good start to know Channel I/O in general: > + https://en.wikipedia.org/wiki/Channel_I/O > +- s390 architecture: > + s390 Principles of Operation manual (IBM Form. No. SA22-7832) > +- The existing Qemu code which implements a simple emulated channel > + subsystem could also be a good reference. It makes it easier to follow > + the flow. > + qemu/hw/s390x/css.c > + > +For vfio mediated device framework: > +- Documentation/vfio-mediated-device.txt > + > +Motivation of vfio-ccw > +---------------------- > + > +Currently, a guest virtualized via qemu/kvm on s390 only sees > +paravirtualized virtio devices via the "Virtio Over Channel I/O > +(virtio-ccw)" transport. This makes virtio devices discoverable via > +standard operating system algorithms for handling channel devices. > + > +However this is not enough. On s390 for the majority of devices, which > +use the standard Channel I/O based mechanism, we also need to provide > +the functionality of passing through them to a Qemu virtual machine. > +This includes devices that don't have a virtio counterpart (e.g. tape > +drives) or that have specific characteristics which guests want to > +exploit. > + > +For passing a device to a guest, we want to use the same interface as > +everybody else, namely vfio. Thus, we would like to introduce vfio > +support for channel devices. And we would like to name this new vfio > +device "vfio-ccw". > + > +Access patterns of CCW devices > +------------------------------ > + > +s390 architecture has implemented a so called channel subsystem, that > +provides a unified view of the devices physically attached to the > +systems. Though the s390 hardware platform knows about a huge variety of > +different peripheral attachments like disk devices (aka. DASDs), tapes, > +communication controllers, etc. They can all be accessed by a well > +defined access method and they are presenting I/O completion a unified > +way: I/O interruptions. > + > +All I/O requires the use of channel command words (CCWs). A CCW is an > +instruction to a specialized I/O channel processor. A channel program is > +a sequence of CCWs which are executed by the I/O channel subsystem. To > +issue a channel program to the channel subsystem, it is required to > +build an operation request block (ORB), which can be used to point out > +the format of the CCW and other control information to the system. The > +operating system signals the I/O channel subsystem to begin executing > +the channel program with a SSCH (start sub-channel) instruction. The > +central processor is then free to proceed with non-I/O instructions > +until interrupted. The I/O completion result is received by the > +interrupt handler in the form of interrupt response block (IRB). > + > +Back to vfio-ccw, in short: > +- ORBs and channel programs are built in guest kernel (with guest > + physical addresses). > +- ORBs and channel programs are passed to the host kernel. > +- Host kernel translates the guest physical addresses to real addresses > + and starts the I/O with issuing a privileged Channel I/O instruction > + (e.g SSCH). > +- channel programs run asynchronously on a separate processor. > +- I/O completion will be signaled to the host with I/O interruptions. > + And it will be copied as IRB to user space to pass it back to the > + guest. > + > +Physical vfio ccw device and its child mdev > +------------------------------------------- > + > +As mentioned above, we realize vfio-ccw with a mdev implementation. > + > +Channel I/O does not have IOMMU hardware support, so the physical > +vfio-ccw device does not have an IOMMU level translation or isolation. > + > +Sub-channel I/O instructions are all privileged instructions, When > +handling the I/O instruction interception, vfio-ccw has the software > +policing and translation how the channel program is programmed before > +it gets sent to hardware. > + > +Within this implementation, we have two drivers for two types of > +devices: > +- The vfio_ccw driver for the physical subchannel device. > + This is an I/O subchannel driver for the real subchannel device. It > + realizes a group of callbacks and registers to the mdev framework as a > + parent (physical) device. As a consequence, mdev provides vfio_ccw a > + generic interface (sysfs) to create mdev devices. A vfio mdev could be > + created by vfio_ccw then and added to the mediated bus. It is the vfio > + device that added to an IOMMU group and a vfio group. > + vfio_ccw also provides an I/O region to accept channel program > + request from user space and store I/O interrupt result for user > + space to retrieve. To notify user space an I/O completion, it offers > + an interface to setup an eventfd fd for asynchronous signaling. > + > +- The vfio_mdev driver for the mediated vfio ccw device. > + This is provided by the mdev framework. It is a vfio device driver for > + the mdev that created by vfio_ccw. > + It realize a group of vfio device driver callbacks, adds itself to a > + vfio group, and registers itself to the mdev framework as a mdev > + driver. > + It uses a vfio iommu backend that uses the existing map and unmap > + ioctls, but rather than programming them into an IOMMU for a device, > + it simply stores the translations for use by later requests. This > + means that a device programmed in a VM with guest physical addresses > + can have the vfio kernel convert that address to process virtual > + address, pin the page and program the hardware with the host physical > + address in one step. > + For a mdev, the vfio iommu backend will not pin the pages during the > + VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database > + of the iova<->vaddr mappings in this operation. And they export a > + vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu > + backend for the physical devices to pin and unpin pages by demand. > + > +Below is a high Level block diagram. > + > + +-------------+ > + | | > + | +---------+ | mdev_register_driver() +--------------+ > + | | Mdev | +<-----------------------+ | > + | | bus | | | vfio_mdev.ko | > + | | driver | +----------------------->+ |<-> VFIO user > + | +---------+ | probe()/remove() +--------------+ APIs > + | | > + | MDEV CORE | > + | MODULE | > + | mdev.ko | > + | +---------+ | mdev_register_device() +--------------+ > + | |Physical | +<-----------------------+ | > + | | device | | | vfio_ccw.ko |<-> subchannel > + | |interface| +----------------------->+ | device > + | +---------+ | callback +--------------+ > + +-------------+ > + > +The process of how these work together. > +1. vfio_ccw.ko drives the physical I/O subchannel, and registers the > + physical device (with callbacks) to mdev framework. > + When vfio_ccw probing the subchannel device, it registers device > + pointer and callbacks to the mdev framework. Mdev related file nodes > + under the device node in sysfs would be created for the subchannel > + device, namely 'mdev_create', 'mdev_destroy' and > + 'mdev_supported_types'. > +2. Create a mediated vfio ccw device. > + Use the 'mdev_create' sysfs file, we need to manually create one (and > + only one for our case) mediated device. > +3. vfio_mdev.ko drives the mediated ccw device. > + vfio_mdev is also the vfio device drvier. It will probe the mdev and > + add it to an iommu_group and a vfio_group. Then we could pass through > + the mdev to a guest. > + > +vfio-ccw I/O region > +------------------- > + > +An I/O region is used to accept channel program request from user > +space and store I/O interrupt result for user space to retrieve. The > +defination of the region is: > + > +struct ccw_io_region { > +#define ORB_AREA_SIZE 12 > + __u8 orb_area[ORB_AREA_SIZE]; > +#define SCSW_AREA_SIZE 12 > + __u8 scsw_area[SCSW_AREA_SIZE]; > +#define IRB_AREA_SIZE 96 > + __u8 irb_area[IRB_AREA_SIZE]; > + __u32 ret_code; > +} __packed; > + > +While starting an I/O request, orb_area should be filled with the > +guest ORB, and scsw_area should be filled with the SCSW of the Virtual > +Subchannel. > + > +irb_area stores the I/O result. > + > +ret_code stores a return code for each access of the region. Pardon if these questions expose my lack of familiarity with S390: So I/O requests are asynchronous, the user is notified via interrupt when completed, can more than one request be queued at a time? The communication format doesn't seem like it'd easily support that. Is it possible? A future enhancement that we should design for now? I'm also a little unclear what sort of I/O a user has access to via this interface and how the kernel polices that access. For instance, are multiple tape or disk devices available through a single I/O channel? How does the user configure which devices a user has access to when creating the vfio-ccw device? Otherwise I think the interface looks great. Thanks, Alex
* Alex Williamson <alex.williamson@redhat.com> [2017-03-21 12:47:16 -0600]: [...] > > +vfio-ccw I/O region > > +------------------- > > + > > +An I/O region is used to accept channel program request from user > > +space and store I/O interrupt result for user space to retrieve. The > > +defination of the region is: > > + > > +struct ccw_io_region { > > +#define ORB_AREA_SIZE 12 > > + __u8 orb_area[ORB_AREA_SIZE]; > > +#define SCSW_AREA_SIZE 12 > > + __u8 scsw_area[SCSW_AREA_SIZE]; > > +#define IRB_AREA_SIZE 96 > > + __u8 irb_area[IRB_AREA_SIZE]; > > + __u32 ret_code; > > +} __packed; > > + > > +While starting an I/O request, orb_area should be filled with the > > +guest ORB, and scsw_area should be filled with the SCSW of the Virtual > > +Subchannel. > > + > > +irb_area stores the I/O result. > > + > > +ret_code stores a return code for each access of the region. Hi Alex, > > Pardon if these questions expose my lack of familiarity with S390: > > So I/O requests are asynchronous, the user is notified via interrupt > when completed, can more than one request be queued at a time? The answer is no. The subchannel will stay in a state that prohibiting from a new request if there is processing for a previous request ongoing. And we need to issue an explit I/O instruction to retrieve and (or) clear the pending interruption before issue another I/O request. > The communication format doesn't seem like it'd easily support that. > Is it possible? A future enhancement that we should design for now? As the above statements said, it's not possible. > > I'm also a little unclear what sort of I/O a user has access to via > this interface and how the kernel polices that access. For instance, > are multiple tape or disk devices available through a single I/O > channel? No. An I/O subchannel is dedicated to one device, and... > How does the user configure which devices a user has access to when > creating the vfio-ccw device? ...this mapping is usually determined/configured before the machine startup by the administrtor of the upper level hypervisor. So when creating the vfio-ccw device, we do not configure/modify this mapping. When the guest I/O subchannel driver probing, it will issue a SENSE command on the subchannel to recognize/find what kind of device is behind the subchannel, and then it uses corresponding CCW device driver serving the I/O device. @Conny, please correct me if my understanding is not right. > > Otherwise I think the interface looks great. Thanks, This is good news. :> Thanks! > > Alex >
On Wed, 22 Mar 2017 10:34:22 +0800 Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote: > * Alex Williamson <alex.williamson@redhat.com> [2017-03-21 12:47:16 -0600]: > > [...] > > > > +vfio-ccw I/O region > > > +------------------- > > > + > > > +An I/O region is used to accept channel program request from user > > > +space and store I/O interrupt result for user space to retrieve. The > > > +defination of the region is: > > > + > > > +struct ccw_io_region { > > > +#define ORB_AREA_SIZE 12 > > > + __u8 orb_area[ORB_AREA_SIZE]; > > > +#define SCSW_AREA_SIZE 12 > > > + __u8 scsw_area[SCSW_AREA_SIZE]; > > > +#define IRB_AREA_SIZE 96 > > > + __u8 irb_area[IRB_AREA_SIZE]; > > > + __u32 ret_code; > > > +} __packed; > > > + > > > +While starting an I/O request, orb_area should be filled with the > > > +guest ORB, and scsw_area should be filled with the SCSW of the Virtual > > > +Subchannel. > > > + > > > +irb_area stores the I/O result. > > > + > > > +ret_code stores a return code for each access of the region. > Hi Alex, > > > > > Pardon if these questions expose my lack of familiarity with S390: > > > > So I/O requests are asynchronous, the user is notified via interrupt > > when completed, can more than one request be queued at a time? > The answer is no. The subchannel will stay in a state that prohibiting > from a new request if there is processing for a previous request > ongoing. And we need to issue an explit I/O instruction to retrieve and > (or) clear the pending interruption before issue another I/O request. > > > The communication format doesn't seem like it'd easily support that. > > Is it possible? A future enhancement that we should design for now? > As the above statements said, it's not possible. > > > > > I'm also a little unclear what sort of I/O a user has access to via > > this interface and how the kernel polices that access. For instance, > > are multiple tape or disk devices available through a single I/O > > channel? > No. An I/O subchannel is dedicated to one device, and... > > > How does the user configure which devices a user has access to when > > creating the vfio-ccw device? > ...this mapping is usually determined/configured before the machine > startup by the administrtor of the upper level hypervisor. So when > creating the vfio-ccw device, we do not configure/modify this mapping. > When the guest I/O subchannel driver probing, it will issue a SENSE > command on the subchannel to recognize/find what kind of device is > behind the subchannel, and then it uses corresponding CCW device driver > serving the I/O device. > > @Conny, please correct me if my understanding is not right. No, it's fine. FWIW: https://virtualpenguins.blogspot.com/2017/02/channel-io-demystified.html (I plan to write more in the future.) > > > > > Otherwise I think the interface looks great. Thanks, > This is good news. :> > > Thanks! > > > > Alex > > >
* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-03-28 10:16:23 +0200]: [...] > > > > @Conny, please correct me if my understanding is not right. > > No, it's fine. Thanks! > > FWIW: > https://virtualpenguins.blogspot.com/2017/02/channel-io-demystified.html > > (I plan to write more in the future.) That's a very good sharing. > [...]
diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX index 9189535..317f037 100644 --- a/Documentation/s390/00-INDEX +++ b/Documentation/s390/00-INDEX @@ -22,5 +22,7 @@ qeth.txt - HiperSockets Bridge Port Support. s390dbf.txt - information on using the s390 debug feature. +vfio-ccw.txt + information on the vfio-ccw I/O subchannel driver. zfcpdump.txt - information on the s390 SCSI dump tool. diff --git a/Documentation/s390/vfio-ccw.txt b/Documentation/s390/vfio-ccw.txt new file mode 100644 index 0000000..90b3dfe --- /dev/null +++ b/Documentation/s390/vfio-ccw.txt @@ -0,0 +1,303 @@ +vfio-ccw: the basic infrastructure +================================== + +Introduction +------------ + +Here we describe the vfio support for I/O subchannel devices for +Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a +virtual machine, while vfio is the means. + +Different than other hardware architectures, s390 has defined a unified +I/O access method, which is so called Channel I/O. It has its own access +patterns: +- Channel programs run asynchronously on a separate (co)processor. +- The channel subsystem will access any memory designated by the caller + in the channel program directly, i.e. there is no iommu involved. +Thus when we introduce vfio support for these devices, we realize it +with a mediated device (mdev) implementation. The vfio mdev will be +added to an iommu group, so as to make itself able to be managed by the +vfio framework. And we add read/write callbacks for special vfio I/O +regions to pass the channel programs from the mdev to its parent device +(the real I/O subchannel device) to do further address translation and +to perform I/O instructions. + +This document does not intend to explain the s390 I/O architecture in +every detail. More information/reference could be found here: +- A good start to know Channel I/O in general: + https://en.wikipedia.org/wiki/Channel_I/O +- s390 architecture: + s390 Principles of Operation manual (IBM Form. No. SA22-7832) +- The existing Qemu code which implements a simple emulated channel + subsystem could also be a good reference. It makes it easier to follow + the flow. + qemu/hw/s390x/css.c + +For vfio mediated device framework: +- Documentation/vfio-mediated-device.txt + +Motivation of vfio-ccw +---------------------- + +Currently, a guest virtualized via qemu/kvm on s390 only sees +paravirtualized virtio devices via the "Virtio Over Channel I/O +(virtio-ccw)" transport. This makes virtio devices discoverable via +standard operating system algorithms for handling channel devices. + +However this is not enough. On s390 for the majority of devices, which +use the standard Channel I/O based mechanism, we also need to provide +the functionality of passing through them to a Qemu virtual machine. +This includes devices that don't have a virtio counterpart (e.g. tape +drives) or that have specific characteristics which guests want to +exploit. + +For passing a device to a guest, we want to use the same interface as +everybody else, namely vfio. Thus, we would like to introduce vfio +support for channel devices. And we would like to name this new vfio +device "vfio-ccw". + +Access patterns of CCW devices +------------------------------ + +s390 architecture has implemented a so called channel subsystem, that +provides a unified view of the devices physically attached to the +systems. Though the s390 hardware platform knows about a huge variety of +different peripheral attachments like disk devices (aka. DASDs), tapes, +communication controllers, etc. They can all be accessed by a well +defined access method and they are presenting I/O completion a unified +way: I/O interruptions. + +All I/O requires the use of channel command words (CCWs). A CCW is an +instruction to a specialized I/O channel processor. A channel program is +a sequence of CCWs which are executed by the I/O channel subsystem. To +issue a channel program to the channel subsystem, it is required to +build an operation request block (ORB), which can be used to point out +the format of the CCW and other control information to the system. The +operating system signals the I/O channel subsystem to begin executing +the channel program with a SSCH (start sub-channel) instruction. The +central processor is then free to proceed with non-I/O instructions +until interrupted. The I/O completion result is received by the +interrupt handler in the form of interrupt response block (IRB). + +Back to vfio-ccw, in short: +- ORBs and channel programs are built in guest kernel (with guest + physical addresses). +- ORBs and channel programs are passed to the host kernel. +- Host kernel translates the guest physical addresses to real addresses + and starts the I/O with issuing a privileged Channel I/O instruction + (e.g SSCH). +- channel programs run asynchronously on a separate processor. +- I/O completion will be signaled to the host with I/O interruptions. + And it will be copied as IRB to user space to pass it back to the + guest. + +Physical vfio ccw device and its child mdev +------------------------------------------- + +As mentioned above, we realize vfio-ccw with a mdev implementation. + +Channel I/O does not have IOMMU hardware support, so the physical +vfio-ccw device does not have an IOMMU level translation or isolation. + +Sub-channel I/O instructions are all privileged instructions, When +handling the I/O instruction interception, vfio-ccw has the software +policing and translation how the channel program is programmed before +it gets sent to hardware. + +Within this implementation, we have two drivers for two types of +devices: +- The vfio_ccw driver for the physical subchannel device. + This is an I/O subchannel driver for the real subchannel device. It + realizes a group of callbacks and registers to the mdev framework as a + parent (physical) device. As a consequence, mdev provides vfio_ccw a + generic interface (sysfs) to create mdev devices. A vfio mdev could be + created by vfio_ccw then and added to the mediated bus. It is the vfio + device that added to an IOMMU group and a vfio group. + vfio_ccw also provides an I/O region to accept channel program + request from user space and store I/O interrupt result for user + space to retrieve. To notify user space an I/O completion, it offers + an interface to setup an eventfd fd for asynchronous signaling. + +- The vfio_mdev driver for the mediated vfio ccw device. + This is provided by the mdev framework. It is a vfio device driver for + the mdev that created by vfio_ccw. + It realize a group of vfio device driver callbacks, adds itself to a + vfio group, and registers itself to the mdev framework as a mdev + driver. + It uses a vfio iommu backend that uses the existing map and unmap + ioctls, but rather than programming them into an IOMMU for a device, + it simply stores the translations for use by later requests. This + means that a device programmed in a VM with guest physical addresses + can have the vfio kernel convert that address to process virtual + address, pin the page and program the hardware with the host physical + address in one step. + For a mdev, the vfio iommu backend will not pin the pages during the + VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database + of the iova<->vaddr mappings in this operation. And they export a + vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu + backend for the physical devices to pin and unpin pages by demand. + +Below is a high Level block diagram. + + +-------------+ + | | + | +---------+ | mdev_register_driver() +--------------+ + | | Mdev | +<-----------------------+ | + | | bus | | | vfio_mdev.ko | + | | driver | +----------------------->+ |<-> VFIO user + | +---------+ | probe()/remove() +--------------+ APIs + | | + | MDEV CORE | + | MODULE | + | mdev.ko | + | +---------+ | mdev_register_device() +--------------+ + | |Physical | +<-----------------------+ | + | | device | | | vfio_ccw.ko |<-> subchannel + | |interface| +----------------------->+ | device + | +---------+ | callback +--------------+ + +-------------+ + +The process of how these work together. +1. vfio_ccw.ko drives the physical I/O subchannel, and registers the + physical device (with callbacks) to mdev framework. + When vfio_ccw probing the subchannel device, it registers device + pointer and callbacks to the mdev framework. Mdev related file nodes + under the device node in sysfs would be created for the subchannel + device, namely 'mdev_create', 'mdev_destroy' and + 'mdev_supported_types'. +2. Create a mediated vfio ccw device. + Use the 'mdev_create' sysfs file, we need to manually create one (and + only one for our case) mediated device. +3. vfio_mdev.ko drives the mediated ccw device. + vfio_mdev is also the vfio device drvier. It will probe the mdev and + add it to an iommu_group and a vfio_group. Then we could pass through + the mdev to a guest. + +vfio-ccw I/O region +------------------- + +An I/O region is used to accept channel program request from user +space and store I/O interrupt result for user space to retrieve. The +defination of the region is: + +struct ccw_io_region { +#define ORB_AREA_SIZE 12 + __u8 orb_area[ORB_AREA_SIZE]; +#define SCSW_AREA_SIZE 12 + __u8 scsw_area[SCSW_AREA_SIZE]; +#define IRB_AREA_SIZE 96 + __u8 irb_area[IRB_AREA_SIZE]; + __u32 ret_code; +} __packed; + +While starting an I/O request, orb_area should be filled with the +guest ORB, and scsw_area should be filled with the SCSW of the Virtual +Subchannel. + +irb_area stores the I/O result. + +ret_code stores a return code for each access of the region. + +vfio-ccw patches overview +------------------------- + +For now, our patches are rebased on the latest mdev implementation. +vfio-ccw follows what vfio-pci did on the s390 paltform and uses +vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch +the code review for vfio-ccw. Note that the implementation is far from +complete yet; but we'd like to get feedback for the general +architecture. + +* CCW translation APIs +- Description: + These introduce a group of APIs (start with 'cp_') to do CCW + translation. The CCWs passed in by a user space program are + organized with their guest physical memory addresses. These APIs + will copy the CCWs into the kernel space, and assemble a runnable + kernel channel program by updating the guest physical addresses with + their corresponding host physical addresses. +- Patches: + vfio: ccw: introduce channel program interfaces + +* vfio_ccw device driver +- Description: + The following patches utilizes the CCW translation APIs and introduce + vfio_ccw, which is the driver for the I/O subchannel devices you want + to pass through. + vfio_ccw implements the following vfio ioctls: + VFIO_DEVICE_GET_INFO + VFIO_DEVICE_GET_IRQ_INFO + VFIO_DEVICE_GET_REGION_INFO + VFIO_DEVICE_RESET + VFIO_DEVICE_SET_IRQS + This provides an I/O region, so that the user space program can pass a + channel program to the kernel, to do further CCW translation before + issuing them to a real device. + This also provides the SET_IRQ ioctl to setup an event notifier to + notify the user space program the I/O completion in an asynchronous + way. +- Patches: + vfio: ccw: basic implementation for vfio_ccw driver + vfio: ccw: introduce ccw_io_region + vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl + vfio: ccw: realize VFIO_DEVICE_RESET ioctl + vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls + +The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a +good example to get understand how these patches work. Here is a little +bit more detail how an I/O request triggered by the Qemu guest will be +handled (without error handling). + +Explanation: +Q1-Q7: Qemu side process. +K1-K5: Kernel side process. + +Q1. Get I/O region info during initialization. +Q2. Setup event notifier and handler to handle I/O completion. + +... ... + +Q3. Intercept a ssch instruction. +Q4. Write the guest channel program and ORB to the I/O region. + K1. Copy from guest to kernel. + K2. Translate the guest channel program to a host kernel space + channel program, which becomes runnable for a real device. + K3. With the necessary information contained in the orb passed in + by Qemu, issue the ccwchain to the device. + K4. Return the ssch CC code. +Q5. Return the CC code to the guest. + +... ... + + K5. Interrupt handler gets the I/O result and write the result to + the I/O region. + K6. Signal Qemu to retrieve the result. +Q6. Get the signal and event handler reads out the result from the I/O + region. +Q7. Update the irb for the guest. + +Limitations +----------- + +The current vfio-ccw implementation focuses on supporting basic commands +needed to implement block device functionality (read/write) of DASD/ECKD +device only. Some commands may need special handling in the future, for +example, anything related to path grouping. + +DASD is a kind of storage device. While ECKD is a data recording format. +More information for DASD and ECKD could be found here: +https://en.wikipedia.org/wiki/Direct-access_storage_device +https://en.wikipedia.org/wiki/Count_key_data + +Together with the corresponding work in Qemu, we can bring the passed +through DASD/ECKD device online in a guest now and use it as a block +device. + +Reference +--------- +1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832) +2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204) +3. https://en.wikipedia.org/wiki/Channel_I/O +4. Documentation/s390/cds.txt +5. Documentation/vfio.txt +6. Documentation/vfio-mediated-device.txt