diff mbox

[v4,14/16] docs: add documentation for vfio-ccw

Message ID 20170317031743.40128-15-bjsdjshi@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dong Jia Shi March 17, 2017, 3:17 a.m. UTC
Add file Documentation/s390/vfio-ccw.txt that includes details
of vfio-ccw.

Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 Documentation/s390/00-INDEX     |   2 +
 Documentation/s390/vfio-ccw.txt | 303 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 305 insertions(+)
 create mode 100644 Documentation/s390/vfio-ccw.txt

Comments

Alex Williamson March 21, 2017, 6:47 p.m. UTC | #1
On Fri, 17 Mar 2017 04:17:41 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> Add file Documentation/s390/vfio-ccw.txt that includes details
> of vfio-ccw.
> 
> Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> ---
>  Documentation/s390/00-INDEX     |   2 +
>  Documentation/s390/vfio-ccw.txt | 303 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 305 insertions(+)
>  create mode 100644 Documentation/s390/vfio-ccw.txt
> 
> diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX
> index 9189535..317f037 100644
> --- a/Documentation/s390/00-INDEX
> +++ b/Documentation/s390/00-INDEX
> @@ -22,5 +22,7 @@ qeth.txt
>  	- HiperSockets Bridge Port Support.
>  s390dbf.txt
>  	- information on using the s390 debug feature.
> +vfio-ccw.txt
> +	  information on the vfio-ccw I/O subchannel driver.
>  zfcpdump.txt
>  	- information on the s390 SCSI dump tool.
> diff --git a/Documentation/s390/vfio-ccw.txt b/Documentation/s390/vfio-ccw.txt
> new file mode 100644
> index 0000000..90b3dfe
> --- /dev/null
> +++ b/Documentation/s390/vfio-ccw.txt
> @@ -0,0 +1,303 @@
> +vfio-ccw: the basic infrastructure
> +==================================
> +
> +Introduction
> +------------
> +
> +Here we describe the vfio support for I/O subchannel devices for
> +Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
> +virtual machine, while vfio is the means.
> +
> +Different than other hardware architectures, s390 has defined a unified
> +I/O access method, which is so called Channel I/O. It has its own access
> +patterns:
> +- Channel programs run asynchronously on a separate (co)processor.
> +- The channel subsystem will access any memory designated by the caller
> +  in the channel program directly, i.e. there is no iommu involved.
> +Thus when we introduce vfio support for these devices, we realize it
> +with a mediated device (mdev) implementation. The vfio mdev will be
> +added to an iommu group, so as to make itself able to be managed by the
> +vfio framework. And we add read/write callbacks for special vfio I/O
> +regions to pass the channel programs from the mdev to its parent device
> +(the real I/O subchannel device) to do further address translation and
> +to perform I/O instructions.
> +
> +This document does not intend to explain the s390 I/O architecture in
> +every detail. More information/reference could be found here:
> +- A good start to know Channel I/O in general:
> +  https://en.wikipedia.org/wiki/Channel_I/O
> +- s390 architecture:
> +  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
> +- The existing Qemu code which implements a simple emulated channel
> +  subsystem could also be a good reference. It makes it easier to follow
> +  the flow.
> +  qemu/hw/s390x/css.c
> +
> +For vfio mediated device framework:
> +- Documentation/vfio-mediated-device.txt
> +
> +Motivation of vfio-ccw
> +----------------------
> +
> +Currently, a guest virtualized via qemu/kvm on s390 only sees
> +paravirtualized virtio devices via the "Virtio Over Channel I/O
> +(virtio-ccw)" transport. This makes virtio devices discoverable via
> +standard operating system algorithms for handling channel devices.
> +
> +However this is not enough. On s390 for the majority of devices, which
> +use the standard Channel I/O based mechanism, we also need to provide
> +the functionality of passing through them to a Qemu virtual machine.
> +This includes devices that don't have a virtio counterpart (e.g. tape
> +drives) or that have specific characteristics which guests want to
> +exploit.
> +
> +For passing a device to a guest, we want to use the same interface as
> +everybody else, namely vfio. Thus, we would like to introduce vfio
> +support for channel devices. And we would like to name this new vfio
> +device "vfio-ccw".
> +
> +Access patterns of CCW devices
> +------------------------------
> +
> +s390 architecture has implemented a so called channel subsystem, that
> +provides a unified view of the devices physically attached to the
> +systems. Though the s390 hardware platform knows about a huge variety of
> +different peripheral attachments like disk devices (aka. DASDs), tapes,
> +communication controllers, etc. They can all be accessed by a well
> +defined access method and they are presenting I/O completion a unified
> +way: I/O interruptions.
> +
> +All I/O requires the use of channel command words (CCWs). A CCW is an
> +instruction to a specialized I/O channel processor. A channel program is
> +a sequence of CCWs which are executed by the I/O channel subsystem.  To
> +issue a channel program to the channel subsystem, it is required to
> +build an operation request block (ORB), which can be used to point out
> +the format of the CCW and other control information to the system. The
> +operating system signals the I/O channel subsystem to begin executing
> +the channel program with a SSCH (start sub-channel) instruction. The
> +central processor is then free to proceed with non-I/O instructions
> +until interrupted. The I/O completion result is received by the
> +interrupt handler in the form of interrupt response block (IRB).
> +
> +Back to vfio-ccw, in short:
> +- ORBs and channel programs are built in guest kernel (with guest
> +  physical addresses).
> +- ORBs and channel programs are passed to the host kernel.
> +- Host kernel translates the guest physical addresses to real addresses
> +  and starts the I/O with issuing a privileged Channel I/O instruction
> +  (e.g SSCH).
> +- channel programs run asynchronously on a separate processor.
> +- I/O completion will be signaled to the host with I/O interruptions.
> +  And it will be copied as IRB to user space to pass it back to the
> +  guest.
> +
> +Physical vfio ccw device and its child mdev
> +-------------------------------------------
> +
> +As mentioned above, we realize vfio-ccw with a mdev implementation.
> +
> +Channel I/O does not have IOMMU hardware support, so the physical
> +vfio-ccw device does not have an IOMMU level translation or isolation.
> +
> +Sub-channel I/O instructions are all privileged instructions, When
> +handling the I/O instruction interception, vfio-ccw has the software
> +policing and translation how the channel program is programmed before
> +it gets sent to hardware.
> +
> +Within this implementation, we have two drivers for two types of
> +devices:
> +- The vfio_ccw driver for the physical subchannel device.
> +  This is an I/O subchannel driver for the real subchannel device.  It
> +  realizes a group of callbacks and registers to the mdev framework as a
> +  parent (physical) device. As a consequence, mdev provides vfio_ccw a
> +  generic interface (sysfs) to create mdev devices. A vfio mdev could be
> +  created by vfio_ccw then and added to the mediated bus. It is the vfio
> +  device that added to an IOMMU group and a vfio group.
> +  vfio_ccw also provides an I/O region to accept channel program
> +  request from user space and store I/O interrupt result for user
> +  space to retrieve. To notify user space an I/O completion, it offers
> +  an interface to setup an eventfd fd for asynchronous signaling.
> +
> +- The vfio_mdev driver for the mediated vfio ccw device.
> +  This is provided by the mdev framework. It is a vfio device driver for
> +  the mdev that created by vfio_ccw.
> +  It realize a group of vfio device driver callbacks, adds itself to a
> +  vfio group, and registers itself to the mdev framework as a mdev
> +  driver.
> +  It uses a vfio iommu backend that uses the existing map and unmap
> +  ioctls, but rather than programming them into an IOMMU for a device,
> +  it simply stores the translations for use by later requests. This
> +  means that a device programmed in a VM with guest physical addresses
> +  can have the vfio kernel convert that address to process virtual
> +  address, pin the page and program the hardware with the host physical
> +  address in one step.
> +  For a mdev, the vfio iommu backend will not pin the pages during the
> +  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
> +  of the iova<->vaddr mappings in this operation. And they export a
> +  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
> +  backend for the physical devices to pin and unpin pages by demand.
> +
> +Below is a high Level block diagram.
> +
> + +-------------+
> + |             |
> + | +---------+ | mdev_register_driver() +--------------+
> + | |  Mdev   | +<-----------------------+              |
> + | |  bus    | |                        | vfio_mdev.ko |
> + | | driver  | +----------------------->+              |<-> VFIO user
> + | +---------+ |    probe()/remove()    +--------------+    APIs
> + |             |
> + |  MDEV CORE  |
> + |   MODULE    |
> + |   mdev.ko   |
> + | +---------+ | mdev_register_device() +--------------+
> + | |Physical | +<-----------------------+              |
> + | | device  | |                        |  vfio_ccw.ko |<-> subchannel
> + | |interface| +----------------------->+              |     device
> + | +---------+ |       callback         +--------------+
> + +-------------+
> +
> +The process of how these work together.
> +1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
> +   physical device (with callbacks) to mdev framework.
> +   When vfio_ccw probing the subchannel device, it registers device
> +   pointer and callbacks to the mdev framework. Mdev related file nodes
> +   under the device node in sysfs would be created for the subchannel
> +   device, namely 'mdev_create', 'mdev_destroy' and
> +   'mdev_supported_types'.
> +2. Create a mediated vfio ccw device.
> +   Use the 'mdev_create' sysfs file, we need to manually create one (and
> +   only one for our case) mediated device.
> +3. vfio_mdev.ko drives the mediated ccw device.
> +   vfio_mdev is also the vfio device drvier. It will probe the mdev and
> +   add it to an iommu_group and a vfio_group. Then we could pass through
> +   the mdev to a guest.
> +
> +vfio-ccw I/O region
> +-------------------
> +
> +An I/O region is used to accept channel program request from user
> +space and store I/O interrupt result for user space to retrieve. The
> +defination of the region is:
> +
> +struct ccw_io_region {
> +#define ORB_AREA_SIZE 12
> +	__u8	orb_area[ORB_AREA_SIZE];
> +#define SCSW_AREA_SIZE 12
> +	__u8	scsw_area[SCSW_AREA_SIZE];
> +#define IRB_AREA_SIZE 96
> +	__u8	irb_area[IRB_AREA_SIZE];
> +	__u32	ret_code;
> +} __packed;
> +
> +While starting an I/O request, orb_area should be filled with the
> +guest ORB, and scsw_area should be filled with the SCSW of the Virtual
> +Subchannel.
> +
> +irb_area stores the I/O result.
> +
> +ret_code stores a return code for each access of the region.

Pardon if these questions expose my lack of familiarity with S390:

So I/O requests are asynchronous, the user is notified via interrupt
when completed, can more than one request be queued at a time?  The
communication format doesn't seem like it'd easily support that.  Is it
possible?  A future enhancement that we should design for now?

I'm also a little unclear what sort of I/O a user has access to via
this interface and how the kernel polices that access.  For instance,
are multiple tape or disk devices available through a single I/O
channel?  How does the user configure which devices a user has access
to when creating the vfio-ccw device?

Otherwise I think the interface looks great.  Thanks,

Alex
Dong Jia Shi March 22, 2017, 2:34 a.m. UTC | #2
* Alex Williamson <alex.williamson@redhat.com> [2017-03-21 12:47:16 -0600]:

[...]

> > +vfio-ccw I/O region
> > +-------------------
> > +
> > +An I/O region is used to accept channel program request from user
> > +space and store I/O interrupt result for user space to retrieve. The
> > +defination of the region is:
> > +
> > +struct ccw_io_region {
> > +#define ORB_AREA_SIZE 12
> > +	__u8	orb_area[ORB_AREA_SIZE];
> > +#define SCSW_AREA_SIZE 12
> > +	__u8	scsw_area[SCSW_AREA_SIZE];
> > +#define IRB_AREA_SIZE 96
> > +	__u8	irb_area[IRB_AREA_SIZE];
> > +	__u32	ret_code;
> > +} __packed;
> > +
> > +While starting an I/O request, orb_area should be filled with the
> > +guest ORB, and scsw_area should be filled with the SCSW of the Virtual
> > +Subchannel.
> > +
> > +irb_area stores the I/O result.
> > +
> > +ret_code stores a return code for each access of the region.
Hi Alex,

> 
> Pardon if these questions expose my lack of familiarity with S390:
> 
> So I/O requests are asynchronous, the user is notified via interrupt
> when completed, can more than one request be queued at a time?
The answer is no. The subchannel will stay in a state that prohibiting
from a new request if there is processing for a previous request
ongoing. And we need to issue an explit I/O instruction to retrieve and
(or) clear the pending interruption before issue another I/O request.

> The communication format doesn't seem like it'd easily support that.
> Is it possible?  A future enhancement that we should design for now?
As the above statements said, it's not possible.

> 
> I'm also a little unclear what sort of I/O a user has access to via
> this interface and how the kernel polices that access. For instance,
> are multiple tape or disk devices available through a single I/O
> channel?
No. An I/O subchannel is dedicated to one device, and...

> How does the user configure which devices a user has access to when
> creating the vfio-ccw device?
...this mapping is usually determined/configured before the machine
startup by the administrtor of the upper level hypervisor. So when
creating the vfio-ccw device, we do not configure/modify this mapping.
When the guest I/O subchannel driver probing, it will issue a SENSE
command on the subchannel to recognize/find what kind of device is
behind the subchannel, and then it uses corresponding CCW device driver
serving the I/O device.

@Conny, please correct me if my understanding is not right.

> 
> Otherwise I think the interface looks great.  Thanks,
This is good news. :>

Thanks!
> 
> Alex
>
Cornelia Huck March 28, 2017, 8:16 a.m. UTC | #3
On Wed, 22 Mar 2017 10:34:22 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Alex Williamson <alex.williamson@redhat.com> [2017-03-21 12:47:16 -0600]:
> 
> [...]
> 
> > > +vfio-ccw I/O region
> > > +-------------------
> > > +
> > > +An I/O region is used to accept channel program request from user
> > > +space and store I/O interrupt result for user space to retrieve. The
> > > +defination of the region is:
> > > +
> > > +struct ccw_io_region {
> > > +#define ORB_AREA_SIZE 12
> > > +	__u8	orb_area[ORB_AREA_SIZE];
> > > +#define SCSW_AREA_SIZE 12
> > > +	__u8	scsw_area[SCSW_AREA_SIZE];
> > > +#define IRB_AREA_SIZE 96
> > > +	__u8	irb_area[IRB_AREA_SIZE];
> > > +	__u32	ret_code;
> > > +} __packed;
> > > +
> > > +While starting an I/O request, orb_area should be filled with the
> > > +guest ORB, and scsw_area should be filled with the SCSW of the Virtual
> > > +Subchannel.
> > > +
> > > +irb_area stores the I/O result.
> > > +
> > > +ret_code stores a return code for each access of the region.
> Hi Alex,
> 
> > 
> > Pardon if these questions expose my lack of familiarity with S390:
> > 
> > So I/O requests are asynchronous, the user is notified via interrupt
> > when completed, can more than one request be queued at a time?
> The answer is no. The subchannel will stay in a state that prohibiting
> from a new request if there is processing for a previous request
> ongoing. And we need to issue an explit I/O instruction to retrieve and
> (or) clear the pending interruption before issue another I/O request.
> 
> > The communication format doesn't seem like it'd easily support that.
> > Is it possible?  A future enhancement that we should design for now?
> As the above statements said, it's not possible.
> 
> > 
> > I'm also a little unclear what sort of I/O a user has access to via
> > this interface and how the kernel polices that access. For instance,
> > are multiple tape or disk devices available through a single I/O
> > channel?
> No. An I/O subchannel is dedicated to one device, and...
> 
> > How does the user configure which devices a user has access to when
> > creating the vfio-ccw device?
> ...this mapping is usually determined/configured before the machine
> startup by the administrtor of the upper level hypervisor. So when
> creating the vfio-ccw device, we do not configure/modify this mapping.
> When the guest I/O subchannel driver probing, it will issue a SENSE
> command on the subchannel to recognize/find what kind of device is
> behind the subchannel, and then it uses corresponding CCW device driver
> serving the I/O device.
> 
> @Conny, please correct me if my understanding is not right.

No, it's fine.

FWIW:
https://virtualpenguins.blogspot.com/2017/02/channel-io-demystified.html

(I plan to write more in the future.)

> 
> > 
> > Otherwise I think the interface looks great.  Thanks,
> This is good news. :>
> 
> Thanks!
> > 
> > Alex
> > 
>
Dong Jia Shi March 28, 2017, 8:49 a.m. UTC | #4
* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-03-28 10:16:23 +0200]:

[...]
> > 
> > @Conny, please correct me if my understanding is not right.
> 
> No, it's fine.
Thanks!

> 
> FWIW:
> https://virtualpenguins.blogspot.com/2017/02/channel-io-demystified.html
> 
> (I plan to write more in the future.)
That's a very good sharing.

> 
[...]
diff mbox

Patch

diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX
index 9189535..317f037 100644
--- a/Documentation/s390/00-INDEX
+++ b/Documentation/s390/00-INDEX
@@ -22,5 +22,7 @@  qeth.txt
 	- HiperSockets Bridge Port Support.
 s390dbf.txt
 	- information on using the s390 debug feature.
+vfio-ccw.txt
+	  information on the vfio-ccw I/O subchannel driver.
 zfcpdump.txt
 	- information on the s390 SCSI dump tool.
diff --git a/Documentation/s390/vfio-ccw.txt b/Documentation/s390/vfio-ccw.txt
new file mode 100644
index 0000000..90b3dfe
--- /dev/null
+++ b/Documentation/s390/vfio-ccw.txt
@@ -0,0 +1,303 @@ 
+vfio-ccw: the basic infrastructure
+==================================
+
+Introduction
+------------
+
+Here we describe the vfio support for I/O subchannel devices for
+Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
+virtual machine, while vfio is the means.
+
+Different than other hardware architectures, s390 has defined a unified
+I/O access method, which is so called Channel I/O. It has its own access
+patterns:
+- Channel programs run asynchronously on a separate (co)processor.
+- The channel subsystem will access any memory designated by the caller
+  in the channel program directly, i.e. there is no iommu involved.
+Thus when we introduce vfio support for these devices, we realize it
+with a mediated device (mdev) implementation. The vfio mdev will be
+added to an iommu group, so as to make itself able to be managed by the
+vfio framework. And we add read/write callbacks for special vfio I/O
+regions to pass the channel programs from the mdev to its parent device
+(the real I/O subchannel device) to do further address translation and
+to perform I/O instructions.
+
+This document does not intend to explain the s390 I/O architecture in
+every detail. More information/reference could be found here:
+- A good start to know Channel I/O in general:
+  https://en.wikipedia.org/wiki/Channel_I/O
+- s390 architecture:
+  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+- The existing Qemu code which implements a simple emulated channel
+  subsystem could also be a good reference. It makes it easier to follow
+  the flow.
+  qemu/hw/s390x/css.c
+
+For vfio mediated device framework:
+- Documentation/vfio-mediated-device.txt
+
+Motivation of vfio-ccw
+----------------------
+
+Currently, a guest virtualized via qemu/kvm on s390 only sees
+paravirtualized virtio devices via the "Virtio Over Channel I/O
+(virtio-ccw)" transport. This makes virtio devices discoverable via
+standard operating system algorithms for handling channel devices.
+
+However this is not enough. On s390 for the majority of devices, which
+use the standard Channel I/O based mechanism, we also need to provide
+the functionality of passing through them to a Qemu virtual machine.
+This includes devices that don't have a virtio counterpart (e.g. tape
+drives) or that have specific characteristics which guests want to
+exploit.
+
+For passing a device to a guest, we want to use the same interface as
+everybody else, namely vfio. Thus, we would like to introduce vfio
+support for channel devices. And we would like to name this new vfio
+device "vfio-ccw".
+
+Access patterns of CCW devices
+------------------------------
+
+s390 architecture has implemented a so called channel subsystem, that
+provides a unified view of the devices physically attached to the
+systems. Though the s390 hardware platform knows about a huge variety of
+different peripheral attachments like disk devices (aka. DASDs), tapes,
+communication controllers, etc. They can all be accessed by a well
+defined access method and they are presenting I/O completion a unified
+way: I/O interruptions.
+
+All I/O requires the use of channel command words (CCWs). A CCW is an
+instruction to a specialized I/O channel processor. A channel program is
+a sequence of CCWs which are executed by the I/O channel subsystem.  To
+issue a channel program to the channel subsystem, it is required to
+build an operation request block (ORB), which can be used to point out
+the format of the CCW and other control information to the system. The
+operating system signals the I/O channel subsystem to begin executing
+the channel program with a SSCH (start sub-channel) instruction. The
+central processor is then free to proceed with non-I/O instructions
+until interrupted. The I/O completion result is received by the
+interrupt handler in the form of interrupt response block (IRB).
+
+Back to vfio-ccw, in short:
+- ORBs and channel programs are built in guest kernel (with guest
+  physical addresses).
+- ORBs and channel programs are passed to the host kernel.
+- Host kernel translates the guest physical addresses to real addresses
+  and starts the I/O with issuing a privileged Channel I/O instruction
+  (e.g SSCH).
+- channel programs run asynchronously on a separate processor.
+- I/O completion will be signaled to the host with I/O interruptions.
+  And it will be copied as IRB to user space to pass it back to the
+  guest.
+
+Physical vfio ccw device and its child mdev
+-------------------------------------------
+
+As mentioned above, we realize vfio-ccw with a mdev implementation.
+
+Channel I/O does not have IOMMU hardware support, so the physical
+vfio-ccw device does not have an IOMMU level translation or isolation.
+
+Sub-channel I/O instructions are all privileged instructions, When
+handling the I/O instruction interception, vfio-ccw has the software
+policing and translation how the channel program is programmed before
+it gets sent to hardware.
+
+Within this implementation, we have two drivers for two types of
+devices:
+- The vfio_ccw driver for the physical subchannel device.
+  This is an I/O subchannel driver for the real subchannel device.  It
+  realizes a group of callbacks and registers to the mdev framework as a
+  parent (physical) device. As a consequence, mdev provides vfio_ccw a
+  generic interface (sysfs) to create mdev devices. A vfio mdev could be
+  created by vfio_ccw then and added to the mediated bus. It is the vfio
+  device that added to an IOMMU group and a vfio group.
+  vfio_ccw also provides an I/O region to accept channel program
+  request from user space and store I/O interrupt result for user
+  space to retrieve. To notify user space an I/O completion, it offers
+  an interface to setup an eventfd fd for asynchronous signaling.
+
+- The vfio_mdev driver for the mediated vfio ccw device.
+  This is provided by the mdev framework. It is a vfio device driver for
+  the mdev that created by vfio_ccw.
+  It realize a group of vfio device driver callbacks, adds itself to a
+  vfio group, and registers itself to the mdev framework as a mdev
+  driver.
+  It uses a vfio iommu backend that uses the existing map and unmap
+  ioctls, but rather than programming them into an IOMMU for a device,
+  it simply stores the translations for use by later requests. This
+  means that a device programmed in a VM with guest physical addresses
+  can have the vfio kernel convert that address to process virtual
+  address, pin the page and program the hardware with the host physical
+  address in one step.
+  For a mdev, the vfio iommu backend will not pin the pages during the
+  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
+  of the iova<->vaddr mappings in this operation. And they export a
+  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
+  backend for the physical devices to pin and unpin pages by demand.
+
+Below is a high Level block diagram.
+
+ +-------------+
+ |             |
+ | +---------+ | mdev_register_driver() +--------------+
+ | |  Mdev   | +<-----------------------+              |
+ | |  bus    | |                        | vfio_mdev.ko |
+ | | driver  | +----------------------->+              |<-> VFIO user
+ | +---------+ |    probe()/remove()    +--------------+    APIs
+ |             |
+ |  MDEV CORE  |
+ |   MODULE    |
+ |   mdev.ko   |
+ | +---------+ | mdev_register_device() +--------------+
+ | |Physical | +<-----------------------+              |
+ | | device  | |                        |  vfio_ccw.ko |<-> subchannel
+ | |interface| +----------------------->+              |     device
+ | +---------+ |       callback         +--------------+
+ +-------------+
+
+The process of how these work together.
+1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
+   physical device (with callbacks) to mdev framework.
+   When vfio_ccw probing the subchannel device, it registers device
+   pointer and callbacks to the mdev framework. Mdev related file nodes
+   under the device node in sysfs would be created for the subchannel
+   device, namely 'mdev_create', 'mdev_destroy' and
+   'mdev_supported_types'.
+2. Create a mediated vfio ccw device.
+   Use the 'mdev_create' sysfs file, we need to manually create one (and
+   only one for our case) mediated device.
+3. vfio_mdev.ko drives the mediated ccw device.
+   vfio_mdev is also the vfio device drvier. It will probe the mdev and
+   add it to an iommu_group and a vfio_group. Then we could pass through
+   the mdev to a guest.
+
+vfio-ccw I/O region
+-------------------
+
+An I/O region is used to accept channel program request from user
+space and store I/O interrupt result for user space to retrieve. The
+defination of the region is:
+
+struct ccw_io_region {
+#define ORB_AREA_SIZE 12
+	__u8	orb_area[ORB_AREA_SIZE];
+#define SCSW_AREA_SIZE 12
+	__u8	scsw_area[SCSW_AREA_SIZE];
+#define IRB_AREA_SIZE 96
+	__u8	irb_area[IRB_AREA_SIZE];
+	__u32	ret_code;
+} __packed;
+
+While starting an I/O request, orb_area should be filled with the
+guest ORB, and scsw_area should be filled with the SCSW of the Virtual
+Subchannel.
+
+irb_area stores the I/O result.
+
+ret_code stores a return code for each access of the region.
+
+vfio-ccw patches overview
+-------------------------
+
+For now, our patches are rebased on the latest mdev implementation.
+vfio-ccw follows what vfio-pci did on the s390 paltform and uses
+vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
+the code review for vfio-ccw. Note that the implementation is far from
+complete yet; but we'd like to get feedback for the general
+architecture.
+
+* CCW translation APIs
+- Description:
+  These introduce a group of APIs (start with 'cp_') to do CCW
+  translation. The CCWs passed in by a user space program are
+  organized with their guest physical memory addresses. These APIs
+  will copy the CCWs into the kernel space, and assemble a runnable
+  kernel channel program by updating the guest physical addresses with
+  their corresponding host physical addresses.
+- Patches:
+  vfio: ccw: introduce channel program interfaces
+
+* vfio_ccw device driver
+- Description:
+  The following patches utilizes the CCW translation APIs and introduce
+  vfio_ccw, which is the driver for the I/O subchannel devices you want
+  to pass through.
+  vfio_ccw implements the following vfio ioctls:
+    VFIO_DEVICE_GET_INFO
+    VFIO_DEVICE_GET_IRQ_INFO
+    VFIO_DEVICE_GET_REGION_INFO
+    VFIO_DEVICE_RESET
+    VFIO_DEVICE_SET_IRQS
+  This provides an I/O region, so that the user space program can pass a
+  channel program to the kernel, to do further CCW translation before
+  issuing them to a real device.
+  This also provides the SET_IRQ ioctl to setup an event notifier to
+  notify the user space program the I/O completion in an asynchronous
+  way.
+- Patches:
+  vfio: ccw: basic implementation for vfio_ccw driver
+  vfio: ccw: introduce ccw_io_region
+  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
+  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
+  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
+
+The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
+good example to get understand how these patches work. Here is a little
+bit more detail how an I/O request triggered by the Qemu guest will be
+handled (without error handling).
+
+Explanation:
+Q1-Q7: Qemu side process.
+K1-K5: Kernel side process.
+
+Q1. Get I/O region info during initialization.
+Q2. Setup event notifier and handler to handle I/O completion.
+
+... ...
+
+Q3. Intercept a ssch instruction.
+Q4. Write the guest channel program and ORB to the I/O region.
+    K1. Copy from guest to kernel.
+    K2. Translate the guest channel program to a host kernel space
+        channel program, which becomes runnable for a real device.
+    K3. With the necessary information contained in the orb passed in
+        by Qemu, issue the ccwchain to the device.
+    K4. Return the ssch CC code.
+Q5. Return the CC code to the guest.
+
+... ...
+
+    K5. Interrupt handler gets the I/O result and write the result to
+        the I/O region.
+    K6. Signal Qemu to retrieve the result.
+Q6. Get the signal and event handler reads out the result from the I/O
+    region.
+Q7. Update the irb for the guest.
+
+Limitations
+-----------
+
+The current vfio-ccw implementation focuses on supporting basic commands
+needed to implement block device functionality (read/write) of DASD/ECKD
+device only. Some commands may need special handling in the future, for
+example, anything related to path grouping.
+
+DASD is a kind of storage device. While ECKD is a data recording format.
+More information for DASD and ECKD could be found here:
+https://en.wikipedia.org/wiki/Direct-access_storage_device
+https://en.wikipedia.org/wiki/Count_key_data
+
+Together with the corresponding work in Qemu, we can bring the passed
+through DASD/ECKD device online in a guest now and use it as a block
+device.
+
+Reference
+---------
+1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
+3. https://en.wikipedia.org/wiki/Channel_I/O
+4. Documentation/s390/cds.txt
+5. Documentation/vfio.txt
+6. Documentation/vfio-mediated-device.txt