[6/6] spec/vhost-user spec: Add IOMMU support

Message ID	20170511123246.31308-7-maxime.coquelin@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E3A3980C22 From: Maxime Coquelin <maxime.coquelin@redhat.com> To: mst@redhat.com, peterx@redhat.com, marcandre.lureau@gmail.com, vkaplans@redhat.com, jasowang@redhat.com, wexu@redhat.com, yuanhan.liu@linux.intel.com, qemu-devel@nongnu.org, jfreiman@redhat.com Date: Thu, 11 May 2017 14:32:46 +0200 Message-Id: <20170511123246.31308-7-maxime.coquelin@redhat.com> In-Reply-To: <20170511123246.31308-1-maxime.coquelin@redhat.com> References: <20170511123246.31308-1-maxime.coquelin@redhat.com> Subject: [Qemu-devel] [PATCH 6/6] spec/vhost-user spec: Add IOMMU support Precedence: list Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Maxime Coquelin May 11, 2017, 12:32 p.m. UTC

This patch specifies and implements the master/slave communication
to support device IOTLB in slave.

The vhost_iotlb_msg structure introduced for kernel backends is
re-used, making the design close between the two backends.

An exception is the use of the secondary channel to enable the
slave to send IOTLB miss requests to the master.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
 2 files changed, 106 insertions(+)

Michael S. Tsirkin May 11, 2017, 6:25 p.m. UTC | #1

On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
> This patch specifies and implements the master/slave communication
> to support device IOTLB in slave.
> 
> The vhost_iotlb_msg structure introduced for kernel backends is
> re-used, making the design close between the two backends.
> 
> An exception is the use of the secondary channel to enable the
> slave to send IOTLB miss requests to the master.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
>  hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>  2 files changed, 106 insertions(+)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 5fa7016..4a1f0c3 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>     log offset: offset from start of supplied file descriptor
>         where logging starts (i.e. where guest address 0 would be logged)
>  
> + * An IOTLB message
> +   ---------------------------------------------------------
> +   | iova | size | user address | permissions flags | type |
> +   ---------------------------------------------------------
> +
> +   IOVA: a 64-bit guest I/O virtual address

guest -> VM

> +   Size: a 64-bit size

How do you specify "all memory"? give special meaning to size 0?

> +   User address: a 64-bit user address
> +   Permissions flags: a 8-bit bit field:
> +    - Bit 0: Read access
> +    - Bit 1: Write access

Can both bits be set? Can none?

> +   Type: a 8-bit IOTLB message type:
> +    - 1: IOTLB miss
> +    - 2: IOTLB update
> +    - 3: IOTLB invalidate
> +    - 4: IOTLB access fail
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -109,6 +126,7 @@ typedef struct VhostUserMsg {
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
>          VhostUserLog log;
> +        struct vhost_iotlb_msg iotlb;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -253,6 +271,31 @@ Once the source has finished migration, rings will be stopped by
>  the source. No further update must be done before rings are
>  restarted.
>  
> +IOMMU support
> +-------------
> +
> +When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
> +to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
> +requests to the slave with a struct vhost_iotlb_msg payload.

Always? This seems a bit strange since iommu can be enabled/disabled
dynamically. Closing channel seems like a wrong thing to do for this.

> For update events,
> +the iotlb payload has to be filled with the update message type (2), the I/O
> +virtual address, the size, the user virtual address, and the permissions
> +flags. For invalidation events, the iotlb payload has to be filled with the
> +invalidation message type (3), the I/O virtual address and the size. On
> +success, the slave is expected to reply with a zero payload, non-zero
> +otherwise.
> +
> +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
> +master initiated the slave to master communication channel using the
> +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
> +failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to the master
> +with a struct vhost_iotlb_msg payload. For miss events, the iotlb payload has
> +to be filled with the miss message type (1), the I/O virtual address and the
> +permissions flags. For access failure event, the iotlb payload has to be
> +filled with the access failure message type (4), the I/O virtual address and
> +the permissions flags. For synchronization purpose, the slave may rely on the
> +reply-ack feature, so the master may send a reply when operation is completed
> +if the reply-ack feature is negotiated and slaves requests a reply.
> +
>  Slave communication
>  -------------------
>  
> @@ -514,6 +557,38 @@ Master message types
>        If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
>        with zero for success, non-zero otherwise.
>  
> + * VHOST_USER_IOTLB_MSG
> +
> +      Id: 22
> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> +      Master payload: struct vhost_iotlb_msg
> +      Slave payload: u64
> +
> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> +      Master sends such requests to update and invalidate entries in the device
> +      IOTLB. The slave has to acknowledge the request with sending zero as u64
> +      payload for success, non-zero otherwise.
> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> +      has been successfully negotiated.
> +
> +Slave message types
> +-------------------
> +
> + * VHOST_USER_SLAVE_IOTLB_MSG
> +
> +      Id: 1
> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> +      Slave payload: struct vhost_iotlb_msg
> +      Master payload: N/A
> +
> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> +      Slave sends such requests to notify of an IOTLB miss, or an IOTLB
> +      access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
> +      and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
> +      zero when operation is successfully completed, or non-zero otherwise.
> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> +      has been successfully negotiated.
> +

Are there limitations on number of messages in flight?


>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index fbc09fa..2c93181 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -63,11 +63,13 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SEND_RARP = 19,
>      VHOST_USER_NET_SET_MTU = 20,
>      VHOST_USER_SET_SLAVE_REQ_FD = 21,
> +    VHOST_USER_IOTLB_MSG = 22,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
>  typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
> +    VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -105,6 +107,7 @@ typedef struct VhostUserMsg {
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
>          VhostUserLog log;
> +        struct vhost_iotlb_msg iotlb;
>      } payload;
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -611,6 +614,9 @@ static void slave_read(void *opaque)
>      }
>  
>      switch (msg.request) {
> +    case VHOST_USER_SLAVE_IOTLB_MSG:
> +        ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          ret = -EINVAL;
> @@ -858,6 +864,29 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>      return 0;
>  }
>  
> +static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
> +                                            struct vhost_iotlb_msg *imsg)
> +{
> +    VhostUserMsg msg = {
> +        .request = VHOST_USER_IOTLB_MSG,
> +        .size = sizeof(msg.payload.iotlb),
> +        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
> +        .payload.iotlb = *imsg,
> +    };
> +
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -EFAULT;
> +    }
> +
> +    return process_message_reply(dev, msg.request);
> +}
> +
> +
> +static void vhost_user_set_iotlb_callback(struct vhost_dev *dev, int enabled)
> +{
> +    /* No-op as the receive channel is not dedicated to IOTLB messages. */
> +}
> +
>  const VhostOps user_ops = {
>          .backend_type = VHOST_BACKEND_TYPE_USER,
>          .vhost_backend_init = vhost_user_init,
> @@ -882,4 +911,6 @@ const VhostOps user_ops = {
>          .vhost_migration_done = vhost_user_migration_done,
>          .vhost_backend_can_merge = vhost_user_can_merge,
>          .vhost_net_set_mtu = vhost_user_net_set_mtu,
> +        .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
> +        .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
>  };
> -- 
> 2.9.3

Maxime Coquelin May 12, 2017, 2:21 p.m. UTC | #2

On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>> This patch specifies and implements the master/slave communication
>> to support device IOTLB in slave.
>>
>> The vhost_iotlb_msg structure introduced for kernel backends is
>> re-used, making the design close between the two backends.
>>
>> An exception is the use of the secondary channel to enable the
>> slave to send IOTLB miss requests to the master.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>   2 files changed, 106 insertions(+)
>>
>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>> index 5fa7016..4a1f0c3 100644
>> --- a/docs/specs/vhost-user.txt
>> +++ b/docs/specs/vhost-user.txt
>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>      log offset: offset from start of supplied file descriptor
>>          where logging starts (i.e. where guest address 0 would be logged)
>>   
>> + * An IOTLB message
>> +   ---------------------------------------------------------
>> +   | iova | size | user address | permissions flags | type |
>> +   ---------------------------------------------------------
>> +
>> +   IOVA: a 64-bit guest I/O virtual address
> 
> guest -> VM

Ok.

> 
>> +   Size: a 64-bit size
> 
> How do you specify "all memory"? give special meaning to size 0?

Good point, it does not support all memory currently.
It is not vhost-user specific, but general to the vhost implementation.

>> +   User address: a 64-bit user address
>> +   Permissions flags: a 8-bit bit field:
>> +    - Bit 0: Read access
>> +    - Bit 1: Write access
> 
> Can both bits be set? Can none?

Both. I will change it by listing values directly:
  - 0 : No access
  - 1 : Read
  - 2 : Write
  - 3 : Read Write

>> +   Type: a 8-bit IOTLB message type:
>> +    - 1: IOTLB miss
>> +    - 2: IOTLB update
>> +    - 3: IOTLB invalidate
>> +    - 4: IOTLB access fail
>> +
>>   In QEMU the vhost-user message is implemented with the following struct:
>>   
>>   typedef struct VhostUserMsg {
>> @@ -109,6 +126,7 @@ typedef struct VhostUserMsg {
>>           struct vhost_vring_addr addr;
>>           VhostUserMemory memory;
>>           VhostUserLog log;
>> +        struct vhost_iotlb_msg iotlb;
>>       };
>>   } QEMU_PACKED VhostUserMsg;
>>   
>> @@ -253,6 +271,31 @@ Once the source has finished migration, rings will be stopped by
>>   the source. No further update must be done before rings are
>>   restarted.
>>   
>> +IOMMU support
>> +-------------
>> +
>> +When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
>> +to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
>> +requests to the slave with a struct vhost_iotlb_msg payload.
> 
> Always? This seems a bit strange since iommu can be enabled/disabled
> dynamically.
Ok, what about:

When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated and iommu
is enbaled, the master sends IOTLB entries update & invalidation via
VHOST_USER_IOTLB_MSG requests to the slave with a struct vhost_iotlb_msg
payload.


> Closing channel seems like a wrong thing to do for this.

Sorry, I'm not sure to get your comment.

>> For update events,
>> +the iotlb payload has to be filled with the update message type (2), the I/O
>> +virtual address, the size, the user virtual address, and the permissions
>> +flags. For invalidation events, the iotlb payload has to be filled with the
>> +invalidation message type (3), the I/O virtual address and the size. On
>> +success, the slave is expected to reply with a zero payload, non-zero
>> +otherwise.
>> +
>> +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
>> +master initiated the slave to master communication channel using the
>> +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
>> +failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to the master
>> +with a struct vhost_iotlb_msg payload. For miss events, the iotlb payload has
>> +to be filled with the miss message type (1), the I/O virtual address and the
>> +permissions flags. For access failure event, the iotlb payload has to be
>> +filled with the access failure message type (4), the I/O virtual address and
>> +the permissions flags. For synchronization purpose, the slave may rely on the
>> +reply-ack feature, so the master may send a reply when operation is completed
>> +if the reply-ack feature is negotiated and slaves requests a reply.
>> +
>>   Slave communication
>>   -------------------
>>   
>> @@ -514,6 +557,38 @@ Master message types
>>         If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
>>         with zero for success, non-zero otherwise.
>>   
>> + * VHOST_USER_IOTLB_MSG
>> +
>> +      Id: 22
>> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
>> +      Master payload: struct vhost_iotlb_msg
>> +      Slave payload: u64
>> +
>> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
>> +      Master sends such requests to update and invalidate entries in the device
>> +      IOTLB. The slave has to acknowledge the request with sending zero as u64
>> +      payload for success, non-zero otherwise.
>> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
>> +      has been successfully negotiated.
>> +
>> +Slave message types
>> +-------------------
>> +
>> + * VHOST_USER_SLAVE_IOTLB_MSG
>> +
>> +      Id: 1
>> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
>> +      Slave payload: struct vhost_iotlb_msg
>> +      Master payload: N/A
>> +
>> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
>> +      Slave sends such requests to notify of an IOTLB miss, or an IOTLB
>> +      access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
>> +      and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
>> +      zero when operation is successfully completed, or non-zero otherwise.
>> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
>> +      has been successfully negotiated.
>> +
> 
> Are there limitations on number of messages in flight?

I didn't think about this, I would say the maximum number of messages in
flight is dependent on the socket buffer size (which is kept to default
in this series).

You question highlights a bug in by DPDK prototype, as the MISS request
can be sent by multiple threads, and I didn't protected this with a lock
to prevent concurrent read on the socket when waiting for the REPLY_ACK.

Thanks,
Maxime

Michael S. Tsirkin May 13, 2017, 12:02 a.m. UTC | #3

On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:
> 
> 
> On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
> > On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
> > > This patch specifies and implements the master/slave communication
> > > to support device IOTLB in slave.
> > > 
> > > The vhost_iotlb_msg structure introduced for kernel backends is
> > > re-used, making the design close between the two backends.
> > > 
> > > An exception is the use of the secondary channel to enable the
> > > slave to send IOTLB miss requests to the master.
> > > 
> > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > ---
> > >   docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
> > >   hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
> > >   2 files changed, 106 insertions(+)
> > > 
> > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > > index 5fa7016..4a1f0c3 100644
> > > --- a/docs/specs/vhost-user.txt
> > > +++ b/docs/specs/vhost-user.txt
> > > @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
> > >      log offset: offset from start of supplied file descriptor
> > >          where logging starts (i.e. where guest address 0 would be logged)
> > > + * An IOTLB message
> > > +   ---------------------------------------------------------
> > > +   | iova | size | user address | permissions flags | type |
> > > +   ---------------------------------------------------------
> > > +
> > > +   IOVA: a 64-bit guest I/O virtual address
> > 
> > guest -> VM
> 
> Ok.
> 
> > 
> > > +   Size: a 64-bit size
> > 
> > How do you specify "all memory"? give special meaning to size 0?
> 
> Good point, it does not support all memory currently.
> It is not vhost-user specific, but general to the vhost implementation.

But iommu needs it to support passthrough.

> 
> > > +   User address: a 64-bit user address
> > > +   Permissions flags: a 8-bit bit field:
> > > +    - Bit 0: Read access
> > > +    - Bit 1: Write access
> > 
> > Can both bits be set? Can none?
> 
> Both. I will change it by listing values directly:
>  - 0 : No access
>  - 1 : Read
>  - 2 : Write
>  - 3 : Read Write
> 
> > > +   Type: a 8-bit IOTLB message type:
> > > +    - 1: IOTLB miss
> > > +    - 2: IOTLB update
> > > +    - 3: IOTLB invalidate
> > > +    - 4: IOTLB access fail
> > > +
> > >   In QEMU the vhost-user message is implemented with the following struct:
> > >   typedef struct VhostUserMsg {
> > > @@ -109,6 +126,7 @@ typedef struct VhostUserMsg {
> > >           struct vhost_vring_addr addr;
> > >           VhostUserMemory memory;
> > >           VhostUserLog log;
> > > +        struct vhost_iotlb_msg iotlb;
> > >       };
> > >   } QEMU_PACKED VhostUserMsg;
> > > @@ -253,6 +271,31 @@ Once the source has finished migration, rings will be stopped by
> > >   the source. No further update must be done before rings are
> > >   restarted.
> > > +IOMMU support
> > > +-------------
> > > +
> > > +When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
> > > +to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
> > > +requests to the slave with a struct vhost_iotlb_msg payload.
> > 
> > Always? This seems a bit strange since iommu can be enabled/disabled
> > dynamically.
> Ok, what about:
> 
> When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated and iommu
> is enbaled, the master sends IOTLB entries update & invalidation via
> VHOST_USER_IOTLB_MSG requests to the slave with a struct vhost_iotlb_msg
> payload.
> 
> 
> > Closing channel seems like a wrong thing to do for this.
> 
> Sorry, I'm not sure to get your comment.

What happens when guest disables the IOMMU?

> > > For update events,
> > > +the iotlb payload has to be filled with the update message type (2), the I/O
> > > +virtual address, the size, the user virtual address, and the permissions
> > > +flags. For invalidation events, the iotlb payload has to be filled with the
> > > +invalidation message type (3), the I/O virtual address and the size. On
> > > +success, the slave is expected to reply with a zero payload, non-zero
> > > +otherwise.
> > > +
> > > +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
> > > +master initiated the slave to master communication channel using the
> > > +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
> > > +failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to the master
> > > +with a struct vhost_iotlb_msg payload. For miss events, the iotlb payload has
> > > +to be filled with the miss message type (1), the I/O virtual address and the
> > > +permissions flags. For access failure event, the iotlb payload has to be
> > > +filled with the access failure message type (4), the I/O virtual address and
> > > +the permissions flags. For synchronization purpose, the slave may rely on the
> > > +reply-ack feature, so the master may send a reply when operation is completed
> > > +if the reply-ack feature is negotiated and slaves requests a reply.
> > > +
> > >   Slave communication
> > >   -------------------
> > > @@ -514,6 +557,38 @@ Master message types
> > >         If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
> > >         with zero for success, non-zero otherwise.
> > > + * VHOST_USER_IOTLB_MSG
> > > +
> > > +      Id: 22
> > > +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> > > +      Master payload: struct vhost_iotlb_msg
> > > +      Slave payload: u64
> > > +
> > > +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> > > +      Master sends such requests to update and invalidate entries in the device
> > > +      IOTLB. The slave has to acknowledge the request with sending zero as u64
> > > +      payload for success, non-zero otherwise.
> > > +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> > > +      has been successfully negotiated.
> > > +
> > > +Slave message types
> > > +-------------------
> > > +
> > > + * VHOST_USER_SLAVE_IOTLB_MSG
> > > +
> > > +      Id: 1
> > > +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> > > +      Slave payload: struct vhost_iotlb_msg
> > > +      Master payload: N/A
> > > +
> > > +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> > > +      Slave sends such requests to notify of an IOTLB miss, or an IOTLB
> > > +      access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
> > > +      and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
> > > +      zero when operation is successfully completed, or non-zero otherwise.
> > > +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> > > +      has been successfully negotiated.
> > > +
> > 
> > Are there limitations on number of messages in flight?
> 
> I didn't think about this, I would say the maximum number of messages in
> flight is dependent on the socket buffer size (which is kept to default
> in this series).
> 
> You question highlights a bug in by DPDK prototype, as the MISS request
> can be sent by multiple threads, and I didn't protected this with a lock
> to prevent concurrent read on the socket when waiting for the REPLY_ACK.
> 
> Thanks,
> Maxime

A way to make it work without a lock might possibly be valuable.
Can be done on top.

Jason Wang May 15, 2017, 5:45 a.m. UTC | #4

On 2017年05月13日 08:02, Michael S. Tsirkin wrote:
> On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:
>>
>> On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
>>> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>>>> This patch specifies and implements the master/slave communication
>>>> to support device IOTLB in slave.
>>>>
>>>> The vhost_iotlb_msg structure introduced for kernel backends is
>>>> re-used, making the design close between the two backends.
>>>>
>>>> An exception is the use of the secondary channel to enable the
>>>> slave to send IOTLB miss requests to the master.
>>>>
>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> ---
>>>>    docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>    hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>>>    2 files changed, 106 insertions(+)
>>>>
>>>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>>>> index 5fa7016..4a1f0c3 100644
>>>> --- a/docs/specs/vhost-user.txt
>>>> +++ b/docs/specs/vhost-user.txt
>>>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>>>       log offset: offset from start of supplied file descriptor
>>>>           where logging starts (i.e. where guest address 0 would be logged)
>>>> + * An IOTLB message
>>>> +   ---------------------------------------------------------
>>>> +   | iova | size | user address | permissions flags | type |
>>>> +   ---------------------------------------------------------
>>>> +
>>>> +   IOVA: a 64-bit guest I/O virtual address
>>> guest -> VM
>> Ok.
>>
>>>> +   Size: a 64-bit size
>>> How do you specify "all memory"? give special meaning to size 0?
>> Good point, it does not support all memory currently.
>> It is not vhost-user specific, but general to the vhost implementation.
> But iommu needs it to support passthrough.

Probably not, we will just pass the mappings in vhost_memory_region to 
vhost. Its memory_size is also a __u64.

Thanks

Maxime Coquelin May 16, 2017, 8:19 a.m. UTC | #5

On 05/12/2017 04:21 PM, Maxime Coquelin wrote:
> 
> 
> On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
>> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>>> This patch specifies and implements the master/slave communication
>>> to support device IOTLB in slave.
>>>
>>> The vhost_iotlb_msg structure introduced for kernel backends is
>>> re-used, making the design close between the two backends.
>>>
>>> An exception is the use of the secondary channel to enable the
>>> slave to send IOTLB miss requests to the master.
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>   docs/specs/vhost-user.txt | 75 
>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>   hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>>   2 files changed, 106 insertions(+)
>>>
>>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>>> index 5fa7016..4a1f0c3 100644
>>> --- a/docs/specs/vhost-user.txt
>>> +++ b/docs/specs/vhost-user.txt
>>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>>      log offset: offset from start of supplied file descriptor
>>>          where logging starts (i.e. where guest address 0 would be 
>>> logged)
>>> + * An IOTLB message
>>> +   ---------------------------------------------------------
>>> +   | iova | size | user address | permissions flags | type |
>>> +   ---------------------------------------------------------
>>> +
>>> +   IOVA: a 64-bit guest I/O virtual address
>>
>> guest -> VM
> 
> Ok.

It seems that VM is never used in the doc, but guest is:
$ grep -i guest docs/specs/vhost-user.txt | wc -l
13
grep -i vm docs/specs/vhost-user.txt | wc -l
0

I think I should keep "guest" for consistency?

Thanks,
Maxime

Michael S. Tsirkin May 16, 2017, 1:23 p.m. UTC | #6

On Tue, May 16, 2017 at 10:19:24AM +0200, Maxime Coquelin wrote:
> 
> 
> On 05/12/2017 04:21 PM, Maxime Coquelin wrote:
> > 
> > 
> > On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
> > > On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
> > > > This patch specifies and implements the master/slave communication
> > > > to support device IOTLB in slave.
> > > > 
> > > > The vhost_iotlb_msg structure introduced for kernel backends is
> > > > re-used, making the design close between the two backends.
> > > > 
> > > > An exception is the use of the secondary channel to enable the
> > > > slave to send IOTLB miss requests to the master.
> > > > 
> > > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > > ---
> > > >   docs/specs/vhost-user.txt | 75
> > > > +++++++++++++++++++++++++++++++++++++++++++++++
> > > >   hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
> > > >   2 files changed, 106 insertions(+)
> > > > 
> > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > > > index 5fa7016..4a1f0c3 100644
> > > > --- a/docs/specs/vhost-user.txt
> > > > +++ b/docs/specs/vhost-user.txt
> > > > @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
> > > >      log offset: offset from start of supplied file descriptor
> > > >          where logging starts (i.e. where guest address 0 would
> > > > be logged)
> > > > + * An IOTLB message
> > > > +   ---------------------------------------------------------
> > > > +   | iova | size | user address | permissions flags | type |
> > > > +   ---------------------------------------------------------
> > > > +
> > > > +   IOVA: a 64-bit guest I/O virtual address
> > > 
> > > guest -> VM
> > 
> > Ok.
> 
> It seems that VM is never used in the doc, but guest is:
> $ grep -i guest docs/specs/vhost-user.txt | wc -l
> 13
> grep -i vm docs/specs/vhost-user.txt | wc -l
> 0
> 
> I think I should keep "guest" for consistency?
> 
> Thanks,
> Maxime

"I/O virtual address programmed by the guest" ?

Michael S. Tsirkin May 16, 2017, 3:16 p.m. UTC | #7

On Mon, May 15, 2017 at 01:45:28PM +0800, Jason Wang wrote:
> 
> 
> On 2017年05月13日 08:02, Michael S. Tsirkin wrote:
> > On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:
> > > 
> > > On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
> > > > On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
> > > > > This patch specifies and implements the master/slave communication
> > > > > to support device IOTLB in slave.
> > > > > 
> > > > > The vhost_iotlb_msg structure introduced for kernel backends is
> > > > > re-used, making the design close between the two backends.
> > > > > 
> > > > > An exception is the use of the secondary channel to enable the
> > > > > slave to send IOTLB miss requests to the master.
> > > > > 
> > > > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > > > ---
> > > > >    docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > >    hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
> > > > >    2 files changed, 106 insertions(+)
> > > > > 
> > > > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > > > > index 5fa7016..4a1f0c3 100644
> > > > > --- a/docs/specs/vhost-user.txt
> > > > > +++ b/docs/specs/vhost-user.txt
> > > > > @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
> > > > >       log offset: offset from start of supplied file descriptor
> > > > >           where logging starts (i.e. where guest address 0 would be logged)
> > > > > + * An IOTLB message
> > > > > +   ---------------------------------------------------------
> > > > > +   | iova | size | user address | permissions flags | type |
> > > > > +   ---------------------------------------------------------
> > > > > +
> > > > > +   IOVA: a 64-bit guest I/O virtual address
> > > > guest -> VM
> > > Ok.
> > > 
> > > > > +   Size: a 64-bit size
> > > > How do you specify "all memory"? give special meaning to size 0?
> > > Good point, it does not support all memory currently.
> > > It is not vhost-user specific, but general to the vhost implementation.
> > But iommu needs it to support passthrough.
> 
> Probably not, we will just pass the mappings in vhost_memory_region to
> vhost. Its memory_size is also a __u64.
> 
> Thanks

That's different since that's chunks of qemu virtual memory.

IOMMU maps IOVA to GPA.

Jason Wang May 17, 2017, 2:53 a.m. UTC | #8

On 2017年05月16日 23:16, Michael S. Tsirkin wrote:
> On Mon, May 15, 2017 at 01:45:28PM +0800, Jason Wang wrote:
>>
>> On 2017年05月13日 08:02, Michael S. Tsirkin wrote:
>>> On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:
>>>> On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
>>>>> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>>>>>> This patch specifies and implements the master/slave communication
>>>>>> to support device IOTLB in slave.
>>>>>>
>>>>>> The vhost_iotlb_msg structure introduced for kernel backends is
>>>>>> re-used, making the design close between the two backends.
>>>>>>
>>>>>> An exception is the use of the secondary channel to enable the
>>>>>> slave to send IOTLB miss requests to the master.
>>>>>>
>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>> ---
>>>>>>     docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>>>>>     2 files changed, 106 insertions(+)
>>>>>>
>>>>>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>>>>>> index 5fa7016..4a1f0c3 100644
>>>>>> --- a/docs/specs/vhost-user.txt
>>>>>> +++ b/docs/specs/vhost-user.txt
>>>>>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>>>>>        log offset: offset from start of supplied file descriptor
>>>>>>            where logging starts (i.e. where guest address 0 would be logged)
>>>>>> + * An IOTLB message
>>>>>> +   ---------------------------------------------------------
>>>>>> +   | iova | size | user address | permissions flags | type |
>>>>>> +   ---------------------------------------------------------
>>>>>> +
>>>>>> +   IOVA: a 64-bit guest I/O virtual address
>>>>> guest -> VM
>>>> Ok.
>>>>
>>>>>> +   Size: a 64-bit size
>>>>> How do you specify "all memory"? give special meaning to size 0?
>>>> Good point, it does not support all memory currently.
>>>> It is not vhost-user specific, but general to the vhost implementation.
>>> But iommu needs it to support passthrough.
>> Probably not, we will just pass the mappings in vhost_memory_region to
>> vhost. Its memory_size is also a __u64.
>>
>> Thanks
> That's different since that's chunks of qemu virtual memory.
>
> IOMMU maps IOVA to GPA.
>

But we're in fact cache IOVA -> HVA mapping in the remote IOTLB. When 
passthrough mode is enabled, IOVA == GPA, so passing mappings in 
vhost_memory_region should be fine.

The only possible "issue" with "all memory" is if you can not use a 
single TLB invalidation to invalidate all caches in remote TLB. But this 
is only theoretical problem since it only happen when we have a 1 byte 
mapping [2^64 - 1, 2^64) cached in remote TLB. Consider:

- E.g intel IOMMU has a range limitation for invalidation (1G currently)
- Looks like all existed IOMMU use page aligned mappings

It was probably not a big issue. And for safety we could use two 
invalidations to make sure all caches were flushed remotely. Or just 
change the protocol from start, size to start, end. Vhost-kernel is 
probably too late for this change, but I'm still not quite sure it is 
worthwhile.

Thanks

Maxime Coquelin May 17, 2017, 2:10 p.m. UTC | #9

On 05/17/2017 04:53 AM, Jason Wang wrote:
> 
> 
> On 2017年05月16日 23:16, Michael S. Tsirkin wrote:
>> On Mon, May 15, 2017 at 01:45:28PM +0800, Jason Wang wrote:
>>>
>>> On 2017年05月13日 08:02, Michael S. Tsirkin wrote:
>>>> On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:
>>>>> On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
>>>>>> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>>>>>>> This patch specifies and implements the master/slave communication
>>>>>>> to support device IOTLB in slave.
>>>>>>>
>>>>>>> The vhost_iotlb_msg structure introduced for kernel backends is
>>>>>>> re-used, making the design close between the two backends.
>>>>>>>
>>>>>>> An exception is the use of the secondary channel to enable the
>>>>>>> slave to send IOTLB miss requests to the master.
>>>>>>>
>>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>> ---
>>>>>>>     docs/specs/vhost-user.txt | 75 
>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>     hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>>>>>>     2 files changed, 106 insertions(+)
>>>>>>>
>>>>>>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>>>>>>> index 5fa7016..4a1f0c3 100644
>>>>>>> --- a/docs/specs/vhost-user.txt
>>>>>>> +++ b/docs/specs/vhost-user.txt
>>>>>>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>>>>>>        log offset: offset from start of supplied file descriptor
>>>>>>>            where logging starts (i.e. where guest address 0 would 
>>>>>>> be logged)
>>>>>>> + * An IOTLB message
>>>>>>> +   ---------------------------------------------------------
>>>>>>> +   | iova | size | user address | permissions flags | type |
>>>>>>> +   ---------------------------------------------------------
>>>>>>> +
>>>>>>> +   IOVA: a 64-bit guest I/O virtual address
>>>>>> guest -> VM
>>>>> Ok.
>>>>>
>>>>>>> +   Size: a 64-bit size
>>>>>> How do you specify "all memory"? give special meaning to size 0?
>>>>> Good point, it does not support all memory currently.
>>>>> It is not vhost-user specific, but general to the vhost 
>>>>> implementation.
>>>> But iommu needs it to support passthrough.
>>> Probably not, we will just pass the mappings in vhost_memory_region to
>>> vhost. Its memory_size is also a __u64.
>>>
>>> Thanks
>> That's different since that's chunks of qemu virtual memory.
>>
>> IOMMU maps IOVA to GPA.
>>
> 
> But we're in fact cache IOVA -> HVA mapping in the remote IOTLB. When 
> passthrough mode is enabled, IOVA == GPA, so passing mappings in 
> vhost_memory_region should be fine.

Not sure this is a good idea, because when configured in passthrough,
QEMU will see the IOMMU as enabled, so the the VIRTIO_F_IOMMU_PLATFORM
feature will be negotiated if both guest and backend support it.
So how the backend will know whether it should directly pick the
translation directly into the vhost_memory_region, or translate it
through the device IOTLB?

Maybe the solution would be for QEMU to wrap "all memory" IOTLB updates
& invalidations to vhost_memory_regions, since the backend won't anyway
be able to perform accesses outside these regions?

> The only possible "issue" with "all memory" is if you can not use a 
> single TLB invalidation to invalidate all caches in remote TLB.

If needed, maybe we could introduce a new VHOST_IOTLB_INVALIDATE message
type?
For older kernel backend that doesn't support it, -EINVAL will be
returned, so QEMU could handle it another way in this case.

> But this 
> is only theoretical problem since it only happen when we have a 1 byte 
> mapping [2^64 - 1, 2^64) cached in remote TLB. Consider:
> 
> - E.g intel IOMMU has a range limitation for invalidation (1G currently)
> - Looks like all existed IOMMU use page aligned mappings
> 
> It was probably not a big issue. And for safety we could use two 
> invalidations to make sure all caches were flushed remotely. Or just 
> change the protocol from start, size to start, end. Vhost-kernel is 
> probably too late for this change, but I'm still not quite sure it is 
> worthwhile.

I'm not for diverging the protocol between kernel & user backends.

Thanks,
Maxime

> Thanks

Maxime Coquelin May 17, 2017, 3:27 p.m. UTC | #10

On 05/13/2017 02:02 AM, Michael S. Tsirkin wrote:
>>>> @@ -253,6 +271,31 @@ Once the source has finished migration, rings will be stopped by
>>>>    the source. No further update must be done before rings are
>>>>    restarted.
>>>> +IOMMU support
>>>> +-------------
>>>> +
>>>> +When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
>>>> +to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
>>>> +requests to the slave with a struct vhost_iotlb_msg payload.
>>> Always? This seems a bit strange since iommu can be enabled/disabled
>>> dynamically.
>> Ok, what about:
>>
>> When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated and iommu
>> is enbaled, the master sends IOTLB entries update & invalidation via
>> VHOST_USER_IOTLB_MSG requests to the slave with a struct vhost_iotlb_msg
>> payload.
>>
>>
>>> Closing channel seems like a wrong thing to do for this.
>> Sorry, I'm not sure to get your comment.
> What happens when guest disables the IOMMU?

I think you mean for example when rebooting the guest with IOMMU
disabled. In this case, I think that the user backend should clean its
IOTLB cache on the next SET_FEATURE call, as the kernel backend does.

Note that I was wrong when stating:
"When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated and iommu
is enabled..."

Actually, VIRTIO_F_IOMMU_PLATFORM is negotiated if both supported by the
backend and the guest, and if an iommu device is attached. If the guest
kernel doesn't enable the IOMMU (e.g. intel_iommu=off in kernel
cmdline), the master replies to IOTLB misses with identity IOTLB
entries updates (Passthrough) (it requires recent fixes from Peter to
work[0]).

Maxime

[0]: http://patchwork.ozlabs.org/patch/763379/

Michael S. Tsirkin May 17, 2017, 4:48 p.m. UTC | #11

On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
> This patch specifies and implements the master/slave communication
> to support device IOTLB in slave.
> 
> The vhost_iotlb_msg structure introduced for kernel backends is
> re-used, making the design close between the two backends.
> 
> An exception is the use of the secondary channel to enable the
> slave to send IOTLB miss requests to the master.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
>  hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>  2 files changed, 106 insertions(+)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 5fa7016..4a1f0c3 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>     log offset: offset from start of supplied file descriptor
>         where logging starts (i.e. where guest address 0 would be logged)
>  
> + * An IOTLB message
> +   ---------------------------------------------------------
> +   | iova | size | user address | permissions flags | type |
> +   ---------------------------------------------------------
> +
> +   IOVA: a 64-bit guest I/O virtual address
> +   Size: a 64-bit size
> +   User address: a 64-bit user address
> +   Permissions flags: a 8-bit bit field:
> +    - Bit 0: Read access
> +    - Bit 1: Write access
> +   Type: a 8-bit IOTLB message type:
> +    - 1: IOTLB miss
> +    - 2: IOTLB update
> +    - 3: IOTLB invalidate
> +    - 4: IOTLB access fail
> +
>  In QEMU the vhost-user message is implemented with the following struct:
>  
>  typedef struct VhostUserMsg {
> @@ -109,6 +126,7 @@ typedef struct VhostUserMsg {
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
>          VhostUserLog log;
> +        struct vhost_iotlb_msg iotlb;
>      };
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -253,6 +271,31 @@ Once the source has finished migration, rings will be stopped by
>  the source. No further update must be done before rings are
>  restarted.
>  
> +IOMMU support
> +-------------
> +
> +When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
> +to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
> +requests to the slave with a struct vhost_iotlb_msg payload. For update events,
> +the iotlb payload has to be filled with the update message type (2), the I/O
> +virtual address, the size, the user virtual address, and the permissions
> +flags. For invalidation events, the iotlb payload has to be filled with the
> +invalidation message type (3), the I/O virtual address and the size. On
> +success, the slave is expected to reply with a zero payload, non-zero
> +otherwise.
> +
> +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
> +master initiated the slave to master communication channel using the
> +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
> +failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to the master
> +with a struct vhost_iotlb_msg payload. For miss events, the iotlb payload has
> +to be filled with the miss message type (1), the I/O virtual address and the
> +permissions flags. For access failure event, the iotlb payload has to be
> +filled with the access failure message type (4), the I/O virtual address and
> +the permissions flags.

I don't think slave should cache invalid entries. If it does not,
how can it detect access failure as opposed to a miss?

> For synchronization purpose, the slave may rely on the
> +reply-ack feature, so the master may send a reply when operation is completed

What does completed mean in this context?

> +if the reply-ack feature is negotiated and slaves requests a reply.
> +

This is not very clear to me.

So slave sends an access to master. Master finds a pte that
overlaps. What does it send to guest? Initial PTE?
All PTEs to cover the request? Part of the PTE that overlaps
the request?


>  Slave communication
>  -------------------
>  
> @@ -514,6 +557,38 @@ Master message types
>        If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated, slave must respond
>        with zero for success, non-zero otherwise.
>  
> + * VHOST_USER_IOTLB_MSG
> +
> +      Id: 22
> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> +      Master payload: struct vhost_iotlb_msg
> +      Slave payload: u64
> +
> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> +      Master sends such requests to update and invalidate entries in the device
> +      IOTLB. The slave has to acknowledge the request with sending zero as u64
> +      payload for success, non-zero otherwise.
> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> +      has been successfully negotiated.
> +
> +Slave message types
> +-------------------
> +
> + * VHOST_USER_SLAVE_IOTLB_MSG
> +
> +      Id: 1
> +      Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type)
> +      Slave payload: struct vhost_iotlb_msg
> +      Master payload: N/A
> +
> +      Send IOTLB messages with struct vhost_iotlb_msg as payload.
> +      Slave sends such requests to notify of an IOTLB miss, or an IOTLB
> +      access failure. If VHOST_USER_PROTOCOL_F_REPLY_ACK is negotiated,
> +      and slave set the VHOST_USER_NEED_REPLY flag, master must respond with
> +      zero when operation is successfully completed, or non-zero otherwise.
> +      This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature
> +      has been successfully negotiated.
> +
>  VHOST_USER_PROTOCOL_F_REPLY_ACK:
>  -------------------------------
>  The original vhost-user specification only demands replies for certain
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index fbc09fa..2c93181 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -63,11 +63,13 @@ typedef enum VhostUserRequest {
>      VHOST_USER_SEND_RARP = 19,
>      VHOST_USER_NET_SET_MTU = 20,
>      VHOST_USER_SET_SLAVE_REQ_FD = 21,
> +    VHOST_USER_IOTLB_MSG = 22,
>      VHOST_USER_MAX
>  } VhostUserRequest;
>  
>  typedef enum VhostUserSlaveRequest {
>      VHOST_USER_SLAVE_NONE = 0,
> +    VHOST_USER_SLAVE_IOTLB_MSG = 1,
>      VHOST_USER_SLAVE_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -105,6 +107,7 @@ typedef struct VhostUserMsg {
>          struct vhost_vring_addr addr;
>          VhostUserMemory memory;
>          VhostUserLog log;
> +        struct vhost_iotlb_msg iotlb;
>      } payload;
>  } QEMU_PACKED VhostUserMsg;
>  
> @@ -611,6 +614,9 @@ static void slave_read(void *opaque)
>      }
>  
>      switch (msg.request) {
> +    case VHOST_USER_SLAVE_IOTLB_MSG:
> +        ret = vhost_backend_handle_iotlb_msg(dev, &msg.payload.iotlb);
> +        break;
>      default:
>          error_report("Received unexpected msg type.");
>          ret = -EINVAL;
> @@ -858,6 +864,29 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>      return 0;
>  }
>  
> +static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
> +                                            struct vhost_iotlb_msg *imsg)
> +{
> +    VhostUserMsg msg = {
> +        .request = VHOST_USER_IOTLB_MSG,
> +        .size = sizeof(msg.payload.iotlb),
> +        .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
> +        .payload.iotlb = *imsg,
> +    };
> +
> +    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
> +        return -EFAULT;
> +    }
> +
> +    return process_message_reply(dev, msg.request);
> +}
> +
> +
> +static void vhost_user_set_iotlb_callback(struct vhost_dev *dev, int enabled)
> +{
> +    /* No-op as the receive channel is not dedicated to IOTLB messages. */
> +}
> +
>  const VhostOps user_ops = {
>          .backend_type = VHOST_BACKEND_TYPE_USER,
>          .vhost_backend_init = vhost_user_init,
> @@ -882,4 +911,6 @@ const VhostOps user_ops = {
>          .vhost_migration_done = vhost_user_migration_done,
>          .vhost_backend_can_merge = vhost_user_can_merge,
>          .vhost_net_set_mtu = vhost_user_net_set_mtu,
> +        .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
> +        .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
>  };
> -- 
> 2.9.3

Maxime Coquelin May 18, 2017, 8:43 a.m. UTC | #12

On 05/17/2017 06:48 PM, Michael S. Tsirkin wrote:
> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>> This patch specifies and implements the master/slave communication
>> to support device IOTLB in slave.
>>
>> The vhost_iotlb_msg structure introduced for kernel backends is
>> re-used, making the design close between the two backends.
>>
>> An exception is the use of the secondary channel to enable the
>> slave to send IOTLB miss requests to the master.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   docs/specs/vhost-user.txt | 75 +++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>   2 files changed, 106 insertions(+)
>>
>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>> index 5fa7016..4a1f0c3 100644
>> --- a/docs/specs/vhost-user.txt
>> +++ b/docs/specs/vhost-user.txt
>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>      log offset: offset from start of supplied file descriptor
>>          where logging starts (i.e. where guest address 0 would be logged)
>>   
>> + * An IOTLB message
>> +   ---------------------------------------------------------
>> +   | iova | size | user address | permissions flags | type |
>> +   ---------------------------------------------------------
>> +
>> +   IOVA: a 64-bit guest I/O virtual address
>> +   Size: a 64-bit size
>> +   User address: a 64-bit user address
>> +   Permissions flags: a 8-bit bit field:
>> +    - Bit 0: Read access
>> +    - Bit 1: Write access
>> +   Type: a 8-bit IOTLB message type:
>> +    - 1: IOTLB miss
>> +    - 2: IOTLB update
>> +    - 3: IOTLB invalidate
>> +    - 4: IOTLB access fail
>> +
>>   In QEMU the vhost-user message is implemented with the following struct:
>>   
>>   typedef struct VhostUserMsg {
>> @@ -109,6 +126,7 @@ typedef struct VhostUserMsg {
>>           struct vhost_vring_addr addr;
>>           VhostUserMemory memory;
>>           VhostUserLog log;
>> +        struct vhost_iotlb_msg iotlb;
>>       };
>>   } QEMU_PACKED VhostUserMsg;
>>   
>> @@ -253,6 +271,31 @@ Once the source has finished migration, rings will be stopped by
>>   the source. No further update must be done before rings are
>>   restarted.
>>   
>> +IOMMU support
>> +-------------
>> +
>> +When the VIRTIO_F_IOMMU_PLATFORM feature has been negotiated, the master has
>> +to send IOTLB entries update & invalidation by sending VHOST_USER_IOTLB_MSG
>> +requests to the slave with a struct vhost_iotlb_msg payload. For update events,
>> +the iotlb payload has to be filled with the update message type (2), the I/O
>> +virtual address, the size, the user virtual address, and the permissions
>> +flags. For invalidation events, the iotlb payload has to be filled with the
>> +invalidation message type (3), the I/O virtual address and the size. On
>> +success, the slave is expected to reply with a zero payload, non-zero
>> +otherwise.
>> +
>> +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, and the
>> +master initiated the slave to master communication channel using the
>> +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss and access
>> +failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to the master
>> +with a struct vhost_iotlb_msg payload. For miss events, the iotlb payload has
>> +to be filled with the miss message type (1), the I/O virtual address and the
>> +permissions flags. For access failure event, the iotlb payload has to be
>> +filled with the access failure message type (4), the I/O virtual address and
>> +the permissions flags.
> 
> I don't think slave should cache invalid entries. If it does not,
> how can it detect access failure as opposed to a miss?

Of course, invalid cache entries should not be cached.
The VHOST_IOTLB_ACCESS_FAIL has been specified for the Kernel backend,
even if the latter does not implement it yet.

In the case of user backend, I think it could be used without caching
entries.

After the backend sends a miss requests, the master will send an update
requests.

If for some reasons, the update entry is invalid (e.g. is outside the
memory ranges shared by set_mem_table), it won't be inserted into the
cache. So, when looking again at the cache, the backend will still face
a miss, and so can notify the master of the failing access.

That said, I specified this message type to mimic the kernel backend, I
don't what the master could do of such information.

> 
>> For synchronization purpose, the slave may rely on the
>> +reply-ack feature, so the master may send a reply when operation is completed
> 
> What does completed mean in this context?

Completed means either the master has sent the IOTLB entry update to the
slave through the master->slave channel, or the IOVA is invalid (so
nothing sent).

> 
>> +if the reply-ack feature is negotiated and slaves requests a reply.
>> +
> 
> This is not very clear to me.
> So slave sends an access to master. Master finds a pte that
> overlaps. What does it send to guest? Initial PTE?
> All PTEs to cover the request? Part of the PTE that overlaps
> the request?

Actually, the slave only sends the IOVA and access permissions, not the
size. So the master will send the PTE overlapping this IOVA if
permissions match.
When received the slave will again lookup into its cache, and may send
another request with the next first missing IOVA if needed.

Maxime

Jason Wang May 19, 2017, 6:48 a.m. UTC | #13

On 2017年05月17日 22:10, Maxime Coquelin wrote:
>
>
> On 05/17/2017 04:53 AM, Jason Wang wrote:
>>
>>
>> On 2017年05月16日 23:16, Michael S. Tsirkin wrote:
>>> On Mon, May 15, 2017 at 01:45:28PM +0800, Jason Wang wrote:
>>>>
>>>> On 2017年05月13日 08:02, Michael S. Tsirkin wrote:
>>>>> On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:
>>>>>> On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
>>>>>>> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>>>>>>>> This patch specifies and implements the master/slave communication
>>>>>>>> to support device IOTLB in slave.
>>>>>>>>
>>>>>>>> The vhost_iotlb_msg structure introduced for kernel backends is
>>>>>>>> re-used, making the design close between the two backends.
>>>>>>>>
>>>>>>>> An exception is the use of the secondary channel to enable the
>>>>>>>> slave to send IOTLB miss requests to the master.
>>>>>>>>
>>>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>> ---
>>>>>>>>     docs/specs/vhost-user.txt | 75 
>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>     hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>>>>>>>     2 files changed, 106 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>>>>>>>> index 5fa7016..4a1f0c3 100644
>>>>>>>> --- a/docs/specs/vhost-user.txt
>>>>>>>> +++ b/docs/specs/vhost-user.txt
>>>>>>>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>>>>>>>        log offset: offset from start of supplied file descriptor
>>>>>>>>            where logging starts (i.e. where guest address 0 
>>>>>>>> would be logged)
>>>>>>>> + * An IOTLB message
>>>>>>>> + ---------------------------------------------------------
>>>>>>>> +   | iova | size | user address | permissions flags | type |
>>>>>>>> + ---------------------------------------------------------
>>>>>>>> +
>>>>>>>> +   IOVA: a 64-bit guest I/O virtual address
>>>>>>> guest -> VM
>>>>>> Ok.
>>>>>>
>>>>>>>> +   Size: a 64-bit size
>>>>>>> How do you specify "all memory"? give special meaning to size 0?
>>>>>> Good point, it does not support all memory currently.
>>>>>> It is not vhost-user specific, but general to the vhost 
>>>>>> implementation.
>>>>> But iommu needs it to support passthrough.
>>>> Probably not, we will just pass the mappings in vhost_memory_region to
>>>> vhost. Its memory_size is also a __u64.
>>>>
>>>> Thanks
>>> That's different since that's chunks of qemu virtual memory.
>>>
>>> IOMMU maps IOVA to GPA.
>>>
>>
>> But we're in fact cache IOVA -> HVA mapping in the remote IOTLB. When 
>> passthrough mode is enabled, IOVA == GPA, so passing mappings in 
>> vhost_memory_region should be fine.
>
> Not sure this is a good idea, because when configured in passthrough,
> QEMU will see the IOMMU as enabled, so the the VIRTIO_F_IOMMU_PLATFORM
> feature will be negotiated if both guest and backend support it.
> So how the backend will know whether it should directly pick the
> translation directly into the vhost_memory_region, or translate it
> through the device IOTLB?

This no need for backend to know about this, since IOVA equals to GPA, 
vhost_memory_region stores IOVA -> HVA mapping. If we pass them, device 
IOTLB should work as usual?

>
> Maybe the solution would be for QEMU to wrap "all memory" IOTLB updates
> & invalidations to vhost_memory_regions, since the backend won't anyway
> be able to perform accesses outside these regions?

This is just what I mean, you can refer Peter's series.

>
>> The only possible "issue" with "all memory" is if you can not use a 
>> single TLB invalidation to invalidate all caches in remote TLB.
>
> If needed, maybe we could introduce a new VHOST_IOTLB_INVALIDATE message
> type?
> For older kernel backend that doesn't support it, -EINVAL will be
> returned, so QEMU could handle it another way in this case.

We could, but not sure it was really needed.

>
>> But this is only theoretical problem since it only happen when we 
>> have a 1 byte mapping [2^64 - 1, 2^64) cached in remote TLB. Consider:
>>
>> - E.g intel IOMMU has a range limitation for invalidation (1G currently)
>> - Looks like all existed IOMMU use page aligned mappings
>>
>> It was probably not a big issue. And for safety we could use two 
>> invalidations to make sure all caches were flushed remotely. Or just 
>> change the protocol from start, size to start, end. Vhost-kernel is 
>> probably too late for this change, but I'm still not quite sure it is 
>> worthwhile.
>
> I'm not for diverging the protocol between kernel & user backends.
>
> Thanks,
> Maxime

Yes, agree.

Thanks

>
>> Thanks

Jason Wang May 19, 2017, 7:46 a.m. UTC | #14

On 2017年05月18日 16:43, Maxime Coquelin wrote:
>>>
>>> +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the slave, 
>>> and the
>>> +master initiated the slave to master communication channel using the
>>> +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB miss 
>>> and access
>>> +failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests to 
>>> the master
>>> +with a struct vhost_iotlb_msg payload. For miss events, the iotlb 
>>> payload has
>>> +to be filled with the miss message type (1), the I/O virtual 
>>> address and the
>>> +permissions flags. For access failure event, the iotlb payload has 
>>> to be
>>> +filled with the access failure message type (4), the I/O virtual 
>>> address and
>>> +the permissions flags.
>>
>> I don't think slave should cache invalid entries. If it does not,
>> how can it detect access failure as opposed to a miss?
>
> Of course, invalid cache entries should not be cached.
> The VHOST_IOTLB_ACCESS_FAIL has been specified for the Kernel backend,
> even if the latter does not implement it yet. 

Yes, I leave this for future use e.g reporting copy_to_user() failure to 
userspace.

Thanks

Maxime Coquelin May 19, 2017, 8:35 a.m. UTC | #15

On 05/19/2017 08:48 AM, Jason Wang wrote:
> 
> 
> On 2017年05月17日 22:10, Maxime Coquelin wrote:
>>
>>
>> On 05/17/2017 04:53 AM, Jason Wang wrote:
>>>
>>>
>>> On 2017年05月16日 23:16, Michael S. Tsirkin wrote:
>>>> On Mon, May 15, 2017 at 01:45:28PM +0800, Jason Wang wrote:
>>>>>
>>>>> On 2017年05月13日 08:02, Michael S. Tsirkin wrote:
>>>>>> On Fri, May 12, 2017 at 04:21:58PM +0200, Maxime Coquelin wrote:
>>>>>>> On 05/11/2017 08:25 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Thu, May 11, 2017 at 02:32:46PM +0200, Maxime Coquelin wrote:
>>>>>>>>> This patch specifies and implements the master/slave communication
>>>>>>>>> to support device IOTLB in slave.
>>>>>>>>>
>>>>>>>>> The vhost_iotlb_msg structure introduced for kernel backends is
>>>>>>>>> re-used, making the design close between the two backends.
>>>>>>>>>
>>>>>>>>> An exception is the use of the secondary channel to enable the
>>>>>>>>> slave to send IOTLB miss requests to the master.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>> ---
>>>>>>>>>     docs/specs/vhost-user.txt | 75 
>>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>     hw/virtio/vhost-user.c    | 31 ++++++++++++++++++++
>>>>>>>>>     2 files changed, 106 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
>>>>>>>>> index 5fa7016..4a1f0c3 100644
>>>>>>>>> --- a/docs/specs/vhost-user.txt
>>>>>>>>> +++ b/docs/specs/vhost-user.txt
>>>>>>>>> @@ -97,6 +97,23 @@ Depending on the request type, payload can be:
>>>>>>>>>        log offset: offset from start of supplied file descriptor
>>>>>>>>>            where logging starts (i.e. where guest address 0 
>>>>>>>>> would be logged)
>>>>>>>>> + * An IOTLB message
>>>>>>>>> + ---------------------------------------------------------
>>>>>>>>> +   | iova | size | user address | permissions flags | type |
>>>>>>>>> + ---------------------------------------------------------
>>>>>>>>> +
>>>>>>>>> +   IOVA: a 64-bit guest I/O virtual address
>>>>>>>> guest -> VM
>>>>>>> Ok.
>>>>>>>
>>>>>>>>> +   Size: a 64-bit size
>>>>>>>> How do you specify "all memory"? give special meaning to size 0?
>>>>>>> Good point, it does not support all memory currently.
>>>>>>> It is not vhost-user specific, but general to the vhost 
>>>>>>> implementation.
>>>>>> But iommu needs it to support passthrough.
>>>>> Probably not, we will just pass the mappings in vhost_memory_region to
>>>>> vhost. Its memory_size is also a __u64.
>>>>>
>>>>> Thanks
>>>> That's different since that's chunks of qemu virtual memory.
>>>>
>>>> IOMMU maps IOVA to GPA.
>>>>
>>>
>>> But we're in fact cache IOVA -> HVA mapping in the remote IOTLB. When 
>>> passthrough mode is enabled, IOVA == GPA, so passing mappings in 
>>> vhost_memory_region should be fine.
>>
>> Not sure this is a good idea, because when configured in passthrough,
>> QEMU will see the IOMMU as enabled, so the the VIRTIO_F_IOMMU_PLATFORM
>> feature will be negotiated if both guest and backend support it.
>> So how the backend will know whether it should directly pick the
>> translation directly into the vhost_memory_region, or translate it
>> through the device IOTLB?
> 
> This no need for backend to know about this, since IOVA equals to GPA, 
> vhost_memory_region stores IOVA -> HVA mapping. If we pass them, device 
> IOTLB should work as usual?

Ok, I think there were a misunderstanding. I understood you said there
were no need to use the device IOTLB in this case.

> 
>>
>> Maybe the solution would be for QEMU to wrap "all memory" IOTLB updates
>> & invalidations to vhost_memory_regions, since the backend won't anyway
>> be able to perform accesses outside these regions?
> 
> This is just what I mean, you can refer Peter's series. >
>>
>>> The only possible "issue" with "all memory" is if you can not use a 
>>> single TLB invalidation to invalidate all caches in remote TLB.
>>
>> If needed, maybe we could introduce a new VHOST_IOTLB_INVALIDATE message
>> type?
>> For older kernel backend that doesn't support it, -EINVAL will be
>> returned, so QEMU could handle it another way in this case.
> 
> We could, but not sure it was really needed.

I meant VHOST_IOTLB_INVALIDATE_ALL, and yes, I'm not sure this is
needed. But this is an option we have it turns out to be at some point.

Thanks,
Maxime

Michael S. Tsirkin May 19, 2017, 4:42 p.m. UTC | #16

On Fri, May 19, 2017 at 03:46:36PM +0800, Jason Wang wrote:
> 
> 
> On 2017年05月18日 16:43, Maxime Coquelin wrote:
> > > > 
> > > > +When the VHOST_USER_PROTOCOL_F_SLAVE_REQ is supported by the
> > > > slave, and the
> > > > +master initiated the slave to master communication channel using the
> > > > +VHOST_USER_SET_SLAVE_REQ_FD request, the slave can send IOTLB
> > > > miss and access
> > > > +failure events by sending VHOST_USER_SLAVE_IOTLB_MSG requests
> > > > to the master
> > > > +with a struct vhost_iotlb_msg payload. For miss events, the
> > > > iotlb payload has
> > > > +to be filled with the miss message type (1), the I/O virtual
> > > > address and the
> > > > +permissions flags. For access failure event, the iotlb payload
> > > > has to be
> > > > +filled with the access failure message type (4), the I/O
> > > > virtual address and
> > > > +the permissions flags.
> > > 
> > > I don't think slave should cache invalid entries. If it does not,
> > > how can it detect access failure as opposed to a miss?
> > 
> > Of course, invalid cache entries should not be cached.
> > The VHOST_IOTLB_ACCESS_FAIL has been specified for the Kernel backend,
> > even if the latter does not implement it yet.
> 
> Yes, I leave this for future use e.g reporting copy_to_user() failure to
> userspace.
> 
> Thanks

Interesting. And it's not handled now.
So let's add a text "reserved for reporting internal access
errors in the future. Should not be used for now.".

[6/6] spec/vhost-user spec: Add IOMMU support

Commit Message

Comments

Patch