Message ID | 1464509494-159509-5-git-send-email-wei.w.wang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/29/2016 04:11 PM, Wei Wang wrote: > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > --- > Details | 324 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 324 insertions(+) > create mode 100644 Details > > diff --git a/Details b/Details > new file mode 100644 > index 0000000..4ea2252 > --- /dev/null > +++ b/Details > @@ -0,0 +1,324 @@ > +1 Device ID > +TBD > + > +2 Virtqueues > +0 controlq > + > +3 Feature Bits > +3.1 Local Feature Bits > +Currently no local feature bits are defined, so the standard virtio feature > +bits negation will always be successful and complete. > + > +3.2 Remote Feature Bits > +The remote feature bits are obtained from the frontend virtio device and > +negotiated with the vhost-pci driver via the controlq. The negotiation steps > +are described in 4.5 Device Initialization. > + > +4 Device Configuration Layout > +struct vhost_pci_config { > + #define VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK 0 > + #define VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK 1 > + #define VHOST_PCI_CONTROLQ_FEATURE_BITS_ACK 2 > + u32 ack_type; > + u32 ack_device_type; > + u64 ack_device_id; > + union { > + #define VHOST_PCI_CONTROLQ_ACK_ADD_DONE 0 > + #define VHOST_PCI_CONTROLQ_ACK_ADD_FAIL 1 > + #define VHOST_PCI_CONTROLQ_ACK_DEL_DONE 2 > + #define VHOST_PCI_CONTROLQ_ACK_DEL_FAIL 3 > + u64 ack_memory_info; > + u64 ack_device_info; > + u64 ack_feature_bits; > + }; > +}; Do you need to write all these 4 field to ack the operation? It seems it is not efficient and it is not flexible if the driver need to offer more data to the device in the further. Can we dedicate a vq for this purpose? BTW, current approach can not handle the case if there are multiple same kind of requests in the control queue, e.g, if there are two memory-add request in the control queue. > + > +The configuration fields are currently used for the vhost-pci driver to > +acknowledge to the vhost-pci device after it receives controlq messages. > + > +4.5 Device Initialization > +When a device VM boots, it creates a vhost-pci server socket. > + > +When a virtio device on the driver VM is created with specifying the use of a > +vhost-pci device as a backend, a client socket is created and connected to the > +corresponding vhost-pci server for message exchanges. > + > +The messages passed to the vhost-pci server is proceeded by the following > +header: > +struct vhost_pci_socket_hdr { > + #define VHOST_PCI_SOCKET_MEMORY_INFO 0 > + #define VHOST_PCI_SOCKET_MEMORY_INFO_ACK 1 > + #define VHOST_PCI_SOCKET_DEVICE_INFO 2 > + #define VHOST_PCI_SOCKET_DEVICE_INFO_ACK 3 > + #define VHOST_PCI_SOCKET_FEATURE_BITS 4 > + #define VHOST_PCI_SOCKET_FEATURE_BITS_ACK 5 > + u16 msg_type; > + u16 msg_version; > + u32 msg_len; > + u64 qemu_pid; > +}; > + > +The payload of the above message types can be constructed using the structures > +below: > +/* VHOST_PCI_SOCKET_MEMORY_INFO message */ > +struct vhost_pci_socket_memory_info { > + #define VHOST_PCI_ADD_MEMORY 0 > + #define VHOST_PCI_DEL_MEMORY 1 > + u16 ops; > + u32 nregions; > + struct vhost_pci_memory_region { > + int fd; > + u64 guest_phys_addr; > + u64 memory_size; > + u64 mmap_offset; > + } regions[VHOST_PCI_MAX_NREGIONS]; > +}; > + > +/* VHOST_PCI_SOCKET_DEVICE_INFO message */ > +struct vhost_pci_device_info { > + #define VHOST_PCI_ADD_FRONTEND_DEVICE 0 > + #define VHOST_PCI_DEL_FRONTEND_DEVICE 1 > + u16 ops; > + u32 nvirtq; > + #define VHOST_PCI_FRONTEND_DEVICE_NET 1 > + #define VHOST_PCI_FRONTEND_DEVICE_BLK 2 > + #define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3 > + #define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4 > + #define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5 > + #define VHOST_PCI_FRONTEND_DEVICE_SCSI 8 > + u32 device_type; > + u64 device_id; > + struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ]; > +}; > +The device_id field identifies the device. For example, it can be used to > +store a MAC address if the device_type is VHOST_PCI_FRONTEND_DEVICE_NET. > + > +/* VHOST_PCI_SOCKET_FEATURE_BITS message*/ > +struct vhost_pci_feature_bits { > + u64 feature_bits; > +}; We not only have 'socket feature bits' but also the feature bits for per virtio device plugged in on the side of vhost-pci device. E.g: if there are two virtio devices (e.g, a NIC and BLK) both of them need to directly communicate with another VM. The feature bits of these two devices need to be negotiated with that VM respectively. And you can not put these feature bits in vhost_pci_device_info struct as its vq is not created at that time. > + > +/* VHOST_PCI_SOCKET_xx_ACK messages */ > +struct vhost_pci_socket_ack { > + #define VHOST_PCI_SOCKET_ACK_ADD_DONE 0 > + #define VHOST_PCI_SOCKET_ACK_ADD_FAIL 1 > + #define VHOST_PCI_SOCKET_ACK_DEL_DONE 2 > + #define VHOST_PCI_SOCKET_ACK_DEL_FAIL 3 > + u64 ack; > +}; > + > +The driver update message passed via the controlq is preceded by the following > +header: > +struct vhost_pci_controlq_hdr { > + #define VHOST_PCI_CONTROLQ_MEMORY_INFO 0 > + #define VHOST_PCI_CONTROLQ_DEVICE_INFO 1 > + #define VHOST_PCI_CONTROLQ_FEATURE_BITS 2 > + #define VHOST_PCI_CONTROLQ_UPDATE_DONE 3 > + u16 msg_type; > + u16 msg_version; > + u32 msg_len; > +}; > + > +The payload of a VHOST_PCI_CONTROLQ_MEMORY_INFO message can be constructed > +using the following structure: > +/* VHOST_PCI_CONTROLQ_MEMORY_INFO message */ > +struct vhost_pci_controlq_memory_info { > + #define VHOST_PCI_ADD_MEMORY 0 > + #define VHOST_PCI_DEL_MEMORY 1 > + u16 ops; > + u32 nregion; > + struct exotic_memory_region { > + u64 region_base_xgpa; > + u64 size; > + u64 offset_in_bar_area; > + } region[VHOST_PCI_MAX_NREGIONS]; > +}; > + > +The payload of VHOST_PCI_CONTROLQ_DEVICE_INFO and > +VHOST_PCI_CONTROLQ_FEATURE_BITS messages can be constructed using the > +vhost_pci_device_info structure and the vhost_pci_feature_bits structure > +respectively. > + > +The payload of a VHOST_PCI_CONTROLQ_UPDATE_DONE message can be constructed > +using the structure below: > +struct vhost_pci_controlq_update_done { > + u32 device_type; > + u64 device_id; > +}; > + > +Fig. 1 shows the initialization steps. > + > +When the vhost-pci server receives a VHOST_PCI_SOCKET_MEMORY_INFO(ADD) message, > +it checks if a vhost-pci device has been created for the requesting VM whose > +QEMU process id is qemu_pid. If yes, it will simply update the subsequent > +received messages to the vhost-pci driver via the controlq. Otherwise, the > +server creates a new vhost-pci device, and continues the following > +initialization steps. qemu-pid is not stable as the existing VM will be killed silently and the new vhost-pci driver reusing the same qemu-pid will ask to join before the vhost-device gets to know the previous one has gone. > + > +The vhost-pci server adds up all the memory region size, and uses a 64-bit > +device bar for the mapping of all the memory regions obtained from the socket > +message. To better support memory hot-plugging of the driver VM, the bar is > +configured with a double size of the driver VM's memory. The server maps the > +received memory info via the QEMU MemoryRegion mechanism, and then the new > +created vhost-pci device is hot-plugged to the VM. > + > +When the device status is updated with DRIVER_OK, a > +VHOST_PCI_CONTROLQ_MEMORY_INFO(ADD) message, which is stemed from the memory > +info socket message, is put on the controlq and a controlq interrupt is injected > +to the VM. > + > +When the vhost-pci server receives a > +VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK(ADD_DONE) acknowledgement from the driver, > +it sends a VHOST_PCI_SOCKET_MEMORY_INFO_ACK(ADD_DONE) message to the client > +that is identified by the ack_device_type and ack_device_id fields. > + > +When the vhost-pci server receives a > +VHOST_PCI_SOCKET_FEATURE_BITS(feature bits) message, a > +VHOST_PCI_CONTROLQ_FEATURE_BITS(feature bits) message is put on the controlq > +and a controlq interrupt is injected to the VM. > + > +If the vhost-pci server notices that the driver fully accepted the offered > +feature bits, it sends a VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message > +to the client. If the vhost-pci server notices that the vhost-pci driver only > +accepted a subset of the offered feature bits, it sends a > +VHOST_PCI_SOCKET_FEATURE_BITS(accepted feature bits) message back to the > +client. The client side virtio device re-negotiates the new feature bits with > +its driver, and sends back a VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) > +message to the server. > + > +Either when the vhost-pci driver fully accepted the offered feature bits or a > +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message is received from the > +client, the vhost-pci server puts a VHOST_PCI_CONTROLQ_UPDATE_DONE message on > +the controlq, and a controlq interrupt is injected to the VM. Why VHOST_PCI_CONTROLQ_UPDATE_DONE is needed? > + > +When the vhost-pci server receives a VHOST_PCI_SOCKET_DEVICE_INFO(ADD) message, > +a VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) message is put on the controlq and a > +controlq interrupt is injected to the VM. > + > +When the vhost-pci server receives a > +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(ADD_DONE) acknowledgement from the driver, > +it sends a VHOST_PCI_SOCKET_DEVICE_INFO_ACK(ADD_DONE) message to the > +corresponding client. > + > +4.5.1 Device Requirements: Device Initialization > +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST > +be created when it boots. > + > +The vhost-pci server socket path SHOULD be provided to a virtio client socket > +for the connection to the vhost-pci server. > + > +The virtio device MUST finish the feature bits negotiation with its driver > +before negotiating them with the vhost-pci device. > + > +If the client receives a VHOST_PCI_SOCKET_FEATURE_BITS(feature bits) message, > +it MUST reset the device to go into backwards capability mode, re-negotiate > +the received feature bits with its driver, and send back a > +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message to the server. > + > +In any cases that an acknowledgement from the vhost-pci driver indicates a > +FAIL, the vhost-pci server SHOULD send a FAIL socket message to the client. > + > +In any cases that the msg_type is different between the sender and the > +receiver, the receiver SHOULD acknowledge a FAIL to the sender or convert the > +message to its version if the converted version is still functionally usable. > + > +4.5.2 Driver Requirements: Device Initialization > +The vhost-pci driver MUST NOT accept any feature bits that are not offered by > +the remote feature bits, and SHOULD acknowledge to the device of the accepted > +feature bits by writing them to the vhost_pci_config fields. > + > +When the vhost-pci driver receives a VHOST_PCI_CONTROLQ_UPDATE_DONE message > +from the controlq, the vhost-pci driver MUST initialize the corresponding > +driver interface of the device_type if it has not been initialized, and add > +the device_id to the frontend device list that records all the frontend virtio > +devices being supported by vhost-pci for inter-VM communications. Okay, i saw how to use it here. But, once the driver gets VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) then it knows how to communicate with the virtio device on another VM. Why we postpone the initialize until it gets VHOST_PCI_CONTROLQ_UPDATE_DONE? > + > +The vhost-pci driver SHOULD acknowledge to the device that the device and > +memory info update (add or delete) is DONE or FAIL by writing the > +acknowledgement (DONE or FAIL) to the vhost_pci_config fields. > + > +The vhost-pci driver MUST ensure that writing to the vhost_pci_config fields > +to be atomic. > + > +4.6 Device Operation > +4.6.1 Device Requirements: Device Operation > +4.6.1.1 Frontend Device Info Update > +When the frontend virtio device changes any info (e.g. device_id, virtq > +address) that it has sent to the vhost-pci device, it SHOULD send a > +VHOST_PCI_SOCKET_DEVICE_INFO(ADD) message, which contains the new device info, > +to the vhost-pci server. The vhost-pci device SHOULD insert a > +VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) to the controlq and inject a contrlq > +interrupt to the VM. > + > +When the vhost-pci device receives a > +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(ADD_DONE) acknowledgement from the driver, > +it SHOULD send a VHOST_PCI_SOCKET_DEVICE_INFO_ACK(ADD_DONE) message to the > +client that is identified by the ack_device_type and ack_device_id fields, to > +indicate that the vhost-pci driver has finished the handling of the device > +info update. If VHOST_PCI_CONTROLQ_UPDATE_DONE is really needed, you missed it here. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
T24gV2VkIDYvMS8yMDE2IDQ6MTUgUE0sIFhpYW8gR3Vhbmdyb25nIHdyb3RlOg0KPiBPbiAwNS8y OS8yMDE2IDA0OjExIFBNLCBXZWkgV2FuZyB3cm90ZToNCj4gPiBTaWduZWQtb2ZmLWJ5OiBXZWkg V2FuZyA8d2VpLncud2FuZ0BpbnRlbC5jb20+DQo+ID4gLS0tDQo+ID4gICBEZXRhaWxzIHwgMzI0 DQo+ICsrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysNCj4gPiAgIDEgZmlsZSBjaGFuZ2VkLCAzMjQgaW5zZXJ0aW9ucygrKQ0KPiA+ ICAgY3JlYXRlIG1vZGUgMTAwNjQ0IERldGFpbHMNCj4gPg0KPiA+IGRpZmYgLS1naXQgYS9EZXRh aWxzIGIvRGV0YWlscw0KPiA+IG5ldyBmaWxlIG1vZGUgMTAwNjQ0DQo+ID4gaW5kZXggMDAwMDAw MC4uNGVhMjI1Mg0KPiA+IC0tLSAvZGV2L251bGwNCj4gPiArKysgYi9EZXRhaWxzDQo+ID4gQEAg LTAsMCArMSwzMjQgQEANCj4gPiArMSBEZXZpY2UgSUQNCj4gPiArVEJEDQo+ID4gKw0KPiA+ICsy IFZpcnRxdWV1ZXMNCj4gPiArMCBjb250cm9scQ0KPiA+ICsNCj4gPiArMyBGZWF0dXJlIEJpdHMN Cj4gPiArMy4xIExvY2FsIEZlYXR1cmUgQml0cw0KPiA+ICtDdXJyZW50bHkgbm8gbG9jYWwgZmVh dHVyZSBiaXRzIGFyZSBkZWZpbmVkLCBzbyB0aGUgc3RhbmRhcmQgdmlydGlvDQo+ID4gK2ZlYXR1 cmUgYml0cyBuZWdhdGlvbiB3aWxsIGFsd2F5cyBiZSBzdWNjZXNzZnVsIGFuZCBjb21wbGV0ZS4N Cj4gPiArDQo+ID4gKzMuMiBSZW1vdGUgRmVhdHVyZSBCaXRzDQo+ID4gK1RoZSByZW1vdGUgZmVh dHVyZSBiaXRzIGFyZSBvYnRhaW5lZCBmcm9tIHRoZSBmcm9udGVuZCB2aXJ0aW8gZGV2aWNlDQo+ ID4gK2FuZCBuZWdvdGlhdGVkIHdpdGggdGhlIHZob3N0LXBjaSBkcml2ZXIgdmlhIHRoZSBjb250 cm9scS4gVGhlDQo+ID4gK25lZ290aWF0aW9uIHN0ZXBzIGFyZSBkZXNjcmliZWQgaW4gNC41IERl dmljZSBJbml0aWFsaXphdGlvbi4NCj4gPiArDQo+ID4gKzQgRGV2aWNlIENvbmZpZ3VyYXRpb24g TGF5b3V0DQo+ID4gK3N0cnVjdCB2aG9zdF9wY2lfY29uZmlnIHsNCj4gPiArCSNkZWZpbmUgVkhP U1RfUENJX0NPTlRST0xRX01FTU9SWV9JTkZPX0FDSyAwDQo+ID4gKwkjZGVmaW5lIFZIT1NUX1BD SV9DT05UUk9MUV9ERVZJQ0VfSU5GT19BQ0sgMQ0KPiA+ICsJI2RlZmluZSBWSE9TVF9QQ0lfQ09O VFJPTFFfRkVBVFVSRV9CSVRTX0FDSyAyDQo+ID4gKwl1MzIgYWNrX3R5cGU7DQo+ID4gKwl1MzIg YWNrX2RldmljZV90eXBlOw0KPiA+ICsJdTY0IGFja19kZXZpY2VfaWQ7DQo+ID4gKwl1bmlvbiB7 DQo+ID4gKwkJI2RlZmluZSBWSE9TVF9QQ0lfQ09OVFJPTFFfQUNLX0FERF9ET05FIDANCj4gPiAr CQkjZGVmaW5lIFZIT1NUX1BDSV9DT05UUk9MUV9BQ0tfQUREX0ZBSUwgMQ0KPiA+ICsJCSNkZWZp bmUgVkhPU1RfUENJX0NPTlRST0xRX0FDS19ERUxfRE9ORSAyDQo+ID4gKwkJI2RlZmluZSBWSE9T VF9QQ0lfQ09OVFJPTFFfQUNLX0RFTF9GQUlMIDMNCj4gPiArCQl1NjQgYWNrX21lbW9yeV9pbmZv Ow0KPiA+ICsJCXU2NCBhY2tfZGV2aWNlX2luZm87DQo+ID4gKwkJdTY0IGFja19mZWF0dXJlX2Jp dHM7DQo+ID4gKwl9Ow0KPiA+ICt9Ow0KPiANCj4gRG8geW91IG5lZWQgdG8gd3JpdGUgYWxsIHRo ZXNlIDQgZmllbGQgdG8gYWNrIHRoZSBvcGVyYXRpb24/IEl0IHNlZW1zIGl0IGlzIG5vdA0KPiBl ZmZpY2llbnQgYW5kIGl0IGlzIG5vdCBmbGV4aWJsZSBpZiB0aGUgZHJpdmVyIG5lZWQgdG8gb2Zm ZXIgbW9yZSBkYXRhIHRvIHRoZSBkZXZpY2UNCj4gaW4gdGhlIGZ1cnRoZXIuIENhbiB3ZSBkZWRp Y2F0ZSBhIHZxIGZvciB0aGlzIHB1cnBvc2XvvJ8NCg0KWWVzLCBhbGwgdGhlIDQgZmllbGRzIGFy ZSByZXF1aXJlZCB0byBiZSB3cml0dGVuLiBUaGUgdmhvc3QtcGNpIHNlcnZlciB1c3VhbGx5IGNv bm5lY3RzIHRvIG11bHRpcGxlIGNsaWVudHMsIGFuZCB0aGUgImFja19kZXZpY2VfdHlwZSIgYW5k ICJhY2tfZGV2aWNlX2lkIiBmaWVsZHMgYXJlIHVzZWQgdG8gaWRlbnRpZnkgdGhlbS4NCg0KQWdy ZWUsIGFub3RoZXIgY29udHJvbHEgZm9yIHRoZSBndWVzdC0+aG9zdCBkaXJlY3Rpb24gbG9va3Mg YmV0dGVyLCBhbmQgdGhlIGFib3ZlIGZpbGVkcyBjYW4gYmUgY29udmVydGVkIHRvIGJlIHRoZSBj b250cm9scSBtZXNzYWdlIGhlYWRlci4NCg0KPiANCj4gQlRXLCBjdXJyZW50IGFwcHJvYWNoIGNh biBub3QgaGFuZGxlIHRoZSBjYXNlIGlmIHRoZXJlIGFyZSBtdWx0aXBsZSBzYW1lIGtpbmQNCj4g b2YgcmVxdWVzdHMgaW4gdGhlIGNvbnRyb2wgcXVldWUsIGUuZywgaWYgdGhlcmUgYXJlIHR3byBt ZW1vcnktYWRkIHJlcXVlc3QgaW4NCj4gdGhlIGNvbnRyb2wgcXVldWUuDQoNCkEgdmhvc3QtcGNp IGRldmljZSBjb3JyZXNwb25kcyB0byBhIGRyaXZlciBWTS4gVGhlIHR3byBtZW1vcnktYWRkIHJl cXVlc3RzIG9uIHRoZSBjb250cm9scSBhcmUgYWxsIGZvciB0aGUgc2FtZSBkcml2ZXIgVk0uICBN ZW1vcnktYWRkIHJlcXVlc3RzIGZvciBkaWZmZXJlbnQgZHJpdmVyIFZNcyAgY291bGRu4oCZdCBi ZSBwcmVzZW50IG9uIHRoZSBzYW1lIGNvbnRyb2xxLiBJIGhhdmVu4oCZdCBzZWVuIHRoZSBpc3N1 ZSB5ZXQuIENhbiB5b3UgcGxlYXNlIGV4cGxhaW4gbW9yZT8gVGhhbmtzLg0KDQoNCj4gPiArDQo+ ID4gK1RoZSBjb25maWd1cmF0aW9uIGZpZWxkcyBhcmUgY3VycmVudGx5IHVzZWQgZm9yIHRoZSB2 aG9zdC1wY2kgZHJpdmVyDQo+ID4gK3RvIGFja25vd2xlZGdlIHRvIHRoZSB2aG9zdC1wY2kgZGV2 aWNlIGFmdGVyIGl0IHJlY2VpdmVzIGNvbnRyb2xxIG1lc3NhZ2VzLg0KPiA+ICsNCj4gPiArNC41 IERldmljZSBJbml0aWFsaXphdGlvbg0KPiA+ICtXaGVuIGEgZGV2aWNlIFZNIGJvb3RzLCBpdCBj cmVhdGVzIGEgdmhvc3QtcGNpIHNlcnZlciBzb2NrZXQuDQo+ID4gKw0KPiA+ICtXaGVuIGEgdmly dGlvIGRldmljZSBvbiB0aGUgZHJpdmVyIFZNIGlzIGNyZWF0ZWQgd2l0aCBzcGVjaWZ5aW5nIHRo ZQ0KPiA+ICt1c2Ugb2YgYSB2aG9zdC1wY2kgZGV2aWNlIGFzIGEgYmFja2VuZCwgYSBjbGllbnQg c29ja2V0IGlzIGNyZWF0ZWQNCj4gPiArYW5kIGNvbm5lY3RlZCB0byB0aGUgY29ycmVzcG9uZGlu ZyB2aG9zdC1wY2kgc2VydmVyIGZvciBtZXNzYWdlIGV4Y2hhbmdlcy4NCj4gPiArDQo+ID4gK1Ro ZSBtZXNzYWdlcyBwYXNzZWQgdG8gdGhlIHZob3N0LXBjaSBzZXJ2ZXIgaXMgcHJvY2VlZGVkIGJ5 IHRoZQ0KPiA+ICtmb2xsb3dpbmcNCj4gPiAraGVhZGVyOg0KPiA+ICtzdHJ1Y3Qgdmhvc3RfcGNp X3NvY2tldF9oZHIgew0KPiA+ICsJI2RlZmluZSBWSE9TVF9QQ0lfU09DS0VUX01FTU9SWV9JTkZP IDANCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJX1NPQ0tFVF9NRU1PUllfSU5GT19BQ0sgMQ0KPiA+ ICsJI2RlZmluZSBWSE9TVF9QQ0lfU09DS0VUX0RFVklDRV9JTkZPIDINCj4gPiArCSNkZWZpbmUg VkhPU1RfUENJX1NPQ0tFVF9ERVZJQ0VfSU5GT19BQ0sgMw0KPiA+ICsJI2RlZmluZSBWSE9TVF9Q Q0lfU09DS0VUX0ZFQVRVUkVfQklUUyA0DQo+ID4gKwkjZGVmaW5lIFZIT1NUX1BDSV9TT0NLRVRf RkVBVFVSRV9CSVRTX0FDSyA1DQo+ID4gKwl1MTYgbXNnX3R5cGU7DQo+ID4gKwl1MTYgbXNnX3Zl cnNpb247DQo+ID4gKwl1MzIgbXNnX2xlbjsNCj4gPiArCXU2NCBxZW11X3BpZDsNCj4gPiArfTsN Cj4gPiArDQo+ID4gK1RoZSBwYXlsb2FkIG9mIHRoZSBhYm92ZSBtZXNzYWdlIHR5cGVzIGNhbiBi ZSBjb25zdHJ1Y3RlZCB1c2luZyB0aGUNCj4gPiArc3RydWN0dXJlcw0KPiA+ICtiZWxvdzoNCj4g PiArLyogVkhPU1RfUENJX1NPQ0tFVF9NRU1PUllfSU5GTyBtZXNzYWdlICovIHN0cnVjdA0KPiA+ ICt2aG9zdF9wY2lfc29ja2V0X21lbW9yeV9pbmZvIHsNCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJ X0FERF9NRU1PUlkgMA0KPiA+ICsJI2RlZmluZSBWSE9TVF9QQ0lfREVMX01FTU9SWSAxDQo+ID4g Kwl1MTYgb3BzOw0KPiA+ICsJdTMyIG5yZWdpb25zOw0KPiA+ICsJc3RydWN0IHZob3N0X3BjaV9t ZW1vcnlfcmVnaW9uIHsNCj4gPiArCQlpbnQgZmQ7DQo+ID4gKwkJdTY0IGd1ZXN0X3BoeXNfYWRk cjsNCj4gPiArCQl1NjQgbWVtb3J5X3NpemU7DQo+ID4gKwkJdTY0IG1tYXBfb2Zmc2V0Ow0KPiA+ ICsJfSByZWdpb25zW1ZIT1NUX1BDSV9NQVhfTlJFR0lPTlNdOw0KPiA+ICt9Ow0KPiA+ICsNCj4g PiArLyogVkhPU1RfUENJX1NPQ0tFVF9ERVZJQ0VfSU5GTyBtZXNzYWdlICovIHN0cnVjdA0KPiA+ ICt2aG9zdF9wY2lfZGV2aWNlX2luZm8gew0KPiA+ICsJI2RlZmluZSBWSE9TVF9QQ0lfQUREX0ZS T05URU5EX0RFVklDRSAwDQo+ID4gKwkjZGVmaW5lIFZIT1NUX1BDSV9ERUxfRlJPTlRFTkRfREVW SUNFIDENCj4gPiArCXUxNiAgICBvcHM7DQo+ID4gKwl1MzIgICAgbnZpcnRxOw0KPiA+ICsJI2Rl ZmluZSBWSE9TVF9QQ0lfRlJPTlRFTkRfREVWSUNFX05FVCAxDQo+ID4gKwkjZGVmaW5lIFZIT1NU X1BDSV9GUk9OVEVORF9ERVZJQ0VfQkxLIDINCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJX0ZST05U RU5EX0RFVklDRV9DT05TT0xFIDMNCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJX0ZST05URU5EX0RF VklDRV9FTlRST1BZIDQNCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJX0ZST05URU5EX0RFVklDRV9C QUxMT09OIDUNCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJX0ZST05URU5EX0RFVklDRV9TQ1NJIDgN Cj4gPiArCXUzMiAgICBkZXZpY2VfdHlwZTsNCj4gPiArCXU2NCAgICBkZXZpY2VfaWQ7DQo+ID4g KwlzdHJ1Y3QgdmlydHEgZXhvdGljX3ZpcnRxW1ZIT1NUX1BDSV9NQVhfTlZJUlRRXTsNCj4gPiAr fTsNCj4gPiArVGhlIGRldmljZV9pZCBmaWVsZCBpZGVudGlmaWVzIHRoZSBkZXZpY2UuIEZvciBl eGFtcGxlLCBpdCBjYW4gYmUNCj4gPiArdXNlZCB0byBzdG9yZSBhIE1BQyBhZGRyZXNzIGlmIHRo ZSBkZXZpY2VfdHlwZSBpcw0KPiBWSE9TVF9QQ0lfRlJPTlRFTkRfREVWSUNFX05FVC4NCj4gPiAr DQo+ID4gKy8qIFZIT1NUX1BDSV9TT0NLRVRfRkVBVFVSRV9CSVRTIG1lc3NhZ2UqLyBzdHJ1Y3QN Cj4gPiArdmhvc3RfcGNpX2ZlYXR1cmVfYml0cyB7DQo+ID4gKwl1NjQgZmVhdHVyZV9iaXRzOw0K PiA+ICt9Ow0KPiANCj4gV2Ugbm90IG9ubHkgaGF2ZSAnc29ja2V0IGZlYXR1cmUgYml0cycgYnV0 IGFsc28gdGhlIGZlYXR1cmUgYml0cyBmb3IgcGVyIHZpcnRpbw0KPiBkZXZpY2UgcGx1Z2dlZCBp biBvbiB0aGUgc2lkZSBvZiB2aG9zdC1wY2kgZGV2aWNlLg0KDQpZZXMuIEl0IGlzIG1lbnRpb25l ZCBpbiAiMyBGZWF0dXJlIEJpdHMiLiBUaGUgc29ja2V0IGZlYXR1cmUgYml0cyBoZXJlIGFyZSBh Y3R1YWxseSB0aGUgcmVtb3RlIGZlYXR1cmUgYml0cyAoZ290IGZyb20gYSBzb2NrZXQgbWVzc2Fn ZSkuDQoNCj4gDQo+IEUuZzogaWYgdGhlcmUgYXJlIHR3byB2aXJ0aW8gZGV2aWNlcyAoZS5nLCBh IE5JQyBhbmQgQkxLKSBib3RoIG9mIHRoZW0gbmVlZCB0bw0KPiBkaXJlY3RseSBjb21tdW5pY2F0 ZSB3aXRoIGFub3RoZXIgVk0uIFRoZSBmZWF0dXJlIGJpdHMgb2YgdGhlc2UgdHdvIGRldmljZXMN Cj4gbmVlZCB0byBiZSBuZWdvdGlhdGVkIHdpdGggdGhhdCBWTSByZXNwZWN0aXZlbHkuIEFuZCB5 b3UgY2FuIG5vdCBwdXQgdGhlc2UNCj4gZmVhdHVyZSBiaXRzIGluIHZob3N0X3BjaV9kZXZpY2Vf aW5mbyBzdHJ1Y3QgYXMgaXRzIHZxIGlzIG5vdCBjcmVhdGVkIGF0IHRoYXQgdGltZS4NCg0KUmln aHQuIElmIHlvdSBjaGVjayB0aGUgaW5pdGlhbGl6YXRpb24gc3RlcHMgYmVsb3csIHRoZXJlIGlz IGEgc3RhdGVtZW50ICJXaGVuIHRoZSBkZXZpY2Ugc3RhdHVzIGlzIHVwZGF0ZWQgd2l0aCBEUklW RVJfT0siLg0KDQo+ID4gKw0KPiA+ICsvKiBWSE9TVF9QQ0lfU09DS0VUX3h4X0FDSyBtZXNzYWdl cyAqLyBzdHJ1Y3Qgdmhvc3RfcGNpX3NvY2tldF9hY2sgew0KPiA+ICsJI2RlZmluZSBWSE9TVF9Q Q0lfU09DS0VUX0FDS19BRERfRE9ORSAwDQo+ID4gKwkjZGVmaW5lIFZIT1NUX1BDSV9TT0NLRVRf QUNLX0FERF9GQUlMIDENCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJX1NPQ0tFVF9BQ0tfREVMX0RP TkUgMg0KPiA+ICsJI2RlZmluZSBWSE9TVF9QQ0lfU09DS0VUX0FDS19ERUxfRkFJTCAzDQo+ID4g Kwl1NjQgYWNrOw0KPiA+ICt9Ow0KPiA+ICsNCj4gPiArVGhlIGRyaXZlciB1cGRhdGUgbWVzc2Fn ZSBwYXNzZWQgdmlhIHRoZSBjb250cm9scSBpcyBwcmVjZWRlZCBieSB0aGUNCj4gPiArZm9sbG93 aW5nDQo+ID4gK2hlYWRlcjoNCj4gPiArc3RydWN0IHZob3N0X3BjaV9jb250cm9scV9oZHIgew0K PiA+ICsJI2RlZmluZSBWSE9TVF9QQ0lfQ09OVFJPTFFfTUVNT1JZX0lORk8gMA0KPiA+ICsJI2Rl ZmluZSBWSE9TVF9QQ0lfQ09OVFJPTFFfREVWSUNFX0lORk8gMQ0KPiA+ICsJI2RlZmluZSBWSE9T VF9QQ0lfQ09OVFJPTFFfRkVBVFVSRV9CSVRTIDINCj4gPiArCSNkZWZpbmUgVkhPU1RfUENJX0NP TlRST0xRX1VQREFURV9ET05FIDMNCj4gPiArCXUxNiBtc2dfdHlwZTsNCj4gPiArCXUxNiBtc2df dmVyc2lvbjsNCj4gPiArCXUzMiBtc2dfbGVuOw0KPiA+ICt9Ow0KPiA+ICsNCj4gPiArVGhlIHBh eWxvYWQgb2YgYSBWSE9TVF9QQ0lfQ09OVFJPTFFfTUVNT1JZX0lORk8gbWVzc2FnZSBjYW4gYmUN Cj4gPiArY29uc3RydWN0ZWQgdXNpbmcgdGhlIGZvbGxvd2luZyBzdHJ1Y3R1cmU6DQo+ID4gKy8q IFZIT1NUX1BDSV9DT05UUk9MUV9NRU1PUllfSU5GTyBtZXNzYWdlICovIHN0cnVjdA0KPiA+ICt2 aG9zdF9wY2lfY29udHJvbHFfbWVtb3J5X2luZm8gew0KPiA+ICsJI2RlZmluZSBWSE9TVF9QQ0lf QUREX01FTU9SWSAwDQo+ID4gKwkjZGVmaW5lIFZIT1NUX1BDSV9ERUxfTUVNT1JZIDENCj4gPiAr CXUxNiAgb3BzOw0KPiA+ICsJdTMyIG5yZWdpb247DQo+ID4gKwlzdHJ1Y3QgZXhvdGljX21lbW9y eV9yZWdpb24gew0KPiA+ICsJCXU2NCAgIHJlZ2lvbl9iYXNlX3hncGE7DQo+ID4gKwkJdTY0ICAg c2l6ZTsNCj4gPiArCQl1NjQgICBvZmZzZXRfaW5fYmFyX2FyZWE7DQo+ID4gKwl9IHJlZ2lvbltW SE9TVF9QQ0lfTUFYX05SRUdJT05TXTsNCj4gPiArfTsNCj4gPiArDQo+ID4gK1RoZSBwYXlsb2Fk IG9mIFZIT1NUX1BDSV9DT05UUk9MUV9ERVZJQ0VfSU5GTyBhbmQNCj4gPiArVkhPU1RfUENJX0NP TlRST0xRX0ZFQVRVUkVfQklUUyBtZXNzYWdlcyBjYW4gYmUgY29uc3RydWN0ZWQgdXNpbmcNCj4g dGhlDQo+ID4gK3Zob3N0X3BjaV9kZXZpY2VfaW5mbyBzdHJ1Y3R1cmUgYW5kIHRoZSB2aG9zdF9w Y2lfZmVhdHVyZV9iaXRzDQo+ID4gK3N0cnVjdHVyZSByZXNwZWN0aXZlbHkuDQo+ID4gKw0KPiA+ ICtUaGUgcGF5bG9hZCBvZiBhIFZIT1NUX1BDSV9DT05UUk9MUV9VUERBVEVfRE9ORSBtZXNzYWdl IGNhbiBiZQ0KPiA+ICtjb25zdHJ1Y3RlZCB1c2luZyB0aGUgc3RydWN0dXJlIGJlbG93Og0KPiA+ ICtzdHJ1Y3Qgdmhvc3RfcGNpX2NvbnRyb2xxX3VwZGF0ZV9kb25lIHsNCj4gPiArCXUzMiAgICBk ZXZpY2VfdHlwZTsNCj4gPiArCXU2NCAgICBkZXZpY2VfaWQ7DQo+ID4gK307DQo+ID4gKw0KPiA+ ICtGaWcuIDEgc2hvd3MgdGhlIGluaXRpYWxpemF0aW9uIHN0ZXBzLg0KPiA+ICsNCj4gPiArV2hl biB0aGUgdmhvc3QtcGNpIHNlcnZlciByZWNlaXZlcyBhDQo+ID4gK1ZIT1NUX1BDSV9TT0NLRVRf TUVNT1JZX0lORk8oQUREKSBtZXNzYWdlLCBpdCBjaGVja3MgaWYgYSB2aG9zdC1wY2kNCj4gPiAr ZGV2aWNlIGhhcyBiZWVuIGNyZWF0ZWQgZm9yIHRoZSByZXF1ZXN0aW5nIFZNIHdob3NlIFFFTVUg cHJvY2VzcyBpZA0KPiA+ICtpcyBxZW11X3BpZC4gSWYgeWVzLCBpdCB3aWxsIHNpbXBseSB1cGRh dGUgdGhlIHN1YnNlcXVlbnQgcmVjZWl2ZWQNCj4gPiArbWVzc2FnZXMgdG8gdGhlIHZob3N0LXBj aSBkcml2ZXIgdmlhIHRoZSBjb250cm9scS4gT3RoZXJ3aXNlLCB0aGUNCj4gPiArc2VydmVyIGNy ZWF0ZXMgYSBuZXcgdmhvc3QtcGNpIGRldmljZSwgYW5kIGNvbnRpbnVlcyB0aGUgZm9sbG93aW5n DQo+IGluaXRpYWxpemF0aW9uIHN0ZXBzLg0KPiANCj4gDQo+IHFlbXUtcGlkIGlzIG5vdCBzdGFi bGUgYXMgdGhlIGV4aXN0aW5nIFZNIHdpbGwgYmUga2lsbGVkIHNpbGVudGx5IGFuZCB0aGUgbmV3 DQo+IHZob3N0LXBjaSBkcml2ZXIgcmV1c2luZyB0aGUgc2FtZSBxZW11LXBpZCB3aWxsIGFzayB0 byBqb2luIGJlZm9yZSB0aGUgdmhvc3QtDQo+IGRldmljZSBnZXRzIHRvIGtub3cgdGhlIHByZXZp b3VzIG9uZSBoYXMgZ29uZS4NCg0KV291bGQgaXQgYmUgYSBub3JtYWwgYW5kIGxlZ2FsIG9wZXJh dGlvbiB0byBzaWxlbnRseSBraWxsIGEgUUVNVT8gSSBndWVzcyBvbmx5IHRoZSBzeXN0ZW0gYWRt aW4gY2FuIGRvIHRoYXQsIHJpZ2h0Pw0KDQpJZiB0aGF0J3MgdHJ1ZSwgSSB0aGluayB3ZSBjYW4g YWRkIGEgbmV3IGZpZWxkLCAidTY0IHRzY19vZl9iaXJ0aCIgdG8gdGhlIHZob3N0X3BjaV9zb2Nr ZXRfaGRyIHN0cnVjdHVyZS4gSXQgcmVjb3JkcyB0aGUgdHNjIHdoZW4gdGhlIFFFTVUgaXMgY3Jl YXRlZC4gDQpJZiB0aGF0J3MgdHJ1ZSwgYW5vdGhlciBwcm9ibGVtIHdvdWxkIGJlIHRoZSByZW1v dmUgb2YgdGhlIHZob3N0LXBjaSBkZXZpY2UgZm9yIGEgc2lsZW50bHkga2lsbGVkIGRyaXZlciBW TS4NClRoZSB2aG9zdC1wY2kgc2VydmVyIG1heSBuZWVkIHRvIHBlcmlvZGljYWxseSBzZW5kIGEg Y2hlY2tpbmcgbWVzc2FnZSB0byBjaGVjayBpZiB0aGUgZHJpdmVyIFZNIGlzIHNpbGVudGx5IGtp bGxlZC4gSWYgdGhhdCByZWFsbHkgaGFwcGVucywgaXQgc2hvdWxkIHJlbW92ZSB0aGUgcmVsYXRl ZCB2aG9zdC1wY2kgZGV2aWNlLg0KIA0KPiA+ICsNCj4gPiArVGhlIHZob3N0LXBjaSBzZXJ2ZXIg YWRkcyB1cCBhbGwgdGhlIG1lbW9yeSByZWdpb24gc2l6ZSwgYW5kIHVzZXMgYQ0KPiA+ICs2NC1i aXQgZGV2aWNlIGJhciBmb3IgdGhlIG1hcHBpbmcgb2YgYWxsIHRoZSBtZW1vcnkgcmVnaW9ucyBv YnRhaW5lZA0KPiA+ICtmcm9tIHRoZSBzb2NrZXQgbWVzc2FnZS4gVG8gYmV0dGVyIHN1cHBvcnQg bWVtb3J5IGhvdC1wbHVnZ2luZyBvZiB0aGUNCj4gPiArZHJpdmVyIFZNLCB0aGUgYmFyIGlzIGNv bmZpZ3VyZWQgd2l0aCBhIGRvdWJsZSBzaXplIG9mIHRoZSBkcml2ZXINCj4gPiArVk0ncyBtZW1v cnkuIFRoZSBzZXJ2ZXIgbWFwcyB0aGUgcmVjZWl2ZWQgbWVtb3J5IGluZm8gdmlhIHRoZSBRRU1V DQo+ID4gK01lbW9yeVJlZ2lvbiBtZWNoYW5pc20sIGFuZCB0aGVuIHRoZSBuZXcgY3JlYXRlZCB2 aG9zdC1wY2kgZGV2aWNlIGlzDQo+IGhvdC1wbHVnZ2VkIHRvIHRoZSBWTS4NCj4gPiArDQo+ID4g K1doZW4gdGhlIGRldmljZSBzdGF0dXMgaXMgdXBkYXRlZCB3aXRoIERSSVZFUl9PSywgYQ0KPiA+ ICtWSE9TVF9QQ0lfQ09OVFJPTFFfTUVNT1JZX0lORk8oQUREKSBtZXNzYWdlLCB3aGljaCBpcyBz dGVtZWQNCj4gZnJvbSB0aGUNCj4gPiArbWVtb3J5IGluZm8gc29ja2V0IG1lc3NhZ2UsIGlzIHB1 dCBvbiB0aGUgY29udHJvbHEgYW5kIGEgY29udHJvbHENCj4gPiAraW50ZXJydXB0IGlzIGluamVj dGVkIHRvIHRoZSBWTS4NCj4gPiArDQo+ID4gK1doZW4gdGhlIHZob3N0LXBjaSBzZXJ2ZXIgcmVj ZWl2ZXMgYQ0KPiA+ICtWSE9TVF9QQ0lfQ09OVFJPTFFfTUVNT1JZX0lORk9fQUNLKEFERF9ET05F KQ0KPiBhY2tub3dsZWRnZW1lbnQgZnJvbSB0aGUNCj4gPiArZHJpdmVyLCBpdCBzZW5kcyBhIFZI T1NUX1BDSV9TT0NLRVRfTUVNT1JZX0lORk9fQUNLKEFERF9ET05FKQ0KPiBtZXNzYWdlDQo+ID4g K3RvIHRoZSBjbGllbnQgdGhhdCBpcyBpZGVudGlmaWVkIGJ5IHRoZSBhY2tfZGV2aWNlX3R5cGUg YW5kIGFja19kZXZpY2VfaWQgZmllbGRzLg0KPiA+ICsNCj4gPiArV2hlbiB0aGUgdmhvc3QtcGNp IHNlcnZlciByZWNlaXZlcyBhDQo+ID4gK1ZIT1NUX1BDSV9TT0NLRVRfRkVBVFVSRV9CSVRTKGZl YXR1cmUgYml0cykgbWVzc2FnZSwgYQ0KPiA+ICtWSE9TVF9QQ0lfQ09OVFJPTFFfRkVBVFVSRV9C SVRTKGZlYXR1cmUgYml0cykgbWVzc2FnZSBpcyBwdXQgb24gdGhlDQo+ID4gK2NvbnRyb2xxIGFu ZCBhIGNvbnRyb2xxIGludGVycnVwdCBpcyBpbmplY3RlZCB0byB0aGUgVk0uDQo+ID4gKw0KPiA+ ICtJZiB0aGUgdmhvc3QtcGNpIHNlcnZlciBub3RpY2VzIHRoYXQgdGhlIGRyaXZlciBmdWxseSBh Y2NlcHRlZCB0aGUNCj4gPiArb2ZmZXJlZCBmZWF0dXJlIGJpdHMsIGl0IHNlbmRzIGENCj4gPiAr VkhPU1RfUENJX1NPQ0tFVF9GRUFUVVJFX0JJVFNfQUNLKEFERF9ET05FKSBtZXNzYWdlIHRvIHRo ZSBjbGllbnQuDQo+IElmDQo+ID4gK3RoZSB2aG9zdC1wY2kgc2VydmVyIG5vdGljZXMgdGhhdCB0 aGUgdmhvc3QtcGNpIGRyaXZlciBvbmx5IGFjY2VwdGVkDQo+ID4gK2Egc3Vic2V0IG9mIHRoZSBv ZmZlcmVkIGZlYXR1cmUgYml0cywgaXQgc2VuZHMgYQ0KPiA+ICtWSE9TVF9QQ0lfU09DS0VUX0ZF QVRVUkVfQklUUyhhY2NlcHRlZCBmZWF0dXJlIGJpdHMpIG1lc3NhZ2UgYmFjayB0bw0KPiA+ICt0 aGUgY2xpZW50LiBUaGUgY2xpZW50IHNpZGUgdmlydGlvIGRldmljZSByZS1uZWdvdGlhdGVzIHRo ZSBuZXcNCj4gPiArZmVhdHVyZSBiaXRzIHdpdGggaXRzIGRyaXZlciwgYW5kIHNlbmRzIGJhY2sg YQ0KPiA+ICtWSE9TVF9QQ0lfU09DS0VUX0ZFQVRVUkVfQklUU19BQ0soQUREX0RPTkUpDQo+ID4g K21lc3NhZ2UgdG8gdGhlIHNlcnZlci4NCj4gPiArDQo+ID4gK0VpdGhlciB3aGVuIHRoZSB2aG9z dC1wY2kgZHJpdmVyIGZ1bGx5IGFjY2VwdGVkIHRoZSBvZmZlcmVkIGZlYXR1cmUNCj4gPiArYml0 cyBvciBhDQo+ID4gK1ZIT1NUX1BDSV9TT0NLRVRfRkVBVFVSRV9CSVRTX0FDSyhBRERfRE9ORSkg bWVzc2FnZSBpcyByZWNlaXZlZA0KPiBmcm9tDQo+ID4gK3RoZSBjbGllbnQsIHRoZSB2aG9zdC1w Y2kgc2VydmVyIHB1dHMgYQ0KPiA+ICtWSE9TVF9QQ0lfQ09OVFJPTFFfVVBEQVRFX0RPTkUgbWVz c2FnZSBvbiB0aGUgY29udHJvbHEsIGFuZCBhDQo+IGNvbnRyb2xxIGludGVycnVwdCBpcyBpbmpl Y3RlZCB0byB0aGUgVk0uDQo+IA0KPiBXaHkgVkhPU1RfUENJX0NPTlRST0xRX1VQREFURV9ET05F IGlzIG5lZWRlZD8NCg0KT0ssIHRoaXMgb25lIGxvb2tzIHJlZHVuZGFudC4gV2UgY2FuIHNldCB1 cCB0aGUgcmVsYXRlZCBzdXBwb3J0IGZvciB0aGF0IGZyb250ZW5kIGRldmljZSB3aGVuIHRoZSBk ZXZpY2UgaW5mbyBpcyByZWNlaXZlZCB2aWEgdGhlIGNvbnRyb2xxLg0KDQpCZXN0LA0KV2VpDQog DQo+ID4gKw0KPiA+ICtXaGVuIHRoZSB2aG9zdC1wY2kgc2VydmVyIHJlY2VpdmVzIGENCj4gPiAr VkhPU1RfUENJX1NPQ0tFVF9ERVZJQ0VfSU5GTyhBREQpIG1lc3NhZ2UsIGENCj4gPiArVkhPU1Rf UENJX0NPTlRST0xRX0RFVklDRV9JTkZPKEFERCkgbWVzc2FnZSBpcyBwdXQgb24gdGhlIGNvbnRy b2xxDQo+IGFuZCBhIGNvbnRyb2xxIGludGVycnVwdCBpcyBpbmplY3RlZCB0byB0aGUgVk0uDQo+ ID4gKw0KPiA+ICtXaGVuIHRoZSB2aG9zdC1wY2kgc2VydmVyIHJlY2VpdmVzIGENCj4gPiArVkhP U1RfUENJX0NPTlRST0xRX0RFVklDRV9JTkZPX0FDSyhBRERfRE9ORSkgYWNrbm93bGVkZ2VtZW50 DQo+IGZyb20gdGhlDQo+ID4gK2RyaXZlciwgaXQgc2VuZHMgYSBWSE9TVF9QQ0lfU09DS0VUX0RF VklDRV9JTkZPX0FDSyhBRERfRE9ORSkNCj4gbWVzc2FnZQ0KPiA+ICt0byB0aGUgY29ycmVzcG9u ZGluZyBjbGllbnQuDQo+ID4gKw0KPiA+ICs0LjUuMSBEZXZpY2UgUmVxdWlyZW1lbnRzOiBEZXZp Y2UgSW5pdGlhbGl6YXRpb24gVG8gbGV0IGEgVk0gYmUNCj4gPiArY2FwYWJsZSBvZiBjcmVhdGlu ZyB2aG9zdC1wY2kgZGV2aWNlcywgYSB2aG9zdC1wY2kgc2VydmVyIE1VU1QgYmUNCj4gPiArY3Jl YXRlZCB3aGVuIGl0IGJvb3RzLg0KPiA+ICsNCj4gPiArVGhlIHZob3N0LXBjaSBzZXJ2ZXIgc29j a2V0IHBhdGggU0hPVUxEIGJlIHByb3ZpZGVkIHRvIGEgdmlydGlvDQo+ID4gK2NsaWVudCBzb2Nr ZXQgZm9yIHRoZSBjb25uZWN0aW9uIHRvIHRoZSB2aG9zdC1wY2kgc2VydmVyLg0KPiA+ICsNCj4g PiArVGhlIHZpcnRpbyBkZXZpY2UgTVVTVCBmaW5pc2ggdGhlIGZlYXR1cmUgYml0cyBuZWdvdGlh dGlvbiB3aXRoIGl0cw0KPiA+ICtkcml2ZXIgYmVmb3JlIG5lZ290aWF0aW5nIHRoZW0gd2l0aCB0 aGUgdmhvc3QtcGNpIGRldmljZS4NCj4gPiArDQo+ID4gK0lmIHRoZSBjbGllbnQgcmVjZWl2ZXMg YSBWSE9TVF9QQ0lfU09DS0VUX0ZFQVRVUkVfQklUUyhmZWF0dXJlIGJpdHMpDQo+ID4gK21lc3Nh Z2UsIGl0IE1VU1QgcmVzZXQgdGhlIGRldmljZSB0byBnbyBpbnRvIGJhY2t3YXJkcyBjYXBhYmls aXR5DQo+ID4gK21vZGUsIHJlLW5lZ290aWF0ZSB0aGUgcmVjZWl2ZWQgZmVhdHVyZSBiaXRzIHdp dGggaXRzIGRyaXZlciwgYW5kDQo+ID4gK3NlbmQgYmFjayBhDQo+ID4gK1ZIT1NUX1BDSV9TT0NL RVRfRkVBVFVSRV9CSVRTX0FDSyhBRERfRE9ORSkgbWVzc2FnZSB0byB0aGUgc2VydmVyLg0KPiA+ ICsNCj4gPiArSW4gYW55IGNhc2VzIHRoYXQgYW4gYWNrbm93bGVkZ2VtZW50IGZyb20gdGhlIHZo b3N0LXBjaSBkcml2ZXINCj4gPiAraW5kaWNhdGVzIGEgRkFJTCwgdGhlIHZob3N0LXBjaSBzZXJ2 ZXIgU0hPVUxEIHNlbmQgYSBGQUlMIHNvY2tldCBtZXNzYWdlIHRvDQo+IHRoZSBjbGllbnQuDQo+ ID4gKw0KPiA+ICtJbiBhbnkgY2FzZXMgdGhhdCB0aGUgbXNnX3R5cGUgaXMgZGlmZmVyZW50IGJl dHdlZW4gdGhlIHNlbmRlciBhbmQNCj4gPiArdGhlIHJlY2VpdmVyLCB0aGUgcmVjZWl2ZXIgU0hP VUxEIGFja25vd2xlZGdlIGEgRkFJTCB0byB0aGUgc2VuZGVyIG9yDQo+ID4gK2NvbnZlcnQgdGhl IG1lc3NhZ2UgdG8gaXRzIHZlcnNpb24gaWYgdGhlIGNvbnZlcnRlZCB2ZXJzaW9uIGlzIHN0aWxs IGZ1bmN0aW9uYWxseQ0KPiB1c2FibGUuDQo+ID4gKw0KPiA+ICs0LjUuMiBEcml2ZXIgUmVxdWly ZW1lbnRzOiBEZXZpY2UgSW5pdGlhbGl6YXRpb24gVGhlIHZob3N0LXBjaSBkcml2ZXINCj4gPiAr TVVTVCBOT1QgYWNjZXB0IGFueSBmZWF0dXJlIGJpdHMgdGhhdCBhcmUgbm90IG9mZmVyZWQgYnkg dGhlIHJlbW90ZQ0KPiA+ICtmZWF0dXJlIGJpdHMsIGFuZCBTSE9VTEQgYWNrbm93bGVkZ2UgdG8g dGhlIGRldmljZSBvZiB0aGUgYWNjZXB0ZWQNCj4gPiArZmVhdHVyZSBiaXRzIGJ5IHdyaXRpbmcg dGhlbSB0byB0aGUgdmhvc3RfcGNpX2NvbmZpZyBmaWVsZHMuDQo+ID4gKw0KPiA+ICtXaGVuIHRo ZSB2aG9zdC1wY2kgZHJpdmVyIHJlY2VpdmVzIGENCj4gVkhPU1RfUENJX0NPTlRST0xRX1VQREFU RV9ET05FDQo+ID4gK21lc3NhZ2UgZnJvbSB0aGUgY29udHJvbHEsIHRoZSB2aG9zdC1wY2kgZHJp dmVyIE1VU1QgaW5pdGlhbGl6ZSB0aGUNCj4gPiArY29ycmVzcG9uZGluZyBkcml2ZXIgaW50ZXJm YWNlIG9mIHRoZSBkZXZpY2VfdHlwZSBpZiBpdCBoYXMgbm90IGJlZW4NCj4gPiAraW5pdGlhbGl6 ZWQsIGFuZCBhZGQgdGhlIGRldmljZV9pZCB0byB0aGUgZnJvbnRlbmQgZGV2aWNlIGxpc3QgdGhh dA0KPiA+ICtyZWNvcmRzIGFsbCB0aGUgZnJvbnRlbmQgdmlydGlvIGRldmljZXMgYmVpbmcgc3Vw cG9ydGVkIGJ5IHZob3N0LXBjaSBmb3IgaW50ZXItDQo+IFZNIGNvbW11bmljYXRpb25zLg0KPiAN Cj4gT2theSwgaSBzYXcgaG93IHRvIHVzZSBpdCBoZXJlLiBCdXQsIG9uY2UgdGhlIGRyaXZlciBn ZXRzDQo+IFZIT1NUX1BDSV9DT05UUk9MUV9ERVZJQ0VfSU5GTyhBREQpIHRoZW4gaXQga25vd3Mg aG93IHRvDQo+IGNvbW11bmljYXRlIHdpdGggdGhlIHZpcnRpbyBkZXZpY2Ugb24gYW5vdGhlciBW TS4gV2h5IHdlIHBvc3Rwb25lIHRoZQ0KPiBpbml0aWFsaXplIHVudGlsIGl0IGdldHMgVkhPU1Rf UENJX0NPTlRST0xRX1VQREFURV9ET05FPw0KPiANCj4gPiArDQo+ID4gK1RoZSB2aG9zdC1wY2kg ZHJpdmVyIFNIT1VMRCBhY2tub3dsZWRnZSB0byB0aGUgZGV2aWNlIHRoYXQgdGhlIGRldmljZQ0K PiA+ICthbmQgbWVtb3J5IGluZm8gdXBkYXRlIChhZGQgb3IgZGVsZXRlKSBpcyBET05FIG9yIEZB SUwgYnkgd3JpdGluZyB0aGUNCj4gPiArYWNrbm93bGVkZ2VtZW50IChET05FIG9yIEZBSUwpIHRv IHRoZSB2aG9zdF9wY2lfY29uZmlnIGZpZWxkcy4NCj4gPiArDQo+ID4gK1RoZSB2aG9zdC1wY2kg ZHJpdmVyIE1VU1QgZW5zdXJlIHRoYXQgd3JpdGluZyB0byB0aGUgdmhvc3RfcGNpX2NvbmZpZw0K PiA+ICtmaWVsZHMgdG8gYmUgYXRvbWljLg0KPiA+ICsNCj4gPiArNC42IERldmljZSBPcGVyYXRp b24NCj4gPiArNC42LjEgRGV2aWNlIFJlcXVpcmVtZW50czogRGV2aWNlIE9wZXJhdGlvbg0KPiA+ ICs0LjYuMS4xIEZyb250ZW5kIERldmljZSBJbmZvIFVwZGF0ZQ0KPiA+ICtXaGVuIHRoZSBmcm9u dGVuZCB2aXJ0aW8gZGV2aWNlIGNoYW5nZXMgYW55IGluZm8gKGUuZy4gZGV2aWNlX2lkLA0KPiA+ ICt2aXJ0cQ0KPiA+ICthZGRyZXNzKSB0aGF0IGl0IGhhcyBzZW50IHRvIHRoZSB2aG9zdC1wY2kg ZGV2aWNlLCBpdCBTSE9VTEQgc2VuZCBhDQo+ID4gK1ZIT1NUX1BDSV9TT0NLRVRfREVWSUNFX0lO Rk8oQUREKSBtZXNzYWdlLCB3aGljaCBjb250YWlucyB0aGUgbmV3DQo+ID4gK2RldmljZSBpbmZv LCB0byB0aGUgdmhvc3QtcGNpIHNlcnZlci4gVGhlIHZob3N0LXBjaSBkZXZpY2UgU0hPVUxEDQo+ ID4gK2luc2VydCBhDQo+ID4gK1ZIT1NUX1BDSV9DT05UUk9MUV9ERVZJQ0VfSU5GTyhBREQpIHRv IHRoZSBjb250cm9scSBhbmQgaW5qZWN0IGENCj4gPiArY29udHJscSBpbnRlcnJ1cHQgdG8gdGhl IFZNLg0KPiA+ICsNCj4gPiArV2hlbiB0aGUgdmhvc3QtcGNpIGRldmljZSByZWNlaXZlcyBhDQo+ ID4gK1ZIT1NUX1BDSV9DT05UUk9MUV9ERVZJQ0VfSU5GT19BQ0soQUREX0RPTkUpIGFja25vd2xl ZGdlbWVudA0KPiBmcm9tIHRoZQ0KPiA+ICtkcml2ZXIsIGl0IFNIT1VMRCBzZW5kIGENCj4gVkhP U1RfUENJX1NPQ0tFVF9ERVZJQ0VfSU5GT19BQ0soQUREX0RPTkUpDQo+ID4gK21lc3NhZ2UgdG8g dGhlIGNsaWVudCB0aGF0IGlzIGlkZW50aWZpZWQgYnkgdGhlIGFja19kZXZpY2VfdHlwZSBhbmQN Cj4gPiArYWNrX2RldmljZV9pZCBmaWVsZHMsIHRvIGluZGljYXRlIHRoYXQgdGhlIHZob3N0LXBj aSBkcml2ZXIgaGFzDQo+ID4gK2ZpbmlzaGVkIHRoZSBoYW5kbGluZyBvZiB0aGUgZGV2aWNlIGlu Zm8gdXBkYXRlLg0KPiANCj4gSWYgVkhPU1RfUENJX0NPTlRST0xRX1VQREFURV9ET05FIGlzIHJl YWxseSBuZWVkZWQsIHlvdSBtaXNzZWQgaXQgaGVyZS4NCg0K -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/02/2016 11:15 AM, Wang, Wei W wrote: > On Wed 6/1/2016 4:15 PM, Xiao Guangrong wrote: >> On 05/29/2016 04:11 PM, Wei Wang wrote: >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >>> --- >>> Details | 324 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 324 insertions(+) >>> create mode 100644 Details >>> >>> diff --git a/Details b/Details >>> new file mode 100644 >>> index 0000000..4ea2252 >>> --- /dev/null >>> +++ b/Details >>> @@ -0,0 +1,324 @@ >>> +1 Device ID >>> +TBD >>> + >>> +2 Virtqueues >>> +0 controlq >>> + >>> +3 Feature Bits >>> +3.1 Local Feature Bits >>> +Currently no local feature bits are defined, so the standard virtio >>> +feature bits negation will always be successful and complete. >>> + >>> +3.2 Remote Feature Bits >>> +The remote feature bits are obtained from the frontend virtio device >>> +and negotiated with the vhost-pci driver via the controlq. The >>> +negotiation steps are described in 4.5 Device Initialization. >>> + >>> +4 Device Configuration Layout >>> +struct vhost_pci_config { >>> + #define VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK 0 >>> + #define VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK 1 >>> + #define VHOST_PCI_CONTROLQ_FEATURE_BITS_ACK 2 >>> + u32 ack_type; >>> + u32 ack_device_type; >>> + u64 ack_device_id; >>> + union { >>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_DONE 0 >>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_FAIL 1 >>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_DONE 2 >>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_FAIL 3 >>> + u64 ack_memory_info; >>> + u64 ack_device_info; >>> + u64 ack_feature_bits; >>> + }; >>> +}; >> >> Do you need to write all these 4 field to ack the operation? It seems it is not >> efficient and it is not flexible if the driver need to offer more data to the device >> in the further. Can we dedicate a vq for this purpose? > > Yes, all the 4 fields are required to be written. The vhost-pci server usually connects to multiple clients, and the "ack_device_type" and "ack_device_id" fields are used to identify them. > > Agree, another controlq for the guest->host direction looks better, and the above fileds can be converted to be the controlq message header. > Thanks. >> >> BTW, current approach can not handle the case if there are multiple same kind >> of requests in the control queue, e.g, if there are two memory-add request in >> the control queue. > > A vhost-pci device corresponds to a driver VM. The two memory-add requests on the controlq are all for the same driver VM. Memory-add requests for different driver VMs couldn’t be present on the same controlq. I haven’t seen the issue yet. Can you please explain more? Thanks. The issue is caused by "The two memory-add requests on the controlq are all for the same driver VM", the driver need to ACK these request respectively, however, these two requests have the same ack_type, device_type, device_id, ack_memory_info, then QEMU is not able to figure out which request has been acked. > > >>> + >>> +The configuration fields are currently used for the vhost-pci driver >>> +to acknowledge to the vhost-pci device after it receives controlq messages. >>> + >>> +4.5 Device Initialization >>> +When a device VM boots, it creates a vhost-pci server socket. >>> + >>> +When a virtio device on the driver VM is created with specifying the >>> +use of a vhost-pci device as a backend, a client socket is created >>> +and connected to the corresponding vhost-pci server for message exchanges. >>> + >>> +The messages passed to the vhost-pci server is proceeded by the >>> +following >>> +header: >>> +struct vhost_pci_socket_hdr { >>> + #define VHOST_PCI_SOCKET_MEMORY_INFO 0 >>> + #define VHOST_PCI_SOCKET_MEMORY_INFO_ACK 1 >>> + #define VHOST_PCI_SOCKET_DEVICE_INFO 2 >>> + #define VHOST_PCI_SOCKET_DEVICE_INFO_ACK 3 >>> + #define VHOST_PCI_SOCKET_FEATURE_BITS 4 >>> + #define VHOST_PCI_SOCKET_FEATURE_BITS_ACK 5 >>> + u16 msg_type; >>> + u16 msg_version; >>> + u32 msg_len; >>> + u64 qemu_pid; >>> +}; >>> + >>> +The payload of the above message types can be constructed using the >>> +structures >>> +below: >>> +/* VHOST_PCI_SOCKET_MEMORY_INFO message */ struct >>> +vhost_pci_socket_memory_info { >>> + #define VHOST_PCI_ADD_MEMORY 0 >>> + #define VHOST_PCI_DEL_MEMORY 1 >>> + u16 ops; >>> + u32 nregions; >>> + struct vhost_pci_memory_region { >>> + int fd; >>> + u64 guest_phys_addr; >>> + u64 memory_size; >>> + u64 mmap_offset; >>> + } regions[VHOST_PCI_MAX_NREGIONS]; >>> +}; >>> + >>> +/* VHOST_PCI_SOCKET_DEVICE_INFO message */ struct >>> +vhost_pci_device_info { >>> + #define VHOST_PCI_ADD_FRONTEND_DEVICE 0 >>> + #define VHOST_PCI_DEL_FRONTEND_DEVICE 1 >>> + u16 ops; >>> + u32 nvirtq; >>> + #define VHOST_PCI_FRONTEND_DEVICE_NET 1 >>> + #define VHOST_PCI_FRONTEND_DEVICE_BLK 2 >>> + #define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3 >>> + #define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4 >>> + #define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5 >>> + #define VHOST_PCI_FRONTEND_DEVICE_SCSI 8 >>> + u32 device_type; >>> + u64 device_id; >>> + struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ]; >>> +}; >>> +The device_id field identifies the device. For example, it can be >>> +used to store a MAC address if the device_type is >> VHOST_PCI_FRONTEND_DEVICE_NET. >>> + >>> +/* VHOST_PCI_SOCKET_FEATURE_BITS message*/ struct >>> +vhost_pci_feature_bits { >>> + u64 feature_bits; >>> +}; >> >> We not only have 'socket feature bits' but also the feature bits for per virtio >> device plugged in on the side of vhost-pci device. > > Yes. It is mentioned in "3 Feature Bits". The socket feature bits here are actually the remote feature bits (got from a socket message). Hmm, there are two questions: 1) The device related info (e.g, device-type, device-id, etc) has not been included there, so the vhost-pci driver can not write proper values to the fields of vhost_pci_config. 2) different virtio device may have different feature bits, VHOST_PCI_SOCKET_FEATURE_BITS and VHOST_PCI_CONTROLQ_FEATURE_BITS should be able to negotiate the feature bits for different virtio device. > >> >> E.g: if there are two virtio devices (e.g, a NIC and BLK) both of them need to >> directly communicate with another VM. The feature bits of these two devices >> need to be negotiated with that VM respectively. And you can not put these >> feature bits in vhost_pci_device_info struct as its vq is not created at that time. > > Right. If you check the initialization steps below, there is a statement "When the device status is updated with DRIVER_OK". > >>> + >>> +/* VHOST_PCI_SOCKET_xx_ACK messages */ struct vhost_pci_socket_ack { >>> + #define VHOST_PCI_SOCKET_ACK_ADD_DONE 0 >>> + #define VHOST_PCI_SOCKET_ACK_ADD_FAIL 1 >>> + #define VHOST_PCI_SOCKET_ACK_DEL_DONE 2 >>> + #define VHOST_PCI_SOCKET_ACK_DEL_FAIL 3 >>> + u64 ack; >>> +}; >>> + >>> +The driver update message passed via the controlq is preceded by the >>> +following >>> +header: >>> +struct vhost_pci_controlq_hdr { >>> + #define VHOST_PCI_CONTROLQ_MEMORY_INFO 0 >>> + #define VHOST_PCI_CONTROLQ_DEVICE_INFO 1 >>> + #define VHOST_PCI_CONTROLQ_FEATURE_BITS 2 >>> + #define VHOST_PCI_CONTROLQ_UPDATE_DONE 3 >>> + u16 msg_type; >>> + u16 msg_version; >>> + u32 msg_len; >>> +}; >>> + >>> +The payload of a VHOST_PCI_CONTROLQ_MEMORY_INFO message can be >>> +constructed using the following structure: >>> +/* VHOST_PCI_CONTROLQ_MEMORY_INFO message */ struct >>> +vhost_pci_controlq_memory_info { >>> + #define VHOST_PCI_ADD_MEMORY 0 >>> + #define VHOST_PCI_DEL_MEMORY 1 >>> + u16 ops; >>> + u32 nregion; >>> + struct exotic_memory_region { >>> + u64 region_base_xgpa; >>> + u64 size; >>> + u64 offset_in_bar_area; >>> + } region[VHOST_PCI_MAX_NREGIONS]; >>> +}; >>> + >>> +The payload of VHOST_PCI_CONTROLQ_DEVICE_INFO and >>> +VHOST_PCI_CONTROLQ_FEATURE_BITS messages can be constructed using >> the >>> +vhost_pci_device_info structure and the vhost_pci_feature_bits >>> +structure respectively. >>> + >>> +The payload of a VHOST_PCI_CONTROLQ_UPDATE_DONE message can be >>> +constructed using the structure below: >>> +struct vhost_pci_controlq_update_done { >>> + u32 device_type; >>> + u64 device_id; >>> +}; >>> + >>> +Fig. 1 shows the initialization steps. >>> + >>> +When the vhost-pci server receives a >>> +VHOST_PCI_SOCKET_MEMORY_INFO(ADD) message, it checks if a vhost-pci >>> +device has been created for the requesting VM whose QEMU process id >>> +is qemu_pid. If yes, it will simply update the subsequent received >>> +messages to the vhost-pci driver via the controlq. Otherwise, the >>> +server creates a new vhost-pci device, and continues the following >> initialization steps. >> >> >> qemu-pid is not stable as the existing VM will be killed silently and the new >> vhost-pci driver reusing the same qemu-pid will ask to join before the vhost- >> device gets to know the previous one has gone. > > Would it be a normal and legal operation to silently kill a QEMU? I guess only the system admin can do that, right? Yup, it is a valid operation as a problematic VM will be killed and as you said the admin can kill it silently, anyway, the design should have a way to handle this case properly. > > If that's true, I think we can add a new field, "u64 tsc_of_birth" to the vhost_pci_socket_hdr structure. It records the tsc when the QEMU is created. You only need to identify a VM, can use UUID instead? > If that's true, another problem would be the remove of the vhost-pci device for a silently killed driver VM. > The vhost-pci server may need to periodically send a checking message to check if the driver VM is silently killed. If that really happens, it should remove the related vhost-pci device. Yes. > >>> + >>> +The vhost-pci server adds up all the memory region size, and uses a >>> +64-bit device bar for the mapping of all the memory regions obtained >>> +from the socket message. To better support memory hot-plugging of the >>> +driver VM, the bar is configured with a double size of the driver >>> +VM's memory. The server maps the received memory info via the QEMU >>> +MemoryRegion mechanism, and then the new created vhost-pci device is >> hot-plugged to the VM. >>> + >>> +When the device status is updated with DRIVER_OK, a >>> +VHOST_PCI_CONTROLQ_MEMORY_INFO(ADD) message, which is stemed >> from the >>> +memory info socket message, is put on the controlq and a controlq >>> +interrupt is injected to the VM. >>> + >>> +When the vhost-pci server receives a >>> +VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK(ADD_DONE) >> acknowledgement from the >>> +driver, it sends a VHOST_PCI_SOCKET_MEMORY_INFO_ACK(ADD_DONE) >> message >>> +to the client that is identified by the ack_device_type and ack_device_id fields. >>> + >>> +When the vhost-pci server receives a >>> +VHOST_PCI_SOCKET_FEATURE_BITS(feature bits) message, a >>> +VHOST_PCI_CONTROLQ_FEATURE_BITS(feature bits) message is put on the >>> +controlq and a controlq interrupt is injected to the VM. >>> + >>> +If the vhost-pci server notices that the driver fully accepted the >>> +offered feature bits, it sends a >>> +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message to the client. >> If >>> +the vhost-pci server notices that the vhost-pci driver only accepted >>> +a subset of the offered feature bits, it sends a >>> +VHOST_PCI_SOCKET_FEATURE_BITS(accepted feature bits) message back to >>> +the client. The client side virtio device re-negotiates the new >>> +feature bits with its driver, and sends back a >>> +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) >>> +message to the server. >>> + >>> +Either when the vhost-pci driver fully accepted the offered feature >>> +bits or a >>> +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message is received >> from >>> +the client, the vhost-pci server puts a >>> +VHOST_PCI_CONTROLQ_UPDATE_DONE message on the controlq, and a >> controlq interrupt is injected to the VM. >> >> Why VHOST_PCI_CONTROLQ_UPDATE_DONE is needed? > > OK, this one looks redundant. We can set up the related support for that frontend device when the device info is received via the controlq. > Great. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 6/2/2016 11:52 AM, Xiao Guangrong wrote: > On 06/02/2016 11:15 AM, Wang, Wei W wrote: > > On Wed 6/1/2016 4:15 PM, Xiao Guangrong wrote: > >> On 05/29/2016 04:11 PM, Wei Wang wrote: > >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com> > >>> --- > >>> Details | 324 > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>> 1 file changed, 324 insertions(+) > >>> create mode 100644 Details > >>> > >>> diff --git a/Details b/Details > >>> new file mode 100644 > >>> index 0000000..4ea2252 > >>> --- /dev/null > >>> +++ b/Details > >>> @@ -0,0 +1,324 @@ > >>> +1 Device ID > >>> +TBD > >>> + > >>> +2 Virtqueues > >>> +0 controlq > >>> + > >>> +3 Feature Bits > >>> +3.1 Local Feature Bits > >>> +Currently no local feature bits are defined, so the standard virtio > >>> +feature bits negation will always be successful and complete. > >>> + > >>> +3.2 Remote Feature Bits > >>> +The remote feature bits are obtained from the frontend virtio > >>> +device and negotiated with the vhost-pci driver via the controlq. > >>> +The negotiation steps are described in 4.5 Device Initialization. > >>> + > >>> +4 Device Configuration Layout > >>> +struct vhost_pci_config { > >>> + #define VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK 0 > >>> + #define VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK 1 > >>> + #define VHOST_PCI_CONTROLQ_FEATURE_BITS_ACK 2 > >>> + u32 ack_type; > >>> + u32 ack_device_type; > >>> + u64 ack_device_id; > >>> + union { > >>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_DONE 0 > >>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_FAIL 1 > >>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_DONE 2 > >>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_FAIL 3 > >>> + u64 ack_memory_info; > >>> + u64 ack_device_info; > >>> + u64 ack_feature_bits; > >>> + }; > >>> +}; > >> > >> Do you need to write all these 4 field to ack the operation? It seems > >> it is not efficient and it is not flexible if the driver need to > >> offer more data to the device in the further. Can we dedicate a vq > >> for this purpose? > > > > Yes, all the 4 fields are required to be written. The vhost-pci server usually > connects to multiple clients, and the "ack_device_type" and "ack_device_id" > fields are used to identify them. > > > > Agree, another controlq for the guest->host direction looks better, and the > above fileds can be converted to be the controlq message header. > > > > Thanks. > > >> > >> BTW, current approach can not handle the case if there are multiple > >> same kind of requests in the control queue, e.g, if there are two > >> memory-add request in the control queue. > > > > A vhost-pci device corresponds to a driver VM. The two memory-add requests > on the controlq are all for the same driver VM. Memory-add requests for > different driver VMs couldn’t be present on the same controlq. I haven’t seen > the issue yet. Can you please explain more? Thanks. > > The issue is caused by "The two memory-add requests on the controlq are all for > the same driver VM", the driver need to ACK these request respectively, however, > these two requests have the same ack_type, device_type, device_id, > ack_memory_info, then QEMU is not able to figure out which request has been > acked. Normally pieces of memory info should be combined into one message (the structure includes multiple memory regions) and sent by the client. In a rare case like this: the driver VM hot-adds 1GB memory, followed by hot-adding another 1GB memory. The first piece of memory info is passed via the socket and controlq to the vhost-pci driver, then the second. Normally they won't get an opportunity to be put on the controlq at the same time. Even the implementation batches the controlq messages, there will be a sequence difference between the two messages on the controlq, right? From the QEMU's (vhost-pci server) perspective, it just sends back an ACK to the client whenever it receives an ACK from the vhost-pci driver. From the client's perspective, it will receive two ACK messages in this example. Since the two have a sequence difference, the client should be able to distinguish the two (first sent, first acked), right? Do you see a case where handling the first message is delayed and the second message is handled and ACK-ed first? > > > > > >>> + > >>> +The configuration fields are currently used for the vhost-pci > >>> +driver to acknowledge to the vhost-pci device after it receives controlq > messages. > >>> + > >>> +4.5 Device Initialization > >>> +When a device VM boots, it creates a vhost-pci server socket. > >>> + > >>> +When a virtio device on the driver VM is created with specifying > >>> +the use of a vhost-pci device as a backend, a client socket is > >>> +created and connected to the corresponding vhost-pci server for message > exchanges. > >>> + > >>> +The messages passed to the vhost-pci server is proceeded by the > >>> +following > >>> +header: > >>> +struct vhost_pci_socket_hdr { > >>> + #define VHOST_PCI_SOCKET_MEMORY_INFO 0 > >>> + #define VHOST_PCI_SOCKET_MEMORY_INFO_ACK 1 > >>> + #define VHOST_PCI_SOCKET_DEVICE_INFO 2 > >>> + #define VHOST_PCI_SOCKET_DEVICE_INFO_ACK 3 > >>> + #define VHOST_PCI_SOCKET_FEATURE_BITS 4 > >>> + #define VHOST_PCI_SOCKET_FEATURE_BITS_ACK 5 > >>> + u16 msg_type; > >>> + u16 msg_version; > >>> + u32 msg_len; > >>> + u64 qemu_pid; > >>> +}; > >>> + > >>> +The payload of the above message types can be constructed using the > >>> +structures > >>> +below: > >>> +/* VHOST_PCI_SOCKET_MEMORY_INFO message */ struct > >>> +vhost_pci_socket_memory_info { > >>> + #define VHOST_PCI_ADD_MEMORY 0 > >>> + #define VHOST_PCI_DEL_MEMORY 1 > >>> + u16 ops; > >>> + u32 nregions; > >>> + struct vhost_pci_memory_region { > >>> + int fd; > >>> + u64 guest_phys_addr; > >>> + u64 memory_size; > >>> + u64 mmap_offset; > >>> + } regions[VHOST_PCI_MAX_NREGIONS]; }; > >>> + > >>> +/* VHOST_PCI_SOCKET_DEVICE_INFO message */ struct > >>> +vhost_pci_device_info { > >>> + #define VHOST_PCI_ADD_FRONTEND_DEVICE 0 > >>> + #define VHOST_PCI_DEL_FRONTEND_DEVICE 1 > >>> + u16 ops; > >>> + u32 nvirtq; > >>> + #define VHOST_PCI_FRONTEND_DEVICE_NET 1 > >>> + #define VHOST_PCI_FRONTEND_DEVICE_BLK 2 > >>> + #define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3 > >>> + #define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4 > >>> + #define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5 > >>> + #define VHOST_PCI_FRONTEND_DEVICE_SCSI 8 > >>> + u32 device_type; > >>> + u64 device_id; > >>> + struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ]; > >>> +}; > >>> +The device_id field identifies the device. For example, it can be > >>> +used to store a MAC address if the device_type is > >> VHOST_PCI_FRONTEND_DEVICE_NET. > >>> + > >>> +/* VHOST_PCI_SOCKET_FEATURE_BITS message*/ struct > >>> +vhost_pci_feature_bits { > >>> + u64 feature_bits; > >>> +}; > >> > >> We not only have 'socket feature bits' but also the feature bits for > >> per virtio device plugged in on the side of vhost-pci device. > > > > Yes. It is mentioned in "3 Feature Bits". The socket feature bits here are > actually the remote feature bits (got from a socket message). > > Hmm, there are two questions: > 1) The device related info (e.g, device-type, device-id, etc) has not been included > there, so the vhost-pci driver can not write proper values to the fields of > vhost_pci_config. Right. For the controlq feature bits messages, the data structure lacks the "device_type" and "device_id" fields. I will add them. I think "device_type" and "device_id" don’t need to be included in the socket feature bits message structure: each frontend virtio device has its own connection to the vhost-pci server, that is, the connection itself has actually already been associated with a "device_type+device_id" (the server should maintain a table for such a relationship after a device info socket message is received). > 2) different virtio device may have different feature bits, > VHOST_PCI_SOCKET_FEATURE_BITS > and VHOST_PCI_CONTROLQ_FEATURE_BITS should be able to negotiate the > feature bits for > different virtio device. Then, I think this shouldn’t be a problem after "device_type" and "device_id" are added to the controlq feature bits message structure. > > > >> > >> E.g: if there are two virtio devices (e.g, a NIC and BLK) both of > >> them need to directly communicate with another VM. The feature bits > >> of these two devices need to be negotiated with that VM respectively. > >> And you can not put these feature bits in vhost_pci_device_info struct as its > vq is not created at that time. > > > > Right. If you check the initialization steps below, there is a statement "When > the device status is updated with DRIVER_OK". > > > >>> + > >>> +/* VHOST_PCI_SOCKET_xx_ACK messages */ struct vhost_pci_socket_ack { > >>> + #define VHOST_PCI_SOCKET_ACK_ADD_DONE 0 > >>> + #define VHOST_PCI_SOCKET_ACK_ADD_FAIL 1 > >>> + #define VHOST_PCI_SOCKET_ACK_DEL_DONE 2 > >>> + #define VHOST_PCI_SOCKET_ACK_DEL_FAIL 3 > >>> + u64 ack; > >>> +}; > >>> + > >>> +The driver update message passed via the controlq is preceded by > >>> +the following > >>> +header: > >>> +struct vhost_pci_controlq_hdr { > >>> + #define VHOST_PCI_CONTROLQ_MEMORY_INFO 0 > >>> + #define VHOST_PCI_CONTROLQ_DEVICE_INFO 1 > >>> + #define VHOST_PCI_CONTROLQ_FEATURE_BITS 2 > >>> + #define VHOST_PCI_CONTROLQ_UPDATE_DONE 3 > >>> + u16 msg_type; > >>> + u16 msg_version; > >>> + u32 msg_len; > >>> +}; > >>> + > >>> +The payload of a VHOST_PCI_CONTROLQ_MEMORY_INFO message can be > >>> +constructed using the following structure: > >>> +/* VHOST_PCI_CONTROLQ_MEMORY_INFO message */ struct > >>> +vhost_pci_controlq_memory_info { > >>> + #define VHOST_PCI_ADD_MEMORY 0 > >>> + #define VHOST_PCI_DEL_MEMORY 1 > >>> + u16 ops; > >>> + u32 nregion; > >>> + struct exotic_memory_region { > >>> + u64 region_base_xgpa; > >>> + u64 size; > >>> + u64 offset_in_bar_area; > >>> + } region[VHOST_PCI_MAX_NREGIONS]; > >>> +}; > >>> + > >>> +The payload of VHOST_PCI_CONTROLQ_DEVICE_INFO and > >>> +VHOST_PCI_CONTROLQ_FEATURE_BITS messages can be constructed > using > >> the > >>> +vhost_pci_device_info structure and the vhost_pci_feature_bits > >>> +structure respectively. > >>> + > >>> +The payload of a VHOST_PCI_CONTROLQ_UPDATE_DONE message can be > >>> +constructed using the structure below: > >>> +struct vhost_pci_controlq_update_done { > >>> + u32 device_type; > >>> + u64 device_id; > >>> +}; > >>> + > >>> +Fig. 1 shows the initialization steps. > >>> + > >>> +When the vhost-pci server receives a > >>> +VHOST_PCI_SOCKET_MEMORY_INFO(ADD) message, it checks if a vhost- > pci > >>> +device has been created for the requesting VM whose QEMU process id > >>> +is qemu_pid. If yes, it will simply update the subsequent received > >>> +messages to the vhost-pci driver via the controlq. Otherwise, the > >>> +server creates a new vhost-pci device, and continues the following > >> initialization steps. > >> > >> > >> qemu-pid is not stable as the existing VM will be killed silently and > >> the new vhost-pci driver reusing the same qemu-pid will ask to join > >> before the vhost- device gets to know the previous one has gone. > > > > Would it be a normal and legal operation to silently kill a QEMU? I guess only > the system admin can do that, right? > > Yup, it is a valid operation as a problematic VM will be killed and as you said the > admin can kill it silently, anyway, the design should have a way to handle this > case properly. > > > > > If that's true, I think we can add a new field, "u64 tsc_of_birth" to the > vhost_pci_socket_hdr structure. It records the tsc when the QEMU is created. > > You only need to identify a VM, can use UUID instead? Yes. The two methods are actually similar. Best, Wei > > > If that's true, another problem would be the remove of the vhost-pci device > for a silently killed driver VM. > > The vhost-pci server may need to periodically send a checking message to > check if the driver VM is silently killed. If that really happens, it should remove > the related vhost-pci device. > > Yes. > > > > >>> + > >>> +The vhost-pci server adds up all the memory region size, and uses a > >>> +64-bit device bar for the mapping of all the memory regions > >>> +obtained from the socket message. To better support memory > >>> +hot-plugging of the driver VM, the bar is configured with a double > >>> +size of the driver VM's memory. The server maps the received memory > >>> +info via the QEMU MemoryRegion mechanism, and then the new created > >>> +vhost-pci device is > >> hot-plugged to the VM. > >>> + > >>> +When the device status is updated with DRIVER_OK, a > >>> +VHOST_PCI_CONTROLQ_MEMORY_INFO(ADD) message, which is stemed > >> from the > >>> +memory info socket message, is put on the controlq and a controlq > >>> +interrupt is injected to the VM. > >>> + > >>> +When the vhost-pci server receives a > >>> +VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK(ADD_DONE) > >> acknowledgement from the > >>> +driver, it sends a VHOST_PCI_SOCKET_MEMORY_INFO_ACK(ADD_DONE) > >> message > >>> +to the client that is identified by the ack_device_type and ack_device_id > fields. > >>> + > >>> +When the vhost-pci server receives a > >>> +VHOST_PCI_SOCKET_FEATURE_BITS(feature bits) message, a > >>> +VHOST_PCI_CONTROLQ_FEATURE_BITS(feature bits) message is put on > the > >>> +controlq and a controlq interrupt is injected to the VM. > >>> + > >>> +If the vhost-pci server notices that the driver fully accepted the > >>> +offered feature bits, it sends a > >>> +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message to the > client. > >> If > >>> +the vhost-pci server notices that the vhost-pci driver only > >>> +accepted a subset of the offered feature bits, it sends a > >>> +VHOST_PCI_SOCKET_FEATURE_BITS(accepted feature bits) message back > >>> +to the client. The client side virtio device re-negotiates the new > >>> +feature bits with its driver, and sends back a > >>> +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) > >>> +message to the server. > >>> + > >>> +Either when the vhost-pci driver fully accepted the offered feature > >>> +bits or a > >>> +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message is received > >> from > >>> +the client, the vhost-pci server puts a > >>> +VHOST_PCI_CONTROLQ_UPDATE_DONE message on the controlq, and a > >> controlq interrupt is injected to the VM. > >> > >> Why VHOST_PCI_CONTROLQ_UPDATE_DONE is needed? > > > > OK, this one looks redundant. We can set up the related support for that > frontend device when the device info is received via the controlq. > > > > Great.
On 06/02/2016 04:43 PM, Wang, Wei W wrote: > On Thu 6/2/2016 11:52 AM, Xiao Guangrong wrote: >> On 06/02/2016 11:15 AM, Wang, Wei W wrote: >>> On Wed 6/1/2016 4:15 PM, Xiao Guangrong wrote: >>>> On 05/29/2016 04:11 PM, Wei Wang wrote: >>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com> >>>>> --- >>>>> Details | 324 >>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 324 insertions(+) >>>>> create mode 100644 Details >>>>> >>>>> diff --git a/Details b/Details >>>>> new file mode 100644 >>>>> index 0000000..4ea2252 >>>>> --- /dev/null >>>>> +++ b/Details >>>>> @@ -0,0 +1,324 @@ >>>>> +1 Device ID >>>>> +TBD >>>>> + >>>>> +2 Virtqueues >>>>> +0 controlq >>>>> + >>>>> +3 Feature Bits >>>>> +3.1 Local Feature Bits >>>>> +Currently no local feature bits are defined, so the standard virtio >>>>> +feature bits negation will always be successful and complete. >>>>> + >>>>> +3.2 Remote Feature Bits >>>>> +The remote feature bits are obtained from the frontend virtio >>>>> +device and negotiated with the vhost-pci driver via the controlq. >>>>> +The negotiation steps are described in 4.5 Device Initialization. >>>>> + >>>>> +4 Device Configuration Layout >>>>> +struct vhost_pci_config { >>>>> + #define VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK 0 >>>>> + #define VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK 1 >>>>> + #define VHOST_PCI_CONTROLQ_FEATURE_BITS_ACK 2 >>>>> + u32 ack_type; >>>>> + u32 ack_device_type; >>>>> + u64 ack_device_id; >>>>> + union { >>>>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_DONE 0 >>>>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_FAIL 1 >>>>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_DONE 2 >>>>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_FAIL 3 >>>>> + u64 ack_memory_info; >>>>> + u64 ack_device_info; >>>>> + u64 ack_feature_bits; >>>>> + }; >>>>> +}; >>>> >>>> Do you need to write all these 4 field to ack the operation? It seems >>>> it is not efficient and it is not flexible if the driver need to >>>> offer more data to the device in the further. Can we dedicate a vq >>>> for this purpose? >>> >>> Yes, all the 4 fields are required to be written. The vhost-pci server usually >> connects to multiple clients, and the "ack_device_type" and "ack_device_id" >> fields are used to identify them. >>> >>> Agree, another controlq for the guest->host direction looks better, and the >> above fileds can be converted to be the controlq message header. >>> >> >> Thanks. >> >>>> >>>> BTW, current approach can not handle the case if there are multiple >>>> same kind of requests in the control queue, e.g, if there are two >>>> memory-add request in the control queue. >>> >>> A vhost-pci device corresponds to a driver VM. The two memory-add requests >> on the controlq are all for the same driver VM. Memory-add requests for >> different driver VMs couldn’t be present on the same controlq. I haven’t seen >> the issue yet. Can you please explain more? Thanks. >> >> The issue is caused by "The two memory-add requests on the controlq are all for >> the same driver VM", the driver need to ACK these request respectively, however, >> these two requests have the same ack_type, device_type, device_id, >> ack_memory_info, then QEMU is not able to figure out which request has been >> acked. > > Normally pieces of memory info should be combined into one message (the structure includes multiple memory regions) and sent by the client. In a rare case like this: the driver VM hot-adds 1GB memory, followed by hot-adding another 1GB memory. The first piece of memory info is passed via the socket and controlq to the vhost-pci driver, then the second. Normally they won't get an opportunity to be put on the controlq at the same time. > Even the implementation batches the controlq messages, there will be a sequence difference between the two messages on the controlq, right? That assumes the driver should serially handle the control messages... > > From the QEMU's (vhost-pci server) perspective, it just sends back an ACK to the client whenever it receives an ACK from the vhost-pci driver. > From the client's perspective, it will receive two ACK messages in this example. > Since the two have a sequence difference, the client should be able to distinguish the two (first sent, first acked), right? That assumes that the vhost-pci server and remote virtio device should use serial mode too. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 6/2/2016 7:13 PM, Xiao Guangrong wrote: > On 06/02/2016 04:43 PM, Wang, Wei W wrote: > > On Thu 6/2/2016 11:52 AM, Xiao Guangrong wrote: > >> On 06/02/2016 11:15 AM, Wang, Wei W wrote: > >>> On Wed 6/1/2016 4:15 PM, Xiao Guangrong wrote: > >>>> On 05/29/2016 04:11 PM, Wei Wang wrote: > >>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com> > >>>>> --- > >>>>> Details | 324 > >>>> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>> 1 file changed, 324 insertions(+) > >>>>> create mode 100644 Details > >>>>> > >>>>> diff --git a/Details b/Details > >>>>> new file mode 100644 > >>>>> index 0000000..4ea2252 > >>>>> --- /dev/null > >>>>> +++ b/Details > >>>>> @@ -0,0 +1,324 @@ > >>>>> +1 Device ID > >>>>> +TBD > >>>>> + > >>>>> +2 Virtqueues > >>>>> +0 controlq > >>>>> + > >>>>> +3 Feature Bits > >>>>> +3.1 Local Feature Bits > >>>>> +Currently no local feature bits are defined, so the standard > >>>>> +virtio feature bits negation will always be successful and complete. > >>>>> + > >>>>> +3.2 Remote Feature Bits > >>>>> +The remote feature bits are obtained from the frontend virtio > >>>>> +device and negotiated with the vhost-pci driver via the controlq. > >>>>> +The negotiation steps are described in 4.5 Device Initialization. > >>>>> + > >>>>> +4 Device Configuration Layout > >>>>> +struct vhost_pci_config { > >>>>> + #define VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK 0 > >>>>> + #define VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK 1 > >>>>> + #define VHOST_PCI_CONTROLQ_FEATURE_BITS_ACK 2 > >>>>> + u32 ack_type; > >>>>> + u32 ack_device_type; > >>>>> + u64 ack_device_id; > >>>>> + union { > >>>>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_DONE 0 > >>>>> + #define VHOST_PCI_CONTROLQ_ACK_ADD_FAIL 1 > >>>>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_DONE 2 > >>>>> + #define VHOST_PCI_CONTROLQ_ACK_DEL_FAIL 3 > >>>>> + u64 ack_memory_info; > >>>>> + u64 ack_device_info; > >>>>> + u64 ack_feature_bits; > >>>>> + }; > >>>>> +}; > >>>> > >>>> Do you need to write all these 4 field to ack the operation? It > >>>> seems it is not efficient and it is not flexible if the driver need > >>>> to offer more data to the device in the further. Can we dedicate a > >>>> vq for this purpose? > >>> > >>> Yes, all the 4 fields are required to be written. The vhost-pci > >>> server usually > >> connects to multiple clients, and the "ack_device_type" and "ack_device_id" > >> fields are used to identify them. > >>> > >>> Agree, another controlq for the guest->host direction looks better, > >>> and the > >> above fileds can be converted to be the controlq message header. > >>> > >> > >> Thanks. > >> > >>>> > >>>> BTW, current approach can not handle the case if there are multiple > >>>> same kind of requests in the control queue, e.g, if there are two > >>>> memory-add request in the control queue. > >>> > >>> A vhost-pci device corresponds to a driver VM. The two memory-add > >>> requests > >> on the controlq are all for the same driver VM. Memory-add requests > >> for different driver VMs couldn’t be present on the same controlq. I > >> haven’t seen the issue yet. Can you please explain more? Thanks. > >> > >> The issue is caused by "The two memory-add requests on the controlq > >> are all for the same driver VM", the driver need to ACK these request > >> respectively, however, these two requests have the same ack_type, > >> device_type, device_id, ack_memory_info, then QEMU is not able to > >> figure out which request has been acked. > > > > Normally pieces of memory info should be combined into one message (the > structure includes multiple memory regions) and sent by the client. In a rare case > like this: the driver VM hot-adds 1GB memory, followed by hot-adding another > 1GB memory. The first piece of memory info is passed via the socket and > controlq to the vhost-pci driver, then the second. Normally they won't get an > opportunity to be put on the controlq at the same time. > > Even the implementation batches the controlq messages, there will be a > sequence difference between the two messages on the controlq, right? > > That assumes the driver should serially handle the control messages... > > > > > From the QEMU's (vhost-pci server) perspective, it just sends back an ACK to > the client whenever it receives an ACK from the vhost-pci driver. > > From the client's perspective, it will receive two ACK messages in this example. > > Since the two have a sequence difference, the client should be able to > distinguish the two (first sent, first acked), right? > > That assumes that the vhost-pci server and remote virtio device should use serial > mode too. Before adding more fields to support that, I have another two questions to discuss here: In my understanding, socket messages are sent one by one. How would a client send two messages in parallel to the server? Regarding the controlq, I understand that there are optimizations to parallelize ring operations, but that are mostly done to increase the data plane performance. Is it necessary to do such parallelism for the control message operations, which would complicate the design? Best, Wei
diff --git a/Details b/Details new file mode 100644 index 0000000..4ea2252 --- /dev/null +++ b/Details @@ -0,0 +1,324 @@ +1 Device ID +TBD + +2 Virtqueues +0 controlq + +3 Feature Bits +3.1 Local Feature Bits +Currently no local feature bits are defined, so the standard virtio feature +bits negation will always be successful and complete. + +3.2 Remote Feature Bits +The remote feature bits are obtained from the frontend virtio device and +negotiated with the vhost-pci driver via the controlq. The negotiation steps +are described in 4.5 Device Initialization. + +4 Device Configuration Layout +struct vhost_pci_config { + #define VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK 0 + #define VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK 1 + #define VHOST_PCI_CONTROLQ_FEATURE_BITS_ACK 2 + u32 ack_type; + u32 ack_device_type; + u64 ack_device_id; + union { + #define VHOST_PCI_CONTROLQ_ACK_ADD_DONE 0 + #define VHOST_PCI_CONTROLQ_ACK_ADD_FAIL 1 + #define VHOST_PCI_CONTROLQ_ACK_DEL_DONE 2 + #define VHOST_PCI_CONTROLQ_ACK_DEL_FAIL 3 + u64 ack_memory_info; + u64 ack_device_info; + u64 ack_feature_bits; + }; +}; + +The configuration fields are currently used for the vhost-pci driver to +acknowledge to the vhost-pci device after it receives controlq messages. + +4.5 Device Initialization +When a device VM boots, it creates a vhost-pci server socket. + +When a virtio device on the driver VM is created with specifying the use of a +vhost-pci device as a backend, a client socket is created and connected to the +corresponding vhost-pci server for message exchanges. + +The messages passed to the vhost-pci server is proceeded by the following +header: +struct vhost_pci_socket_hdr { + #define VHOST_PCI_SOCKET_MEMORY_INFO 0 + #define VHOST_PCI_SOCKET_MEMORY_INFO_ACK 1 + #define VHOST_PCI_SOCKET_DEVICE_INFO 2 + #define VHOST_PCI_SOCKET_DEVICE_INFO_ACK 3 + #define VHOST_PCI_SOCKET_FEATURE_BITS 4 + #define VHOST_PCI_SOCKET_FEATURE_BITS_ACK 5 + u16 msg_type; + u16 msg_version; + u32 msg_len; + u64 qemu_pid; +}; + +The payload of the above message types can be constructed using the structures +below: +/* VHOST_PCI_SOCKET_MEMORY_INFO message */ +struct vhost_pci_socket_memory_info { + #define VHOST_PCI_ADD_MEMORY 0 + #define VHOST_PCI_DEL_MEMORY 1 + u16 ops; + u32 nregions; + struct vhost_pci_memory_region { + int fd; + u64 guest_phys_addr; + u64 memory_size; + u64 mmap_offset; + } regions[VHOST_PCI_MAX_NREGIONS]; +}; + +/* VHOST_PCI_SOCKET_DEVICE_INFO message */ +struct vhost_pci_device_info { + #define VHOST_PCI_ADD_FRONTEND_DEVICE 0 + #define VHOST_PCI_DEL_FRONTEND_DEVICE 1 + u16 ops; + u32 nvirtq; + #define VHOST_PCI_FRONTEND_DEVICE_NET 1 + #define VHOST_PCI_FRONTEND_DEVICE_BLK 2 + #define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3 + #define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4 + #define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5 + #define VHOST_PCI_FRONTEND_DEVICE_SCSI 8 + u32 device_type; + u64 device_id; + struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ]; +}; +The device_id field identifies the device. For example, it can be used to +store a MAC address if the device_type is VHOST_PCI_FRONTEND_DEVICE_NET. + +/* VHOST_PCI_SOCKET_FEATURE_BITS message*/ +struct vhost_pci_feature_bits { + u64 feature_bits; +}; + +/* VHOST_PCI_SOCKET_xx_ACK messages */ +struct vhost_pci_socket_ack { + #define VHOST_PCI_SOCKET_ACK_ADD_DONE 0 + #define VHOST_PCI_SOCKET_ACK_ADD_FAIL 1 + #define VHOST_PCI_SOCKET_ACK_DEL_DONE 2 + #define VHOST_PCI_SOCKET_ACK_DEL_FAIL 3 + u64 ack; +}; + +The driver update message passed via the controlq is preceded by the following +header: +struct vhost_pci_controlq_hdr { + #define VHOST_PCI_CONTROLQ_MEMORY_INFO 0 + #define VHOST_PCI_CONTROLQ_DEVICE_INFO 1 + #define VHOST_PCI_CONTROLQ_FEATURE_BITS 2 + #define VHOST_PCI_CONTROLQ_UPDATE_DONE 3 + u16 msg_type; + u16 msg_version; + u32 msg_len; +}; + +The payload of a VHOST_PCI_CONTROLQ_MEMORY_INFO message can be constructed +using the following structure: +/* VHOST_PCI_CONTROLQ_MEMORY_INFO message */ +struct vhost_pci_controlq_memory_info { + #define VHOST_PCI_ADD_MEMORY 0 + #define VHOST_PCI_DEL_MEMORY 1 + u16 ops; + u32 nregion; + struct exotic_memory_region { + u64 region_base_xgpa; + u64 size; + u64 offset_in_bar_area; + } region[VHOST_PCI_MAX_NREGIONS]; +}; + +The payload of VHOST_PCI_CONTROLQ_DEVICE_INFO and +VHOST_PCI_CONTROLQ_FEATURE_BITS messages can be constructed using the +vhost_pci_device_info structure and the vhost_pci_feature_bits structure +respectively. + +The payload of a VHOST_PCI_CONTROLQ_UPDATE_DONE message can be constructed +using the structure below: +struct vhost_pci_controlq_update_done { + u32 device_type; + u64 device_id; +}; + +Fig. 1 shows the initialization steps. + +When the vhost-pci server receives a VHOST_PCI_SOCKET_MEMORY_INFO(ADD) message, +it checks if a vhost-pci device has been created for the requesting VM whose +QEMU process id is qemu_pid. If yes, it will simply update the subsequent +received messages to the vhost-pci driver via the controlq. Otherwise, the +server creates a new vhost-pci device, and continues the following +initialization steps. + +The vhost-pci server adds up all the memory region size, and uses a 64-bit +device bar for the mapping of all the memory regions obtained from the socket +message. To better support memory hot-plugging of the driver VM, the bar is +configured with a double size of the driver VM's memory. The server maps the +received memory info via the QEMU MemoryRegion mechanism, and then the new +created vhost-pci device is hot-plugged to the VM. + +When the device status is updated with DRIVER_OK, a +VHOST_PCI_CONTROLQ_MEMORY_INFO(ADD) message, which is stemed from the memory +info socket message, is put on the controlq and a controlq interrupt is injected +to the VM. + +When the vhost-pci server receives a +VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK(ADD_DONE) acknowledgement from the driver, +it sends a VHOST_PCI_SOCKET_MEMORY_INFO_ACK(ADD_DONE) message to the client +that is identified by the ack_device_type and ack_device_id fields. + +When the vhost-pci server receives a +VHOST_PCI_SOCKET_FEATURE_BITS(feature bits) message, a +VHOST_PCI_CONTROLQ_FEATURE_BITS(feature bits) message is put on the controlq +and a controlq interrupt is injected to the VM. + +If the vhost-pci server notices that the driver fully accepted the offered +feature bits, it sends a VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message +to the client. If the vhost-pci server notices that the vhost-pci driver only +accepted a subset of the offered feature bits, it sends a +VHOST_PCI_SOCKET_FEATURE_BITS(accepted feature bits) message back to the +client. The client side virtio device re-negotiates the new feature bits with +its driver, and sends back a VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) +message to the server. + +Either when the vhost-pci driver fully accepted the offered feature bits or a +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message is received from the +client, the vhost-pci server puts a VHOST_PCI_CONTROLQ_UPDATE_DONE message on +the controlq, and a controlq interrupt is injected to the VM. + +When the vhost-pci server receives a VHOST_PCI_SOCKET_DEVICE_INFO(ADD) message, +a VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) message is put on the controlq and a +controlq interrupt is injected to the VM. + +When the vhost-pci server receives a +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(ADD_DONE) acknowledgement from the driver, +it sends a VHOST_PCI_SOCKET_DEVICE_INFO_ACK(ADD_DONE) message to the +corresponding client. + +4.5.1 Device Requirements: Device Initialization +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST +be created when it boots. + +The vhost-pci server socket path SHOULD be provided to a virtio client socket +for the connection to the vhost-pci server. + +The virtio device MUST finish the feature bits negotiation with its driver +before negotiating them with the vhost-pci device. + +If the client receives a VHOST_PCI_SOCKET_FEATURE_BITS(feature bits) message, +it MUST reset the device to go into backwards capability mode, re-negotiate +the received feature bits with its driver, and send back a +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message to the server. + +In any cases that an acknowledgement from the vhost-pci driver indicates a +FAIL, the vhost-pci server SHOULD send a FAIL socket message to the client. + +In any cases that the msg_type is different between the sender and the +receiver, the receiver SHOULD acknowledge a FAIL to the sender or convert the +message to its version if the converted version is still functionally usable. + +4.5.2 Driver Requirements: Device Initialization +The vhost-pci driver MUST NOT accept any feature bits that are not offered by +the remote feature bits, and SHOULD acknowledge to the device of the accepted +feature bits by writing them to the vhost_pci_config fields. + +When the vhost-pci driver receives a VHOST_PCI_CONTROLQ_UPDATE_DONE message +from the controlq, the vhost-pci driver MUST initialize the corresponding +driver interface of the device_type if it has not been initialized, and add +the device_id to the frontend device list that records all the frontend virtio +devices being supported by vhost-pci for inter-VM communications. + +The vhost-pci driver SHOULD acknowledge to the device that the device and +memory info update (add or delete) is DONE or FAIL by writing the +acknowledgement (DONE or FAIL) to the vhost_pci_config fields. + +The vhost-pci driver MUST ensure that writing to the vhost_pci_config fields +to be atomic. + +4.6 Device Operation +4.6.1 Device Requirements: Device Operation +4.6.1.1 Frontend Device Info Update +When the frontend virtio device changes any info (e.g. device_id, virtq +address) that it has sent to the vhost-pci device, it SHOULD send a +VHOST_PCI_SOCKET_DEVICE_INFO(ADD) message, which contains the new device info, +to the vhost-pci server. The vhost-pci device SHOULD insert a +VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) to the controlq and inject a contrlq +interrupt to the VM. + +When the vhost-pci device receives a +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(ADD_DONE) acknowledgement from the driver, +it SHOULD send a VHOST_PCI_SOCKET_DEVICE_INFO_ACK(ADD_DONE) message to the +client that is identified by the ack_device_type and ack_device_id fields, to +indicate that the vhost-pci driver has finished the handling of the device +info update. + +4.6.1.2 Frontend Device Remove +When the frontend virtio device is removed (e.g. hot-plug out), the client +SHOULD send a VHOST_PCI_SOCKET_DEVICE_INFO(DEL) message to the vhost-pci +server. The vhost-pci device SHOULD put a VHOST_PCI_CONTROLQ_DEVICE_INFO(DEL) +message on the controlq and inject a contrlq interrupt to the VM. + +When the vhost-pci receives a VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(DEL_DONE), it +SHOULD send a VHOST_PCI_SOCKET_DEVICE_INFO_ACK(DEL_DONE) message to the +corresponding client to indicate that the vhost-pci driver has removed the +vhost-pci based inter-VM communication support for the requesting virtio +device. + +4.6.1.3 Driver VM Shutdown and Migration +Before the driver VM is destroyed or migrated, all the clients that connect to +the vhost-pci server SHOULD send a VHOST_PCI_SOCKET_DEVICE_INFO(DEL) message to +the vhost-pci server. The destroying or migrating activity MUST wait until all +the VHOST_PCI_SOCKET_DEL_CONNECTION_ACK(DEL_DONE) messages are received. + +When a vhost-pci device has no frontend devices, the vhost-pci device SHOULD be +destroyed. + +4.6.1.4 Driver VM Memory Hot-plug +When the vhost-pci server receives a VHOST_PCI_SOCKET_MEMORY_INFO(DEL) message, +a VHOST_PCI_CONTROLQ_MEMORY_INFO(DEL) message SHOULD be put on the controlq and +a controlq interrupt is injected to the VM. When the vhost-pci server receives +a VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK(DEL_DONE) acknowledgement from the driver, +it SHOULD unmap that memory region and send a +VHOST_PCI_SOCKET_MEMORY_INFO_ACK(DEL_DONE) message to the client. + +When the vhost-pci server receives a VHOST_PCI_SOCKET_MEMORY_INFO(ADD) message, +and the received memory info is new to what has already been mapped, it +calculates the total received memory size. + +If the new memory size plus the mapped memory size is smaller than the address +space size reserved by the bar, the server SHOULD map the new memory and expose +it to the VM via the QEMU MemoryRegion mechanism. Then it SHOULD put the new +memory info on the controlq, and injects a controlq interrupt to the VM. + +If the new memory size plus the mapped memory size is larger than the address +space size reserved by the bar, the server clones out a new vhost-pci device, +configures the bar size to be double of the current memory, hot-plugs out the +old vhost-pci device, and hot-plugs in the new vhost-pci device to the VM. The +initialization steps SHOULD follow 4.5 Device Initialization, except the +interaction between the server and client is not needed. + +When the vhost-pci server receives a +VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK(ADD_DONE) acknowledgement from the driver, +it SHOULD send a VHOST_PCI_SOCKET_MEMORY_INFO_ACK(ADD_DONE) message to the +client. + +4.6.2 Driver Requirements: Device Operation +The vhost-pci driver SHOULD acknowledge to the vhost-pci device by writing +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(ADD_DONE) to the vhost_pci_config fields +when it finishes handling the device info update. + +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the +device info update before acknowledging to the vhost-pci device. + +The vhost-pci driver SHOULD acknowledge to the vhost-pci device by writing +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(DEL_DONE) to vhost_pci_config fields when +it finishes removing the vhost-pci support for the requesting virtio device. + +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the +removing of the vhost-pci support for the requesting virtio device before +acknowledging to the vhost-pci device.
Signed-off-by: Wei Wang <wei.w.wang@intel.com> --- Details | 324 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 Details