From patchwork Wed Oct 12 21:44:57 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 9373839 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 691A660487 for ; Wed, 12 Oct 2016 21:45:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5979E29744 for ; Wed, 12 Oct 2016 21:45:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4DDCB2974E; Wed, 12 Oct 2016 21:45:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CD2D2974C for ; Wed, 12 Oct 2016 21:45:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932454AbcJLVpF (ORCPT ); Wed, 12 Oct 2016 17:45:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60118 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754980AbcJLVpA (ORCPT ); Wed, 12 Oct 2016 17:45:00 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D5411635C1; Wed, 12 Oct 2016 21:44:59 +0000 (UTC) Received: from t450s.home (ovpn03.gateway.prod.ext.phx2.redhat.com [10.5.9.3]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u9CLiwkB014944; Wed, 12 Oct 2016 17:44:58 -0400 Date: Wed, 12 Oct 2016 15:44:57 -0600 From: Alex Williamson To: Kirti Wankhede Cc: "Tian, Kevin" , "Daniel P. Berrange" , "pbonzini@redhat.com" , "kraxel@redhat.com" , "cjia@nvidia.com" , "Song, Jike" , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , "bjsdjshi@linux.vnet.ibm.com" , Laine Stump Subject: Re: [Qemu-devel] [PATCH v8 4/6] docs: Add Documentation for Mediated devices Message-ID: <20161012154457.5a05b4fb@t450s.home> In-Reply-To: References: <1476131317-358-1-git-send-email-kwankhede@nvidia.com> <1476131317-358-5-git-send-email-kwankhede@nvidia.com> <20161011141411.GH14917@redhat.com> <08c939a3-c78e-a457-1978-794ffb50ee2c@nvidia.com> <6814afb2-3e39-f1c9-8944-489391e948e2@nvidia.com> <20161012095955.192affc5@t450s.home> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 12 Oct 2016 21:45:00 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, 13 Oct 2016 00:32:48 +0530 Kirti Wankhede wrote: > On 10/12/2016 9:29 PM, Alex Williamson wrote: > > On Wed, 12 Oct 2016 20:43:48 +0530 > > Kirti Wankhede wrote: > > > >> On 10/12/2016 7:22 AM, Tian, Kevin wrote: > >>>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com] > >>>> Sent: Wednesday, October 12, 2016 4:45 AM > >>>>>> +* mdev_supported_types: > >>>>>> + List of current supported mediated device types and its details are added > >>>>>> +in this directory in following format: > >>>>>> + > >>>>>> +|- > >>>>>> +|--- Vendor-specific-attributes [optional] > >>>>>> +|--- mdev_supported_types > >>>>>> +| |--- > >>>>>> +| | |--- create > >>>>>> +| | |--- name > >>>>>> +| | |--- available_instances > >>>>>> +| | |--- description /class > >>>>>> +| | |--- [devices] > >>>>>> +| |--- > >>>>>> +| | |--- create > >>>>>> +| | |--- name > >>>>>> +| | |--- available_instances > >>>>>> +| | |--- description /class > >>>>>> +| | |--- [devices] > >>>>>> +| |--- > >>>>>> +| |--- create > >>>>>> +| |--- name > >>>>>> +| |--- available_instances > >>>>>> +| |--- description /class > >>>>>> +| |--- [devices] > >>>>>> + > >>>>>> +[TBD : description or class is yet to be decided. This will change.] > >>>>> > >>>>> I thought that in previous discussions we had agreed to drop > >>>>> the concept and use the name as the unique identifier. > >>>>> When reporting these types in libvirt we won't want to report > >>>>> the type id values - we'll want the name strings to be unique. > >>>>> > >>>> > >>>> The 'name' might not be unique but type_id will be. For example that Neo > >>>> pointed out in earlier discussion, virtual devices can come from two > >>>> different physical devices, end user would be presented with what they > >>>> had selected but there will be internal implementation differences. In > >>>> that case 'type_id' will be unique. > >>>> > >>> > >>> Hi, Kirti, my understanding is that Neo agreed to use an unique type > >>> string (if you still called it ), and then no need of additional > >>> 'name' field which can be put inside 'description' field. See below quote: > >>> > >> > >> We had internal discussions about this within NVIDIA and found that > >> 'name' might not be unique where as 'type_id' would be unique. I'm > >> refering to Neo's mail after that, where Neo do pointed that out. > >> > >> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html > > > > Everyone not privy to those internal discussions, including me, seems to > > think we dropped type_id and that if a vendor does not have a stable > > name, they can compose some sort of stable type description based on the > > name+id, or even vendor+id, ex. NVIDIA-11. So please share why we > > haven't managed to kill off type_id yet. No matter what internal > > representation each vendor driver has of "type_id" it seems possible > > for it to come up with stable string to define a given configuration. > > > The 'type_id' is unique and the 'name' are not, the name is just a > virtual device name/ human readable name. Because at this moment Intel > can't define a proper GPU class, we have to add a 'description' field > there as well to represent the features of this virtual device, once we > have all agreed with the GPU class and its mandatory attributes, the > 'description' field can be removed. Here is an example, > type_id/type_name = NVIDIA_11, > name=M60-M0Q, > description=2560x1600, 2 displays, 512MB" > > Neo's previous comment only applies to the situation where we will have > the GPU class or optional attributes defined and recognized by libvirt, > since that is not going to happen any time soon, we will have to have > the new 'description' field, and we don't want to have it mixed up with > 'name' field. > > We can definitely have something like name+id as Alex recommended to > remove the 'name' field, but it will just require libvirt to have more > logic to parse that string. Let's use the mtty example driver provided in patch 5 so we can all more clearly see how the interfaces work. I'll start from the beginning of my experience and work my way to the type/name thing. (please add a modules_install target to the Makefile) # modprobe mtty Now what? It seems like I need to have prior knowledge that this drivers supports mdev devices and I need to go hunt for them. We need to create a class (ex. /sys/class/mdev/) where a user can find all the devices that participate in this mediated device infrastructure. That would point me to /sys/devices/mtty. # tree /sys/devices/mtty /sys/devices/mtty |-- mdev_supported_types | `-- mtty1 | |-- available_instances (1) | |-- create | |-- devices | `-- name ("Dual-port-serial") |-- mtty_dev | `-- sample_mtty_dev ("This is phy device") |-- power | |-- async | |-- autosuspend_delay_ms | |-- control | |-- runtime_active_kids | |-- runtime_active_time | |-- runtime_enabled | |-- runtime_status | |-- runtime_suspended_time | `-- runtime_usage `-- uevent Ok, but that was boring, we really need to have at least 2 supported types to validate the interface, so without changing the actual device backing, I pretended to have a single port vs dual port: /sys/devices/mtty |-- mdev_supported_types | |-- mtty1 | | |-- available_instances (24) | | |-- create | | |-- devices | | `-- name (Single-port-serial) | `-- mtty2 | |-- available_instances (12) | |-- create | |-- devices | `-- name (Dual-port-serial) [snip] I arbitrarily decides I have 24 ports and each single port uses 1 port and each dual port uses 2 ports. Before I start creating devices, what are we going to key the libvirt XML on? Can we do anything to prevent vendors from colliding or do we have any way to suggest meaningful and unique type_ids? Presumably if we had a PCI device hosting this, we would be rooted at that parent device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0. Maybe the type_id should automatically be prefixed by the vendor module name, ex. mtty-1, i915-foo, nvidia-bar. There's something missing for deterministically creating a "XYZ" device and knowing exactly what that means and finding a parent device that supports it. Let's get to mdev creating... # uuidgen > mdev_supported_types/mtty2/create # tree /sys/devices/mtty /sys/devices/mtty |-- e68189be-700e-41f7-93a3-b5351e79c470 | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev | |-- iommu_group -> ../../../kernel/iommu_groups/63 | |-- mtty2 -> ../mdev_supported_types/mtty2 | |-- power | | |-- async | | |-- autosuspend_delay_ms | | |-- control | | |-- runtime_active_kids | | |-- runtime_active_time | | |-- runtime_enabled | | |-- runtime_status | | |-- runtime_suspended_time | | `-- runtime_usage | |-- remove | |-- subsystem -> ../../../bus/mdev | |-- uevent | `-- vendor | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") |-- mdev_supported_types | |-- mtty1 | | |-- available_instances (22) | | |-- create | | |-- devices | | `-- name | `-- mtty2 | |-- available_instances (11) | |-- create | |-- devices | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 | `-- name The mdev device was created directly under the parent, which seems like it's going to get messy to me (ie. imagine dropping a bunch of uuids into a PCI parent device's sysfs directory, how does a user know what they are?). Under the device we have "mtty2", shouldn't that be "mdev_supported_type", which then links to mtty2? Otherwise a user needs to decode from the link what this attribute is. Also here's an example of those vendor sysfs entries per device. So long as the vendor never expects a tool like libvirt to manipulate attributes there, I can see how that could be pretty powerful. Moving down to the mdev_supported_types, I've updated mtty so that it actually adjusts available instance, and we can now see a link under the devices for mtty2. Also worth noting is that a link for the device appears in /sys/bus/mdev/devices. BTW, specifying this device for QEMU vfio-pci is where the sysfsdev option comes into play: -device vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470 Which raises another question, we can tell through the vfio interfaces that this is exposes as a PCI device, by creating a container (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)), adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)), getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the flag bit that says the API is PCI. That's a long path to go and has stumbling blocks like the type of iommu that's available for the given platform. How do we make that manageable? I don't think we want to create some artificial relationship that the type of the parent necessarily match the type of the child mdev, we've already broken that with a simple mdev tty driver. One more: # uuidgen > mdev_supported_types/mtty1/create # tree /sys/devices/mtty /sys/devices/mtty |-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev | |-- iommu_group -> ../../../kernel/iommu_groups/64 | |-- mtty1 -> ../mdev_supported_types/mtty1 | |-- power [snip] | |-- remove | |-- subsystem -> ../../../bus/mdev | |-- uevent | `-- vendor | `-- sample_mdev_dev ("This is MDEV a7ae17d1-2de4-44c2-ae58-20ae0a0befe8") |-- e68189be-700e-41f7-93a3-b5351e79c470 | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev | |-- iommu_group -> ../../../kernel/iommu_groups/63 | |-- mtty2 -> ../mdev_supported_types/mtty2 | |-- power [snip] | |-- remove | |-- subsystem -> ../../../bus/mdev | |-- uevent | `-- vendor | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") |-- mdev_supported_types | |-- mtty1 | | |-- available_instances (21) | | |-- create | | |-- devices | | | `-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 -> ../../../a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 | | `-- name | `-- mtty2 | |-- available_instances (10) | |-- create | |-- devices | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 | `-- name Hopefully as expected with the caveats for the first example. # echo 1 > a7ae17d1-2de4-44c2-ae58-20ae0a0befe8/remove # echo 1 > e68189be-700e-41f7-93a3-b5351e79c470/remove These do what they're supposed to, the devices are gone. Ok, I've identified some issues, let's figure out how to resolve them. Thanks, Alex (hack multi-port mtty patch attached) diff --git a/Documentation/vfio-mdev/Makefile b/Documentation/vfio-mdev/Makefile index ff6f8a3..721daf0 100644 --- a/Documentation/vfio-mdev/Makefile +++ b/Documentation/vfio-mdev/Makefile @@ -8,6 +8,9 @@ obj-m:=mtty.o default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules +modules_install: + $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules_install + clean: @rm -rf .*.cmd *.mod.c *.o *.ko .tmp* @rm -rf Module.* Modules.* modules.* .tmp_versions diff --git a/Documentation/vfio-mdev/mtty.c b/Documentation/vfio-mdev/mtty.c index 497c90e..05ab40d 100644 --- a/Documentation/vfio-mdev/mtty.c +++ b/Documentation/vfio-mdev/mtty.c @@ -141,8 +141,11 @@ struct mdev_state { struct serial_port s[2]; struct mutex rxtx_lock; struct vfio_device_info dev_info; + int nr_ports; }; +#define MAX_MTTYS 24 + struct mutex mdev_list_lock; struct list_head mdev_devices_list; @@ -723,6 +726,11 @@ int mtty_create(struct kobject *kobj, struct mdev_device *mdev) if (mdev_state == NULL) return -ENOMEM; + if (!strcmp(kobj->name, "mtty1")) + mdev_state->nr_ports = 1; + else if (!strcmp(kobj->name, "mtty2")) + mdev_state->nr_ports = 2; + mdev_state->irq_index = -1; mdev_state->s[0].max_fifo_size = MAX_FIFO_SIZE; mdev_state->s[1].max_fifo_size = MAX_FIFO_SIZE; @@ -1224,7 +1232,12 @@ const struct attribute_group *mdev_dev_groups[] = { static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf) { - return sprintf(buf, "Dual-port-serial\n"); + if (!strcmp(kobj->name, "mtty1")) + return sprintf(buf, "Single-port-serial\n"); + if (!strcmp(kobj->name, "mtty2")) + return sprintf(buf, "Dual-port-serial\n"); + + return -EINVAL; } MDEV_TYPE_ATTR_RO(name); @@ -1232,7 +1245,20 @@ MDEV_TYPE_ATTR_RO(name); static ssize_t available_instances_show(struct kobject *kobj, struct device *dev, char *buf) { - return sprintf(buf, "1\n"); + struct mdev_state *mds; + int ports, used = 0; + + if (!strcmp(kobj->name, "mtty1")) + ports = 1; + else if (!strcmp(kobj->name, "mtty2")) + ports = 2; + else + return -EINVAL; + + list_for_each_entry(mds, &mdev_devices_list, next) { + used += mds->nr_ports; + } + return sprintf(buf, "%d\n", (MAX_MTTYS - used)/ports); } MDEV_TYPE_ATTR_RO(available_instances); @@ -1243,13 +1269,19 @@ static struct attribute *mdev_types_attrs[] = { NULL, }; -static struct attribute_group mdev_type_group = { +static struct attribute_group mdev_type_group1 = { .name = "mtty1", .attrs = mdev_types_attrs, }; +static struct attribute_group mdev_type_group2 = { + .name = "mtty2", + .attrs = mdev_types_attrs, +}; + struct attribute_group *mdev_type_groups[] = { - &mdev_type_group, + &mdev_type_group1, + &mdev_type_group2, NULL, };