Message ID | 20161012154457.5a05b4fb@t450s.home (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/13/2016 3:14 AM, Alex Williamson wrote: > On Thu, 13 Oct 2016 00:32:48 +0530 > Kirti Wankhede <kwankhede@nvidia.com> wrote: > >> On 10/12/2016 9:29 PM, Alex Williamson wrote: >>> On Wed, 12 Oct 2016 20:43:48 +0530 >>> Kirti Wankhede <kwankhede@nvidia.com> wrote: >>> >>>> On 10/12/2016 7:22 AM, Tian, Kevin wrote: >>>>>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com] >>>>>> Sent: Wednesday, October 12, 2016 4:45 AM >>>>>>>> +* mdev_supported_types: >>>>>>>> + List of current supported mediated device types and its details are added >>>>>>>> +in this directory in following format: >>>>>>>> + >>>>>>>> +|- <parent phy device> >>>>>>>> +|--- Vendor-specific-attributes [optional] >>>>>>>> +|--- mdev_supported_types >>>>>>>> +| |--- <type id> >>>>>>>> +| | |--- create >>>>>>>> +| | |--- name >>>>>>>> +| | |--- available_instances >>>>>>>> +| | |--- description /class >>>>>>>> +| | |--- [devices] >>>>>>>> +| |--- <type id> >>>>>>>> +| | |--- create >>>>>>>> +| | |--- name >>>>>>>> +| | |--- available_instances >>>>>>>> +| | |--- description /class >>>>>>>> +| | |--- [devices] >>>>>>>> +| |--- <type id> >>>>>>>> +| |--- create >>>>>>>> +| |--- name >>>>>>>> +| |--- available_instances >>>>>>>> +| |--- description /class >>>>>>>> +| |--- [devices] >>>>>>>> + >>>>>>>> +[TBD : description or class is yet to be decided. This will change.] >>>>>>> >>>>>>> I thought that in previous discussions we had agreed to drop >>>>>>> the <type id> concept and use the name as the unique identifier. >>>>>>> When reporting these types in libvirt we won't want to report >>>>>>> the type id values - we'll want the name strings to be unique. >>>>>>> >>>>>> >>>>>> The 'name' might not be unique but type_id will be. For example that Neo >>>>>> pointed out in earlier discussion, virtual devices can come from two >>>>>> different physical devices, end user would be presented with what they >>>>>> had selected but there will be internal implementation differences. In >>>>>> that case 'type_id' will be unique. >>>>>> >>>>> >>>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type >>>>> string (if you still called it <type id>), and then no need of additional >>>>> 'name' field which can be put inside 'description' field. See below quote: >>>>> >>>> >>>> We had internal discussions about this within NVIDIA and found that >>>> 'name' might not be unique where as 'type_id' would be unique. I'm >>>> refering to Neo's mail after that, where Neo do pointed that out. >>>> >>>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html >>> >>> Everyone not privy to those internal discussions, including me, seems to >>> think we dropped type_id and that if a vendor does not have a stable >>> name, they can compose some sort of stable type description based on the >>> name+id, or even vendor+id, ex. NVIDIA-11. So please share why we >>> haven't managed to kill off type_id yet. No matter what internal >>> representation each vendor driver has of "type_id" it seems possible >>> for it to come up with stable string to define a given configuration. >> >> >> The 'type_id' is unique and the 'name' are not, the name is just a >> virtual device name/ human readable name. Because at this moment Intel >> can't define a proper GPU class, we have to add a 'description' field >> there as well to represent the features of this virtual device, once we >> have all agreed with the GPU class and its mandatory attributes, the >> 'description' field can be removed. Here is an example, >> type_id/type_name = NVIDIA_11, >> name=M60-M0Q, >> description=2560x1600, 2 displays, 512MB" >> >> Neo's previous comment only applies to the situation where we will have >> the GPU class or optional attributes defined and recognized by libvirt, >> since that is not going to happen any time soon, we will have to have >> the new 'description' field, and we don't want to have it mixed up with >> 'name' field. >> >> We can definitely have something like name+id as Alex recommended to >> remove the 'name' field, but it will just require libvirt to have more >> logic to parse that string. > > Let's use the mtty example driver provided in patch 5 so we can all > more clearly see how the interfaces work. I'll start from the > beginning of my experience and work my way to the type/name thing. > Thanks for looking into it and getting feel of it. And I hope this helps to understand that 'name' and 'type_id' are different. > (please add a modules_install target to the Makefile) > This is an example and I feel it should not be installed in /lib/modules/../build path. This should be used to understand the interface and the flow of mdev device management life cycle. User can use insmod to load driver: # insmod mtty.ko > # modprobe mtty > > Now what? It seems like I need to have prior knowledge that this > drivers supports mdev devices and I need to go hunt for them. We need > to create a class (ex. /sys/class/mdev/) where a user can find all the > devices that participate in this mediated device infrastructure. That > would point me to /sys/devices/mtty. > You can find devices registered to mdev framework by searching for 'mdev_supported_types' directory at the leaf nodes of devices in /sys/devices directory. Yes, we can have 'mdev' class and links to devices which are registered to mdev framework in /sys/class/mdev/. > # tree /sys/devices/mtty > /sys/devices/mtty > |-- mdev_supported_types > | `-- mtty1 > | |-- available_instances (1) > | |-- create > | |-- devices > | `-- name ("Dual-port-serial") > |-- mtty_dev > | `-- sample_mtty_dev ("This is phy device") > |-- power > | |-- async > | |-- autosuspend_delay_ms > | |-- control > | |-- runtime_active_kids > | |-- runtime_active_time > | |-- runtime_enabled > | |-- runtime_status > | |-- runtime_suspended_time > | `-- runtime_usage > `-- uevent > > Ok, but that was boring, we really need to have at least 2 supported > types to validate the interface, so without changing the actual device > backing, I pretended to have a single port vs dual port: > > /sys/devices/mtty > |-- mdev_supported_types > | |-- mtty1 > | | |-- available_instances (24) > | | |-- create > | | |-- devices > | | `-- name (Single-port-serial) > | `-- mtty2 > | |-- available_instances (12) > | |-- create > | |-- devices > | `-- name (Dual-port-serial) > [snip] > > I arbitrarily decides I have 24 ports and each single port uses 1 port > and each dual port uses 2 ports. > > Before I start creating devices, what are we going to key the libvirt > XML on? Can we do anything to prevent vendors from colliding or do we > have any way to suggest meaningful and unique type_ids? Libvirt would have parent and type_id in XML. No two vendors can own same parent device. So I don't think vendors would collide even having same type_id, since <parent, type_id> pair would always be unique. Presumably if > we had a PCI device hosting this, we would be rooted at that parent > device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0. Maybe > the type_id should automatically be prefixed by the vendor module name, > ex. mtty-1, i915-foo, nvidia-bar. There's something missing for > deterministically creating a "XYZ" device and knowing exactly what that > means and finding a parent device that supports it. > We can prefix type_id with module name, i.e using dev->driver->name, but <parent, type_id> pair is unique so I don't see much benefit in doing that. > Let's get to mdev creating... > > # uuidgen > mdev_supported_types/mtty2/create > # tree /sys/devices/mtty > /sys/devices/mtty > |-- e68189be-700e-41f7-93a3-b5351e79c470 > | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev > | |-- iommu_group -> ../../../kernel/iommu_groups/63 > | |-- mtty2 -> ../mdev_supported_types/mtty2 > | |-- power > | | |-- async > | | |-- autosuspend_delay_ms > | | |-- control > | | |-- runtime_active_kids > | | |-- runtime_active_time > | | |-- runtime_enabled > | | |-- runtime_status > | | |-- runtime_suspended_time > | | `-- runtime_usage > | |-- remove > | |-- subsystem -> ../../../bus/mdev > | |-- uevent > | `-- vendor > | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") > |-- mdev_supported_types > | |-- mtty1 > | | |-- available_instances (22) > | | |-- create > | | |-- devices > | | `-- name > | `-- mtty2 > | |-- available_instances (11) > | |-- create > | |-- devices > | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 > | `-- name > > The mdev device was created directly under the parent, which seems like > it's going to get messy to me (ie. imagine dropping a bunch of uuids > into a PCI parent device's sysfs directory, how does a user know what > they are?). > That is the way devices are placed in sysfs. For example below devices: 80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 1a (rev 07) 80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07) 80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 3a in PCI Express Mode (rev 07) 80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 0 (rev 07) 80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 1 (rev 07) 80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 2 (rev 07) 80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 3 (rev 07) 80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 4 (rev 07) 80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 5 (rev 07) 80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 6 (rev 07) 80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel 7 (rev 07) 80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address Map, VTd_Misc, System Management (rev 07) 80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control Status and Global Errors (rev 07) 80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07) In sysfs, those are in same parent folder of its parent root port: # ls /sys/devices/pci0000\:80/ -l total 0 drwxr-xr-x 8 root root 0 Oct 13 12:08 0000:80:01.0 drwxr-xr-x 7 root root 0 Oct 13 12:08 0000:80:02.0 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:03.0 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.0 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.1 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.2 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.3 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.4 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.5 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.6 drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.7 drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.0 drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.2 drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.4 lrwxrwxrwx 1 root root 0 Oct 13 13:25 firmware_node -> ../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01 drwxr-xr-x 3 root root 0 Oct 13 12:08 pci_bus drwxr-xr-x 2 root root 0 Oct 13 12:08 power -rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent > Under the device we have "mtty2", shouldn't that be > "mdev_supported_type", which then links to mtty2? Otherwise a user > needs to decode from the link what this attribute is. > I thought it should show type, so that by looking at 'ls' output user should be able to find type_id. > Also here's an example of those vendor sysfs entries per device. So > long as the vendor never expects a tool like libvirt to manipulate > attributes there, I can see how that could be pretty powerful. > Yes, it is good to have vendor specific entries, libvirt might not report/use it. That would be more useful for system admin to get extra information manually that libvirt doesn't report. > Moving down to the mdev_supported_types, I've updated mtty so that it > actually adjusts available instance, and we can now see a link under > the devices for mtty2. > > Also worth noting is that a link for the device appears > in /sys/bus/mdev/devices. > > BTW, specifying this device for QEMU vfio-pci is where the sysfsdev > option comes into play: > > -device > vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470 > > Which raises another question, we can tell through the vfio interfaces > that this is exposes as a PCI device, by creating a container > (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)), > adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)), > getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally > getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the > flag bit that says the API is PCI. That's a long path to go and has > stumbling blocks like the type of iommu that's available for the given > platform. How do we make that manageable? Do you want device type to be expressed in sysfs? Then that should be done from vendor driver. vfio_mdev module is now a shim layer, so mdev core or vfio_mdev module don't know what device type flag vendor driver had set. Thanks, Kirti > I don't think we want to > create some artificial relationship that the type of the parent > necessarily match the type of the child mdev, we've already broken that > with a simple mdev tty driver. > > One more: > > # uuidgen > mdev_supported_types/mtty1/create > # tree /sys/devices/mtty > /sys/devices/mtty > |-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 > | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev > | |-- iommu_group -> ../../../kernel/iommu_groups/64 > | |-- mtty1 -> ../mdev_supported_types/mtty1 > | |-- power > [snip] > | |-- remove > | |-- subsystem -> ../../../bus/mdev > | |-- uevent > | `-- vendor > | `-- sample_mdev_dev ("This is MDEV a7ae17d1-2de4-44c2-ae58-20ae0a0befe8") > |-- e68189be-700e-41f7-93a3-b5351e79c470 > | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev > | |-- iommu_group -> ../../../kernel/iommu_groups/63 > | |-- mtty2 -> ../mdev_supported_types/mtty2 > | |-- power > [snip] > | |-- remove > | |-- subsystem -> ../../../bus/mdev > | |-- uevent > | `-- vendor > | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") > |-- mdev_supported_types > | |-- mtty1 > | | |-- available_instances (21) > | | |-- create > | | |-- devices > | | | `-- a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 -> ../../../a7ae17d1-2de4-44c2-ae58-20ae0a0befe8 > | | `-- name > | `-- mtty2 > | |-- available_instances (10) > | |-- create > | |-- devices > | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 > | `-- name > > Hopefully as expected with the caveats for the first example. > > # echo 1 > a7ae17d1-2de4-44c2-ae58-20ae0a0befe8/remove > # echo 1 > e68189be-700e-41f7-93a3-b5351e79c470/remove > > These do what they're supposed to, the devices are gone. > > Ok, I've identified some issues, let's figure out how to resolve them. > Thanks, > > Alex > > (hack multi-port mtty patch attached) >
On Thu, 13 Oct 2016 14:52:09 +0530 Kirti Wankhede <kwankhede@nvidia.com> wrote: > On 10/13/2016 3:14 AM, Alex Williamson wrote: > > On Thu, 13 Oct 2016 00:32:48 +0530 > > Kirti Wankhede <kwankhede@nvidia.com> wrote: > > > >> On 10/12/2016 9:29 PM, Alex Williamson wrote: > >>> On Wed, 12 Oct 2016 20:43:48 +0530 > >>> Kirti Wankhede <kwankhede@nvidia.com> wrote: > >>> > >>>> On 10/12/2016 7:22 AM, Tian, Kevin wrote: > >>>>>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com] > >>>>>> Sent: Wednesday, October 12, 2016 4:45 AM > >>>>>>>> +* mdev_supported_types: > >>>>>>>> + List of current supported mediated device types and its details are added > >>>>>>>> +in this directory in following format: > >>>>>>>> + > >>>>>>>> +|- <parent phy device> > >>>>>>>> +|--- Vendor-specific-attributes [optional] > >>>>>>>> +|--- mdev_supported_types > >>>>>>>> +| |--- <type id> > >>>>>>>> +| | |--- create > >>>>>>>> +| | |--- name > >>>>>>>> +| | |--- available_instances > >>>>>>>> +| | |--- description /class > >>>>>>>> +| | |--- [devices] > >>>>>>>> +| |--- <type id> > >>>>>>>> +| | |--- create > >>>>>>>> +| | |--- name > >>>>>>>> +| | |--- available_instances > >>>>>>>> +| | |--- description /class > >>>>>>>> +| | |--- [devices] > >>>>>>>> +| |--- <type id> > >>>>>>>> +| |--- create > >>>>>>>> +| |--- name > >>>>>>>> +| |--- available_instances > >>>>>>>> +| |--- description /class > >>>>>>>> +| |--- [devices] > >>>>>>>> + > >>>>>>>> +[TBD : description or class is yet to be decided. This will change.] > >>>>>>> > >>>>>>> I thought that in previous discussions we had agreed to drop > >>>>>>> the <type id> concept and use the name as the unique identifier. > >>>>>>> When reporting these types in libvirt we won't want to report > >>>>>>> the type id values - we'll want the name strings to be unique. > >>>>>>> > >>>>>> > >>>>>> The 'name' might not be unique but type_id will be. For example that Neo > >>>>>> pointed out in earlier discussion, virtual devices can come from two > >>>>>> different physical devices, end user would be presented with what they > >>>>>> had selected but there will be internal implementation differences. In > >>>>>> that case 'type_id' will be unique. > >>>>>> > >>>>> > >>>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type > >>>>> string (if you still called it <type id>), and then no need of additional > >>>>> 'name' field which can be put inside 'description' field. See below quote: > >>>>> > >>>> > >>>> We had internal discussions about this within NVIDIA and found that > >>>> 'name' might not be unique where as 'type_id' would be unique. I'm > >>>> refering to Neo's mail after that, where Neo do pointed that out. > >>>> > >>>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html > >>> > >>> Everyone not privy to those internal discussions, including me, seems to > >>> think we dropped type_id and that if a vendor does not have a stable > >>> name, they can compose some sort of stable type description based on the > >>> name+id, or even vendor+id, ex. NVIDIA-11. So please share why we > >>> haven't managed to kill off type_id yet. No matter what internal > >>> representation each vendor driver has of "type_id" it seems possible > >>> for it to come up with stable string to define a given configuration. > >> > >> > >> The 'type_id' is unique and the 'name' are not, the name is just a > >> virtual device name/ human readable name. Because at this moment Intel > >> can't define a proper GPU class, we have to add a 'description' field > >> there as well to represent the features of this virtual device, once we > >> have all agreed with the GPU class and its mandatory attributes, the > >> 'description' field can be removed. Here is an example, > >> type_id/type_name = NVIDIA_11, > >> name=M60-M0Q, > >> description=2560x1600, 2 displays, 512MB" > >> > >> Neo's previous comment only applies to the situation where we will have > >> the GPU class or optional attributes defined and recognized by libvirt, > >> since that is not going to happen any time soon, we will have to have > >> the new 'description' field, and we don't want to have it mixed up with > >> 'name' field. > >> > >> We can definitely have something like name+id as Alex recommended to > >> remove the 'name' field, but it will just require libvirt to have more > >> logic to parse that string. > > > > Let's use the mtty example driver provided in patch 5 so we can all > > more clearly see how the interfaces work. I'll start from the > > beginning of my experience and work my way to the type/name thing. > > > > Thanks for looking into it and getting feel of it. And I hope this helps > to understand that 'name' and 'type_id' are different. > > > > (please add a modules_install target to the Makefile) > > > > This is an example and I feel it should not be installed in > /lib/modules/../build path. This should be used to understand the > interface and the flow of mdev device management life cycle. User can > use insmod to load driver: > > # insmod mtty.ko It's not built by default, that's sufficient. Providing a modules_install target makes it more accessible for testing and allows easier testing of module dependencies with modprobe. insmod does not exercise the automatic module dependency loading. > > # modprobe mtty > > > > Now what? It seems like I need to have prior knowledge that this > > drivers supports mdev devices and I need to go hunt for them. We need > > to create a class (ex. /sys/class/mdev/) where a user can find all the > > devices that participate in this mediated device infrastructure. That > > would point me to /sys/devices/mtty. > > > > You can find devices registered to mdev framework by searching for > 'mdev_supported_types' directory at the leaf nodes of devices in > /sys/devices directory. Yes, we can have 'mdev' class and links to > devices which are registered to mdev framework in /sys/class/mdev/. > > > > # tree /sys/devices/mtty > > /sys/devices/mtty > > |-- mdev_supported_types > > | `-- mtty1 > > | |-- available_instances (1) > > | |-- create > > | |-- devices > > | `-- name ("Dual-port-serial") > > |-- mtty_dev > > | `-- sample_mtty_dev ("This is phy device") > > |-- power > > | |-- async > > | |-- autosuspend_delay_ms > > | |-- control > > | |-- runtime_active_kids > > | |-- runtime_active_time > > | |-- runtime_enabled > > | |-- runtime_status > > | |-- runtime_suspended_time > > | `-- runtime_usage > > `-- uevent > > > > Ok, but that was boring, we really need to have at least 2 supported > > types to validate the interface, so without changing the actual device > > backing, I pretended to have a single port vs dual port: > > > > /sys/devices/mtty > > |-- mdev_supported_types > > | |-- mtty1 > > | | |-- available_instances (24) > > | | |-- create > > | | |-- devices > > | | `-- name (Single-port-serial) > > | `-- mtty2 > > | |-- available_instances (12) > > | |-- create > > | |-- devices > > | `-- name (Dual-port-serial) > > [snip] > > > > I arbitrarily decides I have 24 ports and each single port uses 1 port > > and each dual port uses 2 ports. > > > > Before I start creating devices, what are we going to key the libvirt > > XML on? Can we do anything to prevent vendors from colliding or do we > > have any way to suggest meaningful and unique type_ids? > > Libvirt would have parent and type_id in XML. No two vendors can own > same parent device. So I don't think vendors would collide even having > same type_id, since <parent, type_id> pair would always be unique. We have a goal of supporting migration with mdev devices, Intel has already shown this is possible. Tying the XML representation of an mdev device to a parent device is directly contradictory to that goal. libvirt needs a token which is unique across vendor to be able to instantiate an mdev device. <parent, type_id> is unacceptable. > Presumably if > > we had a PCI device hosting this, we would be rooted at that parent > > device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0. Maybe > > the type_id should automatically be prefixed by the vendor module name, > > ex. mtty-1, i915-foo, nvidia-bar. There's something missing for > > deterministically creating a "XYZ" device and knowing exactly what that > > means and finding a parent device that supports it. > > > > We can prefix type_id with module name, i.e using dev->driver->name, but > <parent, type_id> pair is unique so I don't see much benefit in doing > that. > > > > Let's get to mdev creating... > > > > # uuidgen > mdev_supported_types/mtty2/create > > # tree /sys/devices/mtty > > /sys/devices/mtty > > |-- e68189be-700e-41f7-93a3-b5351e79c470 > > | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev > > | |-- iommu_group -> ../../../kernel/iommu_groups/63 > > | |-- mtty2 -> ../mdev_supported_types/mtty2 > > | |-- power > > | | |-- async > > | | |-- autosuspend_delay_ms > > | | |-- control > > | | |-- runtime_active_kids > > | | |-- runtime_active_time > > | | |-- runtime_enabled > > | | |-- runtime_status > > | | |-- runtime_suspended_time > > | | `-- runtime_usage > > | |-- remove > > | |-- subsystem -> ../../../bus/mdev > > | |-- uevent > > | `-- vendor > > | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") > > |-- mdev_supported_types > > | |-- mtty1 > > | | |-- available_instances (22) > > | | |-- create > > | | |-- devices > > | | `-- name > > | `-- mtty2 > > | |-- available_instances (11) > > | |-- create > > | |-- devices > > | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 > > | `-- name > > > > The mdev device was created directly under the parent, which seems like > > it's going to get messy to me (ie. imagine dropping a bunch of uuids > > into a PCI parent device's sysfs directory, how does a user know what > > they are?). > > > > That is the way devices are placed in sysfs. For example below devices: > > 80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express > Root Port 1a (rev 07) > 80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express > Root Port 2a (rev 07) > 80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express > Root Port 3a in PCI Express Mode (rev 07) > 80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 0 (rev 07) > 80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 1 (rev 07) > 80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 2 (rev 07) > 80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 3 (rev 07) > 80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 4 (rev 07) > 80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 5 (rev 07) > 80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 6 (rev 07) > 80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel > 7 (rev 07) > 80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address > Map, VTd_Misc, System Management (rev 07) > 80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control > Status and Global Errors (rev 07) > 80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07) > > In sysfs, those are in same parent folder of its parent root port: > > # ls /sys/devices/pci0000\:80/ -l > total 0 > drwxr-xr-x 8 root root 0 Oct 13 12:08 0000:80:01.0 > drwxr-xr-x 7 root root 0 Oct 13 12:08 0000:80:02.0 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:03.0 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.0 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.1 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.2 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.3 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.4 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.5 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.6 > drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.7 > drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.0 > drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.2 > drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.4 > lrwxrwxrwx 1 root root 0 Oct 13 13:25 firmware_node -> > ../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01 > drwxr-xr-x 3 root root 0 Oct 13 12:08 pci_bus > drwxr-xr-x 2 root root 0 Oct 13 12:08 power > -rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent I agree, that's also messy, but a PCI device has a standard PCI address format, so just by looking at those you know it's a child PCI device. We can write code that understands how to parse that and understands what it is by the address. On the other hand we're dropping uuids into a directory. We can write code that understands how to parse it, but how do we know that it's a child device vs some other attribute for the parent? > > Under the device we have "mtty2", shouldn't that be > > "mdev_supported_type", which then links to mtty2? Otherwise a user > > needs to decode from the link what this attribute is. > > > > I thought it should show type, so that by looking at 'ls' output user > should be able to find type_id. The type_id should be shown by actually reading the link, not by the link name itself, the same way that the iommu_group link for a device isn't the group number, it links to the group number but uses a standard link name. > > Also here's an example of those vendor sysfs entries per device. So > > long as the vendor never expects a tool like libvirt to manipulate > > attributes there, I can see how that could be pretty powerful. > > > > Yes, it is good to have vendor specific entries, libvirt might not > report/use it. That would be more useful for system admin to get extra > information manually that libvirt doesn't report. > > > > Moving down to the mdev_supported_types, I've updated mtty so that it > > actually adjusts available instance, and we can now see a link under > > the devices for mtty2. > > > > Also worth noting is that a link for the device appears > > in /sys/bus/mdev/devices. > > > > BTW, specifying this device for QEMU vfio-pci is where the sysfsdev > > option comes into play: > > > > -device > > vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470 > > > > Which raises another question, we can tell through the vfio interfaces > > that this is exposes as a PCI device, by creating a container > > (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)), > > adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)), > > getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally > > getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the > > flag bit that says the API is PCI. That's a long path to go and has > > stumbling blocks like the type of iommu that's available for the given > > platform. How do we make that manageable? > > Do you want device type to be expressed in sysfs? Then that should be > done from vendor driver. vfio_mdev module is now a shim layer, so mdev > core or vfio_mdev module don't know what device type flag vendor driver > had set. Right, the vendor driver would need to expose this, the mdev layers are device agnostic, they don't know or care which device API is being exposed. The other question is whether it needs to be part of the initial implementation or can we assume pci for now and add something later. I guess we already have our proof to the contrary with the IBM ccw device that libvirt can't simply assume pci. I see that many devices in sysfs have a subsystem link, which seems rather appropriate, but we're not creating a real pci device, so linking to /sys/bus/pci or /sys/class/pci_bus both seem invalid. Is that a dead end? We could always expose vfio_device_info.flags, but that seems pretty ugly as well, plus the sysfs mdev interface is not vfio specific. What if we had a "device_api" attribute which the vendor driver would show as "vfio-pci"? Therefore the mdev interface is not tied to vfio, but we clearly show that a given type_id clearly exports a vfio-pci interface. Thanks, Alex
On 13/10/2016 16:36, Alex Williamson wrote: >> > >> > Libvirt would have parent and type_id in XML. No two vendors can own >> > same parent device. So I don't think vendors would collide even having >> > same type_id, since <parent, type_id> pair would always be unique. > > We have a goal of supporting migration with mdev devices, Intel has > already shown this is possible. Tying the XML representation of an > mdev device to a parent device is directly contradictory to that goal. > libvirt needs a token which is unique across vendor to be able to > instantiate an mdev device. <parent, type_id> is unacceptable. Would the vGPU's PCI vendor and device ID be acceptable and unique? Paolo
On Thu, 13 Oct 2016 18:00:07 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 13/10/2016 16:36, Alex Williamson wrote: > >> > > >> > Libvirt would have parent and type_id in XML. No two vendors can own > >> > same parent device. So I don't think vendors would collide even having > >> > same type_id, since <parent, type_id> pair would always be unique. > > > > We have a goal of supporting migration with mdev devices, Intel has > > already shown this is possible. Tying the XML representation of an > > mdev device to a parent device is directly contradictory to that goal. > > libvirt needs a token which is unique across vendor to be able to > > instantiate an mdev device. <parent, type_id> is unacceptable. > > Would the vGPU's PCI vendor and device ID be acceptable and unique? No, the PCI vendor:device ID doesn't fully describe the device, just look at a given GeForce configuration on the market today, a single NVIDIA PCI device ID can have different clocks, different memory sizes, different cooling profiles, different output ports, etc. configurable by the card vendor. An mdev vGPU can be the same. The type_id needs to encompass the entire virtual hardware configuration of the device. Personally I think we should create a type-id the pre-pends the vendor driver name, giving each vendor a unique namespace, and then let the vendor driver manage the rest. For example, an "nvidia-xyz" type_id should define a unique mdev configuration that may be implemented across multiple physical hardware SKUs. libvirt xml (with managed='yes') would list an "nvidia-xyz" device as required for the VM and search the host for mdev parent device capable of creating such a device. Based on utilization, locality, or whatever parameters it determines, libvirt would create (or re-use, if a pool) an instance of that type_id. Specifying a specific parent might be something we want to allow to give the user full placement control, but I don't think it should be required based on the sysfs API we provide to the user. Thanks, Alex
On 10/13/2016 8:06 PM, Alex Williamson wrote: > On Thu, 13 Oct 2016 14:52:09 +0530 > Kirti Wankhede <kwankhede@nvidia.com> wrote: > >> On 10/13/2016 3:14 AM, Alex Williamson wrote: >>> On Thu, 13 Oct 2016 00:32:48 +0530 >>> Kirti Wankhede <kwankhede@nvidia.com> wrote: >>> >>>> On 10/12/2016 9:29 PM, Alex Williamson wrote: >>>>> On Wed, 12 Oct 2016 20:43:48 +0530 >>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote: >>>>> >>>>>> On 10/12/2016 7:22 AM, Tian, Kevin wrote: >>>>>>>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com] >>>>>>>> Sent: Wednesday, October 12, 2016 4:45 AM >>>>>>>>>> +* mdev_supported_types: >>>>>>>>>> + List of current supported mediated device types and its details are added >>>>>>>>>> +in this directory in following format: >>>>>>>>>> + >>>>>>>>>> +|- <parent phy device> >>>>>>>>>> +|--- Vendor-specific-attributes [optional] >>>>>>>>>> +|--- mdev_supported_types >>>>>>>>>> +| |--- <type id> >>>>>>>>>> +| | |--- create >>>>>>>>>> +| | |--- name >>>>>>>>>> +| | |--- available_instances >>>>>>>>>> +| | |--- description /class >>>>>>>>>> +| | |--- [devices] >>>>>>>>>> +| |--- <type id> >>>>>>>>>> +| | |--- create >>>>>>>>>> +| | |--- name >>>>>>>>>> +| | |--- available_instances >>>>>>>>>> +| | |--- description /class >>>>>>>>>> +| | |--- [devices] >>>>>>>>>> +| |--- <type id> >>>>>>>>>> +| |--- create >>>>>>>>>> +| |--- name >>>>>>>>>> +| |--- available_instances >>>>>>>>>> +| |--- description /class >>>>>>>>>> +| |--- [devices] >>>>>>>>>> + >>>>>>>>>> +[TBD : description or class is yet to be decided. This will change.] >>>>>>>>> >>>>>>>>> I thought that in previous discussions we had agreed to drop >>>>>>>>> the <type id> concept and use the name as the unique identifier. >>>>>>>>> When reporting these types in libvirt we won't want to report >>>>>>>>> the type id values - we'll want the name strings to be unique. >>>>>>>>> >>>>>>>> >>>>>>>> The 'name' might not be unique but type_id will be. For example that Neo >>>>>>>> pointed out in earlier discussion, virtual devices can come from two >>>>>>>> different physical devices, end user would be presented with what they >>>>>>>> had selected but there will be internal implementation differences. In >>>>>>>> that case 'type_id' will be unique. >>>>>>>> >>>>>>> >>>>>>> Hi, Kirti, my understanding is that Neo agreed to use an unique type >>>>>>> string (if you still called it <type id>), and then no need of additional >>>>>>> 'name' field which can be put inside 'description' field. See below quote: >>>>>>> >>>>>> >>>>>> We had internal discussions about this within NVIDIA and found that >>>>>> 'name' might not be unique where as 'type_id' would be unique. I'm >>>>>> refering to Neo's mail after that, where Neo do pointed that out. >>>>>> >>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07714.html >>>>> >>>>> Everyone not privy to those internal discussions, including me, seems to >>>>> think we dropped type_id and that if a vendor does not have a stable >>>>> name, they can compose some sort of stable type description based on the >>>>> name+id, or even vendor+id, ex. NVIDIA-11. So please share why we >>>>> haven't managed to kill off type_id yet. No matter what internal >>>>> representation each vendor driver has of "type_id" it seems possible >>>>> for it to come up with stable string to define a given configuration. >>>> >>>> >>>> The 'type_id' is unique and the 'name' are not, the name is just a >>>> virtual device name/ human readable name. Because at this moment Intel >>>> can't define a proper GPU class, we have to add a 'description' field >>>> there as well to represent the features of this virtual device, once we >>>> have all agreed with the GPU class and its mandatory attributes, the >>>> 'description' field can be removed. Here is an example, >>>> type_id/type_name = NVIDIA_11, >>>> name=M60-M0Q, >>>> description=2560x1600, 2 displays, 512MB" >>>> >>>> Neo's previous comment only applies to the situation where we will have >>>> the GPU class or optional attributes defined and recognized by libvirt, >>>> since that is not going to happen any time soon, we will have to have >>>> the new 'description' field, and we don't want to have it mixed up with >>>> 'name' field. >>>> >>>> We can definitely have something like name+id as Alex recommended to >>>> remove the 'name' field, but it will just require libvirt to have more >>>> logic to parse that string. >>> >>> Let's use the mtty example driver provided in patch 5 so we can all >>> more clearly see how the interfaces work. I'll start from the >>> beginning of my experience and work my way to the type/name thing. >>> >> >> Thanks for looking into it and getting feel of it. And I hope this helps >> to understand that 'name' and 'type_id' are different. >> >> >>> (please add a modules_install target to the Makefile) >>> >> >> This is an example and I feel it should not be installed in >> /lib/modules/../build path. This should be used to understand the >> interface and the flow of mdev device management life cycle. User can >> use insmod to load driver: >> >> # insmod mtty.ko > > It's not built by default, that's sufficient. Providing a > modules_install target makes it more accessible for testing and allows > easier testing of module dependencies with modprobe. insmod does not > exercise the automatic module dependency loading. > >>> # modprobe mtty >>> >>> Now what? It seems like I need to have prior knowledge that this >>> drivers supports mdev devices and I need to go hunt for them. We need >>> to create a class (ex. /sys/class/mdev/) where a user can find all the >>> devices that participate in this mediated device infrastructure. That >>> would point me to /sys/devices/mtty. >>> >> >> You can find devices registered to mdev framework by searching for >> 'mdev_supported_types' directory at the leaf nodes of devices in >> /sys/devices directory. Yes, we can have 'mdev' class and links to >> devices which are registered to mdev framework in /sys/class/mdev/. >> >> >>> # tree /sys/devices/mtty >>> /sys/devices/mtty >>> |-- mdev_supported_types >>> | `-- mtty1 >>> | |-- available_instances (1) >>> | |-- create >>> | |-- devices >>> | `-- name ("Dual-port-serial") >>> |-- mtty_dev >>> | `-- sample_mtty_dev ("This is phy device") >>> |-- power >>> | |-- async >>> | |-- autosuspend_delay_ms >>> | |-- control >>> | |-- runtime_active_kids >>> | |-- runtime_active_time >>> | |-- runtime_enabled >>> | |-- runtime_status >>> | |-- runtime_suspended_time >>> | `-- runtime_usage >>> `-- uevent >>> >>> Ok, but that was boring, we really need to have at least 2 supported >>> types to validate the interface, so without changing the actual device >>> backing, I pretended to have a single port vs dual port: >>> >>> /sys/devices/mtty >>> |-- mdev_supported_types >>> | |-- mtty1 >>> | | |-- available_instances (24) >>> | | |-- create >>> | | |-- devices >>> | | `-- name (Single-port-serial) >>> | `-- mtty2 >>> | |-- available_instances (12) >>> | |-- create >>> | |-- devices >>> | `-- name (Dual-port-serial) >>> [snip] >>> >>> I arbitrarily decides I have 24 ports and each single port uses 1 port >>> and each dual port uses 2 ports. >>> >>> Before I start creating devices, what are we going to key the libvirt >>> XML on? Can we do anything to prevent vendors from colliding or do we >>> have any way to suggest meaningful and unique type_ids? >> >> Libvirt would have parent and type_id in XML. No two vendors can own >> same parent device. So I don't think vendors would collide even having >> same type_id, since <parent, type_id> pair would always be unique. > > > We have a goal of supporting migration with mdev devices, Intel has > already shown this is possible. Tying the XML representation of an > mdev device to a parent device is directly contradictory to that goal. > libvirt needs a token which is unique across vendor to be able to > instantiate an mdev device. <parent, type_id> is unacceptable. > Ok, I'll have dev->driver->name prefix for type_id from mdev core module. In vendor driver attribute functions, they should also use same format to identify the type. >> Presumably if >>> we had a PCI device hosting this, we would be rooted at that parent >>> device, ex. /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0. Maybe >>> the type_id should automatically be prefixed by the vendor module name, >>> ex. mtty-1, i915-foo, nvidia-bar. There's something missing for >>> deterministically creating a "XYZ" device and knowdev->driver->nameing exactly what that >>> means and finding a parent device that supports it. >>> >> >> We can prefix type_id with module name, i.e using , but >> <parent, type_id> pair is unique so I don't see much benefit in doing >> that. >> >> >>> Let's get to mdev creating... >>> >>> # uuidgen > mdev_supported_types/mtty2/create >>> # tree /sys/devices/mtty >>> /sys/devices/mtty >>> |-- e68189be-700e-41f7-93a3-b5351e79c470 >>> | |-- driver -> ../../../bus/mdev/drivers/vfio_mdev >>> | |-- iommu_group -> ../../../kernel/iommu_groups/63 >>> | |-- mtty2 -> ../mdev_supported_types/mtty2 >>> | |-- power >>> | | |-- async >>> | | |-- autosuspend_delay_ms >>> | | |-- control >>> | | |-- runtime_active_kids >>> | | |-- runtime_active_time >>> | | |-- runtime_enabled >>> | | |-- runtime_status >>> | | |-- runtime_suspended_time >>> | | `-- runtime_usage >>> | |-- remove >>> | |-- subsystem -> ../../../bus/mdev >>> | |-- uevent >>> | `-- vendor >>> | `-- sample_mdev_dev ("This is MDEV e68189be-700e-41f7-93a3-b5351e79c470") >>> |-- mdev_supported_types >>> | |-- mtty1 >>> | | |-- available_instances (22) >>> | | |-- create >>> | | |-- devices >>> | | `-- name >>> | `-- mtty2 >>> | |-- available_instances (11) >>> | |-- create >>> | |-- devices >>> | | `-- e68189be-700e-41f7-93a3-b5351e79c470 -> ../../../e68189be-700e-41f7-93a3-b5351e79c470 >>> | `-- name >>> >>> The mdev device was created directly under the parent, which seems like >>> it's going to get messy to me (ie. imagine dropping a bunch of uuids >>> into a PCI parent device's sysfs directory, how does a user know what >>> they are?). >>> >> >> That is the way devices are placed in sysfs. For example below devices: >> >> 80:01.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express >> Root Port 1a (rev 07) >> 80:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express >> Root Port 2a (rev 07) >> 80:03.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express >> Root Port 3a in PCI Express Mode (rev 07) >> 80:04.0 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 0 (rev 07) >> 80:04.1 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 1 (rev 07) >> 80:04.2 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 2 (rev 07) >> 80:04.3 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 3 (rev 07) >> 80:04.4 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 4 (rev 07) >> 80:04.5 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 5 (rev 07) >> 80:04.6 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 6 (rev 07) >> 80:04.7 System peripheral: Intel Corporation Xeon E5/Core i7 DMA Channel >> 7 (rev 07) >> 80:05.0 System peripheral: Intel Corporation Xeon E5/Core i7 Address >> Map, VTd_Misc, System Management (rev 07) >> 80:05.2 System peripheral: Intel Corporation Xeon E5/Core i7 Control >> Status and Global Errors (rev 07) >> 80:05.4 PIC: Intel Corporation Xeon E5/Core i7 I/O APIC (rev 07) >> >> In sysfs, those are in same parent folder of its parent root port: >> >> # ls /sys/devices/pci0000\:80/ -l >> total 0 >> drwxr-xr-x 8 root root 0 Oct 13 12:08 0000:80:01.0 >> drwxr-xr-x 7 root root 0 Oct 13 12:08 0000:80:02.0 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:03.0 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.0 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.1 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.2 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.3 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.4 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.5 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.6 >> drwxr-xr-x 6 root root 0 Oct 13 12:08 0000:80:04.7 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.0 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.2 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 0000:80:05.4 >> lrwxrwxrwx 1 root root 0 Oct 13 13:25 firmware_node -> >> ../LNXSYSTM:00/LNXSYBUS:00/PNP0A08:01 >> drwxr-xr-x 3 root root 0 Oct 13 12:08 pci_bus >> drwxr-xr-x 2 root root 0 Oct 13 12:08 power >> -rw-r--r-- 1 root root 4096 Oct 13 13:25 uevent > > I agree, that's also messy, but a PCI device has a standard PCI address > format, so just by looking at those you know it's a child PCI device. > We can write code that understands how to parse that and understands > what it is by the address. On the other hand we're dropping uuids into > a directory. We can write code that understands how to parse it, but > how do we know that it's a child device vs some other attribute for the > parent? > mdev child will have 'mdev_supported_type' link in its directory. That should be sufficient to identify it's child and not other attribute. >>> Under the device we have "mtty2", shouldn't that be >>> "mdev_supported_type", which then links to mtty2? Otherwise a user >>> needs to decode from the link what this attribute is. >>> >> >> I thought it should show type, so that by looking at 'ls' output user >> should be able to find type_id. > > The type_id should be shown by actually reading the link, not by the > link name itself, the same way that the iommu_group link for a device > isn't the group number, it links to the group number but uses a > standard link name. > Ok. I'll rename the link name to 'mdev_supported_type' >>> Also here's an example of those vendor sysfs entries per device. So >>> long as the vendor never expects a tool like libvirt to manipulate >>> attributes there, I can see how that could be pretty powerful. >>> >> >> Yes, it is good to have vendor specific entries, libvirt might not >> report/use it. That would be more useful for system admin to get extra >> information manually that libvirt doesn't report. >> >> >>> Moving down to the mdev_supported_types, I've updated mtty so that it >>> actually adjusts available instance, and we can now see a link under >>> the devices for mtty2. >>> >>> Also worth noting is that a link for the device appears >>> in /sys/bus/mdev/devices. >>> >>> BTW, specifying this device for QEMU vfio-pci is where the sysfsdev >>> option comes into play: >>> >>> -device >>> vfio-pci,sysfsdev=/sys/devices/mtty/e68189be-700e-41f7-93a3-b5351e79c470 >>> >>> Which raises another question, we can tell through the vfio interfaces >>> that this is exposes as a PCI device, by creating a container >>> (open(/dev/vfio/vfio)), setting an iommu (ioctl(VFIO_SET_IOMMU)), >>> adding the group to the container (ioctl(VFIO_GROUP_SET_CONTAINER)), >>> getting the device (ioctl(VFIO_GROUP_GET_DEVICE_FD)), and finally >>> getting the device info (ioctl(VFIO_DEVICE_GET_INFO)) and checking the >>> flag bit that says the API is PCI. That's a long path to go and has >>> stumbling blocks like the type of iommu that's available for the given >>> platform. How do we make that manageable? >> >> Do you want device type to be expressed in sysfs? Then that should be >> done from vendor driver. vfio_mdev module is now a shim layer, so mdev >> core or vfio_mdev module don't know what device type flag vendor driver >> had set. > > Right, the vendor driver would need to expose this, the mdev layers are > device agnostic, they don't know or care which device API is being > exposed. The other question is whether it needs to be part of the > initial implementation or can we assume pci for now and add something > later. I guess we already have our proof to the contrary with the > IBM ccw device that libvirt can't simply assume pci. I see that many > devices in sysfs have a subsystem link, which seems rather appropriate, > but we're not creating a real pci device, so linking to /sys/bus/pci > or /sys/class/pci_bus both seem invalid. Is that a dead end? We could > always expose vfio_device_info.flags, but that seems pretty ugly as > well, plus the sysfs mdev interface is not vfio specific. What if we > had a "device_api" attribute which the vendor driver would show as > "vfio-pci"? Therefore the mdev interface is not tied to vfio, but we > clearly show that a given type_id clearly exports a vfio-pci > interface. Thanks, > We can't use subsystem for mdev device. Kernel's device core framework adds subsystem link to mdev device folder as: subsystem -> ../../../../../../../bus/mdev We will have an mandatory attribute in 'supported_type_groups' which should show "vfio-pci" or "vfio-platform" based on the device flag vendor driver is going to set for that type. Thanks, Kirti
On Fri, 14 Oct 2016 09:01:01 +0530 Kirti Wankhede <kwankhede@nvidia.com> wrote: > On 10/13/2016 8:06 PM, Alex Williamson wrote: > > On Thu, 13 Oct 2016 14:52:09 +0530 > > Kirti Wankhede <kwankhede@nvidia.com> wrote: > > > >> On 10/13/2016 3:14 AM, Alex Williamson wrote: > >>> Under the device we have "mtty2", shouldn't that be > >>> "mdev_supported_type", which then links to mtty2? Otherwise a user > >>> needs to decode from the link what this attribute is. > >>> > >> > >> I thought it should show type, so that by looking at 'ls' output user > >> should be able to find type_id. > > > > The type_id should be shown by actually reading the link, not by the > > link name itself, the same way that the iommu_group link for a device > > isn't the group number, it links to the group number but uses a > > standard link name. > > > > Ok. I'll rename the link name to 'mdev_supported_type' Hmm, if we have a device, then clearly it's a supported type, we can probably reduce this to 'mdev_type'. Sorry for not catching that. BTW, please include the linux-kernel <linux-kernel@vger.kernel.org> mailing list on the CC in your next posting. Thanks, Alex
diff --git a/Documentation/vfio-mdev/Makefile b/Documentation/vfio-mdev/Makefile index ff6f8a3..721daf0 100644 --- a/Documentation/vfio-mdev/Makefile +++ b/Documentation/vfio-mdev/Makefile @@ -8,6 +8,9 @@ obj-m:=mtty.o default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules +modules_install: + $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules_install + clean: @rm -rf .*.cmd *.mod.c *.o *.ko .tmp* @rm -rf Module.* Modules.* modules.* .tmp_versions diff --git a/Documentation/vfio-mdev/mtty.c b/Documentation/vfio-mdev/mtty.c index 497c90e..05ab40d 100644 --- a/Documentation/vfio-mdev/mtty.c +++ b/Documentation/vfio-mdev/mtty.c @@ -141,8 +141,11 @@ struct mdev_state { struct serial_port s[2]; struct mutex rxtx_lock; struct vfio_device_info dev_info; + int nr_ports; }; +#define MAX_MTTYS 24 + struct mutex mdev_list_lock; struct list_head mdev_devices_list; @@ -723,6 +726,11 @@ int mtty_create(struct kobject *kobj, struct mdev_device *mdev) if (mdev_state == NULL) return -ENOMEM; + if (!strcmp(kobj->name, "mtty1")) + mdev_state->nr_ports = 1; + else if (!strcmp(kobj->name, "mtty2")) + mdev_state->nr_ports = 2; + mdev_state->irq_index = -1; mdev_state->s[0].max_fifo_size = MAX_FIFO_SIZE; mdev_state->s[1].max_fifo_size = MAX_FIFO_SIZE; @@ -1224,7 +1232,12 @@ const struct attribute_group *mdev_dev_groups[] = { static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf) { - return sprintf(buf, "Dual-port-serial\n"); + if (!strcmp(kobj->name, "mtty1")) + return sprintf(buf, "Single-port-serial\n"); + if (!strcmp(kobj->name, "mtty2")) + return sprintf(buf, "Dual-port-serial\n"); + + return -EINVAL; } MDEV_TYPE_ATTR_RO(name); @@ -1232,7 +1245,20 @@ MDEV_TYPE_ATTR_RO(name); static ssize_t available_instances_show(struct kobject *kobj, struct device *dev, char *buf) { - return sprintf(buf, "1\n"); + struct mdev_state *mds; + int ports, used = 0; + + if (!strcmp(kobj->name, "mtty1")) + ports = 1; + else if (!strcmp(kobj->name, "mtty2")) + ports = 2; + else + return -EINVAL; + + list_for_each_entry(mds, &mdev_devices_list, next) { + used += mds->nr_ports; + } + return sprintf(buf, "%d\n", (MAX_MTTYS - used)/ports); } MDEV_TYPE_ATTR_RO(available_instances); @@ -1243,13 +1269,19 @@ static struct attribute *mdev_types_attrs[] = { NULL, }; -static struct attribute_group mdev_type_group = { +static struct attribute_group mdev_type_group1 = { .name = "mtty1", .attrs = mdev_types_attrs, }; +static struct attribute_group mdev_type_group2 = { + .name = "mtty2", + .attrs = mdev_types_attrs, +}; + struct attribute_group *mdev_type_groups[] = { - &mdev_type_group, + &mdev_type_group1, + &mdev_type_group2, NULL, };