From patchwork Tue Oct 18 12:38:21 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jike Song X-Patchwork-Id: 9382099 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 20574600CA for ; Tue, 18 Oct 2016 12:43:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10884295CF for ; Tue, 18 Oct 2016 12:43:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0502B295D4; Tue, 18 Oct 2016 12:43:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 32945295D7 for ; Tue, 18 Oct 2016 12:43:32 +0000 (UTC) Received: from localhost ([::1]:41271 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bwTjw-0008Ic-30 for patchwork-qemu-devel@patchwork.kernel.org; Tue, 18 Oct 2016 08:43:32 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43586) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bwThu-0006kw-Fd for qemu-devel@nongnu.org; Tue, 18 Oct 2016 08:41:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bwThr-0000Yn-4h for qemu-devel@nongnu.org; Tue, 18 Oct 2016 08:41:26 -0400 Received: from mga11.intel.com ([192.55.52.93]:36019) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bwThq-0000YJ-PB for qemu-devel@nongnu.org; Tue, 18 Oct 2016 08:41:23 -0400 Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP; 18 Oct 2016 05:41:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,361,1473145200"; d="scan'208";a="20870364" Received: from git1.bj.intel.com ([10.238.135.72]) by orsmga005.jf.intel.com with ESMTP; 18 Oct 2016 05:41:18 -0700 Message-ID: <580617BD.8000300@intel.com> Date: Tue, 18 Oct 2016 20:38:21 +0800 From: Jike Song User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Alex Williamson References: <1259cdba-c137-c3da-abe2-ecf51aec6738@linux.intel.com> <523e1446-75f1-fe3a-d818-f7d238d57751@redhat.com> <5800B579.9000705@intel.com> <20161014084158.623087aa@t450s.home> <20161014084601.2a50ba87@t450s.home> <20161014163545.GA6121@nvidia.com> <20161014105124.42b438a6@t450s.home> <20161014221901.GA8865@nvidia.com> <20161017100229.1474ae33@t450s.home> In-Reply-To: <20161017100229.1474ae33@t450s.home> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.93 Subject: Re: [Qemu-devel] [PATCH 1/2] KVM: page track: add a new notifier type: track_flush_slot X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Tian, Kevin" , Neo Jia , kvm@vger.kernel.org, guangrong.xiao@intel.com, Xiao Guangrong , qemu-devel , Xiaoguang Chen , Kirti Wankhede , Paolo Bonzini Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP On 10/18/2016 12:02 AM, Alex Williamson wrote: > On Fri, 14 Oct 2016 15:19:01 -0700 > Neo Jia wrote: > >> On Fri, Oct 14, 2016 at 10:51:24AM -0600, Alex Williamson wrote: >>> On Fri, 14 Oct 2016 09:35:45 -0700 >>> Neo Jia wrote: >>> >>>> On Fri, Oct 14, 2016 at 08:46:01AM -0600, Alex Williamson wrote: >>>>> On Fri, 14 Oct 2016 08:41:58 -0600 >>>>> Alex Williamson wrote: >>>>> >>>>>> On Fri, 14 Oct 2016 18:37:45 +0800 >>>>>> Jike Song wrote: >>>>>> >>>>>>> On 10/11/2016 05:47 PM, Paolo Bonzini wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 11/10/2016 11:21, Xiao Guangrong wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/11/2016 04:54 PM, Paolo Bonzini wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11/10/2016 04:39, Xiao Guangrong wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/11/2016 02:32 AM, Paolo Bonzini wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 10/10/2016 20:01, Neo Jia wrote: >>>>>>>>>>>>>> Hi Neo, >>>>>>>>>>>>>> >>>>>>>>>>>>>> AFAIK this is needed because KVMGT doesn't paravirtualize the PPGTT, >>>>>>>>>>>>>> while nVidia does. >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Paolo and Xiaoguang, >>>>>>>>>>>>> >>>>>>>>>>>>> I am just wondering how device driver can register a notifier so he >>>>>>>>>>>>> can be >>>>>>>>>>>>> notified for write-protected pages when writes are happening. >>>>>>>>>>>> >>>>>>>>>>>> It can't yet, but the API is ready for that. kvm_vfio_set_group is >>>>>>>>>>>> currently where a struct kvm_device* and struct vfio_group* touch. >>>>>>>>>>>> Given >>>>>>>>>>>> a struct kvm_device*, dev->kvm provides the struct kvm to be passed to >>>>>>>>>>>> kvm_page_track_register_notifier. So I guess you could add a callback >>>>>>>>>>>> that passes the struct kvm_device* to the mdev device. >>>>>>>>>>>> >>>>>>>>>>>> Xiaoguang and Guangrong, what were your plans? We discussed it briefly >>>>>>>>>>>> at KVM Forum but I don't remember the details. >>>>>>>>>>> >>>>>>>>>>> Your suggestion was that pass kvm fd to KVMGT via VFIO, so that we can >>>>>>>>>>> figure out the kvm instance based on the fd. >>>>>>>>>>> >>>>>>>>>>> We got a new idea, how about search the kvm instance by mm_struct, it >>>>>>>>>>> can work as KVMGT is running in the vcpu context and it is much more >>>>>>>>>>> straightforward. >>>>>>>>>> >>>>>>>>>> Perhaps I didn't understand your suggestion, but the same mm_struct can >>>>>>>>>> have more than 1 struct kvm so I'm not sure that it can work. >>>>>>>>> >>>>>>>>> vcpu->pid is valid during vcpu running so that it can be used to figure >>>>>>>>> out which kvm instance owns the vcpu whose pid is the one as current >>>>>>>>> thread, i think it can work. :) >>>>>>>> >>>>>>>> No, don't do that. There's no reason for a thread to run a single VCPU, >>>>>>>> and if you can have multiple VCPUs you can also have multiple VCPUs from >>>>>>>> multiple VMs. >>>>>>>> >>>>>>>> Passing file descriptors around are the right way to connect subsystems. >>>>>>> >>>>>>> [CC Alex, Kevin and Qemu-devel] >>>>>>> >>>>>>> Hi Paolo & Alex, >>>>>>> >>>>>>> IIUC, passing file descriptors means touching QEMU and the UAPI between >>>>>>> QEMU and VFIO. Would you guys have a look at below draft patch? If it's >>>>>>> on the correct direction, I'll send the split ones. Thanks! >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> Jike >>>>>>> >>>>>>> >>>>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c >>>>>>> index bec694c..f715d37 100644 >>>>>>> --- a/hw/vfio/pci-quirks.c >>>>>>> +++ b/hw/vfio/pci-quirks.c >>>>>>> @@ -10,12 +10,14 @@ >>>>>>> * the COPYING file in the top-level directory. >>>>>>> */ >>>>>>> >>>>>>> +#include >>>>>>> #include "qemu/osdep.h" >>>>>>> #include "qemu/error-report.h" >>>>>>> #include "qemu/range.h" >>>>>>> #include "qapi/error.h" >>>>>>> #include "hw/nvram/fw_cfg.h" >>>>>>> #include "pci.h" >>>>>>> +#include "sysemu/kvm.h" >>>>>>> #include "trace.h" >>>>>>> >>>>>>> /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */ >>>>>>> @@ -1844,3 +1846,15 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev) >>>>>>> break; >>>>>>> } >>>>>>> } >>>>>>> + >>>>>>> +void vfio_quirk_kvmgt(VFIOPCIDevice *vdev) >>>>>>> +{ >>>>>>> + int vmfd; >>>>>>> + >>>>>>> + if (!kvm_enabled() || !vdev->kvmgt) >>>>>>> + return; >>>>>>> + >>>>>>> + /* Tell the device what KVM it attached */ >>>>>>> + vmfd = kvm_get_vmfd(kvm_state); >>>>>>> + ioctl(vdev->vbasedev.fd, VFIO_SET_KVMFD, vmfd); >>>>>>> +} >>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >>>>>>> index a5a620a..8732552 100644 >>>>>>> --- a/hw/vfio/pci.c >>>>>>> +++ b/hw/vfio/pci.c >>>>>>> @@ -2561,6 +2561,8 @@ static int vfio_initfn(PCIDevice *pdev) >>>>>>> return ret; >>>>>>> } >>>>>>> >>>>>>> + vfio_quirk_kvmgt(vdev); >>>>>>> + >>>>>>> /* Get a copy of config space */ >>>>>>> ret = pread(vdev->vbasedev.fd, vdev->pdev.config, >>>>>>> MIN(pci_config_size(&vdev->pdev), vdev->config_size), >>>>>>> @@ -2832,6 +2834,7 @@ static Property vfio_pci_dev_properties[] = { >>>>>>> DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice, >>>>>>> sub_device_id, PCI_ANY_ID), >>>>>>> DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0), >>>>>>> + DEFINE_PROP_BOOL("kvmgt", VFIOPCIDevice, kvmgt, false), >>>>>> >>>>>> Just a side note, device options are a headache, users are prone to get >>>>>> them wrong and minimally it requires an entire round to get libvirt >>>>>> support. We should be able to detect from the device or vfio API >>>>>> whether such a call is required. Obviously if we can use the existing >>>>>> kvm-vfio device, that's the better option anyway. Thanks, >>>>> >>>>> Also, vfio devices currently have no hard dependencies on KVM, if kvmgt >>>>> does, it needs to produce a device failure when unavailable. Thanks, >>>> >>>> Also, I would like to see this as an generic feature instead of >>>> kvmgt specific interface, so we don't have to add new options to QEMU and it is >>>> up to the vendor driver to proceed with or without it. >>> >>> In general this should be decided by lack of some required feature >>> exclusively provided by KVM. I would not want to add a generic opt-out >>> for mdev vendor drivers to decide that they arbitrarily want to disable >>> that path. Thanks, >> >> IIUC, you are suggesting that this path should be controlled by KVM feature cap >> and it will be accessible to VFIO users when such checking is satisfied. > > Maybe we're getting too loose with our pronouns here, I'm starting to > lose track of what "this" is referring to. I agree that there's no > reason for the ioctl, as proposed to be kvmgt specific. I would hope > that going through the kvm-vfio device to create that linkage would > eliminate that, but we'll need to see what Jike can come up with to > plumb between KVM and vfio. Vendor drivers can implement their own > ioctls, now that we pass them through the mdev layer, but someone needs > to call those ioctls. Ideally we want something programmatic to > trigger that, without requiring a user to pass an extra device > parameter. Additionally, if there is any hope of making use of the > device with userspace drivers other than QEMU, hard dependencies on KVM > should be avoided. Thanks, > > Alex > Thanks for the advice, so I cooked another patch for your comments. Basically a 'void *usrdata' is added to vfio_group, external users can set it (kvm) or get it (kvm or other users like kvmgt). BTW, in device-model, the open method will return failure to vfio-mdev in case that such kvm information is not available. --- Thanks, Jike diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index d1d70e0..6b8d1d2 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -86,6 +86,7 @@ struct vfio_group { struct mutex unbound_lock; atomic_t opened; bool noiommu; + void *usrdata; }; struct vfio_device { @@ -447,14 +448,13 @@ static struct vfio_group *vfio_group_try_get(struct vfio_group *group) } static -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group) +struct vfio_group *__vfio_group_get_from_iommu(struct iommu_group *iommu_group) { struct vfio_group *group; mutex_lock(&vfio.group_lock); list_for_each_entry(group, &vfio.group_list, vfio_next) { if (group->iommu_group == iommu_group) { - vfio_group_get(group); mutex_unlock(&vfio.group_lock); return group; } @@ -464,6 +464,17 @@ struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group) return NULL; } +static +struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group) +{ + struct vfio_group *group = __vfio_group_get_from_iommu(iommu_group); + if (!group) + return NULL; + + vfio_group_get(group); + return group; +} + static struct vfio_group *vfio_group_get_from_minor(int minor) { struct vfio_group *group; @@ -1728,6 +1739,31 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg) } EXPORT_SYMBOL_GPL(vfio_external_check_extension); +void vfio_group_set_usrdata(struct vfio_group *group, void *data) +{ + group->usrdata = data; +} +EXPORT_SYMBOL_GPL(vfio_group_set_usrdata); + +void *vfio_group_get_usrdata(struct vfio_group *group) +{ + return group->usrdata; +} +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata); + +void *vfio_group_get_usrdata_by_device(struct device *dev) +{ + struct vfio_group *vfio_group; + + vfio_group = __vfio_group_get_from_iommu(dev->iommu_group); + if (!vfio_group) + return NULL; + + return vfio_group_get_usrdata(vfio_group); +} +EXPORT_SYMBOL_GPL(vfio_group_get_usrdata_by_device); + + /** * Sub-module support */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 0ecae0b..712588f 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -91,6 +91,10 @@ extern void vfio_unregister_iommu_driver( extern int vfio_external_user_iommu_id(struct vfio_group *group); extern long vfio_external_check_extension(struct vfio_group *group, unsigned long arg); +extern void vfio_group_set_usrdata(struct vfio_group *group, void *data); +extern void *vfio_group_get_usrdata(struct vfio_group *group); +extern void *vfio_group_get_usrdata_by_device(struct device *dev); + /* * Sub-module helpers diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index 1dd087d..e00d401 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -60,6 +60,20 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group) symbol_put(vfio_group_put_external_user); } +static void kvm_vfio_group_set_kvm(struct vfio_group *group, void *kvm) +{ + void (*fn)(struct vfio_group *, void *); + + fn = symbol_get(vfio_group_set_usrdata); + if (!fn) + return; + + fn(group, kvm); + kvm_get_kvm(kvm); + + symbol_put(vfio_group_set_usrdata); +} + static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group) { long (*fn)(struct vfio_group *, unsigned long); @@ -161,6 +175,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg) kvm_vfio_update_coherency(dev); + kvm_vfio_group_set_kvm(vfio_group, dev->kvm); + return 0; case KVM_DEV_VFIO_GROUP_DEL: @@ -200,6 +216,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg) kvm_vfio_update_coherency(dev); + kvm_put_kvm(dev->kvm); + return ret; }