Message ID | 20090716151945.29318.10882.stgit@dev.haskins.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Sam Ravnborg wrote: >> diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild >> index ad8ec35..9f50cc3 100644 >> --- a/arch/x86/Kbuild >> +++ b/arch/x86/Kbuild >> @@ -1,5 +1,7 @@ >> >> -obj-$(CONFIG_KVM) += kvm/ >> +ifdef CONFIG_KVM >> +obj-y += kvm/ >> +endif >> > > What was wrong with the old version? > If this is because xinterface.o is builtin to the kernel Bingo > then we should > always visit kvm/ and use: > > obj-y += kvm/ > > unconditionally. > > Ok, easy enough. Thanks Sam, -Greg
Gregory Haskins wrote: > +/* > + * ------------ > + * XINTERFACE (External Interface) > + * ------------- > + */ > + > +static struct kvm * > +intf_to_kvm(struct kvm_xinterface *intf) > +{ > + return container_of(intf, struct kvm, xinterface); > +} > + > +static unsigned long > +xinterface_gpa_to_hva(struct kvm_xinterface *intf, unsigned long gpa) > +{ > + struct kvm *kvm = intf_to_kvm(intf); > + unsigned long addr; > + > + addr = gfn_to_hva(kvm, gpa >> PAGE_SHIFT); > + if (kvm_is_error_hva(addr)) > + return 0; > + > + return addr + offset_in_page(gpa); > +} > + > +static struct page * > +xinterface_gpa_to_page(struct kvm_xinterface *intf, unsigned long gpa) > +{ > + struct kvm *kvm = intf_to_kvm(intf); > + struct page *page; > + > + page = gfn_to_page(kvm, gpa >> PAGE_SHIFT); > + if (page == bad_page) > + return ERR_PTR(-EINVAL); > + > + return page; > +} > + > +static void > +xinterface_release(struct kvm_xinterface *intf) > +{ > + struct kvm *kvm = intf_to_kvm(intf); > + > + kvm_put_kvm(kvm); > +} > + > +struct kvm_xinterface_ops _kvm_xinterface_ops = { > + .gpa_to_hva = xinterface_gpa_to_hva, > + .gpa_to_page = xinterface_gpa_to_page, > + .release = xinterface_release, > +}; > How do you deal with locking? The mappings (gpa_to_page) are not fixed. They can and do change very often. The interface doesn't seem to attempt to cope with this. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Anthony Liguori wrote: > Gregory Haskins wrote: >> +/* >> + * ------------ >> + * XINTERFACE (External Interface) >> + * ------------- >> + */ >> + >> +static struct kvm * >> +intf_to_kvm(struct kvm_xinterface *intf) >> +{ >> + return container_of(intf, struct kvm, xinterface); >> +} >> + >> +static unsigned long >> +xinterface_gpa_to_hva(struct kvm_xinterface *intf, unsigned long gpa) >> +{ >> + struct kvm *kvm = intf_to_kvm(intf); >> + unsigned long addr; >> + >> + addr = gfn_to_hva(kvm, gpa >> PAGE_SHIFT); >> + if (kvm_is_error_hva(addr)) >> + return 0; >> + >> + return addr + offset_in_page(gpa); >> +} >> + >> +static struct page * >> +xinterface_gpa_to_page(struct kvm_xinterface *intf, unsigned long gpa) >> +{ >> + struct kvm *kvm = intf_to_kvm(intf); >> + struct page *page; >> + >> + page = gfn_to_page(kvm, gpa >> PAGE_SHIFT); >> + if (page == bad_page) >> + return ERR_PTR(-EINVAL); >> + >> + return page; >> +} >> + >> +static void >> +xinterface_release(struct kvm_xinterface *intf) >> +{ >> + struct kvm *kvm = intf_to_kvm(intf); >> + >> + kvm_put_kvm(kvm); >> +} >> + >> +struct kvm_xinterface_ops _kvm_xinterface_ops = { >> + .gpa_to_hva = xinterface_gpa_to_hva, >> + .gpa_to_page = xinterface_gpa_to_page, >> + .release = xinterface_release, >> +}; >> > > How do you deal with locking? > > The mappings (gpa_to_page) are not fixed. They can and do change very > often. The interface doesn't seem to attempt to cope with this. Hmm, well I used to need gpa_to_page() in the older version of vbus that did explicit hypercalls, but I don't actually use it today in v4. I left it in because it seems like it might be useful in general (perhaps for Michael). However, if this ends up being a real problem I suppose we can just drop that vtable entry and let the guy that actually needs it deal with the issues ;) I really only require gpa_to_hva() in the short-term. That said, I think the assumption that was made when I was using this was that a proper ref for the page was acquired by the gfn_to_page() and dropped by the caller. This was always used in the context of a hypercall/vmexit so presumably the gpa should be considered stable across that call. Is that not true? Regards, -Greg
Gregory Haskins wrote: > That said, I think the assumption that was made when I was using this > was that a proper ref for the page was acquired by the gfn_to_page() and > dropped by the caller. This was always used in the context of a > hypercall/vmexit so presumably the gpa should be considered stable > across that call. Is that not true? > If you're in kvm.ko, then yes, that's a safe assumption to make because the guest VCPU cannot run while you are running. But you're opening this interface to any caller so the VCPU is likely to be running while someone calls this function > Regards, > -Greg > > >
On Thursday 16 July 2009, Gregory Haskins wrote: > Background: The original vbus code was tightly integrated with kvm.ko. Avi > suggested that we abstract the interfaces such that it could live outside > of kvm. The code is still highly kvm-specific, you would not be able to use it with another hypervisor like lguest or vmware player, right? > Example usage: QEMU instantiates a guest, and an external module "foo" > that desires the ability to interface with the guest (say via > open("/dev/foo")). QEMU may then issue a KVM_GET_VMID operation to acquire > the u64-based vmid, and pass it to ioctl(foofd, FOO_SET_VMID, &vmid). > Upon receipt, the foo module can issue kvm_xinterface_find(vmid) to acquire > the proper context. Internally, the struct kvm* and associated > struct module* will remain pinned at least until the foo module calls > kvm_xinterface_put(). Your approach allows passing the vmid from a process that does not own the kvm context. This looks like an intentional feature, but I can't see what this gains us. > As a final measure, we link the xinterface code statically > into the kernel so that callers are guaranteed a stable interface to > kvm_xinterface_find() without implicitly pinning kvm.ko or racing against > it. I also don't understand this. Are you worried about driver modules breaking when an externally-compiled kvm.ko is loaded? The same could be achieved by defining your data structures kvm_xinterface_ops and kvm_xinterface in a kernel header that is not shipped by kvm-kmod but always taken from the kernel headers. It does not matter if the entry points are build into the kernel or exported from a kvm.ko as long as you define a fixed ABI. What is the problem with pinning kvm.ko from another module using its features? Can't you simply provide a function call to lookup the kvm context pointer from the file descriptor to achieve the same functionality? To take that thought further, maybe the dependency can be turned around: If every user (pci-uio, virtio-net, ...) exposes a file descriptor based interface to user space, you can have a kvm ioctl to register the object behind that file descriptor with an existing kvm context to associate it with a guest. That would nicely solve the life time questions by pinning the external object for the life time of the kvm context rather than the other way round, and it would be completely separate from kvm in that each such object could be used by other subsystems independent of kvm. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arnd Bergmann wrote: > On Thursday 16 July 2009, Gregory Haskins wrote: > >> Background: The original vbus code was tightly integrated with kvm.ko. Avi >> suggested that we abstract the interfaces such that it could live outside >> of kvm. >> > > The code is still highly kvm-specific, you would not be able to use > it with another hypervisor like lguest or vmware player, right? > In its current form, it is kvm specific primarily because it was not a explicit design constraint of mine to support others. However, there is no reason why we could not generalize the interface if that is a desirable trait. Ultimately I would like to have my project support other things like lguest, so this is not a bad idea. Otherwise, someone will invariably be proposing an "lguest_xinterface" next ;) > >> Example usage: QEMU instantiates a guest, and an external module "foo" >> that desires the ability to interface with the guest (say via >> open("/dev/foo")). QEMU may then issue a KVM_GET_VMID operation to acquire >> the u64-based vmid, and pass it to ioctl(foofd, FOO_SET_VMID, &vmid). >> Upon receipt, the foo module can issue kvm_xinterface_find(vmid) to acquire >> the proper context. Internally, the struct kvm* and associated >> struct module* will remain pinned at least until the foo module calls >> kvm_xinterface_put(). >> > > Your approach allows passing the vmid from a process that does > not own the kvm context. This looks like an intentional feature, > but I can't see what this gains us. This work is towards the implementation of lockless-shared-memory subsystems, which includes ring constructs such as virtio-ring, VJ-netchannels, and vbus-ioq. I find that these designs perform optimally when you allow two distinct contexts (producer + consumer) to process the ring concurrently, which implies a disparate context from the guest in question. Note that the infrastructure we are discussing does not impose a requirement for the contexts to be unique: it will work equally well from the same or a different process. For an example of this "producer/consumer" dynamic over shared memory in action, please refer to my previous posting re: "vbus" http://lkml.org/lkml/2009/4/21/408 I am working on v4 now, and this patch is part of the required support. > > > >> As a final measure, we link the xinterface code statically >> into the kernel so that callers are guaranteed a stable interface to >> kvm_xinterface_find() without implicitly pinning kvm.ko or racing against >> it. >> > > I also don't understand this. Are you worried about driver modules > breaking when an externally-compiled kvm.ko is loaded? The same could > be achieved by defining your data structures kvm_xinterface_ops and > kvm_xinterface in a kernel header that is not shipped by kvm-kmod but > always taken from the kernel headers. > It does not matter if the entry points are build into the kernel or > exported from a kvm.ko as long as you define a fixed ABI. > > What is the problem with pinning kvm.ko from another module using > its features? > Well, there is always the chance that I am doing something dumb or missing the point ;) But my rationale was as follows: The problem is that kvm is a little weird in the module ref department: If I were to do it the standard way and link xinterface.o into kvm.o (and have any xinterface_find() users do a tristate+"depends on KVM"), this would work as I believe you are suggesting. That is to say: whenever I loaded "foo.ko", insmod would automatically up the reference of kvm.ko. The issue is that is not quite what I really want ;) I want to hold the reference to the entire .text chain, which includes kvm.ko + [kvm-intel.ko | kvm-amd.ko]. If you look carefully, the ops->owner that is assigned is actually the arch.ko. In addition, I wanted the kvm.ko lifetime to be associated with the lifetime of its contexts actually in use, not the lifetime of its installed dependencies. Therefore, I did it this way. But to your point, I suppose the dependency lifetime thing is not a huge deal. I could therefore modify the patch to simply link xinterface.o into kvm.ko and still achieve the primary objective by retaining ops->owner. Note that if we are going to generalize the interface to support other guests as you may have been suggesting above, it should probably stay statically linked (and perhaps live in ./lib or something) > Can't you simply provide a function call to lookup the kvm context > pointer from the file descriptor to achieve the same functionality? > You mean so have: struct kvm_xinterface *kvm_xinterface_find(int fd) (instead of creating our own vmid namespace) ? Or are you suggesting using fget() instead of kvm_xinterface_find()? > To take that thought further, maybe the dependency can be turned > around: If every user (pci-uio, virtio-net, ...) exposes a file > descriptor based interface to user space, you can have a kvm > ioctl to register the object behind that file descriptor with > an existing kvm context to associate it with a guest. FWIW: We do that already for the signaling path (see irqfd and ioeventfd in kvm.git). Each side exposes interfaces that accept eventfds, and the fds are passed around that way. However, for the functions we are talking about now, I don't think it really works well to go the other way. I could be misunderstanding what you mean, though. What I mean is that it's KVM that is providing a service to the other modules (in this case, translating memory pointers), so what would an inverse interface look like for that? And even if you came up with one, it seems to me that its just "6 of one, half-dozen of the other" kind of thing. > That would > nicely solve the life time questions by pinning the external > object for the life time of the kvm context I suppose that is nice, but note that in practice both objects (the kvm-guest and the "foo" module) are managed by the same entity (i.e. QEMU) and therefore share the same approximate lifetime. Kind Regards, -Greg
Gregory Haskins wrote: > Note that if we are going to generalize the interface to support other > guests as you may have been suggesting above, it should probably stay > statically linked (and perhaps live in ./lib or something) > More specifically, it can no longer live in kvm.ko. I guess it technically doesn't have to be statically linked, though (e.g. xinterface.ko is fine, too). Regards, -Greg
Gregory Haskins wrote: > Gregory Haskins wrote: >> Note that if we are going to generalize the interface to support other >> guests as you may have been suggesting above, it should probably stay >> statically linked (and perhaps live in ./lib or something) >> > > More specifically, it can no longer live in kvm.ko. I guess it > technically doesn't have to be statically linked, though (e.g. > xinterface.ko is fine, too). Speaking as someone who knows nothing about this, if this is going to be a module and visible in places like lsmod, could you name it something else? I see "xinterface.ko" and the first thing in my head is "something to do with the X Window System." How about kvm_xinterface or vguest_xinterface?
On Thursday 16 July 2009, Gregory Haskins wrote: > Arnd Bergmann wrote: > > Your approach allows passing the vmid from a process that does > > not own the kvm context. This looks like an intentional feature, > > but I can't see what this gains us. > > This work is towards the implementation of lockless-shared-memory > subsystems, which includes ring constructs such as virtio-ring, > VJ-netchannels, and vbus-ioq. I find that these designs perform > optimally when you allow two distinct contexts (producer + consumer) to > process the ring concurrently, which implies a disparate context from > the guest in question. Note that the infrastructure we are discussing > does not impose a requirement for the contexts to be unique: it will > work equally well from the same or a different process. > > For an example of this "producer/consumer" dynamic over shared memory in > action, please refer to my previous posting re: "vbus" > > http://lkml.org/lkml/2009/4/21/408 > > I am working on v4 now, and this patch is part of the required support. Ok. I can see how your approach gives you more flexibility in this regard, but it does not seem critical. > But to your point, I suppose the dependency lifetime thing is not a huge > deal. I could therefore modify the patch to simply link xinterface.o > into kvm.ko and still achieve the primary objective by retaining ops->owner. Right. And even if it's a separate module, holding an extra reference on kvm.ko will not cause any harm. > > Can't you simply provide a function call to lookup the kvm context > > pointer from the file descriptor to achieve the same functionality? > > > You mean so have: struct kvm_xinterface *kvm_xinterface_find(int fd) > > (instead of creating our own vmid namespace) ? > > Or are you suggesting using fget() instead of kvm_xinterface_find()? I guess they are roughly equivalent. Either you pass a fd to kvm_xinterface_find, or pass the struct file pointer you get from fget. The latter is probably more convenient because it allows you to pass around the struct file in kernel contexts that don't have that file descriptor open. > > To take that thought further, maybe the dependency can be turned > > around: If every user (pci-uio, virtio-net, ...) exposes a file > > descriptor based interface to user space, you can have a kvm > > ioctl to register the object behind that file descriptor with > > an existing kvm context to associate it with a guest. > > FWIW: We do that already for the signaling path (see irqfd and ioeventfd > in kvm.git). Each side exposes interfaces that accept eventfds, and the > fds are passed around that way. > > However, for the functions we are talking about now, I don't think it > really works well to go the other way. I could be misunderstanding what > you mean, though. What I mean is that it's KVM that is providing a > service to the other modules (in this case, translating memory > pointers), so what would an inverse interface look like for that? And > even if you came up with one, it seems to me that its just "6 of one, > half-dozen of the other" kind of thing. I mean something like int kvm_ioctl_register_service(struct file *filp, unsigned long arg) { struct file *service = fget(arg); struct kvm *kvm = filp->private_data; if (!service->f_ops->new_xinterface_register) return -EINVAL; return service->f_ops->new_xinterface_register(service, (void*)kvm, &kvm_xinterface_ops); } This would assume that we define a new file_operation specifically for this, which would simplify the code, but there are other ways to achieve the same. It would even mean that you don't need any static code as an interface layer. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Zan Lynx wrote: > Gregory Haskins wrote: >> Gregory Haskins wrote: >>> Note that if we are going to generalize the interface to support other >>> guests as you may have been suggesting above, it should probably stay >>> statically linked (and perhaps live in ./lib or something) >>> >> >> More specifically, it can no longer live in kvm.ko. I guess it >> technically doesn't have to be statically linked, though (e.g. >> xinterface.ko is fine, too). > > Speaking as someone who knows nothing about this, if this is going to > be a module and visible in places like lsmod, could you name it > something else? > > I see "xinterface.ko" and the first thing in my head is "something to > do with the X Window System." > > How about kvm_xinterface or vguest_xinterface? > Heh, I totally agree. That was just pseudo-naming, anyway. ;) If it was going to be generic, I would do something like "virt-xinterface.ko". Kind Regards, -Greg
Arnd Bergmann wrote: > On Thursday 16 July 2009, Gregory Haskins wrote: > >> Arnd Bergmann wrote: >> > > >>> Your approach allows passing the vmid from a process that does >>> not own the kvm context. This looks like an intentional feature, >>> but I can't see what this gains us. >>> >> This work is towards the implementation of lockless-shared-memory >> subsystems, which includes ring constructs such as virtio-ring, >> VJ-netchannels, and vbus-ioq. I find that these designs perform >> optimally when you allow two distinct contexts (producer + consumer) to >> process the ring concurrently, which implies a disparate context from >> the guest in question. Note that the infrastructure we are discussing >> does not impose a requirement for the contexts to be unique: it will >> work equally well from the same or a different process. >> >> For an example of this "producer/consumer" dynamic over shared memory in >> action, please refer to my previous posting re: "vbus" >> >> http://lkml.org/lkml/2009/4/21/408 >> >> I am working on v4 now, and this patch is part of the required support. >> > > Ok. I can see how your approach gives you more flexibility in this > regard, but it does not seem critical. > Yeah, and I think I missed your point the first time around. I thought you were asking about how the resulting memory API was used, but now I see you were referring to the scope of the vmid namespace (vs file-descriptors). On that topic, yes the vmid implementation here differs from file-descriptors in the sense that the namespace is global to the kernel, instead of per-process. And yes, I also agree with you that this is not critical to the design, per se. For instance, the use cases I have in mind would work fine with the per-task fd namespace as well since I will be within the same QEMU instance. So ultimately, I think that means the fd idea you presented would work equally well. > >> But to your point, I suppose the dependency lifetime thing is not a huge >> deal. I could therefore modify the patch to simply link xinterface.o >> into kvm.ko and still achieve the primary objective by retaining ops->owner. >> > > Right. And even if it's a separate module, holding an extra reference > on kvm.ko will not cause any harm. > Yep, agreed. > >>> Can't you simply provide a function call to lookup the kvm context >>> pointer from the file descriptor to achieve the same functionality? >>> >>> >> You mean so have: struct kvm_xinterface *kvm_xinterface_find(int fd) >> >> (instead of creating our own vmid namespace) ? >> >> Or are you suggesting using fget() instead of kvm_xinterface_find()? >> > > I guess they are roughly equivalent. Either you pass a fd to > kvm_xinterface_find, or pass the struct file pointer you get > from fget. The latter is probably more convenient because it > allows you to pass around the struct file in kernel contexts > that don't have that file descriptor open. > Interesting. I like it. > >>> To take that thought further, maybe the dependency can be turned >>> around: If every user (pci-uio, virtio-net, ...) exposes a file >>> descriptor based interface to user space, you can have a kvm >>> ioctl to register the object behind that file descriptor with >>> an existing kvm context to associate it with a guest. >>> >> FWIW: We do that already for the signaling path (see irqfd and ioeventfd >> in kvm.git). Each side exposes interfaces that accept eventfds, and the >> fds are passed around that way. >> >> However, for the functions we are talking about now, I don't think it >> really works well to go the other way. I could be misunderstanding what >> you mean, though. What I mean is that it's KVM that is providing a >> service to the other modules (in this case, translating memory >> pointers), so what would an inverse interface look like for that? And >> even if you came up with one, it seems to me that its just "6 of one, >> half-dozen of the other" kind of thing. >> > > I mean something like > > int kvm_ioctl_register_service(struct file *filp, unsigned long arg) > { > struct file *service = fget(arg); > struct kvm *kvm = filp->private_data; > > if (!service->f_ops->new_xinterface_register) > return -EINVAL; > > return service->f_ops->new_xinterface_register(service, (void*)kvm, > &kvm_xinterface_ops); > } > > This would assume that we define a new file_operation specifically for this, > which would simplify the code, but there are other ways to achieve the same. > It would even mean that you don't need any static code as an interface layer. > I suspect I will have a much harder time getting that type of modification accepted in mainline ;) I think I will go with your suggestion for the fd/file interface, tho. I can get rid of the rbtree logic and re-use the lookup via fget, which is a nice win. Thanks Arnd! -Greg
diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index ad8ec35..9f50cc3 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -1,5 +1,7 @@ -obj-$(CONFIG_KVM) += kvm/ +ifdef CONFIG_KVM +obj-y += kvm/ +endif # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index afaaa76..80d951d 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -17,3 +17,7 @@ kvm-amd-y += svm.o obj-$(CONFIG_KVM) += kvm.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o obj-$(CONFIG_KVM_AMD) += kvm-amd.o + +ifdef CONFIG_KVM +obj-y += $(addprefix ../../../virt/kvm/, xinterface.o) +endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 48567fa..5725527 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1208,6 +1208,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_IOEVENTFD: case KVM_CAP_PIT2: case KVM_CAP_PIT_STATE2: + case KVM_CAP_XINTERFACE: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 230a91a..7790894 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -435,6 +435,7 @@ struct kvm_ioeventfd { #define KVM_CAP_PIT_STATE2 35 #endif #define KVM_CAP_IOEVENTFD 36 +#define KVM_CAP_XINTERFACE 37 #ifdef KVM_CAP_IRQ_ROUTING @@ -544,6 +545,7 @@ struct kvm_irqfd { #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) #define KVM_SET_BOOT_CPU_ID _IO(KVMIO, 0x78) #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) +#define KVM_GET_VMID _IOR(KVMIO, 0x7a, __u64) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f244f11..0ee95df 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -23,6 +23,7 @@ #include <linux/kvm_para.h> #include <linux/kvm_types.h> +#include <linux/kvm_xinterface.h> #include <asm/kvm_host.h> @@ -175,6 +176,7 @@ struct kvm { unsigned long mmu_notifier_seq; long mmu_notifier_count; #endif + struct kvm_xinterface xinterface; /* interface for external modules */ }; /* The guest did something we don't support. */ @@ -199,6 +201,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) idx < atomic_read(&kvm->online_vcpus) && vcpup; \ vcpup = kvm_get_vcpu(kvm, ++idx)) +void kvm_xinterface_register(struct kvm_xinterface *intf, + const struct kvm_xinterface_ops *ops); +void kvm_xinterface_unregister(struct kvm_xinterface *intf); + int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); diff --git a/include/linux/kvm_xinterface.h b/include/linux/kvm_xinterface.h new file mode 100644 index 0000000..858acfd --- /dev/null +++ b/include/linux/kvm_xinterface.h @@ -0,0 +1,58 @@ +#ifndef __KVM_XINTERFACE_H +#define __KVM_XINTERFACE_H + +/* + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include <linux/kref.h> +#include <linux/module.h> +#include <linux/rbtree.h> + +struct kvm_xinterface; + +struct kvm_xinterface_ops { + struct module *owner; + + unsigned long (*gpa_to_hva)(struct kvm_xinterface *, unsigned long gpa); + struct page* (*gpa_to_page)(struct kvm_xinterface *, unsigned long gpa); + void (*release)(struct kvm_xinterface *); +}; + +struct kvm_xinterface { + struct kref kref; + const struct kvm_xinterface_ops *ops; + struct rb_node node; +}; + +static inline void +kvm_xinterface_get(struct kvm_xinterface *intf) +{ + kref_get(&intf->kref); +} + +static inline void +_kvm_xinterface_release(struct kref *kref) +{ + struct kvm_xinterface *intf; + struct module *owner; + + intf = container_of(kref, struct kvm_xinterface, kref); + + owner = intf->ops->owner; + rmb(); + + intf->ops->release(intf); + module_put(owner); +} + +static inline void +kvm_xinterface_put(struct kvm_xinterface *intf) +{ + kref_put(&intf->kref, _kvm_xinterface_release); +} + +struct kvm_xinterface *kvm_xinterface_find(long vmid); + +#endif /* __KVM_XINTERFACE_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7cd1c10..058cb6c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -935,6 +935,58 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { }; #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ +/* + * ------------ + * XINTERFACE (External Interface) + * ------------- + */ + +static struct kvm * +intf_to_kvm(struct kvm_xinterface *intf) +{ + return container_of(intf, struct kvm, xinterface); +} + +static unsigned long +xinterface_gpa_to_hva(struct kvm_xinterface *intf, unsigned long gpa) +{ + struct kvm *kvm = intf_to_kvm(intf); + unsigned long addr; + + addr = gfn_to_hva(kvm, gpa >> PAGE_SHIFT); + if (kvm_is_error_hva(addr)) + return 0; + + return addr + offset_in_page(gpa); +} + +static struct page * +xinterface_gpa_to_page(struct kvm_xinterface *intf, unsigned long gpa) +{ + struct kvm *kvm = intf_to_kvm(intf); + struct page *page; + + page = gfn_to_page(kvm, gpa >> PAGE_SHIFT); + if (page == bad_page) + return ERR_PTR(-EINVAL); + + return page; +} + +static void +xinterface_release(struct kvm_xinterface *intf) +{ + struct kvm *kvm = intf_to_kvm(intf); + + kvm_put_kvm(kvm); +} + +struct kvm_xinterface_ops _kvm_xinterface_ops = { + .gpa_to_hva = xinterface_gpa_to_hva, + .gpa_to_page = xinterface_gpa_to_page, + .release = xinterface_release, +}; + static struct kvm *kvm_create_vm(void) { struct kvm *kvm = kvm_arch_create_vm(); @@ -991,6 +1043,8 @@ static struct kvm *kvm_create_vm(void) #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET kvm_coalesced_mmio_init(kvm); #endif + kvm_get_kvm(kvm); /* the xinterface needs another ref */ + kvm_xinterface_register(&kvm->xinterface, &_kvm_xinterface_ops); out: return kvm; } @@ -1073,6 +1127,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp) struct kvm *kvm = filp->private_data; kvm_irqfd_release(kvm); + kvm_xinterface_unregister(&kvm->xinterface); kvm_put_kvm(kvm); return 0; @@ -2289,6 +2344,22 @@ static long kvm_vm_ioctl(struct file *filp, mutex_unlock(&kvm->lock); break; #endif + case KVM_GET_VMID: { + u64 vmid = (u64)&kvm->xinterface.node; + + /* + * our vmid is simply the address of our rb_node in the + * registry, which is guaranteed unique. This also simplifies + * the registry map-lookup since we dont need to do a deep + * decode on the pointer to figure out if we have a match + */ + + r = -EFAULT; + if (copy_to_user(argp, &vmid, (sizeof vmid))) + goto out; + r = 0; + break; + } default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); } @@ -2761,6 +2832,7 @@ int kvm_init(void *opaque, unsigned int vcpu_size, kvm_chardev_ops.owner = module; kvm_vm_fops.owner = module; kvm_vcpu_fops.owner = module; + _kvm_xinterface_ops.owner = module; r = misc_register(&kvm_dev); if (r) { diff --git a/virt/kvm/xinterface.c b/virt/kvm/xinterface.c new file mode 100644 index 0000000..fe9a214 --- /dev/null +++ b/virt/kvm/xinterface.c @@ -0,0 +1,147 @@ +/* + * KVM module interface - Allows external modules to interface with a guest + * + * This code is designed to be statically linked to the kernel, regardless + * of the configuration of kvm.ko. This allows the kvm_xinterface_find + * routine to be stably exported without dependencies on, or race conditions + * against acquiring the kvm.ko module itself. + * + * Copyright 2009 Novell. All Rights Reserved. + * + * Author: + * Gregory Haskins <ghaskins@novell.com> + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/kvm_host.h> +#include <linux/kvm_xinterface.h> + +struct kvm_registry { + struct mutex lock; + struct rb_root root; +}; + +/* system wide registry of kvm based VMs */ +static struct kvm_registry kvm_registry = { + .lock = __MUTEX_INITIALIZER(kvm_registry.lock), + .root = RB_ROOT, +}; + +static struct kvm_xinterface * +to_intf(struct rb_node *node) +{ + return node ? container_of(node, struct kvm_xinterface, node) : NULL; +} + +struct kvm_xinterface * +kvm_xinterface_find(long vmid) +{ + struct rb_node *node; + struct kvm_xinterface *intf; + + mutex_lock(&kvm_registry.lock); + + node = kvm_registry.root.rb_node; + + while (node) { + long val; + + val = vmid - (long)node; + if (val < 0) + node = node->rb_left; + else if (val > 0) + node = node->rb_right; + else + break; + } + + intf = to_intf(node); + if (intf) + kvm_xinterface_get(intf); + + mutex_unlock(&kvm_registry.lock); + + return intf; +} +EXPORT_SYMBOL_GPL(kvm_xinterface_find); + +/* + * ------------------------------------------ + * register/unregister + * ------------------------------------------ + * + * These functions are private to the API and are only to be called + * by the KVM core + * ------------------------------------------ + */ + +/* caller must hold intf->ops->owner */ +void +kvm_xinterface_register(struct kvm_xinterface *intf, + const struct kvm_xinterface_ops *ops) +{ + struct rb_root *root; + struct rb_node **new, *parent = NULL; + struct rb_node *node; + + memset(intf, 0, sizeof(*intf)); + kref_init(&intf->kref); + intf->ops = ops; + + mutex_lock(&kvm_registry.lock); + + root = &kvm_registry.root; + new = &(root->rb_node); + node = &intf->node; + + /* Figure out where to put new node */ + while (*new) { + long val; + + parent = *new; + + val = node - parent; + if (val < 0) + new = &((*new)->rb_left); + else if (val > 0) + new = &((*new)->rb_right); + else + panic("kvm_xinterface: duplicate entry: %ld\n", val); + } + + /* Add new node and rebalance tree. */ + rb_link_node(node, parent, new); + rb_insert_color(node, root); + + /* released when the last xinterface reference is released */ + __module_get(intf->ops->owner); + + mutex_unlock(&kvm_registry.lock); +} +EXPORT_SYMBOL_GPL(kvm_xinterface_register); + +/* caller must hold intf->ops->owner */ +void +kvm_xinterface_unregister(struct kvm_xinterface *intf) +{ + mutex_lock(&kvm_registry.lock); + rb_erase(&intf->node, &kvm_registry.root); + mutex_unlock(&kvm_registry.lock); + + kvm_xinterface_put(intf); +} +EXPORT_SYMBOL_GPL(kvm_xinterface_unregister);
What: xinterface is a mechanism that allows kernel modules external to the kvm.ko proper to interface with a running guest. It accomplishes this by creating an abstracted interface which does not expose any private details of the guest or its related KVM structures, and provides a mechanism to find and bind to this interface at run-time. This binding mechanism uses a userspace friendly token "u64 vmid" as a handle. This vmid acts similar to a file-descriptor in the sense that it can be extracted from a guest, passed to an end-point of interest, and finally, converted back to a vtable pointer using a stable interface. Why: There are various subsystems that would like to interact with a KVM guest which are ideally suited to exist outside the domain of the kvm.ko core logic. For instance, external pci-passthrough, virtual-bus, and virtio-net modules are currently under development. In order for these modules to successfully interact with the guest, they need, at the very least, various interfaces for signaling IO events, pointer translation, and possibly memory mapping. The signaling case is covered by the recent introduction of the irqfd/ioeventfd mechanisms. This patch provides a mechanism to cover the other cases. Note that today we only expose pointer-translation related functions, but more could be added at a future date as needs arise. Security considerations: This concept is not believed to expose KVM to any kind of additional security risk. The vmid token itself can only be acquired via an open handle to the vmfd (i.e. qemu-kvm), and the interface is only available within the kernel. Therefore the xinterface admission policy is delegated to the kernel/lkm admission policy, which must be assumed secure or the system is already compromised independent of this work. Additionally, the xinterface design is hardened against malformed vmid tokens, as well as race conditions against valid tokens (e.g. guest exiting before the token is redeemed). It is additionally hardened against races in the kvm.ko module itself by acquiring proper module references. As a final measure, we link the xinterface code statically into the kernel so that callers are guaranteed a stable interface to kvm_xinterface_find() without implicitly pinning kvm.ko or racing against it. Example usage: QEMU instantiates a guest, and an external module "foo" that desires the ability to interface with the guest (say via open("/dev/foo")). QEMU may then issue a KVM_GET_VMID operation to acquire the u64-based vmid, and pass it to ioctl(foofd, FOO_SET_VMID, &vmid). Upon receipt, the foo module can issue kvm_xinterface_find(vmid) to acquire the proper context. Internally, the struct kvm* and associated struct module* will remain pinned at least until the foo module calls kvm_xinterface_put(). Signed-off-by: Gregory Haskins <ghaskins@novell.com> --- arch/x86/Kbuild | 4 + arch/x86/kvm/Makefile | 4 + arch/x86/kvm/x86.c | 1 include/linux/kvm.h | 2 + include/linux/kvm_host.h | 6 ++ include/linux/kvm_xinterface.h | 58 ++++++++++++++++ virt/kvm/kvm_main.c | 72 ++++++++++++++++++++ virt/kvm/xinterface.c | 147 ++++++++++++++++++++++++++++++++++++++++ 8 files changed, 293 insertions(+), 1 deletions(-) create mode 100644 include/linux/kvm_xinterface.h create mode 100644 virt/kvm/xinterface.c -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html