diff mbox

[10/10] Introduce MSI message sending interface that bypass IRQ routing.

Message ID 1249821671-32356-11-git-send-email-gleb@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Gleb Natapov Aug. 9, 2009, 12:41 p.m. UTC
Sending of MSI using IRQ routing is an artificial concept and potentially
big number of MSIs (2048 per device) make it also inefficient. This
patch adds an interface to inject MSI messages from userspace to lapic
logic directly. The patch also reduces the maximum number of IRQ routing
entries to 128 since MSIs will no longer go there and 128 entries cover
5 ioapics and this ought to be enough for anybody.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/ia64/kvm/kvm-ia64.c |   26 ++++++++++++++++++++++++++
 arch/x86/kvm/x86.c       |   26 ++++++++++++++++++++++++++
 include/linux/kvm.h      |   10 ++++++++--
 include/linux/kvm_host.h |    3 ++-
 virt/kvm/irq_comm.c      |   23 ++++++++++++++---------
 5 files changed, 76 insertions(+), 12 deletions(-)

Comments

Gleb Natapov Aug. 9, 2009, 2:52 p.m. UTC | #1
On Sun, Aug 09, 2009 at 05:56:30PM +0300, Avi Kivity wrote:
> On 08/09/2009 03:41 PM, Gleb Natapov wrote:
> >Sending of MSI using IRQ routing is an artificial concept and potentially
> >big number of MSIs (2048 per device) make it also inefficient. This
> >patch adds an interface to inject MSI messages from userspace to lapic
> >logic directly. The patch also reduces the maximum number of IRQ routing
> >entries to 128 since MSIs will no longer go there and 128 entries cover
> >5 ioapics and this ought to be enough for anybody.
> 
> In the future many MSIs will be triggered via irqfds, and those
> require irq routing.
> 
Why? My plan is to change irqfd to use the MSI functions.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Aug. 9, 2009, 2:56 p.m. UTC | #2
On 08/09/2009 03:41 PM, Gleb Natapov wrote:
> Sending of MSI using IRQ routing is an artificial concept and potentially
> big number of MSIs (2048 per device) make it also inefficient. This
> patch adds an interface to inject MSI messages from userspace to lapic
> logic directly. The patch also reduces the maximum number of IRQ routing
> entries to 128 since MSIs will no longer go there and 128 entries cover
> 5 ioapics and this ought to be enough for anybody.
>    

In the future many MSIs will be triggered via irqfds, and those require 
irq routing.
Avi Kivity Aug. 9, 2009, 3:01 p.m. UTC | #3
On 08/09/2009 05:52 PM, Gleb Natapov wrote:
> On Sun, Aug 09, 2009 at 05:56:30PM +0300, Avi Kivity wrote:
>    
>> On 08/09/2009 03:41 PM, Gleb Natapov wrote:
>>      
>>> Sending of MSI using IRQ routing is an artificial concept and potentially
>>> big number of MSIs (2048 per device) make it also inefficient. This
>>> patch adds an interface to inject MSI messages from userspace to lapic
>>> logic directly. The patch also reduces the maximum number of IRQ routing
>>> entries to 128 since MSIs will no longer go there and 128 entries cover
>>> 5 ioapics and this ought to be enough for anybody.
>>>        
>> In the future many MSIs will be triggered via irqfds, and those
>> require irq routing.
>>
>>      
> Why? My plan is to change irqfd to use the MSI functions.
>
>    

It's still an "install handle, call handle" interface.  Maybe it would 
have been better to start off with your new interface, but having both 
is too much for too little gain.
Gleb Natapov Aug. 9, 2009, 3:07 p.m. UTC | #4
On Sun, Aug 09, 2009 at 06:01:15PM +0300, Avi Kivity wrote:
> On 08/09/2009 05:52 PM, Gleb Natapov wrote:
>> On Sun, Aug 09, 2009 at 05:56:30PM +0300, Avi Kivity wrote:
>>    
>>> On 08/09/2009 03:41 PM, Gleb Natapov wrote:
>>>      
>>>> Sending of MSI using IRQ routing is an artificial concept and potentially
>>>> big number of MSIs (2048 per device) make it also inefficient. This
>>>> patch adds an interface to inject MSI messages from userspace to lapic
>>>> logic directly. The patch also reduces the maximum number of IRQ routing
>>>> entries to 128 since MSIs will no longer go there and 128 entries cover
>>>> 5 ioapics and this ought to be enough for anybody.
>>>>        
>>> In the future many MSIs will be triggered via irqfds, and those
>>> require irq routing.
>>>
>>>      
>> Why? My plan is to change irqfd to use the MSI functions.
>>
>>    
>
> It's still an "install handle, call handle" interface.  Maybe it would  
> have been better to start off with your new interface, but having both  
> is too much for too little gain.
>
Is it not too late to change interface? There was no released kernel with
irqfd yet. And this just adds another level of indirection and one more
point of false cache sharing.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov Aug. 9, 2009, 3:14 p.m. UTC | #5
On Sun, Aug 09, 2009 at 06:14:32PM +0300, Avi Kivity wrote:
> On 08/09/2009 06:07 PM, Gleb Natapov wrote:
>>> It's still an "install handle, call handle" interface.  Maybe it would
>>> have been better to start off with your new interface, but having both
>>> is too much for too little gain.
>>>
>>>      
>> Is it not too late to change interface? There was no released kernel with
>> irqfd yet. And this just adds another level of indirection and one more
>> point of false cache sharing.
>>    
>
> No irqfd, but we do have irq routing.  The sharing isn't a problem since  
> it's read only.
>
We don't have many users of the old interface now. So what I propose is
to retain old interface for use with old binaries, but limit the number
of GSI entries to small number. New code should use new interface.

Correct sharing is not the problem, but two levels of indirection still
is (one indirection is to find list element another is to call function
by pointer).

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Aug. 9, 2009, 3:14 p.m. UTC | #6
On 08/09/2009 06:07 PM, Gleb Natapov wrote:
>> It's still an "install handle, call handle" interface.  Maybe it would
>> have been better to start off with your new interface, but having both
>> is too much for too little gain.
>>
>>      
> Is it not too late to change interface? There was no released kernel with
> irqfd yet. And this just adds another level of indirection and one more
> point of false cache sharing.
>    

No irqfd, but we do have irq routing.  The sharing isn't a problem since 
it's read only.
Avi Kivity Aug. 9, 2009, 3:22 p.m. UTC | #7
On 08/09/2009 06:14 PM, Gleb Natapov wrote:
> We don't have many users of the old interface now. So what I propose is
> to retain old interface for use with old binaries, but limit the number
> of GSI entries to small number. New code should use new interface.
>    

Still, we complicate the interface for userspace.

> Correct sharing is not the problem, but two levels of indirection still
> is (one indirection is to find list element another is to call function
> by pointer).
>
>    

irqfd can cache the entries locally if it proves to be a problem.
diff mbox

Patch

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 8f1fc3a..c136085 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -195,6 +195,7 @@  int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_IRQCHIP:
 	case KVM_CAP_MP_STATE:
 	case KVM_CAP_IRQ_INJECT_STATUS:
+	case KVM_CAP_MSI_MSG:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -1010,6 +1011,31 @@  long kvm_arch_vm_ioctl(struct file *filp,
 		}
 		break;
 		}
+	case KVM_MSI_MSG: {
+		struct kvm_irq_routing_msi msg;
+		struct msi_msg msi;
+
+		r = -EFAULT;
+		if (copy_from_user(&msg, argp, sizeof msg))
+			goto out;
+		r = -EINVAL;
+		if (!irqchip_in_kernel(kvm))
+			goto out;
+		if (msg.flags)
+			goto out;
+
+		msi.address_lo = msg.address_lo;
+		msi.address_hi = msg.address_hi;
+		msi.data = msg.data;
+
+		msg.status = kvm_set_msi(kvm, &msi);
+		if (copy_to_user(argp, &msg, sizeof msg)) {
+			r = -EFAULT;
+			goto out;
+		}
+		r = 0;
+		break;
+	}
 	case KVM_GET_IRQCHIP: {
 		/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
 		struct kvm_irqchip chip;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d313462..0e5c946 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1211,6 +1211,7 @@  int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_PIT2:
 	case KVM_CAP_PIT_STATE2:
 	case KVM_CAP_SET_IDENTITY_MAP_ADDR:
+	case KVM_CAP_MSI_MSG:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -2296,6 +2297,31 @@  long kvm_arch_vm_ioctl(struct file *filp,
 		}
 		break;
 	}
+	case KVM_MSI_MSG: {
+		struct kvm_irq_routing_msi msg;
+		struct msi_msg msi;
+
+		r = -EFAULT;
+		if (copy_from_user(&msg, argp, sizeof msg))
+			goto out;
+		r = -EINVAL;
+		if (!irqchip_in_kernel(kvm))
+			goto out;
+		if (msg.flags)
+			goto out;
+
+		msi.address_lo = msg.address_lo;
+		msi.address_hi = msg.address_hi;
+		msi.data = msg.data;
+
+		msg.status = kvm_set_msi(kvm, &msi);
+		if (copy_to_user(argp, &msg, sizeof msg)) {
+			r = -EFAULT;
+			goto out;
+		}
+		r = 0;
+		break;
+	}
 	case KVM_GET_IRQCHIP: {
 		/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
 		struct kvm_irqchip *chip = kmalloc(sizeof(*chip), GFP_KERNEL);
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..a326034 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -436,6 +436,7 @@  struct kvm_ioeventfd {
 #endif
 #define KVM_CAP_IOEVENTFD 36
 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
+#define KVM_CAP_MSI_MSG 38
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -447,8 +448,11 @@  struct kvm_irq_routing_irqchip {
 struct kvm_irq_routing_msi {
 	__u32 address_lo;
 	__u32 address_hi;
-	__u32 data;
-	__u32 pad;
+	union {
+		__u32 data;
+		__s32 status;
+	};
+	__u32 flags;
 };
 
 /* gsi routing entry types */
@@ -593,6 +597,8 @@  struct kvm_irqfd {
 #define KVM_X86_SETUP_MCE         _IOW(KVMIO,  0x9c, __u64)
 #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO,  0x9d, __u64)
 #define KVM_X86_SET_MCE           _IOW(KVMIO,  0x9e, struct kvm_x86_mce)
+#define KVM_MSI_MSG					\
+	_IOWR(KVMIO,  0x9f, struct kvm_irq_routing_msi)
 
 /*
  * Deprecated interfaces
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b3db9bf..857f2f4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -407,6 +407,7 @@  void kvm_get_intr_delivery_bitmask(struct kvm_ioapic *ioapic,
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
 int kvm_notifier_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
+int kvm_set_msi(struct kvm *kvm, struct msi_msg *msg);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
 				   struct kvm_irq_ack_notifier *kian);
@@ -524,7 +525,7 @@  static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long mmu_se
 
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
 
-#define KVM_MAX_IRQ_ROUTES 1024
+#define KVM_MAX_IRQ_ROUTES 128
 
 int kvm_setup_default_irq_routing(struct kvm *kvm);
 int kvm_set_irq_routing(struct kvm *kvm,
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index dd9a5ca..963d7a3 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -49,6 +49,12 @@  static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
 		notifier);
 }
 
+static int kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+		    struct kvm *kvm, int level, bool notifier)
+{
+	return kvm_set_msi(kvm, &e->msi);
+}
+
 inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
 {
 #ifdef CONFIG_IA64
@@ -95,20 +101,19 @@  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 	return r;
 }
 
-static int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
-		       struct kvm *kvm, int level)
+int kvm_set_msi(struct kvm *kvm, struct msi_msg *msi)
 {
 	struct kvm_lapic_irq irq;
 
-	trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
+	trace_kvm_msi_set_irq(msi->address_lo, msi->data);
 
-	irq.dest_id = (e->msi.address_lo &
+	irq.dest_id = (msi->address_lo &
 			MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
-	irq.vector = (e->msi.data &
+	irq.vector = (msi->data &
 			MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
-	irq.dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & e->msi.address_lo;
-	irq.trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & e->msi.data;
-	irq.delivery_mode = e->msi.data & 0x700;
+	irq.dest_mode = (1 << MSI_ADDR_DEST_MODE_SHIFT) & msi->address_lo;
+	irq.trig_mode = (1 << MSI_DATA_TRIGGER_SHIFT) & msi->data;
+	irq.delivery_mode = msi->data & 0x700;
 	irq.level = 1;
 	irq.shorthand = 0;
 
@@ -320,7 +325,7 @@  static int setup_routing_entry(struct kvm_irq_routing_table *rt,
 		rt->chip[ue->u.irqchip.irqchip][e->irqchip.pin] = ue->gsi;
 		break;
 	case KVM_IRQ_ROUTING_MSI:
-		e->set = kvm_set_msi;
+		e->set = kvm_set_msi_irq;
 		e->msi.address_lo = ue->u.msi.address_lo;
 		e->msi.address_hi = ue->u.msi.address_hi;
 		e->msi.data = ue->u.msi.data;