Message ID | 20090629132926.GB20289@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2009-06-29 at 06:29 -0700, Gleb Natapov wrote: > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index d1430ef..3e5b6ea 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -260,12 +260,15 @@ config SMP > > config X86_X2APIC > bool "Support x2apic" > - depends on X86_LOCAL_APIC && X86_64 && INTR_REMAP > + depends on X86_LOCAL_APIC && X86_64 Thinking more, probably we shouldn't remove this dependency. This might encourage people (knowingly or unknowingly) to enable x2apic without interrupt-remapping. Can we remove this? KVM mode will still work even if we fail to enable interrupt-remapping. So this shouldn't be an issue, correct? Sorry I should have commented about this before. > > ioapic_entries = alloc_ioapic_entries(); > if (!ioapic_entries) { > - pr_info("Allocate ioapic_entries failed: %d\n", ret); > - goto end; > + pr_info("Allocate ioapic_entries failed\n"); > + return; We should go to ir_failed .. > } > > ret = save_IO_APIC_setup(ioapic_entries); > if (ret) { > pr_info("Saving IO-APIC state failed: %d\n", ret); > - goto end; > + free_ioapic_entries(ioapic_entries); > + return; same here. In few hours I will be on 24 hour flight. So please bear with me for delayed responses. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 29, 2009 at 07:58:39AM -0700, Suresh Siddha wrote: > On Mon, 2009-06-29 at 06:29 -0700, Gleb Natapov wrote: > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index d1430ef..3e5b6ea 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -260,12 +260,15 @@ config SMP > > > > config X86_X2APIC > > bool "Support x2apic" > > - depends on X86_LOCAL_APIC && X86_64 && INTR_REMAP > > + depends on X86_LOCAL_APIC && X86_64 > > Thinking more, probably we shouldn't remove this dependency. This might > encourage people (knowingly or unknowingly) to enable x2apic without > interrupt-remapping. Can we remove this? KVM mode will still work even > if we fail to enable interrupt-remapping. So this shouldn't be an issue, > correct? > Yes, KVM will still work. I don't have strong fillings one way or the other, but why mandate an option that is no longer mandatory. What others think? > Sorry I should have commented about this before. > > > > > ioapic_entries = alloc_ioapic_entries(); > > if (!ioapic_entries) { > > - pr_info("Allocate ioapic_entries failed: %d\n", ret); > > - goto end; > > + pr_info("Allocate ioapic_entries failed\n"); > > + return; > > > We should go to ir_failed .. Why? There is not more ir_failed. > > > } > > > > ret = save_IO_APIC_setup(ioapic_entries); > > if (ret) { > > pr_info("Saving IO-APIC state failed: %d\n", ret); > > - goto end; > > + free_ioapic_entries(ioapic_entries); > > + return; > > same here. > Why? What ir_failed should do? Call free_ioapic_entries(ioapic_entries) and exit? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2009-06-29 at 08:10 -0700, Gleb Natapov wrote: > On Mon, Jun 29, 2009 at 07:58:39AM -0700, Suresh Siddha wrote: > > Thinking more, probably we shouldn't remove this dependency. This might > > encourage people (knowingly or unknowingly) to enable x2apic without > > interrupt-remapping. Can we remove this? KVM mode will still work even > > if we fail to enable interrupt-remapping. So this shouldn't be an issue, > > correct? > > > Yes, KVM will still work. I don't have strong fillings one way or the > other, but why mandate an option that is no longer mandatory. What > others think? Only under the presence of KVM we are breaking this dependency. And typically the same kernel runs natively and under the presence of kvm correct. So thats why we shouldn't break this dependency. > > > ioapic_entries = alloc_ioapic_entries(); > > > if (!ioapic_entries) { > > > - pr_info("Allocate ioapic_entries failed: %d\n", ret); > > > - goto end; > > > + pr_info("Allocate ioapic_entries failed\n"); > > > + return; > > > > > > We should go to ir_failed .. > Why? There is not more ir_failed. oops. Not literally. We should goto the point where we should report the x2apic or ir enabling failed and check for x2apic pre-enabled etc at the end. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 29, 2009 at 08:15:05AM -0700, Suresh Siddha wrote: > On Mon, 2009-06-29 at 08:10 -0700, Gleb Natapov wrote: > > On Mon, Jun 29, 2009 at 07:58:39AM -0700, Suresh Siddha wrote: > > > Thinking more, probably we shouldn't remove this dependency. This might > > > encourage people (knowingly or unknowingly) to enable x2apic without > > > interrupt-remapping. Can we remove this? KVM mode will still work even > > > if we fail to enable interrupt-remapping. So this shouldn't be an issue, > > > correct? > > > > > Yes, KVM will still work. I don't have strong fillings one way or the > > other, but why mandate an option that is no longer mandatory. What > > others think? > > Only under the presence of KVM we are breaking this dependency. And > typically the same kernel runs natively and under the presence of kvm > correct. So thats why we shouldn't break this dependency. > OK, I'll drop Kconfig part in the next version. Unless someone will object till tomorrow. > > > > > ioapic_entries = alloc_ioapic_entries(); > > > > if (!ioapic_entries) { > > > > - pr_info("Allocate ioapic_entries failed: %d\n", ret); > > > > - goto end; > > > > + pr_info("Allocate ioapic_entries failed\n"); > > > > + return; > > > > > > > > > We should go to ir_failed .. > > Why? There is not more ir_failed. > > oops. Not literally. We should goto the point where we should report the > x2apic or ir enabling failed and check for x2apic pre-enabled etc at the > end. > Ah, OK. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Gleb Natapov <gleb@redhat.com> writes: > KVM would like to provide x2APIC interface to a guest without emulating > interrupt remapping device. The reason KVM prefers guest to use x2APIC > is that x2APIC interface is better virtualizable and provides better > performance than mmio xAPIC interface: > > - msr exits are faster than mmio (no page table walk, emulation) > - no need to read back ICR to look at the busy bit > - one 64 bit ICR write instead of two 32 bit writes > - shared code with the Hyper-V paravirt interface > > Included patch changes x2APIC enabling logic to enable it even if IR > initialization failed, but kernel runs under KVM and no apic id is > greater than 255 (if there is one spec requires BIOS to move to x2apic > mode before starting an OS). How common is hotplug hardware in kvm? In particular hotplug cpus? To support that seriously you need interrupt remapping. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 30, 2009 at 02:24:05AM -0700, Eric W. Biederman wrote: > Gleb Natapov <gleb@redhat.com> writes: > > > KVM would like to provide x2APIC interface to a guest without emulating > > interrupt remapping device. The reason KVM prefers guest to use x2APIC > > is that x2APIC interface is better virtualizable and provides better > > performance than mmio xAPIC interface: > > > > - msr exits are faster than mmio (no page table walk, emulation) > > - no need to read back ICR to look at the busy bit > > - one 64 bit ICR write instead of two 32 bit writes > > - shared code with the Hyper-V paravirt interface > > > > Included patch changes x2APIC enabling logic to enable it even if IR > > initialization failed, but kernel runs under KVM and no apic id is > > greater than 255 (if there is one spec requires BIOS to move to x2apic > > mode before starting an OS). > > > How common is hotplug hardware in kvm? In particular hotplug cpus? > It works for Linux guests. > To support that seriously you need interrupt remapping. > Can you explain why? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Gleb Natapov <gleb@redhat.com> writes: > On Tue, Jun 30, 2009 at 02:24:05AM -0700, Eric W. Biederman wrote: >> Gleb Natapov <gleb@redhat.com> writes: >> >> > KVM would like to provide x2APIC interface to a guest without emulating >> > interrupt remapping device. The reason KVM prefers guest to use x2APIC >> > is that x2APIC interface is better virtualizable and provides better >> > performance than mmio xAPIC interface: >> > >> > - msr exits are faster than mmio (no page table walk, emulation) >> > - no need to read back ICR to look at the busy bit >> > - one 64 bit ICR write instead of two 32 bit writes >> > - shared code with the Hyper-V paravirt interface >> > >> > Included patch changes x2APIC enabling logic to enable it even if IR >> > initialization failed, but kernel runs under KVM and no apic id is >> > greater than 255 (if there is one spec requires BIOS to move to x2apic >> > mode before starting an OS). >> >> >> How common is hotplug hardware in kvm? In particular hotplug cpus? >> > It works for Linux guests. > >> To support that seriously you need interrupt remapping. >> > Can you explain why? Because ioapics don't fully function according to spec, and the interrupt code on the hotplug path is a horrible terrible broken hack for ioapics. It is better than nothing but it certainly is not something I would expect to work all of the time. Interrupt remapping is the one case where we have hardware that works according to spec and that works reasonably well. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/30/2009 07:44 PM, Eric W. Biederman wrote: >>> To support that seriously you need interrupt remapping. >>> >>> >> Can you explain why? >> > > Because ioapics don't fully function according to spec, > and the interrupt code on the hotplug path is a horrible > terrible broken hack for ioapics. > > It is better than nothing but it certainly is not something > I would expect to work all of the time. > Can you elaborate? For kvm guests, the hardware is reasonably will implemented and if not we will fix it. We need not cripple a feature just because some hardware is broken.
On Tue, Jun 30, 2009 at 09:44:54AM -0700, Eric W. Biederman wrote: > Gleb Natapov <gleb@redhat.com> writes: > > > On Tue, Jun 30, 2009 at 02:24:05AM -0700, Eric W. Biederman wrote: > >> Gleb Natapov <gleb@redhat.com> writes: > >> > >> > KVM would like to provide x2APIC interface to a guest without emulating > >> > interrupt remapping device. The reason KVM prefers guest to use x2APIC > >> > is that x2APIC interface is better virtualizable and provides better > >> > performance than mmio xAPIC interface: > >> > > >> > - msr exits are faster than mmio (no page table walk, emulation) > >> > - no need to read back ICR to look at the busy bit > >> > - one 64 bit ICR write instead of two 32 bit writes > >> > - shared code with the Hyper-V paravirt interface > >> > > >> > Included patch changes x2APIC enabling logic to enable it even if IR > >> > initialization failed, but kernel runs under KVM and no apic id is > >> > greater than 255 (if there is one spec requires BIOS to move to x2apic > >> > mode before starting an OS). > >> > >> > >> How common is hotplug hardware in kvm? In particular hotplug cpus? > >> > > It works for Linux guests. > > > > >> To support that seriously you need interrupt remapping. > >> > > Can you explain why? > > Because ioapics don't fully function according to spec, > and the interrupt code on the hotplug path is a horrible > terrible broken hack for ioapics. > Considering that interrupt remapping is fairly new feature are you saying that hotplug (pci and cpu) on x86 is a horrible hack on Linux? > It is better than nothing but it certainly is not something > I would expect to work all of the time. > Because of horrible code or non-complaint ioapic implementation out there? If later this is not a big issue for KVM. > Interrupt remapping is the one case where we have hardware > that works according to spec and that works reasonably well. > I am sure when there was only one ioapic implementation in existence it worked according to a spec (and if not spec was changed :)) Give interrupt remapping some time and than we will see how above statement holds. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity <avi@redhat.com> writes: > On 06/30/2009 07:44 PM, Eric W. Biederman wrote: >>>> To support that seriously you need interrupt remapping. >>>> >>>> >>> Can you explain why? >>> >> >> Because ioapics don't fully function according to spec, >> and the interrupt code on the hotplug path is a horrible >> terrible broken hack for ioapics. >> >> It is better than nothing but it certainly is not something >> I would expect to work all of the time. >> > > Can you elaborate? For kvm guests, the hardware is reasonably will implemented > and if not we will fix it. We need not cripple a feature just because some > hardware is broken. The short version is I don't know what work arounds we will ultimately decide to deploy to work with real hardware. I have been seriously contemplating causing a cpu hot-unplug request to fail if we are in ioapic mode and we have irqs routed to the cpu that is being unplugged. Even with perfectly working hardware it is not possible in the general case to migrate an ioapic irq from one cpu to another outside of an interrupt handler without without risking dropping an interrupt. There is no general way to know you have seen the last interrupt floating around your system. PCI ordering rules don't help because the ioapics can potentially take an out of band channel. The interrupt remapping hardware is pci and is only present on systems with in-band interrupts, so we can flush in-flight interrupts using pci reads. We can safely and do reprogram the hardware to point at different cpus at arbitrary times. So if you possibly can please emulate hardware that is known to work reliably for cpu hotplug. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/30/2009 10:08 PM, Eric W. Biederman wrote: >> Can you elaborate? For kvm guests, the hardware is reasonably will implemented >> and if not we will fix it. We need not cripple a feature just because some >> hardware is broken. >> > > The short version is I don't know what work arounds we will ultimately > decide to deploy to work with real hardware. > > I have been seriously contemplating causing a cpu hot-unplug request > to fail if we are in ioapic mode and we have irqs routed to the cpu > that is being unplugged. > Well, obviously we need to disassociate any irqs from such a cpu. Could be done from the kernel or only enforced by the kernel. > Even with perfectly working hardware it is not possible in the general > case to migrate an ioapic irq from one cpu to another outside of an > interrupt handler without without risking dropping an interrupt. > Can't you generate a spurious interrupt immediately after the migration? An extra interrupt shouldn't hurt. > There is no general way to know you have seen the last interrupt > floating around your system. PCI ordering rules don't help because > the ioapics can potentially take an out of band channel. > Can you describe the problem scenario? an ioapic->lapic message delivered to a dead cpu?
Avi Kivity <avi@redhat.com> writes: > On 06/30/2009 10:08 PM, Eric W. Biederman wrote: >>> Can you elaborate? For kvm guests, the hardware is reasonably will implemented >>> and if not we will fix it. We need not cripple a feature just because some >>> hardware is broken. >>> >> >> The short version is I don't know what work arounds we will ultimately >> decide to deploy to work with real hardware. >> >> I have been seriously contemplating causing a cpu hot-unplug request >> to fail if we are in ioapic mode and we have irqs routed to the cpu >> that is being unplugged. >> > > Well, obviously we need to disassociate any irqs from such a cpu. Could be done > from the kernel or only enforced by the kernel. Using the normal irq migration path we can move irqs off of a cpu reliably there just aren't any progress guarantees. >> Even with perfectly working hardware it is not possible in the general >> case to migrate an ioapic irq from one cpu to another outside of an >> interrupt handler without without risking dropping an interrupt. >> > > Can't you generate a spurious interrupt immediately after the migration? An > extra interrupt shouldn't hurt. Nope. The ioapics can't be told to send an interrupt. >> There is no general way to know you have seen the last interrupt >> floating around your system. PCI ordering rules don't help because >> the ioapics can potentially take an out of band channel. >> > > Can you describe the problem scenario? an ioapic->lapic message delivered to a > dead cpu? Dropped irqs.. Driver hangs because it is waiting for an irq. Hardware hangs because it is waiting for the cpu to process the irq. Potentially we get a level triggered irq that is never acked by the cpu that won't arm until the cpu send an ack, and we can't send an ack from another cpu. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Gleb Natapov <gleb@redhat.com> writes: > Considering that interrupt remapping is fairly new feature > are you saying that hotplug (pci and cpu) on x86 is a horrible > hack on Linux? Just the current cpu hotplug path. >> It is better than nothing but it certainly is not something >> I would expect to work all of the time. >> > Because of horrible code or non-complaint ioapic implementation out > there? If later this is not a big issue for KVM. Both. And even in spec ioapics can't be made to work 100% reliably. >> Interrupt remapping is the one case where we have hardware >> that works according to spec and that works reasonably well. >> > I am sure when there was only one ioapic implementation in existence > it worked according to a spec (and if not spec was changed :)) Give > interrupt remapping some time and than we will see how above statement > holds. What we need for cpu hotplug/unplug is on the tested path now. That is a significant difference. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/30/2009 10:36 PM, Eric W. Biederman wrote: >>> The short version is I don't know what work arounds we will ultimately >>> decide to deploy to work with real hardware. >>> >>> I have been seriously contemplating causing a cpu hot-unplug request >>> to fail if we are in ioapic mode and we have irqs routed to the cpu >>> that is being unplugged. >>> >>> >> Well, obviously we need to disassociate any irqs from such a cpu. Could be done >> from the kernel or only enforced by the kernel. >> > > Using the normal irq migration path we can move irqs off of a cpu reliably > there just aren't any progress guarantees. > Program the ioapic to the new cpu. Wait a few milliseconds. If it takes more than that to get an interrupt from the ioapic to the local apic, the machine has much bigger problems. >>> Even with perfectly working hardware it is not possible in the general >>> case to migrate an ioapic irq from one cpu to another outside of an >>> interrupt handler without without risking dropping an interrupt. >>> >>> >> Can't you generate a spurious interrupt immediately after the migration? An >> extra interrupt shouldn't hurt. >> > > Nope. The ioapics can't be told to send an interrupt. > You can program the local apic ICR to generate an interrupt with the same vector. >>> There is no general way to know you have seen the last interrupt >>> floating around your system. PCI ordering rules don't help because >>> the ioapics can potentially take an out of band channel. >>> >>> >> Can you describe the problem scenario? an ioapic->lapic message delivered to a >> dead cpu? >> > > Dropped irqs.. Driver hangs because it is waiting for an irq. Hardware > hangs because it is waiting for the cpu to process the irq. > > Potentially we get a level triggered irq that is never acked by > the cpu that won't arm until the cpu send an ack, and we can't > send an ack from another cpu. > > I think a spurious interrupt generated through the local apic solves that problem. For level-triggered interrupts, mask them before offlining the cpu.
Avi Kivity <avi@redhat.com> writes: > On 06/30/2009 10:36 PM, Eric W. Biederman wrote: >>>> The short version is I don't know what work arounds we will ultimately >>>> decide to deploy to work with real hardware. >>>> >>>> I have been seriously contemplating causing a cpu hot-unplug request >>>> to fail if we are in ioapic mode and we have irqs routed to the cpu >>>> that is being unplugged. >>>> >>>> >>> Well, obviously we need to disassociate any irqs from such a cpu. Could be done >>> from the kernel or only enforced by the kernel. >>> >> >> Using the normal irq migration path we can move irqs off of a cpu reliably >> there just aren't any progress guarantees. >> > > Program the ioapic to the new cpu. Wait a few milliseconds. If it takes more > than that to get an interrupt from the ioapic to the local apic, the machine has > much bigger problems. In general you can not reprogram an ioapic safely unless the interrupt is blocked at the source. Which is why you either need the originating device disabled or you have to do it in interrupt context. I forget all of the details. I just know in real hardware I experimented with it a lot, and wound up hanging the ioapic state machine of both intel and amd ioapics. Migrating ioapic irqs in interrupt context sucks. It just happens to be what works reliably. I do think the wait an eternity in computer time a short while in human time is a valid technique when you can do nothing better. If flushing the interrupt was my only problem that would solve it. >>>> Even with perfectly working hardware it is not possible in the general >>>> case to migrate an ioapic irq from one cpu to another outside of an >>>> interrupt handler without without risking dropping an interrupt. >>>> >>>> >>> Can't you generate a spurious interrupt immediately after the migration? An >>> extra interrupt shouldn't hurt. >>> >> >> Nope. The ioapics can't be told to send an interrupt. >> > > You can program the local apic ICR to generate an interrupt with the same > vector. But you can not program the apic ICR to generate a level triggered interrupt with the same vector. So you don't get the broadcast behavior when you ack the apic. >>>> There is no general way to know you have seen the last interrupt >>>> floating around your system. PCI ordering rules don't help because >>>> the ioapics can potentially take an out of band channel. >>>> >>>> >>> Can you describe the problem scenario? an ioapic->lapic message delivered to a >>> dead cpu? >>> >> >> Dropped irqs.. Driver hangs because it is waiting for an irq. Hardware >> hangs because it is waiting for the cpu to process the irq. >> >> Potentially we get a level triggered irq that is never acked by >> the cpu that won't arm until the cpu send an ack, and we can't >> send an ack from another cpu. >> >> > > I think a spurious interrupt generated through the local apic solves that > problem. For level-triggered interrupts, mask them before offlining the cpu. So now we have a masked unacked irq. It doesn't help in the slightest that the cpu migration code puts irq migration last and request that we do it all with interrupts disabled. You might be right that by application of extreme ingenuity and completely in spec ioapics there is a path that makes this all work. Currently I don't have fully in spec ioapcis, and I don't have anyone interested enough in cpu hotplug to be willing to rip things apart until interrupt migration is a reasonable deal on x86. Instead all I see are patches that mitigate the worst of the brokenness. At the same time with the interrupt remapping hardware because it doesn't need the irq disabled at the source when we reprogram it I can make everything stable much more easily. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2009-06-30 at 12:36 -0700, Eric W. Biederman wrote: > Dropped irqs.. Driver hangs because it is waiting for an irq. Hardware > hangs because it is waiting for the cpu to process the irq. > > Potentially we get a level triggered irq that is never acked by > the cpu that won't arm until the cpu send an ack, and we can't > send an ack from another cpu. Eric, Among number of experiments you have tried in the past to fix this, have you tried the experiment of explicitly clearing the remoteIRR by changing the trigger mode to edge and then back to level. Is there a problem with this? We can send a spurious IPI (after the RTE migration) with the new vector to another cpu and handler which services the level interrupt will check if we saw the edge mode for a level interrupt and then the handler can explicitly restore the level trigger and reset the remote IRR by mask +edge and unmask+level. We might have to work with some rough edges but do you recollect any major issue with this approach.. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Suresh Siddha <suresh.b.siddha@intel.com> writes: > On Tue, 2009-06-30 at 12:36 -0700, Eric W. Biederman wrote: >> Dropped irqs.. Driver hangs because it is waiting for an irq. Hardware >> hangs because it is waiting for the cpu to process the irq. >> >> Potentially we get a level triggered irq that is never acked by >> the cpu that won't arm until the cpu send an ack, and we can't >> send an ack from another cpu. > > Eric, > > Among number of experiments you have tried in the past to fix this, have > you tried the experiment of explicitly clearing the remoteIRR by > changing the trigger mode to edge and then back to level. > > Is there a problem with this? The problem I had wasn't remoteIRR getting stuck, but the symptoms were largely the same. I did try changing the trigger mode to edge and back and that did not unstick the ioapic in all cases. > We can send a spurious IPI (after the RTE migration) with the new vector > to another cpu and handler which services the level interrupt will check > if we saw the edge mode for a level interrupt and then the handler can > explicitly restore the level trigger and reset the remote IRR by mask > +edge and unmask+level. > > We might have to work with some rough edges but do you recollect any > major issue with this approach.. This is coming up enough recently I expect it is time to cook up a patch that does the ioapic migration in process context plus some user space code that stress tests things. Just so people can repeat my experiments and see what I am trying to avoid. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-07-01 at 17:17 -0700, Eric W. Biederman wrote: > Suresh Siddha <suresh.b.siddha@intel.com> writes: > > Among number of experiments you have tried in the past to fix this, have > > you tried the experiment of explicitly clearing the remoteIRR by > > changing the trigger mode to edge and then back to level. > > > > Is there a problem with this? > > The problem I had wasn't remoteIRR getting stuck, but the symptoms > were largely the same. I did try changing the trigger mode to edge > and back and that did not unstick the ioapic in all cases. > I did some experiments locally here yesterday and on my old ICH5 based system, I couldn't reset the remoteIRR by changing the trigger mode to edge and then back to level. However what worked was an explicit eoi to the io-apic using the respective vector. I guess we need to try both the things based on perhaps io-apic version etc. But what I am nervous about is, did you try both these things aswell and still saw stuck interrupts? I will cleanup my code and post it, so that we can try on different systems. If this still doesn't work on certain HW platforms, atleast our experiments of what works and what doesn't work and on what platforms will be documented on the web. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Suresh Siddha <suresh.b.siddha@intel.com> writes: > On Wed, 2009-07-01 at 17:17 -0700, Eric W. Biederman wrote: >> Suresh Siddha <suresh.b.siddha@intel.com> writes: >> > Among number of experiments you have tried in the past to fix this, have >> > you tried the experiment of explicitly clearing the remoteIRR by >> > changing the trigger mode to edge and then back to level. >> > >> > Is there a problem with this? >> >> The problem I had wasn't remoteIRR getting stuck, but the symptoms >> were largely the same. I did try changing the trigger mode to edge >> and back and that did not unstick the ioapic in all cases. >> > > I did some experiments locally here yesterday and on my old ICH5 based > system, I couldn't reset the remoteIRR by changing the trigger mode to > edge and then back to level. > > However what worked was an explicit eoi to the io-apic using the > respective vector. > > I guess we need to try both the things based on perhaps io-apic version > etc. In part the deep problems I ran into was something other than RemoteIRR. If it was something as simple as the ioapic being in a documented state I would have kept looking. At the very least the RemoteIRR bit was not set in some of the cases I encountered. > But what I am nervous about is, did you try both these things aswell and > still saw stuck interrupts? Yes. The work arounds were in the code so I tried them. > I will cleanup my code and post it, so that we can try on different > systems. If this still doesn't work on certain HW platforms, atleast our > experiments of what works and what doesn't work and on what platforms > will be documented on the web. Sounds reasonable. My apologies for the long delayed reply. I want to work on this but I have higher priorities. Eric -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d1430ef..3e5b6ea 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -260,12 +260,15 @@ config SMP config X86_X2APIC bool "Support x2apic" - depends on X86_LOCAL_APIC && X86_64 && INTR_REMAP + depends on X86_LOCAL_APIC && X86_64 ---help--- This enables x2apic support on CPUs that have this feature. This allows 32-bit apic IDs (so it can support very large systems), - and accesses the local apic via MSRs not via mmio. + and accesses the local apic via MSRs not via mmio. CONFIG_INTR_REMAP + is required for x2apic to take affect when running on physical + machine. If you intend to run the kernel as KVM guest x2apic can + be used without interrupt remapping. If you don't know what to do here, say N. diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 8c7c042..f5f02c3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -49,6 +49,7 @@ #include <asm/mtrr.h> #include <asm/smp.h> #include <asm/mce.h> +#include <asm/kvm_para.h> unsigned int num_processors; @@ -1363,52 +1364,76 @@ void enable_x2apic(void) } #endif /* CONFIG_X86_X2APIC */ -void __init enable_IR_x2apic(void) +int __init enable_IR(void) { #ifdef CONFIG_INTR_REMAP int ret; - unsigned long flags; - struct IO_APIC_route_entry **ioapic_entries = NULL; ret = dmar_table_init(); if (ret) { pr_debug("dmar_table_init() failed with %d:\n", ret); - goto ir_failed; + return 0; } if (!intr_remapping_supported()) { pr_debug("intr-remapping not supported\n"); - goto ir_failed; + return 0; } - if (!x2apic_preenabled && skip_ioapic_setup) { pr_info("Skipped enabling intr-remap because of skipping " "io-apic setup\n"); - return; + return 0; } + if (enable_intr_remapping(x2apic_supported())) + return 0; + + pr_info("Enabled Interrupt-remapping\n"); + + return 1; + +#endif + return 0; +} + +void __init enable_IR_x2apic(void) +{ + unsigned long flags; + struct IO_APIC_route_entry **ioapic_entries = NULL; + int ret, x2apic_enabled = 0; + ioapic_entries = alloc_ioapic_entries(); if (!ioapic_entries) { - pr_info("Allocate ioapic_entries failed: %d\n", ret); - goto end; + pr_info("Allocate ioapic_entries failed\n"); + return; } ret = save_IO_APIC_setup(ioapic_entries); if (ret) { pr_info("Saving IO-APIC state failed: %d\n", ret); - goto end; + free_ioapic_entries(ioapic_entries); + return; } local_irq_save(flags); - mask_IO_APIC_setup(ioapic_entries); mask_8259A(); + mask_IO_APIC_setup(ioapic_entries); - ret = enable_intr_remapping(x2apic_supported()); - if (ret) - goto end_restore; + ret = enable_IR(); + if (!ret) { + /* IR is required if x2apic is enabled by BIOS even + * when running in kvm since this indicates present + * of APIC ID > 255 */ + if (x2apic_preenabled || !kvm_para_available()) + goto nox2apic; + else + /* without IR all CPUs can be addressed by IOAPIC/MSI + * only in physical mode */ + x2apic_phys = 1; + } - pr_info("Enabled Interrupt-remapping\n"); + x2apic_enabled = 1; if (x2apic_supported() && !x2apic_mode) { x2apic_mode = 1; @@ -1416,41 +1441,28 @@ void __init enable_IR_x2apic(void) pr_info("Enabled x2apic\n"); } -end_restore: - if (ret) - /* - * IR enabling failed - */ +nox2apic: + if (!ret) /* IR enabling failed */ restore_IO_APIC_setup(ioapic_entries); - unmask_8259A(); local_irq_restore(flags); -end: - if (ioapic_entries) - free_ioapic_entries(ioapic_entries); + free_ioapic_entries(ioapic_entries); - if (!ret) + if (x2apic_enabled) return; -ir_failed: - if (x2apic_preenabled) + if (x2apic_preenabled) { +#ifdef CONFIG_INTR_REMAP panic("x2apic enabled by bios. But IR enabling failed"); - else if (cpu_has_x2apic) - pr_info("Not enabling x2apic,Intr-remapping\n"); #else - if (!cpu_has_x2apic) - return; - - if (x2apic_preenabled) panic("x2apic enabled prior OS handover," - " enable CONFIG_X86_X2APIC, CONFIG_INTR_REMAP"); + " enable CONFIG_X86_X2APIC, CONFIG_INTR_REMAP"); #endif - - return; + } else if (cpu_has_x2apic) + pr_info("Not enabling x2apic,Intr-remapping\n"); } - #ifdef CONFIG_X86_64 /* * Detect and enable local APICs on non-SMP boards.
KVM would like to provide x2APIC interface to a guest without emulating interrupt remapping device. The reason KVM prefers guest to use x2APIC is that x2APIC interface is better virtualizable and provides better performance than mmio xAPIC interface: - msr exits are faster than mmio (no page table walk, emulation) - no need to read back ICR to look at the busy bit - one 64 bit ICR write instead of two 32 bit writes - shared code with the Hyper-V paravirt interface Included patch changes x2APIC enabling logic to enable it even if IR initialization failed, but kernel runs under KVM and no apic id is greater than 255 (if there is one spec requires BIOS to move to x2apic mode before starting an OS). Signed-off-by: Gleb Natapov <gleb@redhat.com> --- v2: Add help message to Kconfig Force physical mode if IR is not available. v3: disable io-apic in outer function. Enable x2apic mode with masked io-apic -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html