Message ID | 569BB278.8080603@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 22/01/2016 08:57, HÃ¥kon Alstadheim wrote: > Den 17. jan. 2016 16:25, skrev Andrew Cooper: >> On 17/01/16 15:16, Andrew Cooper wrote: >>>>> This isn't the first time we have seen this on Haswell processors. Do >>>>> you have microcode loading set up? >>>>> >>>>> ~Andrew >>>>> >>>> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated >>>> cpu microcode, using microcode from 20151106. > ... >>>> Actually, this will be more useful: >>>> >>>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c >>>> index 1228568..4e75b03 100644 >>>> --- a/xen/arch/x86/irq.c >>>> +++ b/xen/arch/x86/irq.c >>>> @@ -1165,6 +1165,15 @@ static void __do_IRQ_guest(int irq) >>>> if ( action->ack_type == ACKTYPE_EOI ) >>>> { >>>> sp = pending_eoi_sp(peoi); >>>> + if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) >>>> + { >>>> + int p; >>>> + >>>> + printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector); >>>> + for ( p = sp; p > 0; --p ) >>>> + printk("**peoi[%d] = {%d, %#x, %d}\n", >>>> + p-1, peoi[p-1].irq, peoi[p-1].vector, >>>> peoi[p-1].ready); >>>> + } >>>> ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); >>>> ASSERT(sp < (NR_DYNAMIC_VECTORS-1)); >>>> peoi[sp].irq = irq; >>>> >>>> >>>> > Got one again. dom5 is my desktop, dom1 is my > mail-server/router/firewall. (planning to split that up ... ) . Is there > any additional info that would be useful? > > Running now with gentoo xen 4.6.0-r8 and xen-tools 4.6.0-r7. dom0 kernel > is gentoo-sources-4.1.15-r1 , and the above patch. > > I tried running with maxcpus=6 for a while, but I had to disable some > services to get that running. So, when nothing happened for a while I > re-enabled all my cores (two cpus, 12 cores, 24 threads). I was running > with two cpu-pools, one for each cpu. I have not re-enabled that. grant_table.c:1491:d1v3 Expanding dom (1) grant table from (12) to (13) frames. ** sp 1, irq 107, vec 0x3b **peoi[0] = {107, 0x3b, 0} Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172 ----[ Xen-4.6.0 x86_64 debug=y Tainted: C ]---- <snip> Xen call trace: [<ffff82d080170205>] do_IRQ+0x451/0x6ea [<ffff82d08023b132>] common_interrupt+0x62/0x70 [<ffff82d0801af1ea>] mwait_idle+0x2cb/0x315 [<ffff82d0801607bc>] idle_loop+0x51/0x6b So we have been interrupted with an interrupt we already believe to be pending. I wonder if there is an erratum to do with going to sleep with a pending interrupt. I will see about extending the debugging patch to stash the IIR/ISR before going to sleep. ~Andrew
>>> On 22.01.16 at 10:20, <andrew.cooper3@citrix.com> wrote: > ** sp 1, irq 107, vec 0x3b > **peoi[0] = {107, 0x3b, 0} > Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1172 > ----[ Xen-4.6.0 x86_64 debug=y Tainted: C ]---- > <snip> > Xen call trace: > [<ffff82d080170205>] do_IRQ+0x451/0x6ea > [<ffff82d08023b132>] common_interrupt+0x62/0x70 > [<ffff82d0801af1ea>] mwait_idle+0x2cb/0x315 > [<ffff82d0801607bc>] idle_loop+0x51/0x6b > > So we have been interrupted with an interrupt we already believe to be > pending. I wonder if there is an erratum to do with going to sleep with > a pending interrupt. An immediate way to check whether that's (part of) the problem would be to run with "cpuidle=0" for a while. Jan
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 1228568..4e75b03 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1165,6 +1165,15 @@ static void __do_IRQ_guest(int irq) if ( action->ack_type == ACKTYPE_EOI ) { sp = pending_eoi_sp(peoi); + if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) ) + { + int p; + + printk("** sp %d, irq %d, vec %#x\n", sp, irq, vector); + for ( p = sp; p > 0; --p ) + printk("**peoi[%d] = {%d, %#x, %d}\n", + p-1, peoi[p-1].irq, peoi[p-1].vector, peoi[p-1].ready); + } ASSERT((sp == 0) || (peoi[sp-1].vector < vector)); ASSERT(sp < (NR_DYNAMIC_VECTORS-1));