KVM induced panic on 2.6.38[2367] & 2.6.39

Message ID	1307505541.3102.12.camel@edumazet-laptop (mailing list archive)
State	New, archived
Headers	show Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p5842XVh014373 for <patchwork-kvm@patchwork.kernel.org>; Wed, 8 Jun 2011 04:02:34 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754132Ab1FHD7L (ORCPT <rfc822;patchwork-kvm@patchwork.kernel.org>); Tue, 7 Jun 2011 23:59:11 -0400 Received: from mail-wy0-f174.google.com ([74.125.82.174]:44890 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752728Ab1FHD7H (ORCPT <rfc822; kvm@vger.kernel.org>); Tue, 7 Jun 2011 23:59:07 -0400 Received: by wya21 with SMTP id 21so61100wya.19 for <multiple recipients>; Tue, 07 Jun 2011 20:59:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:from:to:cc:in-reply-to:references :content-type:date:message-id:mime-version:x-mailer :content-transfer-encoding; bh=TQocmbX4PQLlHdfjrWfCh9CPxASqFLkKYHq1mhA3HCU=; b=eitE13wivF+qOngghABWqOhRhTLwuYWy9evIWl7z7QSb3F0DzmXFsfxG9tihvJLBGx puHYu8/Ux6ZGQLKqnmo3jtdbz/Vcu6eOnM8sKqW0M7nQ2qas5azxocXjRZwpI1rHXODa JXcu265svF0AEp/Lu+q/Lvvm3srIASbUvkLlo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=gvTLy3olxLZ2x3cJRUmwun2N3SmeG4spgD5N7TXxJgllPQpv9GmZM7smZ7m+SEZ07n FoPkqkNfuU2+8RYYoiTbxa6TUF7K8p9ITjIIsICGuQFbvq/YL60ehv0IAUoyAHx0yZ5t QOpMIjhYjzQy56Larxbaxj+eDhtkxo5EE+Tik= Received: by 10.216.245.4 with SMTP id n4mr4500058wer.83.1307505545484; Tue, 07 Jun 2011 20:59:05 -0700 (PDT) Received: from [10.150.51.210] (gw0.net.jmsp.net [212.23.165.14]) by mx.google.com with ESMTPS id c17sm82850wbh.29.2011.06.07.20.59.03 (version=SSLv3 cipher=OTHER); Tue, 07 Jun 2011 20:59:04 -0700 (PDT) Subject: Re: KVM induced panic on 2.6.38[2367] & 2.6.39 From: Eric Dumazet <eric.dumazet@gmail.com> To: Brad Campbell <brad@fnarfbargle.com> Cc: Patrick McHardy <kaber@trash.net>, Bart De Schuymer <bdschuym@pandora.be>, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, netfilter-devel@vger.kernel.org In-Reply-To: <4DEEBFC2.4060102@fnarfbargle.com> References: <20110601011527.GN19505@random.random> <alpine.LSU.2.00.1105312120530.22808@sister.anvils> <4DE5DCA8.7070704@fnarfbargle.com> <4DE5E29E.7080009@redhat.com> <4DE60669.9050606@fnarfbargle.com> <4DE60918.3010008@redhat.com> <4DE60940.1070107@redhat.com> <4DE61A2B.7000008@fnarfbargle.com> <20110601111841.GB3956@zip.com.au> <4DE62801.9080804@fnarfbargle.com> <20110601230342.GC3956@zip.com.au> <4DE8E3ED.7080004@fnarfbargle.com> <isavsg$3or$1@dough.gmane.org> <4DE906C0.6060901@fnarfbargle.com> <4DED344D.7000005@pandora.be> <4DED9C23.2030408@fnarfbargle.com> <4DEE27DE.7060004@trash.net> <4DEE3859.6070808@fnarfbargle.com> <4DEE4538.1020404@trash.net> <1307471484.3091.43.camel@edumazet-laptop> <4DEEACC3.3030509@trash.net> <4DEEBFC2.4060102@fnarfbargle.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 08 Jun 2011 05:59:01 +0200 Message-ID: <1307505541.3102.12.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: <kvm.vger.kernel.org> X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Wed, 08 Jun 2011 04:02:34 +0000 (UTC)

Eric Dumazet June 8, 2011, 3:59 a.m. UTC

Le mercredi 08 juin 2011 à 08:18 +0800, Brad Campbell a écrit :
> On 08/06/11 06:57, Patrick McHardy wrote:
> > On 07.06.2011 20:31, Eric Dumazet wrote:
> >> Le mardi 07 juin 2011 à 17:35 +0200, Patrick McHardy a écrit :
> >>
> >>> The main suspects would be NAT and TCPMSS. Did you also try whether
> >>> the crash occurs with only one of these these rules?
> >>>
> >>>> I've just compiled out CONFIG_BRIDGE_NETFILTER and can no longer access
> >>>> the address the way I was doing it, so that's a no-go for me.
> >>>
> >>> That's really weird since you're apparently not using any bridge
> >>> netfilter features. It shouldn't have any effect besides changing
> >>> at which point ip_tables is invoked. How are your network devices
> >>> configured (specifically any bridges)?
> >>
> >> Something in the kernel does
> >>
> >> u16 *ptr = addr (given by kmalloc())
> >>
> >> ptr[-1] = 0;
> >>
> >> Could be an off-one error in a memmove()/memcopy() or loop...
> >>
> >> I cant see a network issue here.
> >
> > So far me neither, but netfilter appears to trigger the bug.
> 
> Would it help if I tried some older kernels? This issue only surfaced 
> for me recently as I only installed the VM's in question about 12 weeks 
> ago and have only just started really using them in anger. I could try 
> reproducing it on progressively older kernels to see if I can find one 
> that works and then bisect from there.

Well, a bisection definitely should help, but needs a lot of time in
your case.

Could you try following patch, because this is the 'usual suspect' I had
yesterday :



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Brad Campbell June 8, 2011, 5:02 p.m. UTC | #1

On 08/06/11 11:59, Eric Dumazet wrote:

> Well, a bisection definitely should help, but needs a lot of time in
> your case.

Yes. compile, test, crash, walk out to the other building to press 
reset, lather, rinse, repeat.

I need a reset button on the end of a 50M wire, or a hardware watchdog!

Actually it's not so bad. If I turn off slub debugging the kernel panics 
and reboots itself.

This.. :
[    2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1
[    2.913066] netconsole: device eth0 not up yet, forcing it
[    3.660062] Refined TSC clocksource calibration: 3213.422 MHz.
[    3.660118] Switching to clocksource tsc
[   63.200273] r8169 0000:03:00.0: eth0: unable to load firmware patch 
rtl_nic/rtl8168e-1.fw (-2)
[   63.223513] r8169 0000:03:00.0: eth0: link down
[   63.223556] r8169 0000:03:00.0: eth0: link down

..is slowing down reboots considerably. 3.0-rc does _not_ like some 
timing hardware in my machine. Having said that, at least it does not 
randomly panic on SCSI like 2.6.39 does.

Ok, I've ruled out TCPMSS. Found out where it was being set and neutered 
it. I've replicated it with only the single DNAT rule.

> Could you try following patch, because this is the 'usual suspect' I had
> yesterday :
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 46cbd28..9f548f9 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>   		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
>   	}
>
> +#if 0
>   	if (fastpath&&
>   	size + sizeof(struct skb_shared_info)<= ksize(skb->head)) {
>   		memmove(skb->head + size, skb_shinfo(skb),
> @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>   		off = nhead;
>   		goto adjust_others;
>   	}
> -
> +#endif
>   	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
>   	if (!data)
>   		goto nodata;
>
>
>

Nope.. that's not it. <sigh> That might have changed the characteristic 
of the fault slightly, but unfortunately I got caught with a couple of 
fsck's, so I only got to test it 3 times tonight.

It's unfortunate that this is a production system, so I can only take it 
down between about 9pm and 1am. That would normally be pretty 
productive, except that an fsck of a 14TB ext4 can take 30 minutes if it 
panics at the wrong time.

I'm out of time tonight, but I'll have a crack at some bisection 
tomorrow night. Now I just have to go back far enough that it works, and 
be near enough not to have to futz around with /proc /sys or drivers.

I really, really, really appreciate you guys helping me with this. It 
has been driving me absolutely bonkers. If I'm ever in the same town as 
any of you, dinner and drinks are on me.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet June 8, 2011, 9:22 p.m. UTC | #2

Le jeudi 09 juin 2011 à 01:02 +0800, Brad Campbell a écrit :
> On 08/06/11 11:59, Eric Dumazet wrote:
> 
> > Well, a bisection definitely should help, but needs a lot of time in
> > your case.
> 
> Yes. compile, test, crash, walk out to the other building to press 
> reset, lather, rinse, repeat.
> 
> I need a reset button on the end of a 50M wire, or a hardware watchdog!
> 
> Actually it's not so bad. If I turn off slub debugging the kernel panics 
> and reboots itself.
> 
> This.. :
> [    2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1
> [    2.913066] netconsole: device eth0 not up yet, forcing it
> [    3.660062] Refined TSC clocksource calibration: 3213.422 MHz.
> [    3.660118] Switching to clocksource tsc
> [   63.200273] r8169 0000:03:00.0: eth0: unable to load firmware patch 
> rtl_nic/rtl8168e-1.fw (-2)
> [   63.223513] r8169 0000:03:00.0: eth0: link down
> [   63.223556] r8169 0000:03:00.0: eth0: link down
> 
> ..is slowing down reboots considerably. 3.0-rc does _not_ like some 
> timing hardware in my machine. Having said that, at least it does not 
> randomly panic on SCSI like 2.6.39 does.
> 
> Ok, I've ruled out TCPMSS. Found out where it was being set and neutered 
> it. I've replicated it with only the single DNAT rule.
> 
> 
> > Could you try following patch, because this is the 'usual suspect' I had
> > yesterday :
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 46cbd28..9f548f9 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> >   		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
> >   	}
> >
> > +#if 0
> >   	if (fastpath&&
> >   	size + sizeof(struct skb_shared_info)<= ksize(skb->head)) {
> >   		memmove(skb->head + size, skb_shinfo(skb),
> > @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> >   		off = nhead;
> >   		goto adjust_others;
> >   	}
> > -
> > +#endif
> >   	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
> >   	if (!data)
> >   		goto nodata;
> >
> >
> >
> 
> Nope.. that's not it. <sigh> That might have changed the characteristic 
> of the fault slightly, but unfortunately I got caught with a couple of 
> fsck's, so I only got to test it 3 times tonight.
> 
> It's unfortunate that this is a production system, so I can only take it 
> down between about 9pm and 1am. That would normally be pretty 
> productive, except that an fsck of a 14TB ext4 can take 30 minutes if it 
> panics at the wrong time.
> 
> I'm out of time tonight, but I'll have a crack at some bisection 
> tomorrow night. Now I just have to go back far enough that it works, and 
> be near enough not to have to futz around with /proc /sys or drivers.
> 
> I really, really, really appreciate you guys helping me with this. It 
> has been driving me absolutely bonkers. If I'm ever in the same town as 
> any of you, dinner and drinks are on me.

Hmm, I wonder if kmemcheck could help you, but its slow as hell, so not
appropriate for production :(



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Simon Horman June 10, 2011, 2:52 a.m. UTC | #3

On Thu, Jun 09, 2011 at 01:02:13AM +0800, Brad Campbell wrote:
> On 08/06/11 11:59, Eric Dumazet wrote:
> 
> >Well, a bisection definitely should help, but needs a lot of time in
> >your case.
> 
> Yes. compile, test, crash, walk out to the other building to press
> reset, lather, rinse, repeat.
> 
> I need a reset button on the end of a 50M wire, or a hardware watchdog!

Not strictly on-topic, but in situations where I have machines
that either don't have lights-out facilities or have broken ones
I find that network controlled power switches to be very useful.

At one point I would have need an 8000km long wire to the reset switch :-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark Lord June 10, 2011, 12:37 p.m. UTC | #4

On 11-06-09 10:52 PM, Simon Horman wrote:
> On Thu, Jun 09, 2011 at 01:02:13AM +0800, Brad Campbell wrote:
>> On 08/06/11 11:59, Eric Dumazet wrote:
>>
>>> Well, a bisection definitely should help, but needs a lot of time in
>>> your case.
>>
>> Yes. compile, test, crash, walk out to the other building to press
>> reset, lather, rinse, repeat.
>>
>> I need a reset button on the end of a 50M wire, or a hardware watchdog!


Something many of us don't realize is that nearly all Intel chipsets
have a built-in hardware watchdog timer.  This includes chipset for
consumer desktop boards as well as the big iron server stuff.

It's the "i8xx_tco" driver in the kernel enables use of them:

    modprobe i8xx_tco

Cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Henrique de Moraes Holschuh June 10, 2011, 4:43 p.m. UTC | #5

On Fri, 10 Jun 2011, Mark Lord wrote:
> Something many of us don't realize is that nearly all Intel chipsets
> have a built-in hardware watchdog timer.  This includes chipset for
> consumer desktop boards as well as the big iron server stuff.
> 
> It's the "i8xx_tco" driver in the kernel enables use of them:

That's the old module name, but yes, it is very useful in desktops and
laptops (when it works).   Server-class hardware will have a baseboard
management unit that can really power-cycle the system instead of just
rebooting.

And test it first before you depend on it triggering at a remote location,
as the firmware might cause the Intel chipset watchdog to actually hang the
box instead of causing a proper reboot (happens on the IBM thinkpad T43, for
example).

Avi Kivity June 12, 2011, 3:38 p.m. UTC | #6

On 06/10/2011 05:52 AM, Simon Horman wrote:
> At one point I would have need an 8000km long wire to the reset switch :-)

Even more off-topic, there has been a case when a 200,000,000 km long 
wire to the reset button was needed.  IIRC they got away with a watchdog.

KVM induced panic on 2.6.38[2367] & 2.6.39

Commit Message

Comments

Patch