From patchwork Tue Sep 25 03:38:47 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Elder X-Patchwork-Id: 1501811 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 6330E40B1E for ; Tue, 25 Sep 2012 03:38:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752598Ab2IYDiw (ORCPT ); Mon, 24 Sep 2012 23:38:52 -0400 Received: from mail-ie0-f174.google.com ([209.85.223.174]:44271 "EHLO mail-ie0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752208Ab2IYDiu (ORCPT ); Mon, 24 Sep 2012 23:38:50 -0400 Received: by ieak13 with SMTP id k13so12835964iea.19 for ; Mon, 24 Sep 2012 20:38:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=8DLpt9pphW7wE5y1zEAJEZ8Wi6d4np6Ht2asTR0k3Fo=; b=abMpZMAd4b8hc0Num4vp3W3hO6qX8KUIu9nzNuXLYJrYXNO4ck4TzOb/3NYWA5mK89 ggqURhR9sV/gMiYqKC2AOGXgy8lj0AhQRmyyou8nVJptoNzFiH/H0Bpq7/qmS7ravdbv qYYVdLCdvpVtiEUWHIqSlIpsDAjsPkn48dDSD8WXAOhYIhr8dbNWNFaolA4HefZZ9gbu vVKoaIN8ciHlQvDWVME7V5wP1W3ug43Pz3Jb+FQh4UNP99PuQn9ct6ezh7sSQ/u+BaPi +aJ/yD8AzuFmxUhL1Yh6iq3PhMjmu942RRK/ixhhrIDXZ/GMRs0cJuWPyU7o6jDGPTJT DfZA== Received: by 10.50.236.6 with SMTP id uq6mr7064461igc.50.1348544329341; Mon, 24 Sep 2012 20:38:49 -0700 (PDT) Received: from [172.22.22.4] (c-71-195-31-37.hsd1.mn.comcast.net. [71.195.31.37]) by mx.google.com with ESMTPS id bo7sm13267991igb.2.2012.09.24.20.38.47 (version=SSLv3 cipher=OTHER); Mon, 24 Sep 2012 20:38:48 -0700 (PDT) Message-ID: <50612747.6090805@inktank.com> Date: Mon, 24 Sep 2012 22:38:47 -0500 From: Alex Elder User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Christian Huang CC: ceph-devel Subject: Re: CEPH RBD client kernel panic when OSD connection is lost on kernel 3.2, 3.5, 3.5.4 References: <5060E5B5.8040807@inktank.com> In-Reply-To: X-Gm-Message-State: ALoCoQmqCIb2LdV+cDocIQsA6y5rRPCO3Ok27LxGRHZrg8snC7K4CFdzVsYJUxoE3E9Rdj/18WJM Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org On 09/24/2012 08:25 PM, Christian Huang wrote: > Hi Alex, > we have used several kernel versions, some built from source, > some stock kernel, from ubuntu repository. > > for the version you are referring to, we used a stock kernel from > ubuntu repository. > > for building from source, we follow instructions from this page > http://blog.avirtualhome.com/compile-linux-kernel-3-2-for-ubuntu-11-10/ > and use the following tag from precise git repo. > Ubuntu-3.2.0-29.46 These two bits of information: > please also note that we reproduced the issue with kernel 3.5.4 > from kernel ppa > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5.4-quantal/ > > it seems the following version does not have the issue > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc7-quantal/ ...are very helpful. There is a very important bug that got fixed between those two releases, and it has symptoms like what you are reporting. I can't say with 100% confidence that you are hitting this, but it it appears you could be. The fix is very simple, and you should be able to patch your own code to check to see if it makes the problem go away. If you do, please report back whether you find it fixes the problem. Tomorrow I'll see if I can trace the particulars of the problem you are reporting to this issue. -Alex From 02f7c002c9af475df6b2a1b64066bcdaf53cb7dc Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" Date: Wed, 6 Jun 2012 19:35:55 -0500 Subject: [PATCH] rbd: Clear ceph_msg->bio_iter for retransmitted message The bug can cause NULL pointer dereference in write_partial_msg_pages Signed-off-by: Zheng Yan Reviewed-by: Alex Elder (cherry picked from commit 43643528cce60ca184fe8197efa8e8da7c89a037) --- net/ceph/messenger.c | 4 ++++ 1 file changed, 4 insertions(+) m, con->out_seq, le16_to_cpu(m->hdr.type), diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index f0e34ff..d372b34 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -563,6 +563,10 @@ static void prepare_write_message(struct ceph_connection *con) m->hdr.seq = cpu_to_le64(++con->out_seq); m->needs_out_seq = false; } +#ifdef CONFIG_BLOCK + else + m->bio_iter = NULL; +#endif dout("prepare_write_message %p seq %lld type %d len %d+%d+%d %d pgs\n",