From patchwork Tue Mar 25 13:29:19 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Elder X-Patchwork-Id: 3887821 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id E7714BF540 for ; Tue, 25 Mar 2014 13:29:06 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 10F0D201B9 for ; Tue, 25 Mar 2014 13:29:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 20A9D20179 for ; Tue, 25 Mar 2014 13:29:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751624AbaCYN3D (ORCPT ); Tue, 25 Mar 2014 09:29:03 -0400 Received: from mail-qc0-f175.google.com ([209.85.216.175]:54620 "EHLO mail-qc0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751499AbaCYN3C (ORCPT ); Tue, 25 Mar 2014 09:29:02 -0400 Received: by mail-qc0-f175.google.com with SMTP id e16so504864qcx.34 for ; Tue, 25 Mar 2014 06:29:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=EVWIJAvzBxx2H7o6/7j1jmBiXQfhvwRT6dvP/pcFCQk=; b=YSL2dzhrUqAA62oNEQa4eSqTdqKVjSjMm6IHo7nVF9eY8gXRFnteblGJdJXMdyJRGz hGV4c2o//bo7nmsmG2e/Jgn/oz+IlA+R9BEqrUZto5/yF0EtibkeilYwTXRnWLtCOsim reXgm7qnB/n1VxOmMMxHY4H3H5pzzyaSjzRezoZrpsG1E6MZZ9X2u6Q7lKj0M+PHY09Y 5c2q4x3+0T4MhBCLoGUXCDigbxRcZrCF2m3US8601m4393wbMysItvuXh7PpSwMXzXeY yif7uuDm8vuMQYEJcl9UjVcL14eI7DRsHRqTIA4ajc9GmH0M22oXCB+SQMfo9MuYZs1B JQww== X-Gm-Message-State: ALoCoQkcwEtGeZPFhL5bhOFM3YIbtBqEYzqPit8s4FpneQQcxPKIrWWH/3qtZmCeYFjJb7pYs6Bm X-Received: by 10.224.5.73 with SMTP id 9mr3859911qau.71.1395754140775; Tue, 25 Mar 2014 06:29:00 -0700 (PDT) Received: from [172.22.22.4] (c-71-195-31-37.hsd1.mn.comcast.net. [71.195.31.37]) by mx.google.com with ESMTPSA id v2sm32410103qaf.10.2014.03.25.06.28.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 25 Mar 2014 06:29:00 -0700 (PDT) Message-ID: <533184AF.9050101@ieee.org> Date: Tue, 25 Mar 2014 08:29:19 -0500 From: Alex Elder User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Olivier Bonvalet , Ilya Dryomov CC: Ceph Development Subject: Re: Issue #5876 : assertion failure in rbd_img_obj_callback() References: <1395736765.2823.29.camel@localhost> <53316D18.7040103@ieee.org> <53317BC2.9010700@ieee.org> <1395753516.2823.37.camel@localhost> In-Reply-To: <1395753516.2823.37.camel@localhost> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 03/25/2014 08:18 AM, Olivier Bonvalet wrote: > > > Le mardi 25 mars 2014 à 14:57 +0200, Ilya Dryomov a écrit : >> On Tue, Mar 25, 2014 at 2:51 PM, Alex Elder wrote: >>> On 03/25/2014 07:34 AM, Ilya Dryomov wrote: >>>>> On 03/25/2014 04:04 AM, Ilya Dryomov wrote: >>>>>> On Tue, Mar 25, 2014 at 10:39 AM, Olivier Bonvalet wrote: >>>>>>> Hi, >>>>>>> >>>>>>> what can/should I do to help fix that problem ? >>>>>>> >>>>>>> for now, RBD kernel client hang on : >>>>>>> Assertion failure in rbd_img_obj_callback() at line 2131: >>>>>>> rbd_assert(which >= img_request->next_completion); >>>>> >>>>> If you can build your own kernel as Ilya says I'd like to >>>>> see the values of which and img_request->next_completion >>>>> here. >>>> >>>> Looks like which was 1, which means that next_completion had to be 2 or >>>> greater. I miss solaris crash dumps ... >>>> >>>> On a different note, why are we asserting next_completion outside of >>>> a spinlock which is supposed to protect next_completion? >>> >>> That's a very good point (which could be easily remedied by moving >>> the assertion down a couple lines). The image object request (#1) >>> in this case will have been marked done at this point; it's possible >>> that request #2 (or later) was concurrently getting handled by the >>> for_each_obj_request_from() loop below in that same function, but >>> may not have updated next_completion yet. >>> >>> So that *could* explain the tripped assertion. The assertion >>> should be moved in any case, it's a bug. >>> >>> That being said, it doesn't explain the other assertion: >>> rbd_assert(img_request != NULL); >>> So there's at least one other thing going on. >> >> Yeah, exactly my thoughts. >> >> Thanks, >> >> Ilya > > So, a (partial) fix can be this patch ? > > --- a/drivers/block/rbd.c > +++ b/drivers/block/rbd.c > @@ -2123,6 +2123,7 @@ static void rbd_img_obj_callback(struct rbd_obj_request *obj_request) > rbd_assert(obj_request_img_data_test(obj_request)); > img_request = obj_request->img_request; > > + spin_lock_irq(&img_request->completion_lock); > dout("%s: img %p obj %p\n", __func__, img_request, obj_request); > rbd_assert(img_request != NULL); > rbd_assert(img_request->obj_request_count > 0); > @@ -2130,7 +2131,6 @@ static void rbd_img_obj_callback(struct rbd_obj_request *obj_request) > rbd_assert(which < img_request->obj_request_count); > rbd_assert(which >= img_request->next_completion); > > - spin_lock_irq(&img_request->completion_lock); > if (which != img_request->next_completion) > goto out; Yes, roughly. I'd do the following instead. It would be great to learn whether it eliminates the one form of assertion failure you were seeing. -Alex --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -2128,11 +2128,11 @@ static void rbd_img_obj_callback(struct rbd_assert(img_request->obj_request_count > 0); rbd_assert(which != BAD_WHICH); rbd_assert(which < img_request->obj_request_count); - rbd_assert(which >= img_request->next_completion); spin_lock_irq(&img_request->completion_lock); if (which != img_request->next_completion) goto out; + rbd_assert(which > img_request->next_completion); for_each_obj_request_from(img_request, obj_request) { rbd_assert(more);