From patchwork Wed Mar 26 00:00:22 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alex Elder X-Patchwork-Id: 3891511 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 0DF34BF540 for ; Wed, 26 Mar 2014 00:00:10 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2AEBC201EC for ; Wed, 26 Mar 2014 00:00:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0EF61201B9 for ; Wed, 26 Mar 2014 00:00:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754107AbaCZAAF (ORCPT ); Tue, 25 Mar 2014 20:00:05 -0400 Received: from mail-qa0-f54.google.com ([209.85.216.54]:48278 "EHLO mail-qa0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753818AbaCZAAE (ORCPT ); Tue, 25 Mar 2014 20:00:04 -0400 Received: by mail-qa0-f54.google.com with SMTP id w8so1484844qac.13 for ; Tue, 25 Mar 2014 17:00:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=7FOrrt801snvX4GwDaxtr4NIim2GTdGun7grciWksYA=; b=hGcaJyY17hEMp8dU+qzd7wxGRnHi0hUFjS7ueCkwa+diGIptDH55BAmnWBhFppmCVo Pmt57E2Gr86UmBvWHRfpTzQA39L0O9uAGHCTeRH8R1p0VqjyiN9ENW47rU7MnssO/3xC FMQmKNIqDN0gOVJH5i/amcyz3uAvuOfY2rM7FevL+fEwCISF4LlVeAhtnSEBM9Lr4EcK roCJNs9qqQKffPVxcxzjR3h14y06+fEePBnM0VCHWn8qC6+i5AoFxX5tL+RTiP6Uf6LS l9rsrohJP+xGYmG/AYhpOVoRCw9ZiOtrCyS4aJV6FXAx7Wj3L8cAdriJRpPBRbVWjSdA SmwQ== X-Gm-Message-State: ALoCoQkwO/B8HceruCkijytsDpSHeqZWbrJgdyQdU0io+mDu0DJvcgy6RsBeCWWRvcwA1fc0qMVE X-Received: by 10.224.128.138 with SMTP id k10mr87680396qas.68.1395792003030; Tue, 25 Mar 2014 17:00:03 -0700 (PDT) Received: from [172.22.22.4] (c-71-195-31-37.hsd1.mn.comcast.net. [71.195.31.37]) by mx.google.com with ESMTPSA id h90sm29735660qgf.3.2014.03.25.17.00.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 25 Mar 2014 17:00:02 -0700 (PDT) Message-ID: <53321896.1080606@ieee.org> Date: Tue, 25 Mar 2014 19:00:22 -0500 From: Alex Elder User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Olivier Bonvalet CC: Ilya Dryomov , Ceph Development Subject: Re: Issue #5876 : assertion failure in rbd_img_obj_callback() References: <1395736765.2823.29.camel@localhost> <53317BC2.9010700@ieee.org> <1395753516.2823.37.camel@localhost> <533184AF.9050101@ieee.org> <5331853D.40408@ieee.org> <1395767705.9967.5.camel@localhost> <5331C05D.1060008@ieee.org> <1395773582.2076.10.camel@localhost> <5331D2E8.6060002@ieee.org> <1395778894.2076.12.camel@localhost> <1395780835.2076.15.camel@localhost> <1395781847.2076.21.camel@localhost> <1395782577.2076.23.camel@localhost> <1395783675.2076.26.camel@localhost> <1395784476.2076.28.camel@localhost> <1395785839.2076.30.camel@localhost> <5332075F.8080105@ieee.org> <1395788695.2076.35.camel@loca lhost> In-Reply-To: <1395788695.2076.35.camel@localhost> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 03/25/2014 06:04 PM, Olivier Bonvalet wrote: > Le mardi 25 mars 2014 à 17:46 -0500, Alex Elder a écrit : >> On 03/25/2014 05:17 PM, Olivier Bonvalet wrote: >>> >>> I now have this one very often (here 5 minutes after the host boot) : >> >> I am fairly sure this indicates a use-after-free scenario, >> likely caused by something getting deleted before every >> user was done with it. >> >> I believe Ilya is done for the night; I'm going to spend some >> time looking at this to try to determine the cause. If you >> are willing I'd love to have you try whatever fix I come up >> with. I'd rather find a fix than just collect more information, >> but I may need to get more, we'll see. >> >> Thank you for all your reports, they help a lot. >> >> -Alex > > Ok. I can apply some patch to help debug that yes. > I will try to reproduce on a different host, without customer data. > > But I think I will stop here for the night too. > > Thanks for your time, > Olivier Here's something that will provide a few more pieces of information. If you're around and you're able to try it out it might confirm something had likely been destroyed. I'll keep sending stuff as I come up with it (even though I realize you may not be around until morning). -Alex --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Index: b/drivers/block/rbd.c =================================================================== --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -2132,6 +2132,35 @@ static void rbd_img_obj_callback(struct spin_lock_irq(&img_request->completion_lock); if (which > img_request->next_completion) goto out; + if (which != img_request->next_completion) { + printk("%s: bad image object request information:\n", __func__); + printk("obj_request %p\n", obj_request); + printk(" ->object_name <%s>\n", obj_request->object_name); + printk(" ->offset %llu\n", obj_request->offset); + printk(" ->length %llu\n", obj_request->length); + printk(" ->type 0x%x\n", (u32)obj_request->type); + printk(" ->flags 0x%lx\n", obj_request->flags); + printk(" ->img_request %p\n", obj_request->img_request); + printk(" ->which %u\n", obj_request->which); + printk(" ->xferred %llu\n", obj_request->xferred); + printk(" ->result %d\n", obj_request->result); + printk(" ->kref %d\n", atomic_read(&obj_request->kref)); + + printk("img_request %p\n", img_request); + printk(" ->snap 0x%016llx\n", img_request->snap_id); + printk(" ->offset %llu\n", img_request->offset); + printk(" ->length %llu\n", img_request->length); + printk(" ->flags 0x%lx\n", img_request->flags); + printk(" ->obj_request_count %u\n", + img_request->obj_request_count); + printk(" ->next_completion %u\n", + img_request->next_completion); + printk(" ->xferred %llu\n", img_request->xferred); + printk(" ->result %d\n", img_request->result); + printk(" ->obj_requests head %p\n", + img_request->obj_requests.next); + printk(" ->kref %d\n", atomic_read(&img_request->kref)); + } rbd_assert(which == img_request->next_completion); for_each_obj_request_from(img_request, obj_request) {