From patchwork Sun May 15 13:59:39 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dyweni - Ceph-Devel X-Patchwork-Id: 786022 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p4FDxe39015202 for ; Sun, 15 May 2011 13:59:41 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754380Ab1EON7j (ORCPT ); Sun, 15 May 2011 09:59:39 -0400 Received: from pl1.haspere.com ([208.111.35.220]:40673 "EHLO pl1.haspere.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753617Ab1EON7j (ORCPT ); Sun, 15 May 2011 09:59:39 -0400 Received: from pl1.haspere.com (localhost [127.0.0.1]) by pl1.haspere.com (Postfix) with ESMTP id 2C8962B447; Sun, 15 May 2011 08:59:39 -0500 (CDT) MIME-Version: 1.0 Date: Sun, 15 May 2011 08:59:39 -0500 From: Dyweni - Ceph-Devel To: Cc: Ceph Devel Subject: Re: Segfault when creating new cluster Reply-To: Mail-Reply-To: In-Reply-To: References: Message-ID: <316e3ce8aebc72567c343c8117bbedc0@pl1.haspere.com> X-Sender: YS3fpFE2ykfB@dyweni.com User-Agent: Roundcube Webmail/0.5.1 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Sun, 15 May 2011 13:59:41 +0000 (UTC) Hi List! I have tracked down the bad commit to de640d85fa3e0e5e5a31704eab5a8714a1ffe867. I have also created a patch that fixes this error on my test cluster. I am attaching it here for peer-review. --- Thanks, Dyweni On Sat, 14 May 2011 19:17:42 -0500, Dyweni - Ceph-Devel wrote: > Hi List! > > When creating a brand new cluster, I get the following segmentation > fault: > > === osd.2 === > pushing conf and monmap to ceph2 > Warning: Permanently added 'ceph2' (ECDSA) to the list of known > hosts. > umount: /data/osd2: not mounted > umount: /dev/sda: not mounted > > WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL > WARNING! - see http://btrfs.wiki.kernel.org [1] before using > > fs created label (null) on /dev/sda > nodesize 4096 leafsize 4096 sectorsize 4096 size 74.53GB > Btrfs Btrfs v0.19 > Scanning for Btrfs filesystems > ** WARNING: Ceph is still under development. Any feedback can be > directed ** > ** at ceph-devel@vger.kernel.org [2] or > http://ceph.newdream.net/ [3]. ** > *** Caught signal (Segmentation fault) ** > in thread 0xb70f2b30 > ceph version 0.27.1-401-g6af0379 > (commit:6af0379e27ac71a7abd8c9ebb0145ae8b9f66cc4) > 1: (ceph::BackTrace::BackTrace(int)+0x1f) [0x8465fcf] > 2: /usr/bin/cosd() [0x84d8844] > 3: [0xb77f1400] > 4: (pthread_spin_lock()+0x6) [0xb77c38d6] > 5: (ceph::Spinlock::lock()+0x20) [0x82e42e8] > 6: (ceph::atomic_t::dec()+0x12) [0x82e4418] > 7: (RefCountedObject::put()+0x15) [0x82e48d9] > 8: (MonClient::get_monmap_privately()+0x5f2) [0x84c81ec] > 9: (main()+0x976) [0x82e0cce] > 10: (__libc_start_main()+0xd9) [0xb7109ba9] > 11: /usr/bin/cosd() [0x82e0101] > /usr/sbin/mkcephfs: line 239: 859 Segmentation fault (core > dumped) $BINDIR/cosd -c $conf --monmap $dir/monmap -i $id --mkfs > failed: 'ssh ceph2 /usr/sbin/mkcephfs -d /tmp/mkcephfs.6ySmaVjdFm > --init-daemon osd.2' > > Here is the GDB backtrace: > > (gdb) bt > #0 0xb77c6d6f in raise () from /lib/libpthread.so.0 > #1 0x084d870f in reraise_fatal (signum=11) at common/signal.cc:63 > #2 0x084d88ce in handle_fatal_signal (signum=11) at > common/signal.cc:110 > #3 > #4 0xb77c38d6 in pthread_spin_lock () from /lib/libpthread.so.0 > #5 0x082e42e8 in ceph::Spinlock::lock (this=0x4) at > include/Spinlock.h:97 > #6 0x082e4418 in ceph::atomic_t::dec (this=0x4) at > include/atomic.h:75 > #7 0x082e48d9 in RefCountedObject::put (this=0x0) at > msg/Message.h:160 > #8 0x084c81ec in MonClient::get_monmap_privately (this=0xbf81baf4) at > mon/MonClient.cc:230 > #9 0x082e0cce in main (argc=8, argv=0xbf81c1f4) at cosd.cc:130 > > My kernel is: > Linux version 2.6.39-rc7-git5-20110514-0905 (root@phenom) (gcc > version > 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #1 SMP Sat May 14 09:07:07 CDT > 2011 > > -- > Thanks, > Dyweni > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in > the body of a message to majordomo@vger.kernel.org [4] > More majordomo info at http://vger.kernel.org/majordomo-info.html [5] From acf86f21d3c11e8edd82692a4fa27a5b88c538b0 Mon Sep 17 00:00:00 2001 From: root Date: Sun, 15 May 2011 08:54:13 -0500 Subject: [PATCH] fix segfault introduced by commit de640d85fa3e0e5e5a31704eab5a8714a1ffe867 That commit introduces the line 'cur_con->put()' which has the possibility of being called while cur_con is not initialized. --- src/mon/MonClient.cc | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mon/MonClient.cc b/src/mon/MonClient.cc index 70e14e9..9707dfe 100644 --- a/src/mon/MonClient.cc +++ b/src/mon/MonClient.cc @@ -227,8 +227,10 @@ int MonClient::get_monmap_privately() hunting = true; // reset this to true! cur_mon.clear(); - cur_con->put(); - cur_con = NULL; + if (cur_con) { + cur_con->put(); + cur_con = NULL; + } if (monmap.epoch) return 0;