From patchwork Thu Aug 22 06:40:22 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 2848077 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id EA0749F271 for ; Thu, 22 Aug 2013 06:46:17 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id EC4E220437 for ; Thu, 22 Aug 2013 06:46:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5C3BC20318 for ; Thu, 22 Aug 2013 06:46:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753442Ab3HVGqO (ORCPT ); Thu, 22 Aug 2013 02:46:14 -0400 Received: from linux-libre.fsfla.org ([208.118.235.54]:41578 "EHLO linux-libre.fsfla.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753289Ab3HVGqO convert rfc822-to-8bit (ORCPT ); Thu, 22 Aug 2013 02:46:14 -0400 X-Greylist: delayed 306 seconds by postgrey-1.27 at vger.kernel.org; Thu, 22 Aug 2013 02:46:13 EDT Received: from freie (home.lxoliva.fsfla.org [172.31.160.22]) by linux-libre.fsfla.org (8.14.4/8.14.4/Debian-2ubuntu2) with ESMTP id r7M6f4WR027384 for ; Thu, 22 Aug 2013 06:41:05 GMT Received: from livre.home (livre.home [172.31.160.2]) by freie (8.14.7/8.14.6) with ESMTP id r7M6eO3M017544; Thu, 22 Aug 2013 03:40:26 -0300 From: Alexandre Oliva To: ceph-devel@vger.kernel.org Subject: [PATCH] enable mds rejoin with active inodes' old parent xattrs Organization: Free thinker, not speaking for the GNU Project Date: Thu, 22 Aug 2013 03:40:22 -0300 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When the parent xattrs of active inodes that the mds attempts to open during rejoin lack pool info (struct_v < 5), this field will be filled in with -1, causing the mds to retry fetching a backtrace with a pool number that matches the expected value, which fails and causes the err==-ENOENT branch to be taken and retry pool 1, which succeeds, but with pool -1, and so keeps on bouncing between the two retry cases forever. This patch arranges for the mds to go along with pool -1 instead of insisting that it be refetched, enabling it to complete recovery instead of eating cpu, network bandwidth and metadata osd's resources like there's no tomorrow, in what AFAICT is an infinite and very busy loop. This is not a new problem: I've had it even before upgrading from Cuttlefish to Dumpling, I'd just never managed to track it down, and force-unmounting the filesystem and then restarting the mds was an easier (if inconvenient) work-around, particularly because it always hit when the filesystem was under active, heavy-ish use (or there wouldn't be much reason for caps recovery ;-) There are two issues not addressed in this patch, however. One is that nothing seems to proactively update the parent xattr when it is found to be outdated, so it remains out of date forever. Not even renaming top-level directories causes the xattrs to be recursively rewritten. AFAICT that's a bug. The other is that inodes that don't have a parent xattr (created by even older versions of ceph) are reported as non-existing in the mds rejoin message, because the absence of the parent xattr is signaled as a missing inode (“failed to reconnect caps for missing inodes”). I suppose this may cause more serious recovery problems. I suppose a global pass over the filesystem tree updating parent xattrs that are out-of-date would be desirable, if we find any parent xattrs still lacking current information; it might make sense to activate it as a background thread from the backtrace decoding function, when it finds a parent xattr that's too out-of-date, or as a separate client (ceph-fsck?). Signed-off-by: Alexandre Oliva Reviewed-by: Yan, Zheng --- src/mds/MDCache.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc index e592dde..b6c37aec 100644 --- a/src/mds/MDCache.cc +++ b/src/mds/MDCache.cc @@ -7940,7 +7940,7 @@ void MDCache::_open_ino_backtrace_fetched(inodeno_t ino, bufferlist& bl, int err inode_backtrace_t backtrace; if (err == 0) { ::decode(backtrace, bl); - if (backtrace.pool != info.pool) { + if (backtrace.pool != info.pool && backtrace.pool != -1) { dout(10) << " old object in pool " << info.pool << ", retrying pool " << backtrace.pool << dendl; info.pool = backtrace.pool;