From patchwork Mon Nov 26 08:54:32 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yan, Zheng" X-Patchwork-Id: 1800851 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 797393FC23 for ; Mon, 26 Nov 2012 08:54:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754086Ab2KZIyg (ORCPT ); Mon, 26 Nov 2012 03:54:36 -0500 Received: from mga01.intel.com ([192.55.52.88]:65058 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754058Ab2KZIyg (ORCPT ); Mon, 26 Nov 2012 03:54:36 -0500 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 26 Nov 2012 00:54:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.83,319,1352102400"; d="scan'208";a="254203704" Received: from zyan5-mobl.sh.intel.com (HELO [10.239.36.140]) ([10.239.36.140]) by fmsmga002.fm.intel.com with ESMTP; 26 Nov 2012 00:54:34 -0800 Message-ID: <50B32E48.50403@intel.com> Date: Mon, 26 Nov 2012 16:54:32 +0800 From: "Yan, Zheng" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2 MIME-Version: 1.0 To: Sage Weil CC: ceph-devel@vger.kernel.org Subject: Re: [PATCH] mds: sort dentries when committing dir fragment References: <1353845850-14187-1-git-send-email-zheng.z.yan@intel.com> <50B2D196.8010509@intel.com> In-Reply-To: <50B2D196.8010509@intel.com> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org On 11/26/2012 10:19 AM, Yan, Zheng wrote: > On 11/26/2012 06:32 AM, Sage Weil wrote: >> I pushed an alternative approach to wip-tmap. >> >> This sorting is an artifact of tmap's crummy implementation, and the mds >> workaround will need to get reverted when we switch to omap. Instead, fix >> tmap so that it will tolerate unsorted keys. (Also, drop the ENOENT on rm >> on missing key.) >> >> Eventually we can deprecate and remove tmap entirely... >> >> What do you think? > > This approach is cleaner than mine. But I think your fix isn't enough because > MDS may provide tmap contains misordered items to the TMAPPUT method. Misordered > items will confuse future TMAPUP. This fix is either sorting items when handling > TMAPPUT or searching forward for any potential misordered items when TMAP_SET > wants to add a new item or TMAP_RM fails to find an item. > how about the patch attached below ----- From e3c4fb68dc6c7592b7f53ab7a98b561167b567df Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" Date: Mon, 26 Nov 2012 12:28:30 +0800 Subject: [PATCH] osd: check misordered items in TMAP There is a bug in the MDS that causes misordered items in TMAPs that store dir fragments. Misordered items confuse TMAP updates. Fix this by adding code to do_tmapup() to check if there are misordered items that may affect the correctness of TMAP update. Fall back to use do_tmapup_slow if misordered items are found. Signed-off-by: Yan, Zheng --- src/osd/ReplicatedPG.cc | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index 48cba10..71f5363 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -1803,8 +1803,17 @@ int ReplicatedPG::do_tmapup(OpContext *ctx, bufferlist::iterator& bp, OSDOp& osd nkeys--; } if (!ip.end()) { + string last_key = nextkey; + ::decode(nextkey, ip); ::decode(nextval, ip); + + if (nextkey <= last_key) { + dout(0) << "tmapup warning: key '" << key << "' < previous key '" << last_key + << "', falling back to an inefficient (unsorted) update" << dendl; + bp = orig_bp; + return do_tmapup_slow(ctx, bp, osd_op, newop.outdata); + } } else { have_next = false; } @@ -1848,6 +1857,35 @@ int ReplicatedPG::do_tmapup(OpContext *ctx, bufferlist::iterator& bp, OSDOp& osd ::encode(nextkey, newkeydata); ::encode(nextval, newkeydata); dout(20) << " keep " << nextkey << " " << nextval.length() << dendl; + + /* + * TMAPs for storing dir fragments may contain misordered items. + * Only items corresponding to dentries that have the same name + * prefix can be out of order. + */ + size_t lastlen = nextkey.find_last_of('_'); + if (lastlen > 0 && lastlen != string::npos) { + string last_key = nextkey; + bufferlist::iterator tmp_ip = ip; + while (!tmp_ip.end()) { + ::decode(nextkey, tmp_ip); + ::decode(nextval, tmp_ip); + + if (nextkey <= last_key) { + dout(0) << "tmapup warning: key '" << nextkey << "' < previous key '" << last_key + << "', falling back to an inefficient (unsorted) update" << dendl; + bp = orig_bp; + return do_tmapup_slow(ctx, bp, osd_op, newop.outdata); + } + + size_t len = nextkey.find_last_of('_'); + if (len == 0 || len == string::npos) + break; + len = min(len, lastlen); + if (last_key.compare(0, len, nextkey, 0, len) < 0) + break; + } + } } if (!ip.end()) { bufferlist rest;