From patchwork Tue Dec 16 10:51:51 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 5500261 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 06AAE9F1D4 for ; Tue, 16 Dec 2014 10:51:58 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 89FA920A1F for ; Tue, 16 Dec 2014 10:51:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0976C20A1C for ; Tue, 16 Dec 2014 10:51:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751113AbaLPKvx (ORCPT ); Tue, 16 Dec 2014 05:51:53 -0500 Received: from mail-qg0-f43.google.com ([209.85.192.43]:39968 "EHLO mail-qg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751075AbaLPKvw convert rfc822-to-8bit (ORCPT ); Tue, 16 Dec 2014 05:51:52 -0500 Received: by mail-qg0-f43.google.com with SMTP id i50so10089194qgf.30 for ; Tue, 16 Dec 2014 02:51:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=wy5+ixKzhbdUIE0+12GNqReu5W6FKN751UxWzVm/jDI=; b=IwQli6gzxhCDuYHavQDbrmcN1Ggopq7FN/IKPsAAlr0KiHQkwrr3gA2WEwSnK7MbwR q83tk95MZsPhRJmXlAhDkUn8o+CEB8Nk1BBuy0Ob+1E3MO03RR8OURR8+zS6K6P/As54 CYSd2+FjGAiTf43q5auGMsa8IiVwARfoFUuwVtwuK/GFlQnlK4HesOEmxtmwrlejz8md izjiAwUb5mrSpp2oLOYgIrz1CJ7S41Vpx4hqqFEZDeqt0iaaCz9Hk6sveQEXYflkL2CI v1gSpA/NEzXMV/siKUXP/03Ordws9eQn33P9xmpr2PebfjVpyM+O0ioJZ/2+5UvN6Bwt m0/A== X-Gm-Message-State: ALoCoQkxFJ3UnzDxvFftguW/OmVQQPchFaBuwsheSTFL+5I6djtBNk7WJypoyIfGx8a1o2CyjrVa MIME-Version: 1.0 X-Received: by 10.224.79.17 with SMTP id n17mr65508762qak.36.1418727111180; Tue, 16 Dec 2014 02:51:51 -0800 (PST) Received: by 10.140.108.194 with HTTP; Tue, 16 Dec 2014 02:51:51 -0800 (PST) In-Reply-To: References: Date: Tue, 16 Dec 2014 13:51:51 +0300 Message-ID: Subject: Re: [PATCH] ceph: introduce global empty snap context From: Ilya Dryomov To: =?UTF-8?B?5Lil5q2j?= Cc: Ceph Development Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Dec 16, 2014 at 11:43 AM, ?? wrote: > >> ? 2014?12?15??21:00?Ilya Dryomov ??? >> >> On Thu, Nov 13, 2014 at 11:23 PM, Ilya Dryomov wrote: >>> This patch wasn't posted on ceph-devel, posting.. >> >> Hi Zheng, >> >> Ping - I had some comments (below), mostly trivial but a potential >> cached_context leak is what worries me. >> >> Thanks, >> >> Ilya > > > here is the updated patch, thanks. > > — > From 6de279d244d56b78da70efe7968b815db0f3bb5f Mon Sep 17 00:00:00 2001 > From: "Yan, Zheng" > Date: Thu, 6 Nov 2014 15:09:41 +0800 > Subject: [PATCH] ceph: introduce global empty snap context > > Current snaphost code does not properly handle moving inode from one > empty snap realm to another empty snap realm. After changing inode's > snap realm, some dirty pages' snap context can be not equal to inode's > i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs() > > The fix is introduce a global empty snap context for all empty snap > realm. This avoids triggering the BUG() for filesystem with no snapshot. > > Fixes: http://tracker.ceph.com/issues/9928 > Signed-off-by: Yan, Zheng > --- > fs/ceph/snap.c | 28 ++++++++++++++++++++++++++-- > fs/ceph/super.c | 10 ++++++++-- > fs/ceph/super.h | 2 ++ > 3 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > index c1cc993..3035b8f 100644 > --- a/fs/ceph/snap.c > +++ b/fs/ceph/snap.c > @@ -288,6 +288,9 @@ static int cmpu64_rev(const void *a, const void *b) > return 0; > } > > + > +static struct ceph_snap_context *empty_snapc; > + > /* > * build the snap context for a given realm. > */ > @@ -328,6 +331,12 @@ static int build_snap_context(struct ceph_snap_realm *realm) > return 0; > } > > + if (num == 0 && realm->seq == empty_snapc->seq) { > + ceph_get_snap_context(empty_snapc); > + snapc = empty_snapc; > + goto done; > + } > + > /* alloc new snap context */ > err = -ENOMEM; > if (num > (SIZE_MAX - sizeof(*snapc)) / sizeof(u64)) > @@ -364,7 +373,7 @@ static int build_snap_context(struct ceph_snap_realm *realm) > dout("build_snap_context %llx %p: %p seq %lld (%u snaps)\n", > realm->ino, realm, snapc, snapc->seq, > (unsigned int) snapc->num_snaps); > - > +done: > ceph_put_snap_context(realm->cached_context); > realm->cached_context = snapc; > return 0; > @@ -465,6 +474,9 @@ void ceph_queue_cap_snap(struct ceph_inode_info *ci) > cap_snap. lucky us. */ > dout("queue_cap_snap %p already pending\n", inode); > kfree(capsnap); > + } else if (ci->i_snap_realm->cached_context == empty_snapc) { > + dout("queue_cap_snap %p empty snapc\n", inode); > + kfree(capsnap); > } else if (dirty & (CEPH_CAP_AUTH_EXCL|CEPH_CAP_XATTR_EXCL| > CEPH_CAP_FILE_EXCL|CEPH_CAP_FILE_WR)) { > struct ceph_snap_context *snapc = ci->i_head_snapc; > @@ -925,5 +937,17 @@ out: > return; > } > > +int __init ceph_snap_init(void) > +{ > + empty_snapc = ceph_create_snap_context(0, GFP_NOFS); > + if (!empty_snapc) > + return -ENOMEM; > + empty_snapc->seq = 1; > + empty_snapc->num_snaps = 0; > + return 0; > +} > > - > +void ceph_snap_exit(void) > +{ > + ceph_put_snap_context(empty_snapc); > +} > diff --git a/fs/ceph/super.c b/fs/ceph/super.c > index f6e1237..3b5c1e3 100644 > --- a/fs/ceph/super.c > +++ b/fs/ceph/super.c > @@ -1028,15 +1028,20 @@ static int __init init_ceph(void) > > ceph_flock_init(); > ceph_xattr_init(); > + ret = ceph_snap_init(); > + if (ret) > + goto out_xattr; > ret = register_filesystem(&ceph_fs_type); > if (ret) > - goto out_icache; > + goto out_snap; > > pr_info("loaded (mds proto %d)\n", CEPH_MDSC_PROTOCOL); > > return 0; > > -out_icache: > +out_snap: > + ceph_snap_exit(); > +out_xattr: > ceph_xattr_exit(); > destroy_caches(); > out: > @@ -1047,6 +1052,7 @@ static void __exit exit_ceph(void) > { > dout("exit_ceph\n"); > unregister_filesystem(&ceph_fs_type); > + ceph_snap_exit(); > ceph_xattr_exit(); > destroy_caches(); > } > diff --git a/fs/ceph/super.h b/fs/ceph/super.h > index aca2287..9f63f18 100644 > --- a/fs/ceph/super.h > +++ b/fs/ceph/super.h > @@ -699,6 +699,8 @@ extern void ceph_queue_cap_snap(struct ceph_inode_info *ci); > extern int __ceph_finish_cap_snap(struct ceph_inode_info *ci, > struct ceph_cap_snap *capsnap); > extern void ceph_cleanup_empty_realms(struct ceph_mds_client *mdsc); > +extern int __init ceph_snap_init(void); > +extern void ceph_snap_exit(void); > > /* > * a cap_snap is "pending" if it is still awaiting an in-progress > -- > 1.9.3 > > > >> >> >>> >>> From d57f921fd3c7cbe768f13bc58e547d83274f47a8 Mon Sep 17 00:00:00 2001 >>> From: "Yan, Zheng" >>> Date: Thu, 6 Nov 2014 15:09:41 +0800 >>> Subject: [PATCH] ceph: introduce global empty snap context >>> >>> Current snaphost code does not properly handle moving inode from one >>> empty snap realm to another empty snap realm. After changing inode's >>> snap realm, some dirty pages' snap context can be not equal to inode's >>> i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs() >>> >>> The fix is introduce a global empty snap context for all empty snap >>> realm. This avoids triggering the BUG() for filesystem with no snapshot. >>> >>> Fixes: http://tracker.ceph.com/issues/9928 >>> Signed-off-by: Yan, Zheng >>> --- >>> fs/ceph/snap.c | 26 +++++++++++++++++++++++++- >>> fs/ceph/super.c | 6 ++++++ >>> fs/ceph/super.h | 2 ++ >>> 3 files changed, 33 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c >>> index c1cc993225e3..24b454a0e64c 100644 >>> --- a/fs/ceph/snap.c >>> +++ b/fs/ceph/snap.c >>> @@ -288,6 +288,9 @@ static int cmpu64_rev(const void *a, const void *b) >>> return 0; >>> } >>> >>> + >>> +static struct ceph_snap_context *empty_snapc; >>> + >>> /* >>> * build the snap context for a given realm. >>> */ >>> @@ -328,6 +331,12 @@ static int build_snap_context(struct >>> ceph_snap_realm *realm) >>> return 0; >>> } >>> >>> + if (num == 0 && realm->seq == empty_snapc->seq) { >>> + ceph_get_snap_context(empty_snapc); >>> + realm->cached_context = empty_snapc; >>> + return 0; >>> >>> Can there be a cached_context here that we would need to let go of? >>> IOW, shouldn't this be >>> >>> + if (num == 0 && realm->seq == empty_snapc->seq) { >>> + ceph_get_snap_context(empty_snapc); >>> + goto out; >>> + } >>> >>> ... >>> >>> realm->ino, realm, snapc, snapc->seq, >>> (unsigned int) snapc->num_snaps); >>> >>> +out: >>> ceph_put_snap_context(realm->cached_context); >>> realm->cached_context = snapc; >>> return 0; >>> >>> ? >>> >>> + } >>> + >>> /* alloc new snap context */ >>> err = -ENOMEM; >>> if (num > (SIZE_MAX - sizeof(*snapc)) / sizeof(u64)) >>> @@ -465,6 +474,9 @@ void ceph_queue_cap_snap(struct ceph_inode_info *ci) >>> cap_snap. lucky us. */ >>> dout("queue_cap_snap %p already pending\n", inode); >>> kfree(capsnap); >>> + } else if (ci->i_snap_realm->cached_context == empty_snapc) { >>> + dout("queue_cap_snap %p empty snapc\n", inode); >>> + kfree(capsnap); >>> } else if (dirty & (CEPH_CAP_AUTH_EXCL|CEPH_CAP_XATTR_EXCL| >>> CEPH_CAP_FILE_EXCL|CEPH_CAP_FILE_WR)) { >>> struct ceph_snap_context *snapc = ci->i_head_snapc; >>> @@ -925,5 +937,17 @@ out: >>> return; >>> } >>> >>> +int ceph_snap_init(void) >>> >>> I see you put __init on the declaration. While that seems to work >>> I think it's preferred to have it on the definition: >>> >>> int __init ceph_snap_init(void) >>> >>> +{ >>> + empty_snapc = ceph_create_snap_context(0, GFP_NOFS); >>> + if (!empty_snapc) >>> + return -ENOMEM; >>> + empty_snapc->seq = 1; >>> + empty_snapc->num_snaps = 0; >>> >>> Setting num_snaps seems unnecessary, ceph_create_snap_context() does >>> that for you. >>> >>> + return 0; >>> +} >>> >>> - >>> +void ceph_snap_exit(void) >>> +{ >>> + ceph_put_snap_context(empty_snapc); >>> +} >>> diff --git a/fs/ceph/super.c b/fs/ceph/super.c >>> index f6e12377335c..2b5481f1b9c7 100644 >>> --- a/fs/ceph/super.c >>> +++ b/fs/ceph/super.c >>> @@ -1028,6 +1028,9 @@ static int __init init_ceph(void) >>> >>> ceph_flock_init(); >>> ceph_xattr_init(); >>> + ret = ceph_snap_init(); >>> + if (ret) >>> + goto out_xattr; >>> ret = register_filesystem(&ceph_fs_type); >>> if (ret) >>> goto out_icache; >>> @@ -1037,6 +1040,8 @@ static int __init init_ceph(void) >>> return 0; >>> >>> out_icache: >>> >>> You could rename the nonsensical out_icache to out_snap while you are >>> at it. >>> >>> + ceph_snap_exit(); >>> +out_xattr: >>> ceph_xattr_exit(); >>> destroy_caches(); >>> out: >>> @@ -1047,6 +1052,7 @@ static void __exit exit_ceph(void) >>> { >>> dout("exit_ceph\n"); >>> unregister_filesystem(&ceph_fs_type); >>> + ceph_snap_exit(); >>> ceph_xattr_exit(); >>> destroy_caches(); >>> } >>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h >>> index aca22879b41f..9f63f18bb783 100644 >>> --- a/fs/ceph/super.h >>> +++ b/fs/ceph/super.h >>> @@ -699,6 +699,8 @@ extern void ceph_queue_cap_snap(struct ceph_inode_info *ci); >>> extern int __ceph_finish_cap_snap(struct ceph_inode_info *ci, >>> struct ceph_cap_snap *capsnap); >>> extern void ceph_cleanup_empty_realms(struct ceph_mds_client *mdsc); >>> +extern int __init ceph_snap_init(void); >>> >>> extern int ceph_snap_init(void); will do. >>> >>> +extern void ceph_snap_exit(void); >>> >>> /* >>> * a cap_snap is "pending" if it is still awaiting an in-progress >>> -- >>> 2.1.1 I fixed it up in testing manually. Here is the diff: Please make sure I got it right. With the above Reviewed-by: Ilya Dryomov Thanks, Ilya --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index 28571cfa7f40..ce35fbd4ba5d 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -333,8 +333,8 @@ static int build_snap_context(struct ceph_snap_realm *realm) if (num == 0 && realm->seq == empty_snapc->seq) { ceph_get_snap_context(empty_snapc); - realm->cached_context = empty_snapc; - return 0; + snapc = empty_snapc; + goto done; } /* alloc new snap context */ @@ -374,6 +374,7 @@ static int build_snap_context(struct ceph_snap_realm *realm) realm->ino, realm, snapc, snapc->seq, (unsigned int) snapc->num_snaps); +done: ceph_put_snap_context(realm->cached_context); realm->cached_context = snapc; return 0; @@ -939,13 +940,12 @@ out: return; } -int ceph_snap_init(void) +int __init ceph_snap_init(void) { empty_snapc = ceph_create_snap_context(0, GFP_NOFS); if (!empty_snapc) return -ENOMEM; empty_snapc->seq = 1; - empty_snapc->num_snaps = 0; return 0; } diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 8eef47c8909a..50f06cddc94b 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -1031,13 +1031,13 @@ static int __init init_ceph(void) goto out_xattr; ret = register_filesystem(&ceph_fs_type); if (ret) - goto out_icache; + goto out_snap; pr_info("loaded (mds proto %d)\n", CEPH_MDSC_PROTOCOL); return 0; -out_icache: +out_snap: ceph_snap_exit(); out_xattr: ceph_xattr_exit(); diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 33885ad03203..e1aa32d0759d 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -701,7 +701,7 @@ extern void ceph_queue_cap_snap(struct ceph_inode_info *ci); extern int __ceph_finish_cap_snap(struct ceph_inode_info *ci, struct ceph_cap_snap *capsnap); extern void ceph_cleanup_empty_realms(struct ceph_mds_client *mdsc); -extern int __init ceph_snap_init(void); +extern int ceph_snap_init(void); extern void ceph_snap_exit(void); /*