From patchwork Wed Aug 21 08:27:34 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sha Zhengju X-Patchwork-Id: 2847595 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 368019F239 for ; Wed, 21 Aug 2013 08:27:53 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0D1EC204C7 for ; Wed, 21 Aug 2013 08:27:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A2F44204C4 for ; Wed, 21 Aug 2013 08:27:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751360Ab3HUI1s (ORCPT ); Wed, 21 Aug 2013 04:27:48 -0400 Received: from mail-pa0-f50.google.com ([209.85.220.50]:62119 "EHLO mail-pa0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751285Ab3HUI1q (ORCPT ); Wed, 21 Aug 2013 04:27:46 -0400 Received: by mail-pa0-f50.google.com with SMTP id fb10so560230pad.23 for ; Wed, 21 Aug 2013 01:27:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=BQPNaZqIammK1oqP8U3ld3IHCWAg9X3KYEitZ2lfDx4=; b=GznqTjaa9v7Wd3pXeb3ydU7/3DD2c5Ah07Jx287qY1siwr6yivhQDbpsuzRr6M4eE7 hKYwSIPSvJA7eMPcoU/y39UaKtOW6s2gwco2S93Zafgbk8Czz7+GhzZEwf9Sd4lCWCYm ZA4qwuvyxE7E6Am8b+XgUpaZyA0GA+rldFMJkNek3WxkW/C/HNOFPOOTzc7h0VtWyFUl LnVDn95Ih4BWTEf7y0hh5sf6Bdr5HmkoC+kLvwS7fzqP6q1SoQvhjvsmOmNWvuJC6Kuj XUJ4FZ2s1RwuZ0Y66gHkjF3OXMzczdqgmvZLAYaa3Hs0Ht+pBxF0A+FSvXjVqHn5if8I +mwg== X-Received: by 10.66.250.200 with SMTP id ze8mr646082pac.100.1377073666260; Wed, 21 Aug 2013 01:27:46 -0700 (PDT) Received: from handai-Latitude-E6420.taobao.ali.com ([182.92.247.2]) by mx.google.com with ESMTPSA id ia5sm6876092pbc.42.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 21 Aug 2013 01:27:45 -0700 (PDT) From: Sha Zhengju To: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org Cc: sage@inktank.com, ukernel@gmail.com, mhocko@suse.cz, akpm@linux-foundation.org, Sha Zhengju Subject: [PATCH V6] ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem Date: Wed, 21 Aug 2013 16:27:34 +0800 Message-Id: <1377073654-6232-1-git-send-email-handai.szj@taobao.com> X-Mailer: git-send-email 1.7.9.5 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Following we will begin to add memcg dirty page accounting around __set_page_dirty_{buffers,nobuffers} in vfs layer, so we'd better use vfs interface to avoid exporting those details to filesystems. Since vfs set_page_dirty() should be called under page lock, here we don't need elaborate codes to handle racy anymore, and two WARN_ON() are added to detect such exceptions. Thanks very much for Sage and Yan Zheng's coaching! I tested it in a two server's ceph environment that one is client and the other is mds/osd/mon, and run the following fsx test from xfstests: ./fsx 1MB -N 50000 -p 10000 -l 1048576 ./fsx 10MB -N 50000 -p 10000 -l 10485760 ./fsx 100MB -N 50000 -p 10000 -l 104857600 The fsx does lots of mmap-read/mmap-write/truncate operations and the tests completed successfully without triggering any of WARN_ON. Signed-off-by: Sha Zhengju Reviewed-by: Sage Weil --- fs/ceph/addr.c | 42 ++++++++++++++---------------------------- 1 file changed, 14 insertions(+), 28 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index afb2fc2..01891f4 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -72,13 +72,15 @@ static int ceph_set_page_dirty(struct page *page) struct ceph_inode_info *ci; int undo = 0; struct ceph_snap_context *snapc; + int ret; if (unlikely(!mapping)) return !TestSetPageDirty(page); - if (TestSetPageDirty(page)) { + if (PageDirty(page)) { dout("%p set_page_dirty %p idx %lu -- already dirty\n", mapping->host, page, page->index); + BUG_ON(!PagePrivate(page)); return 0; } @@ -107,35 +109,19 @@ static int ceph_set_page_dirty(struct page *page) snapc, snapc->seq, snapc->num_snaps); spin_unlock(&ci->i_ceph_lock); - /* now adjust page */ - spin_lock_irq(&mapping->tree_lock); - if (page->mapping) { /* Race with truncate? */ - WARN_ON_ONCE(!PageUptodate(page)); - account_page_dirtied(page, page->mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - - /* - * Reference snap context in page->private. Also set - * PagePrivate so that we get invalidatepage callback. - */ - page->private = (unsigned long)snapc; - SetPagePrivate(page); - } else { - dout("ANON set_page_dirty %p (raced truncate?)\n", page); - undo = 1; - } - - spin_unlock_irq(&mapping->tree_lock); - - if (undo) - /* whoops, we failed to dirty the page */ - ceph_put_wrbuffer_cap_refs(ci, 1, snapc); + /* + * Reference snap context in page->private. Also set + * PagePrivate so that we get invalidatepage callback. + */ + BUG_ON(PagePrivate(page)); + page->private = (unsigned long)snapc; + SetPagePrivate(page); - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); + ret = __set_page_dirty_nobuffers(page); + WARN_ON(!PageLocked(page)); + WARN_ON(!page->mapping); - BUG_ON(!PageDirty(page)); - return 1; + return ret; } /*