From patchwork Mon Apr 6 19:58:34 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 6164421 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 02638BF4A6 for ; Mon, 6 Apr 2015 20:02:54 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F090520121 for ; Mon, 6 Apr 2015 20:02:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D8EE5200E1 for ; Mon, 6 Apr 2015 20:02:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753835AbbDFUCh (ORCPT ); Mon, 6 Apr 2015 16:02:37 -0400 Received: from mail-qc0-f177.google.com ([209.85.216.177]:32908 "EHLO mail-qc0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753622AbbDFUAj (ORCPT ); Mon, 6 Apr 2015 16:00:39 -0400 Received: by qcrf4 with SMTP id f4so15117746qcr.0; Mon, 06 Apr 2015 13:00:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=PrdXlzciTbnQuKIW6396JmUtp6+avE7qbo/usn2akxQ=; b=zkeYRhHcv2r86m2z2kqdQ9c0Q9V4eH909F73PYgtQ6UxnJ3WcJk89kRvr0OGMBb6um zwchlXwu3XmPeTY5grqtZWqlfMEkJzUyBmWej6nVjT6avPpGbB385NK0xtLHUFMqivBg L9t1H7//lhQ5SMdA2kAc+HRp4Y4NUoP89cKULUKrB6jzU6DMVvM2TyVa9T80S6N9J0ne 3CSu6gLwq3INLOM3R1qDv4PnOnnRutgqosIDoK8eIvukZovZiPFqqExmVeqPUPl26Flf HIyytMASOf1nwh+fd9p+hV4eAkh/YUGB7KNcw40mV2LdlEfX4p4v2D+ZZJPyip9seYzv jV7Q== X-Received: by 10.140.237.216 with SMTP id i207mr1037385qhc.56.1428350438758; Mon, 06 Apr 2015 13:00:38 -0700 (PDT) Received: from htj.duckdns.org.lan (207-38-238-8.c3-0.wsd-ubr1.qens-wsd.ny.cable.rcn.com. [207.38.238.8]) by mx.google.com with ESMTPSA id z67sm3914418qgz.10.2015.04.06.13.00.36 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Apr 2015 13:00:37 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, jack@suse.cz, hch@infradead.org, hannes@cmpxchg.org, linux-fsdevel@vger.kernel.org, vgoyal@redhat.com, lizefan@huawei.com, cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz, clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com, gthelen@google.com, Tejun Heo Subject: [PATCH 45/49] writeback: make writeback initiation functions handle multiple bdi_writeback's Date: Mon, 6 Apr 2015 15:58:34 -0400 Message-Id: <1428350318-8215-46-git-send-email-tj@kernel.org> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1428350318-8215-1-git-send-email-tj@kernel.org> References: <1428350318-8215-1-git-send-email-tj@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP [try_]writeback_inodes_sb[_nr]() and sync_inodes_sb() currently only handle dirty inodes on the root wb (bdi_writeback) of the target bdi. This patch implements bdi_split_work_to_wbs() and use it to make these functions handle multiple wb's. bdi_split_work_to_wbs() takes a base wb_writeback_work and create clones of it and issue them to the wb's of the target bdi. The base work's nr_pages is distributed using wb_split_bdi_pages() - ie. according to each wb's write bandwidth's proportion in the bdi. Cloning a bdi involves memory allocation which may fail. In such cases, bdi_split_work_to_wbs() issues the base work directly and waits for its completion before proceeding to the next wb to guarantee forward progress and correctness under memory pressure. Signed-off-by: Tejun Heo Cc: Jens Axboe Cc: Jan Kara --- fs/fs-writeback.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 91 insertions(+), 5 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index f138680..9f42c14 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -279,6 +279,80 @@ static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages) return DIV_ROUND_UP_ULL((u64)nr_pages * this_bw, tot_bw); } +/** + * wb_clone_and_queue_work - clone a wb_writeback_work and issue it to a wb + * @wb: target bdi_writeback + * @base_work: source wb_writeback_work + * + * Try to make a clone of @base_work and issue it to @wb. If cloning + * succeeds, %true is returned; otherwise, @base_work is issued directly + * and %false is returned. In the latter case, the caller is required to + * wait for @base_work's completion using wb_wait_for_single_work(). + * + * A clone is auto-freed on completion. @base_work never is. + */ +static bool wb_clone_and_queue_work(struct bdi_writeback *wb, + struct wb_writeback_work *base_work) +{ + struct wb_writeback_work *work; + + work = kmalloc(sizeof(*work), GFP_ATOMIC); + if (work) { + *work = *base_work; + work->auto_free = 1; + work->single_wait = 0; + } else { + work = base_work; + work->auto_free = 0; + work->single_wait = 1; + } + work->single_done = 0; + wb_queue_work(wb, work); + return work != base_work; +} + +/** + * bdi_split_work_to_wbs - split a wb_writeback_work to all wb's of a bdi + * @bdi: target backing_dev_info + * @base_work: wb_writeback_work to issue + * @skip_if_busy: skip wb's which already have writeback in progress + * + * Split and issue @base_work to all wb's (bdi_writeback's) of @bdi which + * have dirty inodes. If @base_work->nr_page isn't %LONG_MAX, it's + * distributed to the busy wbs according to each wb's proportion in the + * total active write bandwidth of @bdi. + */ +static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, + struct wb_writeback_work *base_work, + bool skip_if_busy) +{ + long nr_pages = base_work->nr_pages; + int next_blkcg_id = 0; + struct bdi_writeback *wb; + struct wb_iter iter; + + might_sleep(); + + if (!bdi_has_dirty_io(bdi)) + return; +restart: + rcu_read_lock(); + bdi_for_each_wb(wb, bdi, &iter, next_blkcg_id) { + if (!wb_has_dirty_io(wb) || + (skip_if_busy && writeback_in_progress(wb))) + continue; + + base_work->nr_pages = wb_split_bdi_pages(wb, nr_pages); + if (!wb_clone_and_queue_work(wb, base_work)) { + next_blkcg_id = wb->blkcg_css->id + 1; + rcu_read_unlock(); + wb_wait_for_single_work(bdi, base_work); + goto restart; + } + } + rcu_read_unlock(); +} + #else /* CONFIG_CGROUP_WRITEBACK */ static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages) @@ -286,6 +360,21 @@ static long wb_split_bdi_pages(struct bdi_writeback *wb, long nr_pages) return nr_pages; } +static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, + struct wb_writeback_work *base_work, + bool skip_if_busy) +{ + might_sleep(); + + if (bdi_has_dirty_io(bdi) && + (!skip_if_busy || !writeback_in_progress(&bdi->wb))) { + base_work->auto_free = 0; + base_work->single_wait = 0; + base_work->single_done = 0; + wb_queue_work(&bdi->wb, base_work); + } +} + #endif /* CONFIG_CGROUP_WRITEBACK */ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages, @@ -1518,10 +1607,7 @@ static void __writeback_inodes_sb_nr(struct super_block *sb, unsigned long nr, return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); - if (skip_if_busy && writeback_in_progress(&bdi->wb)) - return; - - wb_queue_work(&bdi->wb, &work); + bdi_split_work_to_wbs(sb->s_bdi, &work, skip_if_busy); wb_wait_for_completion(bdi, &done); } @@ -1619,7 +1705,7 @@ void sync_inodes_sb(struct super_block *sb) return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); - wb_queue_work(&bdi->wb, &work); + bdi_split_work_to_wbs(bdi, &work, false); wb_wait_for_completion(bdi, &done); wait_sb_inodes(sb);