From patchwork Mon Mar 23 04:54:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 6069671 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 52B5BBF90F for ; Mon, 23 Mar 2015 05:06:41 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 5041D20107 for ; Mon, 23 Mar 2015 05:06:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 36C6C2021B for ; Mon, 23 Mar 2015 05:06:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752283AbbCWFGL (ORCPT ); Mon, 23 Mar 2015 01:06:11 -0400 Received: from mail-qc0-f180.google.com ([209.85.216.180]:32770 "EHLO mail-qc0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752358AbbCWEz7 (ORCPT ); Mon, 23 Mar 2015 00:55:59 -0400 Received: by qcbjx9 with SMTP id jx9so98005667qcb.0; Sun, 22 Mar 2015 21:55:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=gIfNIqOVLG8l+A6NQ3hM+h6ImSuyG/+Z5kiFLu4PPfk=; b=RENaAi9nS/HcRmwVpy+4MBYN2363JpXWxwQRdQwW2s2BGeUB7lgjWaeI10EefnQHfF jRvT7z0uXn5oWGqvG1BJAbZZCDgdHmr5PoTL1juyo1hK02X7YVxHS/Yxnoe47ysYn9ky nOtCvQYNczkBl2iqgUSKTLq2DeoBNM2I/mkuy5OgOpn59msWgg7OoStMVDqwSjFcDq5K Tuk8Uqom0VyE3dWN3mY/7VgDwdOmnR/4pLYaxAp5fwzH1DNcHlgkDHtbkKdyUZ3OMtpu Er22cu7Qp8/dwWrW0TwXiwkpVkZP3pw/C09nUJV8e4o3gOc+HW80nRWlNEt2RQlbATwr CJmw== X-Received: by 10.55.22.32 with SMTP id g32mr177458251qkh.4.1427086555862; Sun, 22 Mar 2015 21:55:55 -0700 (PDT) Received: from htj.duckdns.org.lan (207-38-238-8.c3-0.wsd-ubr1.qens-wsd.ny.cable.rcn.com. [207.38.238.8]) by mx.google.com with ESMTPSA id n20sm8504159qgd.48.2015.03.22.21.55.54 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 22 Mar 2015 21:55:55 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, jack@suse.cz, hch@infradead.org, hannes@cmpxchg.org, linux-fsdevel@vger.kernel.org, vgoyal@redhat.com, lizefan@huawei.com, cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz, clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com, gthelen@google.com, Tejun Heo Subject: [PATCH 28/48] writeback: implement and use mapping_congested() Date: Mon, 23 Mar 2015 00:54:39 -0400 Message-Id: <1427086499-15657-29-git-send-email-tj@kernel.org> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1427086499-15657-1-git-send-email-tj@kernel.org> References: <1427086499-15657-1-git-send-email-tj@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In several places, bdi_congested() and its wrappers are used to determine whether more IOs should be issued. With cgroup writeback support, this question can't be answered solely based on the bdi (backing_dev_info). It's dependent on whether the filesystem and bdi support cgroup writeback and the blkcg the asking task belongs to. This patch implements mapping_congested() and its wrappers which take @mapping and @task and determines the congestion state considering cgroup writeback for the combination. The new functions replace bdi_*congested() calls in places where the query is about specific mapping and task. There are several filesystem users which also fit this criteria but they should be updated when each filesystem implements cgroup writeback support. Signed-off-by: Tejun Heo Cc: Jens Axboe Cc: Jan Kara Cc: Vivek Goyal --- fs/fs-writeback.c | 39 +++++++++++++++++++++++++++++++++++++++ include/linux/backing-dev.h | 27 +++++++++++++++++++++++++++ mm/fadvise.c | 2 +- mm/readahead.c | 2 +- mm/vmscan.c | 12 ++++++------ 5 files changed, 74 insertions(+), 8 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 48db5e6..015f359 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -130,6 +130,45 @@ static void __wb_start_writeback(struct bdi_writeback *wb, long nr_pages, wb_queue_work(wb, work); } +#ifdef CONFIG_CGROUP_WRITEBACK + +/** + * mapping_congested - test whether a mapping is congested for a task + * @mapping: address space to test for congestion + * @task: task to test congestion for + * @cong_bits: mask of WB_[a]sync_congested bits to test + * + * Tests whether @mapping is congested for @task. @cong_bits is the mask + * of congestion bits to test and the return value is the mask of set bits. + * + * If cgroup writeback is enabled for @mapping, its congestion state for + * @task is determined by whether the cgwb (cgroup bdi_writeback) for the + * blkcg of %current on @mapping->backing_dev_info is congested; otherwise, + * the root's congestion state is used. + */ +int mapping_congested(struct address_space *mapping, + struct task_struct *task, int cong_bits) +{ + struct inode *inode = mapping->host; + struct backing_dev_info *bdi = inode_to_bdi(inode); + struct bdi_writeback *wb; + int ret = 0; + + if (!inode || !inode_cgwb_enabled(inode)) + return wb_congested(&bdi->wb, cong_bits); + + rcu_read_lock(); + wb = wb_find_current(bdi); + if (wb) + ret = wb_congested(wb, cong_bits); + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL_GPL(mapping_congested); + +#endif /* CONFIG_CGROUP_WRITEBACK */ + /** * bdi_start_writeback - start writeback * @bdi: the backing device to write from diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 2c498a2..cfa23ab 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -230,6 +230,8 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi, void __inode_attach_wb(struct inode *inode, struct page *page); void wb_memcg_offline(struct mem_cgroup *memcg); void wb_blkcg_offline(struct blkcg *blkcg); +int mapping_congested(struct address_space *mapping, struct task_struct *task, + int cong_bits); /** * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode @@ -438,8 +440,33 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg) { } +static inline int mapping_congested(struct address_space *mapping, + struct task_struct *task, int cong_bits) +{ + return wb_congested(&inode_to_bdi(mapping->host)->wb, cong_bits); +} + #endif /* CONFIG_CGROUP_WRITEBACK */ +static inline int mapping_read_congested(struct address_space *mapping, + struct task_struct *task) +{ + return mapping_congested(mapping, task, 1 << WB_sync_congested); +} + +static inline int mapping_write_congested(struct address_space *mapping, + struct task_struct *task) +{ + return mapping_congested(mapping, task, 1 << WB_async_congested); +} + +static inline int mapping_rw_congested(struct address_space *mapping, + struct task_struct *task) +{ + return mapping_congested(mapping, task, (1 << WB_sync_congested) | + (1 << WB_async_congested)); +} + static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits) { return wb_congested(&bdi->wb, cong_bits); diff --git a/mm/fadvise.c b/mm/fadvise.c index 4a3907c..174727c 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -115,7 +115,7 @@ SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice) case POSIX_FADV_NOREUSE: break; case POSIX_FADV_DONTNEED: - if (!bdi_write_congested(bdi)) + if (!mapping_write_congested(mapping, current)) __filemap_fdatawrite_range(mapping, offset, endbyte, WB_SYNC_NONE); diff --git a/mm/readahead.c b/mm/readahead.c index 9356758..420a16a 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -541,7 +541,7 @@ page_cache_async_readahead(struct address_space *mapping, /* * Defer asynchronous read-ahead on IO congestion. */ - if (bdi_read_congested(inode_to_bdi(mapping->host))) + if (mapping_read_congested(mapping, current)) return; /* do read-ahead */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 7582f9f..9f8d3c0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -452,14 +452,14 @@ static inline int is_page_cache_freeable(struct page *page) return page_count(page) - page_has_private(page) == 2; } -static int may_write_to_queue(struct backing_dev_info *bdi, - struct scan_control *sc) +static int may_write_to_mapping(struct address_space *mapping, + struct scan_control *sc) { if (current->flags & PF_SWAPWRITE) return 1; - if (!bdi_write_congested(bdi)) + if (!mapping_write_congested(mapping, current)) return 1; - if (bdi == current->backing_dev_info) + if (inode_to_bdi(mapping->host) == current->backing_dev_info) return 1; return 0; } @@ -538,7 +538,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping, } if (mapping->a_ops->writepage == NULL) return PAGE_ACTIVATE; - if (!may_write_to_queue(inode_to_bdi(mapping->host), sc)) + if (!may_write_to_mapping(mapping, sc)) return PAGE_KEEP; if (clear_page_dirty_for_io(page)) { @@ -924,7 +924,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ mapping = page_mapping(page); if (((dirty || writeback) && mapping && - bdi_write_congested(inode_to_bdi(mapping->host))) || + mapping_write_congested(mapping, current)) || (writeback && PageReclaim(page))) nr_congested++;