From patchwork Mon Apr 6 20:04:31 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 6164671 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 5C7BA9F2EC for ; Mon, 6 Apr 2015 20:07:03 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 246DE20173 for ; Mon, 6 Apr 2015 20:07:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CF87A200E0 for ; Mon, 6 Apr 2015 20:07:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754141AbbDFUGo (ORCPT ); Mon, 6 Apr 2015 16:06:44 -0400 Received: from mail-qk0-f174.google.com ([209.85.220.174]:35246 "EHLO mail-qk0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753686AbbDFUFH (ORCPT ); Mon, 6 Apr 2015 16:05:07 -0400 Received: by qkhg7 with SMTP id g7so31112345qkh.2; Mon, 06 Apr 2015 13:05:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=8+Jw2bjaCBi5/N4kYv/y4X1/dnms6UAUtkHd0jF8xQk=; b=EGQed/kfk1J12vQDuC5iRk3gTaz7Urs6QU1E+DT/X35bDBWJ27WaLARA90hl2hdJfp QSaKxo9pHKfQDNSbS7I/SEJCZY572lBUhxJPL3tB+ieVuuK7evbW2SMzOC2sOsVlrO0V YstNDWK27Jsw9PDXJfFLIEyWcJlJBvT1sncfeJ3gbInPsaTcjqDltg51Wq9PN7lvMY/U LHy1WwismX4wuwQA8S2joTd8HvMg3QN1xDtsFCzoxmIw+2rwmpuVmESUCtGChm3Cl6l0 Pw58T2BNYApWApYzHGn4avwbnQdpN0NwRCV7FZiVtulhFLdgLSd3D80VbrHaRQdQMAnp sTZQ== X-Received: by 10.55.26.29 with SMTP id a29mr31968200qka.0.1428350706101; Mon, 06 Apr 2015 13:05:06 -0700 (PDT) Received: from htj.duckdns.org.lan (207-38-238-8.c3-0.wsd-ubr1.qens-wsd.ny.cable.rcn.com. [207.38.238.8]) by mx.google.com with ESMTPSA id o3sm3909138qga.36.2015.04.06.13.05.04 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Apr 2015 13:05:05 -0700 (PDT) From: Tejun Heo To: axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, jack@suse.cz, hch@infradead.org, hannes@cmpxchg.org, linux-fsdevel@vger.kernel.org, vgoyal@redhat.com, lizefan@huawei.com, cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz, clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com, gthelen@google.com, Tejun Heo Subject: [PATCH 16/19] writeback: implement memcg wb_domain Date: Mon, 6 Apr 2015 16:04:31 -0400 Message-Id: <1428350674-8303-17-git-send-email-tj@kernel.org> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1428350674-8303-1-git-send-email-tj@kernel.org> References: <1428350674-8303-1-git-send-email-tj@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Dirtyable memory is distributed to a wb (bdi_writeback) according to the relative bandwidth the wb is writing out in the whole system. This distribution is global - each wb is measured against all other wb's and gets the proportinately sized portion of the memory in the whole system. For cgroup writeback, the amount of dirtyable memory is scoped by memcg and thus each wb would need to be measured and controlled in its memcg. IOW, a wb will belong to two writeback domains - the global and memcg domains. The previous patches laid the groundwork to support the two wb_domains and this patch implements memcg wb_domain. memcg->cgwb_domain is initialized on css online and destroyed on css release, wb->memcg_completions is added, and __wb_writeout_inc() is updated to increment completions against both global and memcg wb_domains. The following patches will update balance_dirty_pages() and its subroutines to actually consider memcg wb_domain for throttling. Signed-off-by: Tejun Heo Cc: Jens Axboe Cc: Jan Kara Cc: Wu Fengguang Cc: Greg Thelen --- include/linux/backing-dev-defs.h | 1 + include/linux/memcontrol.h | 12 +++++++++++- include/linux/writeback.h | 3 +++ mm/backing-dev.c | 9 ++++++++- mm/memcontrol.c | 39 +++++++++++++++++++++++++++++++++++++++ mm/page-writeback.c | 25 +++++++++++++++++++++++++ 6 files changed, 87 insertions(+), 2 deletions(-) diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 97a92fa..8d470b7 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -118,6 +118,7 @@ struct bdi_writeback { #ifdef CONFIG_CGROUP_WRITEBACK struct percpu_ref refcnt; /* used only for !root wb's */ + struct fprop_local_percpu memcg_completions; struct cgroup_subsys_state *memcg_css; /* the associated memcg */ struct cgroup_subsys_state *blkcg_css; /* and blkcg */ struct list_head memcg_node; /* anchored at memcg->cgwb_list */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 662a953..e3177be 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -389,8 +389,18 @@ enum { }; #ifdef CONFIG_CGROUP_WRITEBACK + struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg); -#endif +struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb); + +#else /* CONFIG_CGROUP_WRITEBACK */ + +static inline struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb) +{ + return NULL; +} + +#endif /* CONFIG_CGROUP_WRITEBACK */ struct sock; #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index eb52f23..a9eb40f 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -167,6 +167,9 @@ static inline void laptop_sync_completion(void) { } void throttle_vm_writeout(gfp_t gfp_mask); bool zone_dirty_ok(struct zone *zone); int wb_domain_init(struct wb_domain *dom, gfp_t gfp); +#ifdef CONFIG_CGROUP_WRITEBACK +void wb_domain_exit(struct wb_domain *dom); +#endif extern struct wb_domain global_wb_domain; diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 9c8b7b5..84ebf7c 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -482,6 +482,7 @@ static void cgwb_release_workfn(struct work_struct *work) css_put(wb->blkcg_css); wb_congested_put(wb->congested); + fprop_local_destroy_percpu(&wb->memcg_completions); percpu_ref_exit(&wb->refcnt); wb_exit(wb); kfree_rcu(wb, rcu); @@ -548,9 +549,13 @@ static int cgwb_create(struct backing_dev_info *bdi, if (ret) goto err_wb_exit; + ret = fprop_local_init_percpu(&wb->memcg_completions, gfp); + if (ret) + goto err_ref_exit; + wb->congested = wb_congested_get_create(bdi, blkcg_css->id, gfp); if (!wb->congested) - goto err_ref_exit; + goto err_fprop_exit; wb->memcg_css = memcg_css; wb->blkcg_css = blkcg_css; @@ -587,6 +592,8 @@ static int cgwb_create(struct backing_dev_info *bdi, err_put_congested: wb_congested_put(wb->congested); +err_fprop_exit: + fprop_local_destroy_percpu(&wb->memcg_completions); err_ref_exit: percpu_ref_exit(&wb->refcnt); err_wb_exit: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ab483e9..2a74cf3 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -344,6 +344,7 @@ struct mem_cgroup { #ifdef CONFIG_CGROUP_WRITEBACK struct list_head cgwb_list; + struct wb_domain cgwb_domain; #endif /* List of events which userspace want to receive */ @@ -4085,6 +4086,37 @@ struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg) return &memcg->cgwb_list; } +static int memcg_wb_domain_init(struct mem_cgroup *memcg, gfp_t gfp) +{ + return wb_domain_init(&memcg->cgwb_domain, gfp); +} + +static void memcg_wb_domain_exit(struct mem_cgroup *memcg) +{ + wb_domain_exit(&memcg->cgwb_domain); +} + +struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); + + if (!memcg->css.parent) + return NULL; + + return &memcg->cgwb_domain; +} + +#else /* CONFIG_CGROUP_WRITEBACK */ + +static int memcg_wb_domain_init(struct mem_cgroup *memcg, gfp_t gfp) +{ + return 0; +} + +static void memcg_wb_domain_exit(struct mem_cgroup *memcg) +{ +} + #endif /* CONFIG_CGROUP_WRITEBACK */ /* @@ -4471,9 +4503,15 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->stat = alloc_percpu(struct mem_cgroup_stat_cpu); if (!memcg->stat) goto out_free; + + if (memcg_wb_domain_init(memcg, GFP_KERNEL)) + goto out_free_stat; + spin_lock_init(&memcg->pcp_counter_lock); return memcg; +out_free_stat: + free_percpu(memcg->stat); out_free: kfree(memcg); return NULL; @@ -4500,6 +4538,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) free_mem_cgroup_per_zone_info(memcg, node); free_percpu(memcg->stat); + memcg_wb_domain_exit(memcg); kfree(memcg); } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index c08b053..138e20e 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -171,6 +171,11 @@ static struct dirty_throttle_control *mdtc_gdtc(struct dirty_throttle_control *m return mdtc->gdtc; } +static struct fprop_local_percpu *wb_memcg_completions(struct bdi_writeback *wb) +{ + return &wb->memcg_completions; +} + static void wb_min_max_ratio(struct bdi_writeback *wb, unsigned long *minp, unsigned long *maxp) { @@ -213,6 +218,11 @@ static struct dirty_throttle_control *mdtc_gdtc(struct dirty_throttle_control *m return NULL; } +static struct fprop_local_percpu *wb_memcg_completions(struct bdi_writeback *wb) +{ + return NULL; +} + static void wb_min_max_ratio(struct bdi_writeback *wb, unsigned long *minp, unsigned long *maxp) { @@ -530,9 +540,16 @@ static void wb_domain_writeout_inc(struct wb_domain *dom, */ static inline void __wb_writeout_inc(struct bdi_writeback *wb) { + struct wb_domain *cgdom; + __inc_wb_stat(wb, WB_WRITTEN); wb_domain_writeout_inc(&global_wb_domain, &wb->completions, wb->bdi->max_prop_frac); + + cgdom = mem_cgroup_wb_domain(wb); + if (cgdom) + wb_domain_writeout_inc(cgdom, wb_memcg_completions(wb), + wb->bdi->max_prop_frac); } void wb_writeout_inc(struct bdi_writeback *wb) @@ -583,6 +600,14 @@ int wb_domain_init(struct wb_domain *dom, gfp_t gfp) return fprop_global_init(&dom->completions, gfp); } +#ifdef CONFIG_CGROUP_WRITEBACK +void wb_domain_exit(struct wb_domain *dom) +{ + del_timer_sync(&dom->period_timer); + fprop_global_destroy(&dom->completions); +} +#endif + /* * bdi_min_ratio keeps the sum of the minimum dirty shares of all * registered backing devices, which, for obvious reasons, can not