From patchwork Wed Mar 25 22:26:11 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 6095441 Return-Path: X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 37CF4BF90F for ; Wed, 25 Mar 2015 22:26:37 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 069432034B for ; Wed, 25 Mar 2015 22:26:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C9047202EB for ; Wed, 25 Mar 2015 22:26:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751830AbbCYW0Q (ORCPT ); Wed, 25 Mar 2015 18:26:16 -0400 Received: from mail-qg0-f52.google.com ([209.85.192.52]:34511 "EHLO mail-qg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751238AbbCYW0P (ORCPT ); Wed, 25 Mar 2015 18:26:15 -0400 Received: by qgep97 with SMTP id p97so68286924qge.1; Wed, 25 Mar 2015 15:26:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=tOBYrO9AEgXv8gOjT+SIPmOdRAgF/XgViEZPdasuoGE=; b=oxrZSv/xMgI9nwIUfudLDjTuzCqo+nOUzX1MGDcyjmrvn0lGi6ErX/ex65+HPKlKja zTS51sff2Wm4PNh2k8i7x+TCqcJlEbQxZyc1yLowfXjNBwhGIpd9s9Tb8GfdGReW2MnM OAOU1wUiKCpQDxZkACGk/6HYFouvPjbndh2PVPcmj6AnS5n5tFEqkVxJtqy/+dVIr0AT M9663CaDLWoYn+8ICFkdT40t0VF3w2KFsUu1wvkC8U6il0j0PHLzn4Htwa9IMfvqEeIe 1+lf9OWu1RwT3gMyn/g2qhbDvN0euJ4z3Y8adyrjo9EgCmkgSG9vcl5qHJH1gWoUVA3S DCRw== X-Received: by 10.55.21.152 with SMTP id 24mr23821091qkv.40.1427322374524; Wed, 25 Mar 2015 15:26:14 -0700 (PDT) Received: from htj.duckdns.org (207-38-238-8.c3-0.wsd-ubr1.qens-wsd.ny.cable.rcn.com. [207.38.238.8]) by mx.google.com with ESMTPSA id p46sm2361662qgd.22.2015.03.25.15.26.12 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Mar 2015 15:26:13 -0700 (PDT) Date: Wed, 25 Mar 2015 18:26:11 -0400 From: Tejun Heo To: axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, jack@suse.cz, hch@infradead.org, hannes@cmpxchg.org, linux-fsdevel@vger.kernel.org, vgoyal@redhat.com, lizefan@huawei.com, cgroups@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.cz, clm@fb.com, fengguang.wu@intel.com, david@fromorbit.com, gthelen@google.com, Vladimir Davydov Subject: [PATCH v2 18/18] mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use Message-ID: <20150325222611.GS3880@htj.duckdns.org> References: <1427087267-16592-1-git-send-email-tj@kernel.org> <1427087267-16592-19-git-send-email-tj@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1427087267-16592-19-git-send-email-tj@kernel.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From 3cde6e12925bff066d2bb6332df323fb38de979e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 25 Mar 2015 18:17:13 -0400 Because writeback wasn't cgroup aware before, the usual dirty throttling mechanism in balance_dirty_pages() didn't work for processes under memcg limit. The writeback path didn't know how much memory is available or how fast the dirty pages are being written out for a given memcg and balance_dirty_pages() didn't have any measure of IO back pressure for the memcg. To work around the issue, memcg implemented an ad-hoc dirty throttling mechanism in the direct reclaim path by stalling on pages under writeback which are encountered during direct reclaim scan. This is rather ugly and crude - none of the configurability, fairness, or bandwidth-proportional distribution of the normal path. The previous patches implemented proper memcg aware dirty throttling when cgroup writeback is in use making the ad-hoc mechanism unnecessary. This patch disables direct reclaim stalling for such case. Note: I disabled the parts which seemed obvious and it behaves fine while testing but my understanding of this code path is rudimentary and it's quite possible that I got something wrong. Please let me know if I got some wrong or more global_reclaim() sites should be updated. v2: The original patch removed the direct stalling mechanism which breaks legacy hierarchies. Conditionalize instead of removing. Signed-off-by: Tejun Heo Cc: Jens Axboe Cc: Jan Kara Cc: Wu Fengguang Cc: Greg Thelen Cc: Vladimir Davydov --- The rewview git branch has been updated accordingly. git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-writeback-backpressure-20150322 Thanks. mm/vmscan.c | 51 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 41 insertions(+), 10 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 9f8d3c0..cc76c66 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -154,11 +154,42 @@ static bool global_reclaim(struct scan_control *sc) { return !sc->target_mem_cgroup; } + +/** + * sane_reclaim - is the usual dirty throttling mechanism operational? + * @sc: scan_control in question + * + * The normal page dirty throttling mechanism in balance_dirty_pages() is + * completely broken with the legacy memcg and direct stalling in + * shrink_page_list() is used for throttling instead, which lacks all the + * niceties such as fairness, adaptive pausing, bandwidth proportional + * allocation and configurability. + * + * This function tests whether the vmscan currently in progress can assume + * that the normal dirty throttling mechanism is operational. + */ +static bool sane_reclaim(struct scan_control *sc) +{ + struct mem_cgroup *memcg = sc->target_mem_cgroup; + + if (!memcg) + return true; +#ifdef CONFIG_CGROUP_WRITEBACK + if (cgroup_on_dfl(mem_cgroup_css(memcg)->cgroup)) + return true; +#endif + return false; +} #else static bool global_reclaim(struct scan_control *sc) { return true; } + +static bool sane_reclaim(struct scan_control *sc) +{ + return true; +} #endif static unsigned long zone_reclaimable_pages(struct zone *zone) @@ -942,10 +973,10 @@ static unsigned long shrink_page_list(struct list_head *page_list, * note that the LRU is being scanned too quickly and the * caller can stall after page list has been processed. * - * 2) Global reclaim encounters a page, memcg encounters a - * page that is not marked for immediate reclaim or - * the caller does not have __GFP_IO. In this case mark - * the page for immediate reclaim and continue scanning. + * 2) Global or new memcg reclaim encounters a page that is + * not marked for immediate reclaim or the caller does not + * have __GFP_IO. In this case mark the page for immediate + * reclaim and continue scanning. * * __GFP_IO is checked because a loop driver thread might * enter reclaim, and deadlock if it waits on a page for @@ -959,7 +990,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, * grab_cache_page_write_begin(,,AOP_FLAG_NOFS), so testing * may_enter_fs here is liable to OOM on them. * - * 3) memcg encounters a page that is not already marked + * 3) Legacy memcg encounters a page that is not already marked * PageReclaim. memcg does not have any dirty pages * throttling so we could easily OOM just because too many * pages are in writeback and there is nothing else to @@ -974,7 +1005,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, goto keep_locked; /* Case 2 above */ - } else if (global_reclaim(sc) || + } else if (sane_reclaim(sc) || !PageReclaim(page) || !(sc->gfp_mask & __GFP_IO)) { /* * This is slightly racy - end_page_writeback() @@ -1423,7 +1454,7 @@ static int too_many_isolated(struct zone *zone, int file, if (current_is_kswapd()) return 0; - if (!global_reclaim(sc)) + if (!sane_reclaim(sc)) return 0; if (file) { @@ -1615,10 +1646,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, set_bit(ZONE_WRITEBACK, &zone->flags); /* - * memcg will stall in page writeback so only consider forcibly - * stalling for global reclaim + * Legacy memcg will stall in page writeback so avoid forcibly + * stalling here. */ - if (global_reclaim(sc)) { + if (sane_reclaim(sc)) { /* * Tag a zone as congested if all the dirty pages scanned were * backed by a congested BDI and wait_iff_congested will stall.