From patchwork Sat Oct 12 13:27:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hillf Danton X-Patchwork-Id: 11186793 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D6A714DB for ; Sat, 12 Oct 2019 13:28:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 592D921850 for ; Sat, 12 Oct 2019 13:28:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 592D921850 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6E4F26B0003; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 695666B0005; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A9E08E0001; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id 339EC6B0003 for ; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id C4A654DA7 for ; Sat, 12 Oct 2019 13:27:58 +0000 (UTC) X-FDA: 76035210636.26.store48_7fa1b8660c440 X-Spam-Summary: 2,0,0,fe15acc4cd78ff06,d41d8cd98f00b204,hdanton@sina.com,::linux-fsdevel@vger.kernel.org:akpm@linux-foundation.org:linux-kernel@vger.kernel.org:guro@fb.com:tj@kernel.org:jack@suse.cz:hannes@cmpxchg.org:shakeelb@google.com:minchan@kernel.org:mgorman@suse.de:hdanton@sina.com,RULES_HIT:41:355:379:800:960:973:988:989:1260:1311:1314:1345:1437:1515:1535:1544:1605:1711:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4250:5007:6117:6119:6261:6742:7901:7903:10004:11026:11334:11473:11537:11658:11914:12043:12296:12297:12438:13161:13229:13894:14096:14181:14721:21080:21324:21451:21627:21740:30034:30054:30056:30064:30090,0,RBL:202.108.3.163:@sina.com:.lbl8.mailshell.net-62.18.2.100 64.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: store48_7fa1b8660c440 X-Filterd-Recvd-Size: 5898 Received: from mail3-163.sinamail.sina.com.cn (mail3-163.sinamail.sina.com.cn [202.108.3.163]) by imf28.hostedemail.com (Postfix) with SMTP for ; Sat, 12 Oct 2019 13:27:56 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([222.130.246.252]) by sina.com with ESMTP id 5DA1D4D50002AFF1; Sat, 12 Oct 2019 21:27:53 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 17471749283212 From: Hillf Danton To: mm Cc: fsdev , Andrew Morton , linux , Roman Gushchin , Tejun Heo , Jan Kara , Johannes Weiner , Shakeel Butt , Minchan Kim , Mel Gorman , Hillf Danton Subject: [RFC] writeback: add elastic bdi in cgwb bdp Date: Sat, 12 Oct 2019 21:27:40 +0800 Message-Id: <20191012132740.12968-1-hdanton@sina.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The behaviors of the elastic bdi (ebdi) observed in the current cgwb bandwidth measurement include 1, like spinning disks on market ebdi can do ~128MB/s IOs in consective minutes in few scenarios, or higher like SSD, or lower like USB key. 2, with ebdi a bdi_writeback, wb-A, is able to do 80MB/s writeouts in the current time window of 200ms, while it was 16M/s in the previous one. 3, it will be either 100MB/s in the next time window if wb-B joins wb-A writing pages out or 18MB/s if wb-C also decides to chime in. With the help of bandwidth gauged above, what is left in balancing dirty pages, bdp, is try to make wb-A's laundry speed catch up dirty speed in every 200ms interval without knowing what wb-B is doing. No heuristic is added in this work because ebdi does bdp without it. Cc: Roman Gushchin Cc: Tejun Heo Cc: Jan Kara Cc: Johannes Weiner Cc: Shakeel Butt Cc: Minchan Kim Cc: Mel Gorman Signed-off-by: Hillf Danton --- -- --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -157,6 +157,9 @@ struct bdi_writeback { struct list_head memcg_node; /* anchored at memcg->cgwb_list */ struct list_head blkcg_node; /* anchored at blkcg->cgwb_list */ +#ifdef CONFIG_CGWB_BDP_WITH_EBDI + struct wait_queue_head bdp_waitq; +#endif union { struct work_struct release_work; struct rcu_head rcu; --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -324,6 +324,10 @@ static int wb_init(struct bdi_writeback goto out_destroy_stat; } + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + init_waitqueue_head(&wb->bdp_waitq); + return 0; out_destroy_stat: --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1551,6 +1551,45 @@ static inline void wb_dirty_limits(struc } } +#if defined(CONFIG_CGROUP_WRITEBACK) && defined(CONFIG_CGWB_BDP_WITH_EBDI) +static bool cgwb_bdp_should_throttle(struct bdi_writeback *wb) +{ + struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB }; + + if (fatal_signal_pending(current)) + return false; + + gdtc.avail = global_dirtyable_memory(); + + domain_dirty_limits(&gdtc); + + gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) + + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_WRITEBACK); + + if (gdtc.dirty < gdtc.bg_thresh) + return false; + + if (!writeback_in_progress(wb)) + wb_start_background_writeback(wb); + + /* + * throttle if laundry speed remarkably falls behind dirty speed + * in the current time window of 200ms + */ + return gdtc.dirty > gdtc.thresh && + wb_stat(wb, WB_DIRTIED) > + wb_stat(wb, WB_WRITTEN) + + wb_stat_error(); +} + +static inline void cgwb_bdp(struct bdi_writeback *wb) +{ + wait_event_interruptible_timeout(wb->bdp_waitq, + !cgwb_bdp_should_throttle(wb), HZ); +} +#endif + /* * balance_dirty_pages() must be called by processes which are generating dirty * data. It looks at the number of dirty pages in the machine and will force @@ -1910,7 +1949,11 @@ void balance_dirty_pages_ratelimited(str preempt_enable(); if (unlikely(current->nr_dirtied >= ratelimit)) - balance_dirty_pages(wb, current->nr_dirtied); + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + cgwb_bdp(wb); + else + balance_dirty_pages(wb, current->nr_dirtied); wb_put(wb); } --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -632,6 +632,11 @@ void wbc_detach_inode(struct writeback_c if (!wb) return; + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + if (waitqueue_active(&wb->bdp_waitq)) + wake_up_all(&wb->bdp_waitq); + history = inode->i_wb_frn_history; avg_time = inode->i_wb_frn_avg_time; @@ -811,6 +816,9 @@ static long wb_split_bdi_pages(struct bd if (nr_pages == LONG_MAX) return LONG_MAX; + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + return nr_pages; /* * This may be called on clean wb's and proportional distribution * may not make sense, just use the original @nr_pages in those @@ -1599,6 +1607,10 @@ static long writeback_chunk_size(struct if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) pages = LONG_MAX; else { + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + return work->nr_pages; + pages = min(wb->avg_write_bandwidth / 2, global_wb_domain.dirty_limit / DIRTY_SCOPE); pages = min(pages, work->nr_pages);