From patchwork Sun Oct 20 13:00:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hillf Danton X-Patchwork-Id: 11200917 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DCFD414ED for ; Sun, 20 Oct 2019 13:00:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9BB1120640 for ; Sun, 20 Oct 2019 13:00:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9BB1120640 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 802EC8E0005; Sun, 20 Oct 2019 09:00:30 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7B41B8E0003; Sun, 20 Oct 2019 09:00:30 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C9138E0005; Sun, 20 Oct 2019 09:00:30 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 451B48E0003 for ; Sun, 20 Oct 2019 09:00:30 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id C3619501F for ; Sun, 20 Oct 2019 13:00:29 +0000 (UTC) X-FDA: 76064171778.04.bikes22_3e48c437c613d X-Spam-Summary: 2,0,0,11695ec14d021000,d41d8cd98f00b204,hdanton@sina.com,::linux-fsdevel@vger.kernel.org:akpm@linux-foundation.org:linux-kernel@vger.kernel.org:tj@kernel.org:jack@suse.com:fengguang.wu@intel.com:hannes@cmpxchg.org:shakeelb@google.com:minchan@kernel.org:mgorman@suse.de:hdanton@sina.com,RULES_HIT:2:41:355:379:800:960:973:988:989:1260:1311:1314:1345:1437:1515:1535:1605:1606:1730:1747:1777:1792:2198:2199:2393:2553:2559:2562:2693:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:3874:4117:4250:5007:6117:6119:6120:6261:6742:7875:7901:7903:7904:8660:10004:11026:11334:11473:11537:11658:11914:12043:12295:12296:12297:12438:13148:13161:13184:13229:13230:13894:21080:21220:21324:21451:21627:21740:21790:30034:30054:30056:30064:30090,0,RBL:202.108.3.21:@sina.com:.lbl8.mailshell.net-62.18.2.100 64.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: bikes22_3e48c437c613d X-Filterd-Recvd-Size: 6855 Received: from r3-21.sinamail.sina.com.cn (r3-21.sinamail.sina.com.cn [202.108.3.21]) by imf22.hostedemail.com (Postfix) with SMTP for ; Sun, 20 Oct 2019 13:00:27 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([222.131.66.83]) by sina.com with ESMTP id 5DAC5A660002B14F; Sun, 20 Oct 2019 21:00:24 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 866515629104 From: Hillf Danton To: linux-mm Cc: fsdev , Andrew Morton , linux-kernel , Tejun Heo , Jan Kara , Fengguang Wu , Johannes Weiner , Shakeel Butt , Minchan Kim , Mel Gorman , Hillf Danton Subject: [RFC v1] writeback: add elastic bdi in cgwb bdp Date: Sun, 20 Oct 2019 21:00:13 +0800 Message-Id: <20191020130013.3500-1-hdanton@sina.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The elastic bdi is the mirror bdi of spinning disks, SSD, USB and other storage devices/instruments on market. The performance of ebdi goes up and down as the pattern of IO dispatched changes. It can be approximately estimated as below. P = j(..., IO pattern); In ebdi's view, the bandwidth currently measured in balancing dirty pages has close relation to its performance because the former is a part of the latter as represented below. B = y(P); The functions above suggest that there may be a layer violation if filesystems don't care what is measured at the mm layer, while it could be better measured somewhere below fs. It is measured however to the extent that makes every judge happy, and is playing a role in dispatching IO with the IO pattern entirely ignored that is volatile in nature. And it helps to throttle the dirty speed, with the figure ignored that DRAM in general is x10 faster than ebdi. If B is half of P for instance, then it is near 5% of dirty speed, only 2 points from the figure in the snippet below. /* * If ratelimit_pages is too high then we can get into dirty-data overload * if a large number of processes all perform writes at the same time. * If it is too low then SMP machines will call the (expensive) * get_writeback_state too often. * * Here we set ratelimit_pages to a level which ensures that when all CPUs are * dirtying in parallel, we cannot go more than 3% (1/32) over the dirty memory * thresholds. */ To prevent dirty speed from running away from laundry speed in bdp, ebdi suggests the walk-dog method to consider in assumption that a leash churns less in IO pattern. V1 is based on 5.4-rc3. Changes since v0 - add CGWB_BDP_WITH_EBDI in mm/Kconfig - drop wakeup in wbc_detach_inode() - add wakeup in wb_workfn() Cc: Tejun Heo Cc: Jan Kara Cc: Fengguang Wu Cc: Johannes Weiner Cc: Shakeel Butt Cc: Minchan Kim Cc: Mel Gorman Signed-off-by: Hillf Danton --- -- --- a/mm/Kconfig +++ b/mm/Kconfig @@ -204,6 +204,14 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK config MEMORY_BALLOON bool +config CGWB_BDP_WITH_EBDI + bool + help + This puts the walk-dog method in balancing dirty pages + instead of measuring bandwidth. + + Say N if unsure. + # # support for memory balloon compaction config BALLOON_COMPACTION --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -170,6 +170,10 @@ struct bdi_writeback { struct list_head bdi_node; /* anchored at bdi->wb_list */ +#ifdef CONFIG_CGWB_BDP_WITH_EBDI + struct wait_queue_head bdp_waitq; +#endif + #ifdef CONFIG_CGROUP_WRITEBACK struct percpu_ref refcnt; /* used only for !root wb's */ struct fprop_local_percpu memcg_completions; --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -324,6 +324,9 @@ static int wb_init(struct bdi_writeback goto out_destroy_stat; } + if (IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + init_waitqueue_head(&wb->bdp_waitq); + return 0; out_destroy_stat: --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1551,6 +1551,41 @@ static inline void wb_dirty_limits(struc } } +#ifdef CONFIG_CGWB_BDP_WITH_EBDI +static bool cgwb_bdp_should_throttle(struct bdi_writeback *wb) +{ + struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB }; + + if (fatal_signal_pending(current)) + return false; + + gdtc.avail = global_dirtyable_memory(); + + domain_dirty_limits(&gdtc); + + gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) + + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_WRITEBACK); + + if (gdtc.dirty < gdtc.bg_thresh) + return false; + + if (!writeback_in_progress(wb)) + wb_start_background_writeback(wb); + + return gdtc.dirty > gdtc.thresh && + wb_stat(wb, WB_DIRTIED) > + wb_stat(wb, WB_WRITTEN) + + wb_stat_error(); +} + +static inline void cgwb_bdp(struct bdi_writeback *wb) +{ + wait_event_interruptible_timeout(wb->bdp_waitq, + !cgwb_bdp_should_throttle(wb), HZ); +} +#endif + /* * balance_dirty_pages() must be called by processes which are generating dirty * data. It looks at the number of dirty pages in the machine and will force @@ -1910,7 +1945,10 @@ void balance_dirty_pages_ratelimited(str preempt_enable(); if (unlikely(current->nr_dirtied >= ratelimit)) - balance_dirty_pages(wb, current->nr_dirtied); + if (IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + cgwb_bdp(wb); + else + balance_dirty_pages(wb, current->nr_dirtied); wb_put(wb); } --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -811,6 +811,9 @@ static long wb_split_bdi_pages(struct bd if (nr_pages == LONG_MAX) return LONG_MAX; + if (IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + return nr_pages; + /* * This may be called on clean wb's and proportional distribution * may not make sense, just use the original @nr_pages in those @@ -1598,6 +1601,8 @@ static long writeback_chunk_size(struct */ if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) pages = LONG_MAX; + else if (IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + pages = work->nr_pages; else { pages = min(wb->avg_write_bandwidth / 2, global_wb_domain.dirty_limit / DIRTY_SCOPE); @@ -2092,6 +2097,10 @@ void wb_workfn(struct work_struct *work) wb_wakeup_delayed(wb); current->flags &= ~PF_SWAPWRITE; + + if (IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + if (waitqueue_active(&wb->bdp_waitq)) + wake_up_all(&wb->bdp_waitq); } /*