From patchwork Thu Mar 7 18:08:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10843661 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D751139A for ; Thu, 7 Mar 2019 18:09:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 592BE2F46B for ; Thu, 7 Mar 2019 18:09:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4B9F82F46D; Thu, 7 Mar 2019 18:09:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 599B72F46B for ; Thu, 7 Mar 2019 18:09:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 531A78E0004; Thu, 7 Mar 2019 13:09:32 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 41D238E0002; Thu, 7 Mar 2019 13:09:32 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E5228E0004; Thu, 7 Mar 2019 13:09:32 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by kanga.kvack.org (Postfix) with ESMTP id CD54E8E0002 for ; Thu, 7 Mar 2019 13:09:31 -0500 (EST) Received: by mail-wr1-f71.google.com with SMTP id n2so1017804wrs.15 for ; Thu, 07 Mar 2019 10:09:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=E1LOBPaoUcwq7EBh5jwmgNQUJ5Ql4uc2em7Nrd4127w=; b=aj1Hkfw1NneMbh4OMd+ea7yqYhv3AD2aGetW5xpjy1qZ5fSvOwOGU0plClc3gtpa3m 3eBZCHc7o6G5Z3jTQQ8yVwt0lAkBmDzBL+gnqKp+YJbKPlsYyZkxhayHRlp1DQ5ZsueT 7zOU9DZauUWHj/jvB3eeEnTM0e31RUctckLFRsN92BHehHd4j4ihT1L0fwMDIendFuSC Gems4nIc7xFUTFx5e6d0IejD3ZnIs5nIAE7zk8AxniALAZeR0z60/hYmlCNz/TSFk99R XFXxcZpKxpv9g1o3Z28Izz7zUZQdiT8BVNAP4lxiec6pXOk1DWqVktfdsjJStH5AEKUy qcqA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com X-Gm-Message-State: APjAAAXDmHdlXaiCAVRvkdnZFrzHvV3LceQHCdIYBd7u4S7e6/Xtr+IP fY3iwvYPigbdvl2f32Io3Dgd/atyq4n214CFL6AAB4MXRySSqy4fLq4Rug/MqtgreAu/S8HHvM1 cCtPOHPKqvCPKlTPTYUIxma30Z4uUMGO/LRkpWiqK3faBhEnZdm2abZX/sBYs+ZTFoFdBcsGmQA 6jnxSUpV8q4L7H2rTfH+Z+4Tc0jnk56xLR9kJ725mOQLC9hnKWWSG8GQiFalIWPTr2gAj5UghOd STbw9rmFP5hB/+iJMmFDCcn2xqJyfiN6Zm+l4d0ASs9ZHJLQwxG3sdEEjN0kBklcGXu8JKYC6aU 15y7+3AZDbu6Qgx79c8iEFGm7SSKgrueW0Mh0tsiEnkbRLVzJqWOe4lKS1hKlU0NJrUrHtD3zgc Lp+eUfpQU3HN+jpaHz1rUZgBfri9ncyOWCjE1QpMQBfwUWeBnDxLYDwqsFyqEetjborj7BBbW5t zQGG8rZyKgRdbfZB6vJ+cg+cMPYsl7J1A0ptq9GVw+bB5mJZI6mv9sQxFhNp3Uru0JIvU49n6l6 zhxBfTlqUZeTX3WjmbpegUdL94ccTlNbgRHtVGo2OEB8rF+aEbh8PwjQdYghKUyCnHBziu6iKOi WKGdT1GL+AEm X-Received: by 2002:a5d:4090:: with SMTP id o16mr7646368wrp.208.1551982171122; Thu, 07 Mar 2019 10:09:31 -0800 (PST) X-Google-Smtp-Source: APXvYqyqzNbWOJoYqQpNNB3b39OmjVJX8fMudv/wdfPkvpD06Oq/B3GvtJqmAv6TdCgsJZXAk7+f X-Received: by 2002:a5d:4090:: with SMTP id o16mr7646288wrp.208.1551982169686; Thu, 07 Mar 2019 10:09:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551982169; cv=none; d=google.com; s=arc-20160816; b=wqKxUg01PFsU/JkDo2Jb6R/y1DbjYdemfyA9gzX3zfYb5/v+9XtaonBBTcXfawSVvw wNL1fqdJHA8pBjoIqTH7p9vNAP+cDMW7sqrEcTwFn3CzrNm9sOwcgnWQXFkkVMnrknzi 6FsXUmDYkNo9ij4bhRMViXgl467GtUtYEjFRDgX+ghnitgovkg33WkDy96Fi6xwu7sGA T5ZEmtGcmFhrrg0M60lr1phcytCnM7gnOvZ55lnwNwcuDfjTq8JMcsMvrnaEXmSTLZy9 hnpYHgYUcQrAzDAtM5E3aFEl//x/y6a9hT1WlYeLObKaJN7mAEe2xQ31nWsiJ53X2bvr rS+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=E1LOBPaoUcwq7EBh5jwmgNQUJ5Ql4uc2em7Nrd4127w=; b=RHDgh0ET5iXFjVc/h4CvjYIHl3r0NBBV7RUzqKBHdFhtFg7O7TPulKwMbvqgvPL9BQ vw9fdWgo4yiI3smmeWMBU6pyhyM8Im/aN3ljTp4ZGnP45KoNDd6qMJf6TWtOTSPR8rTI JiFqJcxn7G6GZ+/f2Cvdxs4aZdmVTkev6HeqMbkDMmfrtIsZ6p3CQ8dYx+atQJo/J/i9 l9B400F73snYSvUM5CJdeui/T8fOoM4JMwl16jzY45Tsx7XYaiWQLEFbDz2AOyg5dPmI SY6NaIJw30iB+MlLB645i0g5BgjNOvrVoZwKtLrv1Egwhs0IjlnNKNewkoClVImReEOJ uAEQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from youngberry.canonical.com (youngberry.canonical.com. [91.189.89.112]) by mx.google.com with ESMTPS id x9si3413134wru.443.2019.03.07.10.09.29 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 07 Mar 2019 10:09:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) client-ip=91.189.89.112; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from mail-wr1-f71.google.com ([209.85.221.71]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h1xSS-0001kE-Rh for linux-mm@kvack.org; Thu, 07 Mar 2019 18:09:28 +0000 Received: by mail-wr1-f71.google.com with SMTP id b9so8962701wrw.14 for ; Thu, 07 Mar 2019 10:09:28 -0800 (PST) X-Received: by 2002:a1c:df07:: with SMTP id w7mr6623160wmg.23.1551982168470; Thu, 07 Mar 2019 10:09:28 -0800 (PST) X-Received: by 2002:a1c:df07:: with SMTP id w7mr6623133wmg.23.1551982168042; Thu, 07 Mar 2019 10:09:28 -0800 (PST) Received: from localhost.localdomain (host22-124-dynamic.46-79-r.retail.telecomitalia.it. [79.46.124.22]) by smtp.gmail.com with ESMTPSA id a74sm7872747wma.22.2019.03.07.10.09.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Mar 2019 10:09:27 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/3] blkcg: prevent priority inversion problem during sync() Date: Thu, 7 Mar 2019 19:08:32 +0100 Message-Id: <20190307180834.22008-2-andrea.righi@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190307180834.22008-1-andrea.righi@canonical.com> References: <20190307180834.22008-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Prevent priority inversion problem when a high-priority blkcg issues a sync() and it is forced to wait the completion of all the writeback I/O generated by any other low-priority blkcg, causing massive latencies to processes that shouldn't be I/O-throttled at all. The idea is to save a list of blkcg's that are waiting for writeback: every time a sync() is executed the current blkcg is added to the list. Then, when I/O is throttled, if there's a blkcg waiting for writeback different than the current blkcg, no throttling is applied (we can probably refine this logic later, i.e., a better policy could be to adjust the throttling I/O rate using the blkcg with the highest speed from the list of waiters - priority inheritance, kinda). Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 131 +++++++++++++++++++++++++++++++ block/blk-throttle.c | 11 ++- fs/fs-writeback.c | 5 ++ fs/sync.c | 8 +- include/linux/backing-dev-defs.h | 2 + include/linux/blk-cgroup.h | 23 ++++++ mm/backing-dev.c | 2 + 7 files changed, 178 insertions(+), 4 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 2bed5725aa03..4305e78d1bb2 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1351,6 +1351,137 @@ struct cgroup_subsys io_cgrp_subsys = { }; EXPORT_SYMBOL_GPL(io_cgrp_subsys); +#ifdef CONFIG_CGROUP_WRITEBACK +struct blkcg_wb_sleeper { + struct backing_dev_info *bdi; + struct blkcg *blkcg; + refcount_t refcnt; + struct list_head node; +}; + +static DEFINE_SPINLOCK(blkcg_wb_sleeper_lock); +static LIST_HEAD(blkcg_wb_sleeper_list); + +static struct blkcg_wb_sleeper * +blkcg_wb_sleeper_find(struct blkcg *blkcg, struct backing_dev_info *bdi) +{ + struct blkcg_wb_sleeper *bws; + + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node) + if (bws->blkcg == blkcg && bws->bdi == bdi) + return bws; + return NULL; +} + +static void blkcg_wb_sleeper_add(struct blkcg_wb_sleeper *bws) +{ + list_add(&bws->node, &blkcg_wb_sleeper_list); +} + +static void blkcg_wb_sleeper_del(struct blkcg_wb_sleeper *bws) +{ + list_del_init(&bws->node); +} + +/** + * blkcg_wb_waiters_on_bdi - check for writeback waiters on a block device + * @blkcg: current blkcg cgroup + * @bdi: block device to check + * + * Return true if any other blkcg different than the current one is waiting for + * writeback on the target block device, false otherwise. + */ +bool blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi) +{ + struct blkcg_wb_sleeper *bws; + bool ret = false; + + spin_lock(&blkcg_wb_sleeper_lock); + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node) + if (bws->bdi == bdi && bws->blkcg != blkcg) { + ret = true; + break; + } + spin_unlock(&blkcg_wb_sleeper_lock); + + return ret; +} + +/** + * blkcg_start_wb_wait_on_bdi - add current blkcg to writeback waiters list + * @bdi: target block device + * + * Add current blkcg to the list of writeback waiters on target block device. + */ +void blkcg_start_wb_wait_on_bdi(struct backing_dev_info *bdi) +{ + struct blkcg_wb_sleeper *new_bws, *bws; + struct blkcg *blkcg; + + new_bws = kzalloc(sizeof(*new_bws), GFP_KERNEL); + if (unlikely(!new_bws)) + return; + + rcu_read_lock(); + blkcg = blkcg_from_current(); + if (likely(blkcg)) { + /* Check if blkcg is already sleeping on bdi */ + spin_lock(&blkcg_wb_sleeper_lock); + bws = blkcg_wb_sleeper_find(blkcg, bdi); + if (bws) { + refcount_inc(&bws->refcnt); + } else { + /* Add current blkcg as a new wb sleeper on bdi */ + css_get(&blkcg->css); + new_bws->blkcg = blkcg; + new_bws->bdi = bdi; + refcount_set(&new_bws->refcnt, 1); + blkcg_wb_sleeper_add(new_bws); + new_bws = NULL; + } + spin_unlock(&blkcg_wb_sleeper_lock); + } + rcu_read_unlock(); + + kfree(new_bws); +} + +/** + * blkcg_stop_wb_wait_on_bdi - remove current blkcg from writeback waiters list + * @bdi: target block device + * + * Remove current blkcg from the list of writeback waiters on target block + * device. + */ +void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi) +{ + struct blkcg_wb_sleeper *bws = NULL; + struct blkcg *blkcg; + + rcu_read_lock(); + blkcg = blkcg_from_current(); + if (!blkcg) { + rcu_read_unlock(); + return; + } + spin_lock(&blkcg_wb_sleeper_lock); + bws = blkcg_wb_sleeper_find(blkcg, bdi); + if (unlikely(!bws)) { + /* blkcg_start/stop_wb_wait_on_bdi() mismatch */ + WARN_ON(1); + goto out_unlock; + } + if (refcount_dec_and_test(&bws->refcnt)) { + blkcg_wb_sleeper_del(bws); + css_put(&blkcg->css); + kfree(bws); + } +out_unlock: + spin_unlock(&blkcg_wb_sleeper_lock); + rcu_read_unlock(); +} +#endif + /** * blkcg_activate_policy - activate a blkcg policy on a request_queue * @q: request_queue of interest diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 1b97a73d2fb1..da817896cded 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -970,9 +970,13 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, { bool rw = bio_data_dir(bio); unsigned long bps_wait = 0, iops_wait = 0, max_wait = 0; + struct throtl_data *td = tg->td; + struct request_queue *q = td->queue; + struct backing_dev_info *bdi = q->backing_dev_info; + struct blkcg_gq *blkg = tg_to_blkg(tg); /* - * Currently whole state machine of group depends on first bio + * Currently whole state machine of group depends on first bio * queued in the group bio list. So one should not be calling * this function with a different bio if there are other bios * queued. @@ -981,8 +985,9 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, bio != throtl_peek_queued(&tg->service_queue.queued[rw])); /* If tg->bps = -1, then BW is unlimited */ - if (tg_bps_limit(tg, rw) == U64_MAX && - tg_iops_limit(tg, rw) == UINT_MAX) { + if (blkcg_wb_waiters_on_bdi(blkg->blkcg, bdi) || + (tg_bps_limit(tg, rw) == U64_MAX && + tg_iops_limit(tg, rw) == UINT_MAX)) { if (wait) *wait = 0; return true; diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 36855c1f8daf..77c039a0ec25 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "internal.h" /* @@ -2446,6 +2447,8 @@ void sync_inodes_sb(struct super_block *sb) return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); + blkcg_start_wb_wait_on_bdi(bdi); + /* protect against inode wb switch, see inode_switch_wbs_work_fn() */ bdi_down_write_wb_switch_rwsem(bdi); bdi_split_work_to_wbs(bdi, &work, false); @@ -2453,6 +2456,8 @@ void sync_inodes_sb(struct super_block *sb) bdi_up_write_wb_switch_rwsem(bdi); wait_sb_inodes(sb); + + blkcg_stop_wb_wait_on_bdi(bdi); } EXPORT_SYMBOL(sync_inodes_sb); diff --git a/fs/sync.c b/fs/sync.c index b54e0541ad89..3958b8f98b85 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internal.h" #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ @@ -76,8 +77,13 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg) static void sync_fs_one_sb(struct super_block *sb, void *arg) { - if (!sb_rdonly(sb) && sb->s_op->sync_fs) + struct backing_dev_info *bdi = sb->s_bdi; + + if (!sb_rdonly(sb) && sb->s_op->sync_fs) { + blkcg_start_wb_wait_on_bdi(bdi); sb->s_op->sync_fs(sb, *(int *)arg); + blkcg_stop_wb_wait_on_bdi(bdi); + } } static void fdatawrite_one_bdev(struct block_device *bdev, void *arg) diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 07e02d6df5ad..095e4dd0427b 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -191,6 +191,8 @@ struct backing_dev_info { struct rb_root cgwb_congested_tree; /* their congested states */ struct mutex cgwb_release_mutex; /* protect shutdown of wb structs */ struct rw_semaphore wb_switch_rwsem; /* no cgwb switch while syncing */ + struct list_head cgwb_waiters; /* list of all waiters for writeback */ + spinlock_t cgwb_waiters_lock; /* protect cgwb_waiters list */ #else struct bdi_writeback_congested *wb_congested; #endif diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 76c61318fda5..0f7dcb70e922 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -56,6 +56,7 @@ struct blkcg { struct list_head all_blkcgs_node; #ifdef CONFIG_CGROUP_WRITEBACK + struct list_head cgwb_wait_node; struct list_head cgwb_list; refcount_t cgwb_refcnt; #endif @@ -252,6 +253,12 @@ static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css) return css ? container_of(css, struct blkcg, css) : NULL; } +static inline struct blkcg *blkcg_from_current(void) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return css_to_blkcg(blkcg_css()); +} + /** * __bio_blkcg - internal, inconsistent version to get blkcg * @@ -454,6 +461,10 @@ static inline void blkcg_cgwb_put(struct blkcg *blkcg) blkcg_destroy_blkgs(blkcg); } +bool blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi); +void blkcg_start_wb_wait_on_bdi(struct backing_dev_info *bdi); +void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi); + #else static inline void blkcg_cgwb_get(struct blkcg *blkcg) { } @@ -464,6 +475,14 @@ static inline void blkcg_cgwb_put(struct blkcg *blkcg) blkcg_destroy_blkgs(blkcg); } +static inline bool +blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi) +{ + return false; +} +static inline void blkcg_start_wb_wait_on_bdi(struct backing_dev_info *bdi) { } +static inline void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi) { } + #endif /** @@ -772,6 +791,7 @@ static inline void blkcg_bio_issue_init(struct bio *bio) static inline bool blkcg_bio_issue_check(struct request_queue *q, struct bio *bio) { + struct backing_dev_info *bdi = q->backing_dev_info; struct blkcg_gq *blkg; bool throtl = false; @@ -788,6 +808,9 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q, blkg = bio->bi_blkg; + if (blkcg_wb_waiters_on_bdi(blkg->blkcg, bdi)) + bio_set_flag(bio, BIO_THROTTLED); + throtl = blk_throtl_bio(q, blkg, bio); if (!throtl) { diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 72e6d0c55cfa..8848d26e8bf6 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -686,10 +686,12 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi) { int ret; + INIT_LIST_HEAD(&bdi->cgwb_waiters); INIT_RADIX_TREE(&bdi->cgwb_tree, GFP_ATOMIC); bdi->cgwb_congested_tree = RB_ROOT; mutex_init(&bdi->cgwb_release_mutex); init_rwsem(&bdi->wb_switch_rwsem); + spin_lock_init(&bdi->cgwb_waiters_lock); ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL); if (!ret) { From patchwork Thu Mar 7 18:08:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10843663 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC6CA139A for ; Thu, 7 Mar 2019 18:09:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9BE02F46B for ; Thu, 7 Mar 2019 18:09:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BDE602F46D; Thu, 7 Mar 2019 18:09:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A4BE2F46B for ; Thu, 7 Mar 2019 18:09:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD7608E0005; Thu, 7 Mar 2019 13:09:33 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A367C8E0002; Thu, 7 Mar 2019 13:09:33 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9293D8E0005; Thu, 7 Mar 2019 13:09:33 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by kanga.kvack.org (Postfix) with ESMTP id 3CD0A8E0002 for ; Thu, 7 Mar 2019 13:09:33 -0500 (EST) Received: by mail-wr1-f72.google.com with SMTP id v8so8876064wrt.18 for ; Thu, 07 Mar 2019 10:09:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=msHBAipTtijk0UwIEQstQw/a0b7WTAeEuEqzcOHunh8=; b=Kxjqw0x+s9fyausJ20eepqxDppqAn8LTeV5jU5SWhC3pithfYYEJ5HEGt0bDYGK4eR avjDjwDPJTRen0rMEJpPte/5okwhQ+3aFKrVjm7J5QRfcceN1OGJBhO77kRi85a8pyW9 WMzQ75opzIUWQXryyhoHCEjL17HqjxO2YX6gYJXZxQXWBjW4QOuVuNyz+smVY5LtR5z+ Gckrj8MvDo6N8HaVK8KEQhMS20MgnpbBe+VWvWOREIt1Eg86vcjq+HEHEcUq4MTYAXo1 Ui2qn47J0x28ePh1/RRLbL/gLKM7PDYc4LJTJ2bjVvR2vNuAd0xrQRmYbp/XrYKMAhQo hKnQ== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com X-Gm-Message-State: APjAAAUw8ZvZm1lRzOJ2JyH9zIIsQVAx2GwU9k6lDXwUqy+y6KpRLKgY iiqXZixTRWbl3OxLmsxMBdnZtZVKD25eI9w2Viu8Xt4dksADQwzvJsT/PfMPzXZN+EqvP3KfX4Y AowuLzS3sOjGqvI5ZLHSoIeejfQaPVgV/ICv1jKmaM+7W94qo9e9k3lxsgtBvR7EEPUZMEnGO0x 4FO+avKNrQ3MUzy/TcW57Dav38/2j8raWhw86WVBynnc3z4UMElscE591BBagoaYfeZI3JA/wZW wxaFCSnx2r4nggkB/7y29mZEtxd0W81aQZGWX1oIfzltBu/RwHAwu6fLxOalLVfURkFFXrYlJEJ 8kdTgcvlSRr6XqhLIIxOlu4yXyd6XPNPIBjVCBmC4miNPSyeeB1oCTbtx+sOOaC6ghxTxcz7CLx PtnE3yvVZI6u97N9qjMRRjTDRKIeDTWnr4Sf8VfpIS37dDACaQ4FeNLHlUDdmQBIz5V/LTdgHYz FyVfWK8jIeOVCxcVBJSbcg2USIJPQjy4Lem2OWON9AJr/smqxnFPsjSORtOEKrDR5igNP00Ybvx 7EfctHbWKvC+10V7X8YV5hA9p+g4r+RF+H0yzDw3tyC+vNdIdqZC9apYO8R511zdmpVC3Ouj8TI Dtfo210Uu51v X-Received: by 2002:a5d:6b43:: with SMTP id x3mr8465803wrw.76.1551982172478; Thu, 07 Mar 2019 10:09:32 -0800 (PST) X-Google-Smtp-Source: APXvYqzwxHtl/vZzrf1s7w7EQFN0OBHzIJJq/LJCY1BlKDWNLFBk58yNm/JZ79CSWqLqaAg1JkwJ X-Received: by 2002:a5d:6b43:: with SMTP id x3mr8465699wrw.76.1551982170743; Thu, 07 Mar 2019 10:09:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551982170; cv=none; d=google.com; s=arc-20160816; b=zBj+wcN1YM8NO/rWGQ4GHuZXQSfGaziVoELXNCGo1bcfK3xk1lpEzH7tnmCd10oExU ipw+VoG3fGWj644Ar7CxqENRRYrMYKgb91zF6+BoE/eCaU4oCVfEbRZd9JvqBvmm/a6B GXElZv8qTgSpJXc2aM7z1NaMsWGr3ESu7BPeHfBrWaIRMfXa1VQpHi1YL5uUz6SeXBWs suxlw7jGGYvLpsFKnWP+YC144JmKxM8Y26EkeIFTAcOphHEJf2G7RINlxK3fuWyqgLdS mCc7yvaq5/QAyx92Z/HPGpWLgXL/xuekzZRAsybzeeArAs9VQxMS3Zc10FQ6MwUwKsXP /ppw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=msHBAipTtijk0UwIEQstQw/a0b7WTAeEuEqzcOHunh8=; b=LBv8nIhfyCdJMRgSzM/3NeS223PDS1ZKs1PY78YlTwJBoDpJ8GKRq0YnC2JN2ib0xm 68XsbAJxb384rlzqGTMhL2y9+HVSA89NN1fzWEoXtfZPxn/D3k5oxItrzYSgmwBufhwb zkC/Biymqc6AMycD2rx7O/kyKlzac4wNO/S2tdypueUWuxSKpjPfXbbJhY54LSLKOuYs EQbh7BfohMy444hG5V5IehMBUZP9NXOqsOuOiMmk89lu+XU3Ol4ePrdFGF0bmEF9Nm7p gZRRmZtuxijUEgEVQNBd8n0lce6FxumXa6tKRAmeLfh+xv8fcaWVQGyxJv9bHM+wBXw4 JjRg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from youngberry.canonical.com (youngberry.canonical.com. [91.189.89.112]) by mx.google.com with ESMTPS id x17si3536037wrd.370.2019.03.07.10.09.30 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 07 Mar 2019 10:09:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) client-ip=91.189.89.112; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from mail-wr1-f72.google.com ([209.85.221.72]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h1xST-0001kr-Ro for linux-mm@kvack.org; Thu, 07 Mar 2019 18:09:29 +0000 Received: by mail-wr1-f72.google.com with SMTP id f5so8963090wrt.13 for ; Thu, 07 Mar 2019 10:09:29 -0800 (PST) X-Received: by 2002:a1c:48f:: with SMTP id 137mr6282793wme.21.1551982169505; Thu, 07 Mar 2019 10:09:29 -0800 (PST) X-Received: by 2002:a1c:48f:: with SMTP id 137mr6282785wme.21.1551982169309; Thu, 07 Mar 2019 10:09:29 -0800 (PST) Received: from localhost.localdomain (host22-124-dynamic.46-79-r.retail.telecomitalia.it. [79.46.124.22]) by smtp.gmail.com with ESMTPSA id a74sm7872747wma.22.2019.03.07.10.09.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Mar 2019 10:09:28 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/3] blkcg: introduce io.sync_isolation Date: Thu, 7 Mar 2019 19:08:33 +0100 Message-Id: <20190307180834.22008-3-andrea.righi@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190307180834.22008-1-andrea.righi@canonical.com> References: <20190307180834.22008-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add a flag to the blkcg cgroups to make sync()'ers in a cgroup only be allowed to write out pages that have been dirtied by the cgroup itself. This flag is disabled by default (meaning that we are not changing the previous behavior by default). When this flag is enabled any cgroup can write out only dirty pages that belong to the cgroup itself (except for the root cgroup that would still be able to write out all pages globally). Signed-off-by: Andrea Righi Reviewed-by: Josef Bacik --- Documentation/admin-guide/cgroup-v2.rst | 9 ++++++ block/blk-throttle.c | 37 +++++++++++++++++++++++++ include/linux/blk-cgroup.h | 7 +++++ 3 files changed, 53 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 53d3288c328b..17fff0ee97b8 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1448,6 +1448,15 @@ IO Interface Files Shows pressure stall information for IO. See Documentation/accounting/psi.txt for details. + io.sync_isolation + A flag (0|1) that determines whether a cgroup is allowed to write out + only pages that have been dirtied by the cgroup itself. This option is + set to false (0) by default, meaning that any cgroup would try to write + out dirty pages globally, even those that have been dirtied by other + cgroups. + + Setting this option to true (1) provides a better isolation across + cgroups that are doing an intense write I/O activity. Writeback ~~~~~~~~~ diff --git a/block/blk-throttle.c b/block/blk-throttle.c index da817896cded..4bc3b40a4d93 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1704,6 +1704,35 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, return ret ?: nbytes; } +#ifdef CONFIG_CGROUP_WRITEBACK +static int sync_isolation_show(struct seq_file *sf, void *v) +{ + struct blkcg *blkcg = css_to_blkcg(seq_css(sf)); + + seq_printf(sf, "%d\n", test_bit(BLKCG_SYNC_ISOLATION, &blkcg->flags)); + return 0; +} + +static ssize_t sync_isolation_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct blkcg *blkcg = css_to_blkcg(of_css(of)); + unsigned long val; + int err; + + buf = strstrip(buf); + err = kstrtoul(buf, 0, &val); + if (err) + return err; + if (val) + set_bit(BLKCG_SYNC_ISOLATION, &blkcg->flags); + else + clear_bit(BLKCG_SYNC_ISOLATION, &blkcg->flags); + + return nbytes; +} +#endif + static struct cftype throtl_files[] = { #ifdef CONFIG_BLK_DEV_THROTTLING_LOW { @@ -1721,6 +1750,14 @@ static struct cftype throtl_files[] = { .write = tg_set_limit, .private = LIMIT_MAX, }, +#ifdef CONFIG_CGROUP_WRITEBACK + { + .name = "sync_isolation", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = sync_isolation_show, + .write = sync_isolation_write, + }, +#endif { } /* terminate */ }; diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 0f7dcb70e922..6ac5aa049334 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -44,6 +44,12 @@ enum blkg_rwstat_type { struct blkcg_gq; +/* blkcg->flags */ +enum { + /* sync()'ers allowed to write out pages dirtied by the blkcg */ + BLKCG_SYNC_ISOLATION, +}; + struct blkcg { struct cgroup_subsys_state css; spinlock_t lock; @@ -55,6 +61,7 @@ struct blkcg { struct blkcg_policy_data *cpd[BLKCG_MAX_POLS]; struct list_head all_blkcgs_node; + unsigned long flags; #ifdef CONFIG_CGROUP_WRITEBACK struct list_head cgwb_wait_node; struct list_head cgwb_list; From patchwork Thu Mar 7 18:08:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Righi X-Patchwork-Id: 10843665 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBF8514DE for ; Thu, 7 Mar 2019 18:09:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5B712F46B for ; Thu, 7 Mar 2019 18:09:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9A4AB2F46D; Thu, 7 Mar 2019 18:09:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9B6D2F46B for ; Thu, 7 Mar 2019 18:09:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C5C958E0006; Thu, 7 Mar 2019 13:09:35 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C123A8E0002; Thu, 7 Mar 2019 13:09:35 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB5948E0006; Thu, 7 Mar 2019 13:09:35 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by kanga.kvack.org (Postfix) with ESMTP id 53D7A8E0002 for ; Thu, 7 Mar 2019 13:09:35 -0500 (EST) Received: by mail-wm1-f72.google.com with SMTP id b197so3283005wmb.9 for ; Thu, 07 Mar 2019 10:09:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=CNOTSLqaS5V7TljZnZkBn/fAkhgWZBqkz11C4gbNPOw=; b=kQgv/PnUpUzivmPd9AEXrdf+wEiRZdLiiBTQWj4ggH13lTmVOfYkrvGjttF17nnxJu b1bxwUUfpauUqXX42QQJI9f/sm2f1kkoL9MhOGunkZbBZTMvvMOQ2/0Q1zqlsiub5RZD M90l53dmn4qkwS6fju3EzOGapTJtfyqJBRIhA5vj0lgKryaUO/GszT99vLPYcqyDrp98 karwPZH1pag6UdAeMa/Nfmf0hefzHeSdZn1tcvnfvJhYxFszqD9kn1OHf4ADrEhx9OWh u+Ft+px9olG0pOI3LVy/SjG7e9bp5qpgufFZi6Ir03dSFlwtFnG7f59XUt9IM+ARr2Fm 3Fgg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com X-Gm-Message-State: APjAAAXnxRxHndVIAB595rAENQoxV2APhzpVhiZRnzYNhUdUME6mO0Fr lSQDGnvqQS4C6tl+QtXvAKwkVgu9AP94l3FhacZ70jZCEb9oRpuZ13WaS8YyaB5QC7Ub2heCK66 164qwMrz5/1DouNlaOTBB58uP1BqcCnyKiwMIEwzThu7qy3APajjH6kN7HJMqyF4P9YX+89X8G7 rVamsJQ3YMk3kHeog3ZAZGbyIELD5eCjJCcgVNRtl+0wUXvQz3WTIxFxX75kiHi38pmhwqUmjCY dUGmIbypzh44C/VEI1MfFoK2LlclDR/a7Lqad3o6UFse/C6iBErQy4hf6rjxw8jkCjFIAil+NI6 xtSr8TMcwicaBRG2EFBWVquNbW5oq8HXz6aZYeooqv/bplcMuNXOYkKoUnY19fGhNfS5b4xRqOI wOf1FI+rVqOT8v2YCVh+kNPHEFgiGdAeU1Bixi1XROX4k+v93/g5s4c8RmC43Kkwr7sjYWwJgBZ nJzfQh2JuNZ5tOcnzdHHaqSdfuD6U77Re/nZwZOhXbXXm0562wJzhZc1MPQ+fXwInXl4YKy77cn UEv5ZOuqAr0kuaB8kdmyeIdbXTFGrqwZ5qCBV/VSY50t81Wum2tis3iZVvxLve5+Vgh3j59ZekA 00BDq8S9BgJP X-Received: by 2002:a1c:55c3:: with SMTP id j186mr6922746wmb.5.1551982174515; Thu, 07 Mar 2019 10:09:34 -0800 (PST) X-Google-Smtp-Source: APXvYqyY4SX5UFDWMtG/A3g54MZrvVuwZzJVhBeeg6Dp1AU7xps0XrEfb5KzV+xEsn7cbg0GFgjj X-Received: by 2002:a1c:55c3:: with SMTP id j186mr6922645wmb.5.1551982172533; Thu, 07 Mar 2019 10:09:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551982172; cv=none; d=google.com; s=arc-20160816; b=nDJHYmft0t+yj6Sw052lP2u4sJZateEnVBp7Sdo8wFSxXI0SkDqhq2dAEuhtTXK2Rs 0gxN3UIDEXieJz4CTXe3u2O3FrixiJVo7Ui+YCpa6U8lophfZOaV0/2IicWo7WAOR1Qk edEDA8UQD77QTnr6r9Z9txXS+Mm3jRNa+MLPSzJaYxLdO3cfMHx3zgQkfjpn7+n66KNn oM4OQj1X+8NNwetL/yRNJyrkoKVZzg761+qYI0azS4GEC+dycyhhKT2EHPmTXlFGgwgF 1zOuBTDjzDjl0OLS1A80Z6RV95Lb1rwTPx2e85HiHmRljjvC4rBu9LWrt9GyeDn4FvMP nGvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=CNOTSLqaS5V7TljZnZkBn/fAkhgWZBqkz11C4gbNPOw=; b=MvSf3ZWe3psM2hiqFC366S2KpSJqGqliT25pZwISWFxEsoLc+0rBMsvXdA87JyJC/C SFJoMJ3f6rIglO0uMPh7eCFGigsHX5ZCfhnl0v0clbSk2pulY6KY5kLLyriJCZkdfsGq DtWi3jiUsYN04CCzTIzMn4V4IQlepP3/JMR460Ps0N/O2dDjWG7eXAQ8xwkHjFdMy7M5 vgoYmj8iY0a/B0gILhA7ngcH7UtNT9eG8PmZJ3ta94YqGLqt+ZyoLHNlHvB4N5fSqNJV tkRFwwyXvE2Hz2YMRRmSyC2RROVtingCm0vndF8SploJ+PdF95RfKn0li7GtbHl+x9FU U5ng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from youngberry.canonical.com (youngberry.canonical.com. [91.189.89.112]) by mx.google.com with ESMTPS id r5si3214577wme.163.2019.03.07.10.09.32 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 07 Mar 2019 10:09:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) client-ip=91.189.89.112; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of andrea.righi@canonical.com designates 91.189.89.112 as permitted sender) smtp.mailfrom=andrea.righi@canonical.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from mail-wr1-f71.google.com ([209.85.221.71]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h1xSW-0001lX-0f for linux-mm@kvack.org; Thu, 07 Mar 2019 18:09:32 +0000 Received: by mail-wr1-f71.google.com with SMTP id e18so8927692wrw.10 for ; Thu, 07 Mar 2019 10:09:32 -0800 (PST) X-Received: by 2002:a1c:a186:: with SMTP id k128mr6422924wme.54.1551982171147; Thu, 07 Mar 2019 10:09:31 -0800 (PST) X-Received: by 2002:a1c:a186:: with SMTP id k128mr6422908wme.54.1551982170847; Thu, 07 Mar 2019 10:09:30 -0800 (PST) Received: from localhost.localdomain (host22-124-dynamic.46-79-r.retail.telecomitalia.it. [79.46.124.22]) by smtp.gmail.com with ESMTPSA id a74sm7872747wma.22.2019.03.07.10.09.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Mar 2019 10:09:30 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 3/3] blkcg: implement sync() isolation Date: Thu, 7 Mar 2019 19:08:34 +0100 Message-Id: <20190307180834.22008-4-andrea.righi@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190307180834.22008-1-andrea.righi@canonical.com> References: <20190307180834.22008-1-andrea.righi@canonical.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Keep track of the inodes that have been dirtied by each blkcg cgroup and make sure that a blkcg issuing a sync() can trigger the writeback + wait of only those pages that belong to the cgroup itself. This behavior is applied only when io.sync_isolation is enabled in the cgroup, otherwise the old behavior is applied: sync() triggers the writeback of any dirty page. Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 47 ++++++++++++++++++++++++++++++++++ fs/fs-writeback.c | 52 +++++++++++++++++++++++++++++++++++--- fs/inode.c | 1 + include/linux/blk-cgroup.h | 22 ++++++++++++++++ include/linux/fs.h | 4 +++ mm/page-writeback.c | 1 + 6 files changed, 124 insertions(+), 3 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 4305e78d1bb2..7d3b26ba4575 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1480,6 +1480,53 @@ void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi) spin_unlock(&blkcg_wb_sleeper_lock); rcu_read_unlock(); } + +/** + * blkcg_set_mapping_dirty - set owner of a dirty mapping + * @mapping: target address space + * + * Set the current blkcg as the owner of the address space @mapping (the first + * blkcg that dirties @mapping becomes the owner). + */ +void blkcg_set_mapping_dirty(struct address_space *mapping) +{ + struct blkcg *curr_blkcg, *blkcg; + + if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK) || + mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) + return; + + rcu_read_lock(); + curr_blkcg = blkcg_from_current(); + blkcg = blkcg_from_mapping(mapping); + if (curr_blkcg != blkcg) { + if (blkcg) + css_put(&blkcg->css); + css_get(&curr_blkcg->css); + rcu_assign_pointer(mapping->i_blkcg, curr_blkcg); + } + rcu_read_unlock(); +} + +/** + * blkcg_set_mapping_clean - clear the owner of a dirty mapping + * @mapping: target address space + * + * Unset the owner of @mapping when it becomes clean. + */ + +void blkcg_set_mapping_clean(struct address_space *mapping) +{ + struct blkcg *blkcg; + + rcu_read_lock(); + blkcg = rcu_dereference(mapping->i_blkcg); + if (blkcg) { + css_put(&blkcg->css); + RCU_INIT_POINTER(mapping->i_blkcg, NULL); + } + rcu_read_unlock(); +} #endif /** diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 77c039a0ec25..d003d0593f41 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -58,6 +58,9 @@ struct wb_writeback_work { struct list_head list; /* pending work list */ struct wb_completion *done; /* set if the caller waits */ +#ifdef CONFIG_CGROUP_WRITEBACK + struct blkcg *blkcg; +#endif }; /* @@ -916,6 +919,29 @@ static int __init cgroup_writeback_init(void) } fs_initcall(cgroup_writeback_init); +static void blkcg_set_sync_domain(struct wb_writeback_work *work) +{ + rcu_read_lock(); + work->blkcg = blkcg_from_current(); + rcu_read_unlock(); +} + +static bool blkcg_same_sync_domain(struct wb_writeback_work *work, + struct address_space *mapping) +{ + struct blkcg *blkcg; + + if (!work->blkcg || work->blkcg == &blkcg_root) + return true; + if (!test_bit(BLKCG_SYNC_ISOLATION, &work->blkcg->flags)) + return true; + rcu_read_lock(); + blkcg = blkcg_from_mapping(mapping); + rcu_read_unlock(); + + return blkcg == work->blkcg; +} + #else /* CONFIG_CGROUP_WRITEBACK */ static void bdi_down_write_wb_switch_rwsem(struct backing_dev_info *bdi) { } @@ -959,6 +985,15 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, } } +static void blkcg_set_sync_domain(struct wb_writeback_work *work) +{ +} + +static bool blkcg_same_sync_domain(struct wb_writeback_work *work, + struct address_space *mapping) +{ + return true; +} #endif /* CONFIG_CGROUP_WRITEBACK */ /* @@ -1131,7 +1166,7 @@ static int move_expired_inodes(struct list_head *delaying_queue, LIST_HEAD(tmp); struct list_head *pos, *node; struct super_block *sb = NULL; - struct inode *inode; + struct inode *inode, *next; int do_sb_sort = 0; int moved = 0; @@ -1141,11 +1176,12 @@ static int move_expired_inodes(struct list_head *delaying_queue, expire_time = jiffies - (dirtytime_expire_interval * HZ); older_than_this = &expire_time; } - while (!list_empty(delaying_queue)) { - inode = wb_inode(delaying_queue->prev); + list_for_each_entry_safe(inode, next, delaying_queue, i_io_list) { if (older_than_this && inode_dirtied_after(inode, *older_than_this)) break; + if (!blkcg_same_sync_domain(work, inode->i_mapping)) + continue; list_move(&inode->i_io_list, &tmp); moved++; if (flags & EXPIRE_DIRTY_ATIME) @@ -1560,6 +1596,15 @@ static long writeback_sb_inodes(struct super_block *sb, break; } + /* + * Only write out inodes that belong to the blkcg that issued + * the sync(). + */ + if (!blkcg_same_sync_domain(work, inode->i_mapping)) { + redirty_tail(inode, wb); + continue; + } + /* * Don't bother with new inodes or inodes being freed, first * kind does not need periodic writeout yet, and for the latter @@ -2447,6 +2492,7 @@ void sync_inodes_sb(struct super_block *sb) return; WARN_ON(!rwsem_is_locked(&sb->s_umount)); + blkcg_set_sync_domain(&work); blkcg_start_wb_wait_on_bdi(bdi); /* protect against inode wb switch, see inode_switch_wbs_work_fn() */ diff --git a/fs/inode.c b/fs/inode.c index e9d97add2b36..b9659aaa8546 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -564,6 +564,7 @@ static void evict(struct inode *inode) bd_forget(inode); if (S_ISCHR(inode->i_mode) && inode->i_cdev) cd_forget(inode); + blkcg_set_mapping_clean(&inode->i_data); remove_inode_hash(inode); diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 6ac5aa049334..a2bcc83c8c3e 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -441,6 +441,15 @@ extern void blkcg_destroy_blkgs(struct blkcg *blkcg); #ifdef CONFIG_CGROUP_WRITEBACK +static inline struct blkcg *blkcg_from_mapping(struct address_space *mapping) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return rcu_dereference(mapping->i_blkcg); +} + +void blkcg_set_mapping_dirty(struct address_space *mapping); +void blkcg_set_mapping_clean(struct address_space *mapping); + /** * blkcg_cgwb_get - get a reference for blkcg->cgwb_list * @blkcg: blkcg of interest @@ -474,6 +483,19 @@ void blkcg_stop_wb_wait_on_bdi(struct backing_dev_info *bdi); #else +static inline struct blkcg *blkcg_from_mapping(struct address_space *mapping) +{ + return NULL; +} + +static inline void blkcg_set_mapping_dirty(struct address_space *mapping) +{ +} + +static inline void blkcg_set_mapping_clean(struct address_space *mapping) +{ +} + static inline void blkcg_cgwb_get(struct blkcg *blkcg) { } static inline void blkcg_cgwb_put(struct blkcg *blkcg) diff --git a/include/linux/fs.h b/include/linux/fs.h index 08f26046233e..19e99b4a9fa2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -420,6 +420,7 @@ int pagecache_write_end(struct file *, struct address_space *mapping, * @nrpages: Number of page entries, protected by the i_pages lock. * @nrexceptional: Shadow or DAX entries, protected by the i_pages lock. * @writeback_index: Writeback starts here. + * @i_blkcg: blkcg owner (that dirtied the address_space) * @a_ops: Methods. * @flags: Error bits and flags (AS_*). * @wb_err: The most recent error which has occurred. @@ -438,6 +439,9 @@ struct address_space { unsigned long nrexceptional; pgoff_t writeback_index; const struct address_space_operations *a_ops; +#ifdef CONFIG_CGROUP_WRITEBACK + struct blkcg __rcu *i_blkcg; +#endif unsigned long flags; errseq_t wb_err; spinlock_t private_lock; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 9f61dfec6a1f..e16574f946a7 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2418,6 +2418,7 @@ void account_page_dirtied(struct page *page, struct address_space *mapping) inode_attach_wb(inode, page); wb = inode_to_wb(inode); + blkcg_set_mapping_dirty(mapping); __inc_lruvec_page_state(page, NR_FILE_DIRTY); __inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); __inc_node_page_state(page, NR_DIRTIED);