From patchwork Wed May 13 02:00:21 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gui Jianfeng X-Patchwork-Id: 26255 Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n4R0SZfX010716 for ; Wed, 27 May 2009 00:28:36 GMT Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110]) by hormel.redhat.com (Postfix) with ESMTP id D68BB8E013A; Tue, 26 May 2009 20:28:34 -0400 (EDT) Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id n4D21ZRU005752 for ; Tue, 12 May 2009 22:01:36 -0400 Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n4D21XEj031148; Tue, 12 May 2009 22:01:33 -0400 Received: from song.cn.fujitsu.com (cn.fujitsu.com [222.73.24.84] (may be forged)) by mx3.redhat.com (8.13.8/8.13.8) with ESMTP id n4D21Jqm024147; Tue, 12 May 2009 22:01:19 -0400 Received: from tang.cn.fujitsu.com (tang.cn.fujitsu.com [10.167.250.3]) by song.cn.fujitsu.com (Postfix) with ESMTP id 111A0170142; Wed, 13 May 2009 10:02:21 +0800 (CST) Received: from fnst.cn.fujitsu.com (localhost.localdomain [127.0.0.1]) by tang.cn.fujitsu.com (8.13.1/8.13.1) with ESMTP id n4D21D3B028701; Wed, 13 May 2009 10:01:13 +0800 Received: from [127.0.0.1] (unknown [10.167.141.226]) by fnst.cn.fujitsu.com (Postfix) with ESMTPA id 4BE7BD4038; Wed, 13 May 2009 10:10:28 +0800 (CST) Message-ID: <4A0A29B5.7030109@cn.fujitsu.com> Date: Wed, 13 May 2009 10:00:21 +0800 From: Gui Jianfeng User-Agent: Thunderbird 2.0.0.5 (Windows/20070716) MIME-Version: 1.0 To: Vivek Goyal References: <1241553525-28095-1-git-send-email-vgoyal@redhat.com> In-Reply-To: <1241553525-28095-1-git-send-email-vgoyal@redhat.com> X-RedHat-Spam-Score: -0.697 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 X-Scanned-By: MIMEDefang 2.63 on 172.16.48.32 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Tue, 26 May 2009 20:28:27 -0400 Cc: dhaval@linux.vnet.ibm.com, snitzer@redhat.com, dm-devel@redhat.com, dpshah@google.com, jens.axboe@oracle.com, agk@redhat.com, balbir@linux.vnet.ibm.com, paolo.valente@unimore.it, fernando@oss.ntt.co.jp, mikew@google.com, jmoyer@redhat.com, nauman@google.com, m-ikeda@ds.jp.nec.com, lizf@cn.fujitsu.com, fchecconi@gmail.com, s-uchida@ap.jp.nec.com, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, righi.andrea@gmail.com Subject: [dm-devel] [PATCH] IO Controller: Add per-device weight and ioprio_class handling X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.5 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com Hi Vivek, This patch enables per-cgroup per-device weight and ioprio_class handling. A new cgroup interface "policy" is introduced. You can make use of this file to configure weight and ioprio_class for each device in a given cgroup. The original "weight" and "ioprio_class" files are still available. If you don't do special configuration for a particular device, "weight" and "ioprio_class" are used as default values in this device. You can use the following format to play with the new interface. #echo DEV:weight:ioprio_class > /patch/to/cgroup/policy weight=0 means removing the policy for DEV. Examples: Configure weight=300 ioprio_class=2 on /dev/hdb in this cgroup # echo /dev/hdb:300:2 > io.policy # cat io.policy dev weight class /dev/hdb 300 2 Configure weight=500 ioprio_class=1 on /dev/hda in this cgroup # echo /dev/hda:500:1 > io.policy # cat io.policy dev weight class /dev/hda 500 1 /dev/hdb 300 2 Remove the policy for /dev/hda in this cgroup # echo /dev/hda:0:1 > io.policy # cat io.policy dev weight class /dev/hdb 300 2 Signed-off-by: Gui Jianfeng --- block/elevator-fq.c | 239 +++++++++++++++++++++++++++++++++++++++++++++++++- block/elevator-fq.h | 11 +++ 2 files changed, 245 insertions(+), 5 deletions(-) diff --git a/block/elevator-fq.c b/block/elevator-fq.c index 69435ab..7c95d55 100644 --- a/block/elevator-fq.c +++ b/block/elevator-fq.c @@ -12,6 +12,9 @@ #include "elevator-fq.h" #include #include +#include +#include + /* Values taken from cfq */ const int elv_slice_sync = HZ / 10; @@ -1045,12 +1048,30 @@ struct io_group *io_lookup_io_group_current(struct request_queue *q) } EXPORT_SYMBOL(io_lookup_io_group_current); -void io_group_init_entity(struct io_cgroup *iocg, struct io_group *iog) +static struct policy_node *policy_search_node(const struct io_cgroup *iocg, + void *key); + +void io_group_init_entity(struct io_cgroup *iocg, struct io_group *iog, + void *key) { struct io_entity *entity = &iog->entity; + struct policy_node *pn; + + spin_lock_irq(&iocg->lock); + pn = policy_search_node(iocg, key); + if (pn) { + entity->weight = pn->weight; + entity->new_weight = pn->weight; + entity->ioprio_class = pn->ioprio_class; + entity->new_ioprio_class = pn->ioprio_class; + } else { + entity->weight = iocg->weight; + entity->new_weight = iocg->weight; + entity->ioprio_class = iocg->ioprio_class; + entity->new_ioprio_class = iocg->ioprio_class; + } + spin_unlock_irq(&iocg->lock); - entity->weight = entity->new_weight = iocg->weight; - entity->ioprio_class = entity->new_ioprio_class = iocg->ioprio_class; entity->ioprio_changed = 1; entity->my_sched_data = &iog->sched_data; } @@ -1263,7 +1284,7 @@ struct io_group *io_group_chain_alloc(struct request_queue *q, void *key, atomic_set(&iog->ref, 0); iog->deleting = 0; - io_group_init_entity(iocg, iog); + io_group_init_entity(iocg, iog, key); iog->my_entity = &iog->entity; #ifdef CONFIG_DEBUG_GROUP_IOSCHED iog->iocg_id = css_id(&iocg->css); @@ -1549,8 +1570,208 @@ struct io_group *io_alloc_root_group(struct request_queue *q, return iog; } +static int io_cgroup_policy_read(struct cgroup *cgrp, struct cftype *cft, + struct seq_file *m) +{ + struct io_cgroup *iocg; + struct policy_node *pn; + + iocg = cgroup_to_io_cgroup(cgrp); + + if (list_empty(&iocg->list)) + goto out; + + seq_printf(m, "dev weight class\n"); + + spin_lock_irq(&iocg->lock); + list_for_each_entry(pn, &iocg->list, node) { + seq_printf(m, "%s %lu %lu\n", pn->dev_name, + pn->weight, pn->ioprio_class); + } + spin_unlock_irq(&iocg->lock); +out: + return 0; +} + +static inline void policy_insert_node(struct io_cgroup *iocg, + struct policy_node *pn) +{ + list_add(&pn->node, &iocg->list); +} + +/* Must be called with iocg->lock held */ +static inline void policy_delete_node(struct policy_node *pn) +{ + list_del(&pn->node); +} + +/* Must be called with iocg->lock held */ +static struct policy_node *policy_search_node(const struct io_cgroup *iocg, + void *key) +{ + struct policy_node *pn; + + if (list_empty(&iocg->list)) + return NULL; + + list_for_each_entry(pn, &iocg->list, node) { + if (pn->key == key) + return pn; + } + + return NULL; +} + +static void *devname_to_efqd(const char *buf) +{ + struct block_device *bdev; + void *key = NULL; + struct gendisk *disk; + int part; + + bdev = lookup_bdev(buf); + if (IS_ERR(bdev)) + return NULL; + + disk = get_gendisk(bdev->bd_dev, &part); + key = (void *)&disk->queue->elevator->efqd; + bdput(bdev); + + return key; +} + +static int policy_parse_and_set(char *buf, struct policy_node *newpn) +{ + char *s[3]; + char *p; + int ret; + int i = 0; + + memset(s, 0, sizeof(s)); + while (i < ARRAY_SIZE(s)) { + p = strsep(&buf, ":"); + if (!p) + break; + if (!*p) + continue; + s[i++] = p; + } + + newpn->key = devname_to_efqd(s[0]); + if (!newpn->key) + return -EINVAL; + + strcpy(newpn->dev_name, s[0]); + + ret = strict_strtoul(s[1], 10, &newpn->weight); + if (ret || newpn->weight > WEIGHT_MAX) + return -EINVAL; + + ret = strict_strtoul(s[2], 10, &newpn->ioprio_class); + if (ret || newpn->ioprio_class < IOPRIO_CLASS_RT || + newpn->ioprio_class > IOPRIO_CLASS_IDLE) + return -EINVAL; + + return 0; +} + +static int io_cgroup_policy_write(struct cgroup *cgrp, struct cftype *cft, + const char *buffer) +{ + struct io_cgroup *iocg; + struct policy_node *newpn, *pn; + char *buf; + int ret = 0; + int keep_newpn = 0; + struct hlist_node *n; + struct io_group *iog; + + buf = kstrdup(buffer, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + newpn = kzalloc(sizeof(*newpn), GFP_KERNEL); + if (!newpn) { + ret = -ENOMEM; + goto free_buf; + } + + ret = policy_parse_and_set(buf, newpn); + if (ret) + goto free_newpn; + + if (!cgroup_lock_live_group(cgrp)) { + ret = -ENODEV; + goto free_newpn; + } + + iocg = cgroup_to_io_cgroup(cgrp); + spin_lock_irq(&iocg->lock); + + pn = policy_search_node(iocg, newpn->key); + if (!pn) { + if (newpn->weight != 0) { + policy_insert_node(iocg, newpn); + keep_newpn = 1; + } + goto update_io_group; + } + + if (newpn->weight == 0) { + /* weight == 0 means deleteing a policy */ + policy_delete_node(pn); + goto update_io_group; + } + + pn->weight = newpn->weight; + pn->ioprio_class = newpn->ioprio_class; + +update_io_group: + hlist_for_each_entry(iog, n, &iocg->group_data, group_node) { + if (iog->key == newpn->key) { + if (newpn->weight) { + iog->entity.new_weight = newpn->weight; + iog->entity.new_ioprio_class = + newpn->ioprio_class; + /* + * iog weight and ioprio_class updating + * actually happens if ioprio_changed is set. + * So ensure ioprio_changed is not set until + * new weight and new ioprio_class are updated. + */ + smp_wmb(); + iog->entity.ioprio_changed = 1; + } else { + iog->entity.new_weight = iocg->weight; + iog->entity.new_ioprio_class = + iocg->ioprio_class; + + /* The same as above */ + smp_wmb(); + iog->entity.ioprio_changed = 1; + } + } + } + spin_unlock_irq(&iocg->lock); + + cgroup_unlock(); + +free_newpn: + if (!keep_newpn) + kfree(newpn); +free_buf: + kfree(buf); + return ret; +} + struct cftype bfqio_files[] = { { + .name = "policy", + .read_seq_string = io_cgroup_policy_read, + .write_string = io_cgroup_policy_write, + .max_write_len = 256, + }, + { .name = "weight", .read_u64 = io_cgroup_weight_read, .write_u64 = io_cgroup_weight_write, @@ -1592,6 +1813,7 @@ struct cgroup_subsys_state *iocg_create(struct cgroup_subsys *subsys, INIT_HLIST_HEAD(&iocg->group_data); iocg->weight = IO_DEFAULT_GRP_WEIGHT; iocg->ioprio_class = IO_DEFAULT_GRP_CLASS; + INIT_LIST_HEAD(&iocg->list); return &iocg->css; } @@ -1750,6 +1972,7 @@ void iocg_destroy(struct cgroup_subsys *subsys, struct cgroup *cgroup) unsigned long flags, flags1; int queue_lock_held = 0; struct elv_fq_data *efqd; + struct policy_node *pn, *pntmp; /* * io groups are linked in two lists. One list is maintained @@ -1823,6 +2046,12 @@ locked: BUG_ON(!hlist_empty(&iocg->group_data)); free_css_id(&io_subsys, &iocg->css); + + list_for_each_entry_safe(pn, pntmp, &iocg->list, node) { + policy_delete_node(pn); + kfree(pn); + } + kfree(iocg); } @@ -2137,7 +2366,7 @@ void elv_fq_unset_request_ioq(struct request_queue *q, struct request *rq) void bfq_init_entity(struct io_entity *entity, struct io_group *iog) { entity->ioprio = entity->new_ioprio; - entity->weight = entity->new_weight; + entity->weight = entity->new_weigh; entity->ioprio_class = entity->new_ioprio_class; entity->sched_data = &iog->sched_data; } diff --git a/block/elevator-fq.h b/block/elevator-fq.h index db3a347..0407633 100644 --- a/block/elevator-fq.h +++ b/block/elevator-fq.h @@ -253,6 +253,14 @@ struct io_group { #endif }; +struct policy_node { + struct list_head node; + char dev_name[32]; + void *key; + unsigned long weight; + unsigned long ioprio_class; +}; + /** * struct bfqio_cgroup - bfq cgroup data structure. * @css: subsystem state for bfq in the containing cgroup. @@ -269,6 +277,9 @@ struct io_cgroup { unsigned long weight, ioprio_class; + /* list of policy_node */ + struct list_head list; + spinlock_t lock; struct hlist_head group_data; };