From patchwork Wed Nov 7 16:39:19 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tejun Heo X-Patchwork-Id: 1711611 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id A01F6DFB7A for ; Wed, 7 Nov 2012 16:39:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754098Ab2KGQj1 (ORCPT ); Wed, 7 Nov 2012 11:39:27 -0500 Received: from mail-da0-f46.google.com ([209.85.210.46]:37893 "EHLO mail-da0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753916Ab2KGQj0 (ORCPT ); Wed, 7 Nov 2012 11:39:26 -0500 Received: by mail-da0-f46.google.com with SMTP id n41so759920dak.19 for ; Wed, 07 Nov 2012 08:39:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=F3sJHuFRBduunfheBGRUpHh2r385/Q8XiH/O8SaQ4Lk=; b=num3T3aKoJHwiNwhoAZalkJ9Hmy2Js3e5CadU/ut5qo6zrZWZJQ7tidWrnJzbw3lBR +eUgIEY8+jT/GBgWVML7YNUVOOHVouY/oX9oH8RcIveyxgMeGiyTZjlUYiyO//vChFe1 vLCyzD0152Id0NjhoQFp1yylDUAeLOww6iBAt6nTjf08aVCfrq2YrmL93LQsRk7LwVDB tSWJMU7xCxnUGE+PYb3ZCq1cTvUvKXQJ2XyCD6NSYy7QpN9glPFO+yPY+L4HX8bowoC9 aCSFAIKo64l1a9WVSIRb7J+/hgbQ7Ic+Vq5eSam6nYXRTWYwXQCcDFgUN3bD8p/EcefP 2GDg== Received: by 10.68.228.36 with SMTP id sf4mr15359086pbc.20.1352306365444; Wed, 07 Nov 2012 08:39:25 -0800 (PST) Received: from mtj.dyndns.org (50-78-106-165-static.hfc.comcastbusiness.net. [50.78.106.165]) by mx.google.com with ESMTPS id ph7sm12927478pbb.9.2012.11.07.08.39.23 (version=SSLv3 cipher=OTHER); Wed, 07 Nov 2012 08:39:24 -0800 (PST) Date: Wed, 7 Nov 2012 08:39:19 -0800 From: Tejun Heo To: lizefan@huawei.com, mhocko@suse.cz, rjw@sisk.pl Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, fweisbec@gmail.com Subject: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support Message-ID: <20121107163919.GC2660@mtj.dyndns.org> References: <1351931915-1701-1-git-send-email-tj@kernel.org> <1351931915-1701-10-git-send-email-tj@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1351931915-1701-10-git-send-email-tj@kernel.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Up until now, cgroup_freezer didn't implement hierarchy properly. cgroups could be arranged in hierarchy but it didn't make any difference in how each cgroup_freezer behaved. They all operated separately. This patch implements proper hierarchy support. If a cgroup is frozen, all its descendants are frozen. A cgroup is thawed iff it and all its ancestors are THAWED. freezer.self_freezing shows the current freezing state for the cgroup itself. freezer.parent_freezing shows whether the cgroup is freezing because any of its ancestors is freezing. freezer_post_create() locks the parent and new cgroup and inherits the parent's state and freezer_change_state() applies new state top-down using cgroup_for_each_descendant_pre() which guarantees that no child can escape its parent's state. update_if_frozen() uses cgroup_for_each_descendant_post() to propagate frozen states bottom-up. Synchronization could be coarser and easier by using a single mutex to protect all hierarchy operations. Finer grained approach was used because it wasn't too difficult for cgroup_freezer and I think it's beneficial to have an example implementation and cgroup_freezer is rather simple and can serve a good one. As this makes cgroup_freezer properly hierarchical, freezer_subsys.broken_hierarchy marking is removed. Note that this patch changes userland visible behavior - freezing a cgroup now freezes all its descendants too. This behavior change is intended and has been warned via .broken_hierarchy. v2: Michal spotted a bug in freezer_change_state() - descendants were inheriting from the wrong ancestor. Fixed. Signed-off-by: Tejun Heo --- kernel/cgroup_freezer.c | 161 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 123 insertions(+), 38 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/kernel/cgroup_freezer.c +++ b/kernel/cgroup_freezer.c @@ -22,6 +22,13 @@ #include #include +/* + * A cgroup is freezing if any FREEZING flags are set. FREEZING_SELF is + * set if "FROZEN" is written to freezer.state cgroupfs file, and cleared + * for "THAWED". FREEZING_PARENT is set if the parent freezer is FREEZING + * for whatever reason. IOW, a cgroup has FREEZING_PARENT set if one of + * its ancestors has FREEZING_SELF set. + */ enum freezer_state_flags { CGROUP_FREEZER_ONLINE = (1 << 0), /* freezer is fully online */ CGROUP_FREEZING_SELF = (1 << 1), /* this freezer is freezing */ @@ -50,6 +57,15 @@ static inline struct freezer *task_freez struct freezer, css); } +static struct freezer *parent_freezer(struct freezer *freezer) +{ + struct cgroup *pcg = freezer->css.cgroup->parent; + + if (pcg) + return cgroup_freezer(pcg); + return NULL; +} + bool cgroup_freezing(struct task_struct *task) { bool ret; @@ -74,17 +90,6 @@ static const char *freezer_state_strs(un return "THAWED"; }; -/* - * State diagram - * Transitions are caused by userspace writes to the freezer.state file. - * The values in parenthesis are state labels. The rest are edge labels. - * - * (THAWED) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN) - * ^ ^ | | - * | \_______THAWED_______/ | - * \__________________________THAWED____________/ - */ - struct cgroup_subsys freezer_subsys; static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup) @@ -103,15 +108,34 @@ static struct cgroup_subsys_state *freez * freezer_post_create - commit creation of a freezer cgroup * @cgroup: cgroup being created * - * We're committing to creation of @cgroup. Mark it online. + * We're committing to creation of @cgroup. Mark it online and inherit + * parent's freezing state while holding both parent's and our + * freezer->lock. */ static void freezer_post_create(struct cgroup *cgroup) { struct freezer *freezer = cgroup_freezer(cgroup); + struct freezer *parent = parent_freezer(freezer); + + /* + * The following double locking and freezing state inheritance + * guarantee that @cgroup can never escape ancestors' freezing + * states. See cgroup_for_each_descendant_pre() for details. + */ + if (parent) + spin_lock_irq(&parent->lock); + spin_lock_nested(&freezer->lock, SINGLE_DEPTH_NESTING); - spin_lock_irq(&freezer->lock); freezer->state |= CGROUP_FREEZER_ONLINE; - spin_unlock_irq(&freezer->lock); + + if (parent && (parent->state & CGROUP_FREEZING)) { + freezer->state |= CGROUP_FREEZING_PARENT | CGROUP_FROZEN; + atomic_inc(&system_freezing_cnt); + } + + spin_unlock(&freezer->lock); + if (parent) + spin_unlock_irq(&parent->lock); } /** @@ -153,6 +177,7 @@ static void freezer_attach(struct cgroup { struct freezer *freezer = cgroup_freezer(new_cgrp); struct task_struct *task; + bool clear_frozen = false; spin_lock_irq(&freezer->lock); @@ -172,10 +197,25 @@ static void freezer_attach(struct cgroup } else { freeze_task(task); freezer->state &= ~CGROUP_FROZEN; + clear_frozen = true; } } spin_unlock_irq(&freezer->lock); + + /* + * Propagate FROZEN clearing upwards. We may race with + * update_if_frozen(), but as long as both work bottom-up, either + * update_if_frozen() sees child's FROZEN cleared or we clear the + * parent's FROZEN later. No parent w/ !FROZEN children can be + * left FROZEN. + */ + while (clear_frozen && (freezer = parent_freezer(freezer))) { + spin_lock_irq(&freezer->lock); + freezer->state &= ~CGROUP_FROZEN; + clear_frozen = freezer->state & CGROUP_FREEZING; + spin_unlock_irq(&freezer->lock); + } } static void freezer_fork(struct task_struct *task) @@ -200,24 +240,47 @@ out: rcu_read_unlock(); } -/* - * We change from FREEZING to FROZEN lazily if the cgroup was only - * partially frozen when we exitted write. Caller must hold freezer->lock. +/** + * update_if_frozen - update whether a cgroup finished freezing + * @cgroup: cgroup of interest + * + * Once FREEZING is initiated, transition to FROZEN is lazily updated by + * calling this function. If the current state is FREEZING but not FROZEN, + * this function checks whether all tasks of this cgroup and the descendant + * cgroups finished freezing and, if so, sets FROZEN. + * + * The caller is responsible for grabbing RCU read lock and calling + * update_if_frozen() on all descendants prior to invoking this function. * * Task states and freezer state might disagree while tasks are being * migrated into or out of @cgroup, so we can't verify task states against * @freezer state here. See freezer_attach() for details. */ -static void update_if_frozen(struct freezer *freezer) +static void update_if_frozen(struct cgroup *cgroup) { - struct cgroup *cgroup = freezer->css.cgroup; + struct freezer *freezer = cgroup_freezer(cgroup); + struct cgroup *pos; struct cgroup_iter it; struct task_struct *task; + WARN_ON_ONCE(!rcu_read_lock_held()); + + spin_lock_irq(&freezer->lock); + if (!(freezer->state & CGROUP_FREEZING) || (freezer->state & CGROUP_FROZEN)) - return; + goto out_unlock; + /* are all (live) children frozen? */ + cgroup_for_each_child(pos, cgroup) { + struct freezer *child = cgroup_freezer(pos); + + if ((child->state & CGROUP_FREEZER_ONLINE) && + !(child->state & CGROUP_FROZEN)) + goto out_unlock; + } + + /* are all tasks frozen? */ cgroup_iter_start(cgroup, &it); while ((task = cgroup_iter_next(cgroup, &it))) { @@ -229,27 +292,32 @@ static void update_if_frozen(struct free * the usual frozen condition. */ if (!frozen(task) && !freezer_should_skip(task)) - goto notyet; + goto out_iter_end; } } freezer->state |= CGROUP_FROZEN; -notyet: +out_iter_end: cgroup_iter_end(cgroup, &it); +out_unlock: + spin_unlock_irq(&freezer->lock); } static int freezer_read(struct cgroup *cgroup, struct cftype *cft, struct seq_file *m) { - struct freezer *freezer = cgroup_freezer(cgroup); - unsigned int state; + struct cgroup *pos; - spin_lock_irq(&freezer->lock); - update_if_frozen(freezer); - state = freezer->state; - spin_unlock_irq(&freezer->lock); + rcu_read_lock(); - seq_puts(m, freezer_state_strs(state)); + /* update states bottom-up */ + cgroup_for_each_descendant_post(pos, cgroup) + update_if_frozen(pos); + update_if_frozen(cgroup); + + rcu_read_unlock(); + + seq_puts(m, freezer_state_strs(cgroup_freezer(cgroup)->state)); seq_putc(m, '\n'); return 0; } @@ -320,14 +388,39 @@ static void freezer_apply_state(struct f * @freezer: freezer of interest * @freeze: whether to freeze or thaw * - * Freeze or thaw @cgroup according to @freeze. + * Freeze or thaw @freezer according to @freeze. The operations are + * recursive - all descendants of @freezer will be affected. */ static void freezer_change_state(struct freezer *freezer, bool freeze) { + struct cgroup *pos; + /* update @freezer */ spin_lock_irq(&freezer->lock); freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF); spin_unlock_irq(&freezer->lock); + + /* + * Update all its descendants in pre-order traversal. Each + * descendant will try to inherit its parent's FREEZING state as + * CGROUP_FREEZING_PARENT. + */ + rcu_read_lock(); + cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) { + struct freezer *pos_f = cgroup_freezer(pos); + struct freezer *parent = parent_freezer(pos_f); + + /* + * Our update to @parent->state is already visible which is + * all we need. No need to lock @parent. For more info on + * synchronization, see freezer_post_create(). + */ + spin_lock_irq(&pos_f->lock); + freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING, + CGROUP_FREEZING_PARENT); + spin_unlock_irq(&pos_f->lock); + } + rcu_read_unlock(); } static int freezer_write(struct cgroup *cgroup, struct cftype *cft, @@ -390,12 +483,4 @@ struct cgroup_subsys freezer_subsys = { .attach = freezer_attach, .fork = freezer_fork, .base_cftypes = files, - - /* - * freezer subsys doesn't handle hierarchy at all. Frozen state - * should be inherited through the hierarchy - if a parent is - * frozen, all its children should be frozen. Fix it and remove - * the following. - */ - .broken_hierarchy = true, };