From patchwork Wed Jul 22 08:39:41 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KAMEZAWA Hiroyuki X-Patchwork-Id: 36725 Received: from hormel.redhat.com (hormel1.redhat.com [209.132.177.33]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n6M8fqE1010207 for ; Wed, 22 Jul 2009 08:41:52 GMT Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com [10.8.4.110]) by hormel.redhat.com (Postfix) with ESMTP id 0F4A8618D93; Wed, 22 Jul 2009 04:41:51 -0400 (EDT) Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by listman.util.phx.redhat.com (8.13.1/8.13.1) with ESMTP id n6M8fmH1027083 for ; Wed, 22 Jul 2009 04:41:48 -0400 Received: from mx1.redhat.com (mx1.redhat.com [172.16.48.31]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n6M8flRU013866; Wed, 22 Jul 2009 04:41:47 -0400 Received: from fgwmail7.fujitsu.co.jp (fgwmail7.fujitsu.co.jp [192.51.44.37]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n6M8fX0G024867; Wed, 22 Jul 2009 04:41:34 -0400 Received: from m2.gw.fujitsu.co.jp ([10.0.50.72]) by fgwmail7.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id n6M8fXqA000967 (envelope-from kamezawa.hiroyu@jp.fujitsu.com); Wed, 22 Jul 2009 17:41:33 +0900 Received: from smail (m2 [127.0.0.1]) by outgoing.m2.gw.fujitsu.co.jp (Postfix) with ESMTP id B7DFE45DE55; Wed, 22 Jul 2009 17:41:32 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.fujitsu.co.jp [10.0.50.92]) by m2.gw.fujitsu.co.jp (Postfix) with ESMTP id 8CB7645DE51; Wed, 22 Jul 2009 17:41:32 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 67A89E78001; Wed, 22 Jul 2009 17:41:32 +0900 (JST) Received: from m106.s.css.fujitsu.com (m106.s.css.fujitsu.com [10.249.87.106]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id 0CDDE1DB803A; Wed, 22 Jul 2009 17:41:29 +0900 (JST) Received: from m106.css.fujitsu.com (m106 [127.0.0.1]) by m106.s.css.fujitsu.com (Postfix) with ESMTP id C69C65B8825; Wed, 22 Jul 2009 17:41:28 +0900 (JST) Received: from WIN-WAU6SZB64RR (unknown [10.124.100.143]) by m106.s.css.fujitsu.com (Postfix) with SMTP id 3BA565B8827; Wed, 22 Jul 2009 17:41:28 +0900 (JST) X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Received: from WIN-WAU6SZB64RR[10.124.100.143] by WIN-WAU6SZB64RR (FujitsuOutboundMailChecker v1.3.1/9992[10.124.100.143]); Wed, 22 Jul 2009 17:39:51 +0900 (JST) Date: Wed, 22 Jul 2009 17:39:41 +0900 From: KAMEZAWA Hiroyuki To: Ryo Tsuruta Message-Id: <20090722173941.7608387e.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20090722.172843.193696974.ryov@valinux.co.jp> References: <20090721155636.GB25127@balbir.in.ibm.com> <20090722102058.c1cf731b.kamezawa.hiroyu@jp.fujitsu.com> <20090722110739.e00c0f18.kamezawa.hiroyu@jp.fujitsu.com> <20090722.172843.193696974.ryov@valinux.co.jp> Organization: FUJITSU Co. LTD. Mime-Version: 1.0 X-RedHat-Spam-Score: -101.531 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 X-Scanned-By: MIMEDefang 2.63 on 172.16.48.31 X-loop: dm-devel@redhat.com Cc: xen-devel@lists.xensource.com, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, dm-devel@redhat.com, agk@redhat.com, balbir@linux.vnet.ibm.com Subject: [dm-devel] Re: [PATCH 3/9] blkio-cgroup-v9: The new page_cgroup framework X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.5 Precedence: junk Reply-To: device-mapper development List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com On Wed, 22 Jul 2009 17:28:43 +0900 (JST) Ryo Tsuruta wrote: > > But, following is more straightforward. (and what you do is not different > > from this.) > > == > > struct page { > > ..... > > #ifdef CONFIG_BLOCKIO_CGROUP > > void *blockio_cgroup; > > #endif > > } > > == > > This increases the size of struct page. Could I get a consensus on > this approach? > Just God knows ;) To be honest, what I expected in these days for people of blockio cgroup is like following for getting room for themselves. I'm now thinking to do this by myself and offer a room for you because terrible bugs have been gone now and I have time. Balbir, if you have no concerns, I'll clean up and send this to mmotm. (maybe softlimit accesses pc->page and I have to update this.) Note: This is _not_ tested at all. Thanks, -Kame == From: KAMEZAWA Hiroyuki page cgroup has pointer to memmap it stands for. But, page_cgroup->page is not accessed in fast path and not necessary and not modified. Then, it's not to be maintained as pointer. This patch removes "page" from page_cgroup and add page_cgroup_to_page() function. This uses some amount of FLAGS bit as struct page does. As side effect, nid, zid can be obtaind from page_cgroup itself. Signed-off-by: KAMEZAWA Hiroyuki --- include/linux/page_cgroup.h | 19 ++++++++++++++++--- mm/page_cgroup.c | 42 ++++++++++++++++++++++++++++++++---------- 2 files changed, 48 insertions(+), 13 deletions(-) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel Index: mmotm-2.6.31-Jul16/include/linux/page_cgroup.h =================================================================== --- mmotm-2.6.31-Jul16.orig/include/linux/page_cgroup.h +++ mmotm-2.6.31-Jul16/include/linux/page_cgroup.h @@ -13,7 +13,7 @@ struct page_cgroup { unsigned long flags; struct mem_cgroup *mem_cgroup; - struct page *page; + /* block io tracking will use extra unsigned long bytes */ struct list_head lru; /* per cgroup LRU list */ }; @@ -32,7 +32,12 @@ static inline void __init page_cgroup_in #endif struct page_cgroup *lookup_page_cgroup(struct page *page); +struct page *page_cgroup_to_page(struct page_cgroup *page); +/* + * TOP MOST (NODE_SHIFT+ZONE_SHIFT or SECTION_SHIFT bits of "flags" are used + * for detecting pfn as struct page does. + */ enum { /* flags for mem_cgroup */ PCG_LOCK, /* page cgroup is locked */ @@ -71,14 +76,22 @@ CLEARPCGFLAG(AcctLRU, ACCT_LRU) TESTPCGFLAG(AcctLRU, ACCT_LRU) TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU) +#ifdef NODE_NOT_IN_PAGE_FLAGS static inline int page_cgroup_nid(struct page_cgroup *pc) { - return page_to_nid(pc->page); + struct page *page= page_cgroup_to_page(pc); + return page_to_nid(page); } +#else +static inline int page_cgroup_nid(struct page_cgroup *pc) +{ + return (pc->flags >> NODES_PGSHIFT) & NODES_MASK; +} +#endif static inline enum zone_type page_cgroup_zid(struct page_cgroup *pc) { - return page_zonenum(pc->page); + return (pc->flags >> ZONEID_PGSHIFT) & ZONEID_MASK; } static inline void lock_page_cgroup(struct page_cgroup *pc) Index: mmotm-2.6.31-Jul16/mm/page_cgroup.c =================================================================== --- mmotm-2.6.31-Jul16.orig/mm/page_cgroup.c +++ mmotm-2.6.31-Jul16/mm/page_cgroup.c @@ -13,9 +13,12 @@ static void __meminit __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn) { - pc->flags = 0; + unsigned long flags; + pc->mem_cgroup = NULL; - pc->page = pfn_to_page(pfn); + /* Copy NODE/ZONE/SECTION information from struct page */ + flags = pfn_to_page(pfn)->flags; + pc->flags = flags & ~((1 << __NR_PAGEFLAGS) - 1); INIT_LIST_HEAD(&pc->lru); } static unsigned long total_usage; @@ -42,6 +45,18 @@ struct page_cgroup *lookup_page_cgroup(s return base + offset; } +struct page *page_cgroup_to_page(struct page_cgroup *pc) +{ + int nid = (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; + unsigned long pfn, offset; + + offset = pc - NODE_DATA(nid)->node_page_cgroup + pfn = NODE_DATA(nid)->node_start_pfn + offset; + + return pfn_to_page(pfn); +} + + static int __init alloc_node_page_cgroup(int nid) { struct page_cgroup *base, *pc; @@ -104,6 +119,18 @@ struct page_cgroup *lookup_page_cgroup(s return section->page_cgroup + pfn; } +struct page *page_cgroup_to_page(struct page_cgroup *pc) +{ + unsigned long pfn, sectionid; + struct mem_section *section; + + sectionid = (pc->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; + section = __nr_to_section(sectionid); + + pfn = pc - section->page_cgroup; + return pfn_to_page(pfn); +} + /* __alloc_bootmem...() is protected by !slab_available() */ static int __init_refok init_section_page_cgroup(unsigned long pfn) { @@ -128,15 +155,10 @@ static int __init_refok init_section_pag } } else { /* - * We don't have to allocate page_cgroup again, but - * address of memmap may be changed. So, we have to initialize - * again. + * We don't have to allocate page_cgroup again, and we don't + * take care of address of memmap. */ - base = section->page_cgroup + pfn; - table_size = 0; - /* check address of memmap is changed or not. */ - if (base->page == pfn_to_page(pfn)) - return 0; + return 0; } if (!base) {