From patchwork Fri Jun 1 07:11:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Lu X-Patchwork-Id: 10442555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6AF03602BD for ; Fri, 1 Jun 2018 07:11:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 576A529089 for ; Fri, 1 Jun 2018 07:11:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 49CC329094; Fri, 1 Jun 2018 07:11:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8DC0229089 for ; Fri, 1 Jun 2018 07:11:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F2CB6B0007; Fri, 1 Jun 2018 03:11:19 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9A32D6B0008; Fri, 1 Jun 2018 03:11:19 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 844AE6B000A; Fri, 1 Jun 2018 03:11:19 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl0-f72.google.com (mail-pl0-f72.google.com [209.85.160.72]) by kanga.kvack.org (Postfix) with ESMTP id 4146D6B0007 for ; Fri, 1 Jun 2018 03:11:19 -0400 (EDT) Received: by mail-pl0-f72.google.com with SMTP id g6-v6so14860609plq.9 for ; Fri, 01 Jun 2018 00:11:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:date:from:to :cc:subject:message-id:references:mime-version:content-disposition :in-reply-to:user-agent; bh=vgH8fqp+r5huUqPAfMor4FHfzwO+ev87zEE+9nOEkv4=; b=byJBduzwa3vXsnr9h9fFlu9bknGkg15804RN4fvfFpvtzMoLkCFDDXOrCQfzfr9kWP Zpl8cgAYi2Nb7iSpjnONzRIVIY+GmkYjUReHzDCGC4t5fk5OIkCCWEYds5ScKtnefJqp ldQ+iQmYFOpuPUxb9cBEU4cvxtaRLdEKzw9PZNn4FSR+FlJCWmX/JzJD2yNQy73Fi/Rb xe470r5Y84Nv0xz37r/loc8mnMzIbrH3Xi0UP+TcOwGwVEY2SuBrvaX77T02NAEVIbhA QNp21GjXDoIvGKB1gZQo7EKXDQKCybz4/IiKNTbjMZUpDR71VM3AVbjaMpzlDJWdVrZW qv6Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: ALKqPweJFx5LQXodgLO0F3JNPFdfzn8tOQD231uzJimgBJURKRchmdSv OVSLtm8vybyGD32pqTuSRD6MvoZyDTH/h0vB8n71PxsxXmL8QPAtQ9txyR8fgs+1mmiC8I88BQW zQDjNCOhjamTvv+F1FemBefChOEbfuWKUqRAkvXr0l+3pe4XXNALc4/KiJlf10uonFQ== X-Received: by 2002:a62:df12:: with SMTP id u18-v6mr9802116pfg.230.1527837078927; Fri, 01 Jun 2018 00:11:18 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJZ7j3ftZd5NvwDFhBGTYJc2Jo8XXkf0TzqEWreTR0PZ604ofdDhkDobZ7zRdJWKr2VqR6Z X-Received: by 2002:a62:df12:: with SMTP id u18-v6mr9802080pfg.230.1527837078080; Fri, 01 Jun 2018 00:11:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527837078; cv=none; d=google.com; s=arc-20160816; b=Izt0d5jwVNjwWljBtG5l7D5O66+/hPltiD1AHF0oXMZhTEVur9dPzsfV+/+CofhNa5 mbDiDrhabZlCh4l2cGia6zoe26vMoDykvpj2EDGDz6IErRp24GhEUoUGRvIg9kWv1mhK l54bzftNhhz0766uV99gb3obIbyESlGTYuTqF3sWinozKfp4SWiy9y8HYLO0bal0G1KF pFrndHmODfT0H/XwTQ176uKxK9wwzsUHwvq1IJi5Ci1IW4y1IbfXbQPerpntpvQmR49R lsdu6/ZcenivHIpdikgQ0Q/hLKEhBzFYK/5EKlhCucDcfooGwPjefoF25YHMLWzycVy0 xiGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:arc-authentication-results; bh=vgH8fqp+r5huUqPAfMor4FHfzwO+ev87zEE+9nOEkv4=; b=bfD9pAc0M3xEVFCk44rkBsjKfTcDIvYKwfswdLRvsQCTmvzlfv/2ORFnHa1kBD2KFV ZzVE482j98+s45Kk+UYr7czqtr2xAQAeWT/imrKZNd20iHjZJY/BoZWBS/UNYSbyWL4G IsVeEJ3P8bxsT88OIEBkMmGgPaKh2c8NEYsuWFTVujSy6rR0/euRIIwK0fehnOkQWvzV OPIiH0tcL9ugY9BUknFjSbbrh/OSkq1YvEByF4O825obI6+6toY0g/oMqpAr18pK6+Th 0txtiFQvM6E4P6CJljPjpPr1DeUE8uqgzm7Y7G5y05Q0tZUkSEB1QZpcBzWwrThKcm89 IDZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id g17-v6si37687612plo.355.2018.06.01.00.11.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 01 Jun 2018 00:11:18 -0700 (PDT) Received-SPF: pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) client-ip=134.134.136.20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aaron.lu@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=aaron.lu@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Jun 2018 00:11:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,464,1520924400"; d="scan'208";a="52582180" Received: from aaronlu.sh.intel.com (HELO intel.com) ([10.239.159.135]) by fmsmga002.fm.intel.com with ESMTP; 01 Jun 2018 00:11:15 -0700 Date: Fri, 1 Jun 2018 15:11:15 +0800 From: Aaron Lu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Johannes Weiner , kernel test robot , lkp@01.org, Tejun Heo , Michal Hocko , Andrew Morton Subject: [RFC PATCH] mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the same cacheline Message-ID: <20180601071115.GA27302@intel.com> References: <20180508053451.GD30203@yexl-desktop> <20180508172640.GB24175@cmpxchg.org> <20180528085201.GA2918@intel.com> <20180529084816.GS27180@dhcp22.suse.cz> <20180530082752.GF14785@intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20180530082752.GF14785@intel.com> User-Agent: Mutt/1.9.5 (2018-04-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The LKP robot found a 27% will-it-scale/page_fault3 performance regression regarding commit e27be240df53("mm: memcg: make sure memory.events is uptodate when waking pollers"). What the test does is: 1 mkstemp() a 128M file on a tmpfs; 2 start $nr_cpu processes, each to loop the following: 2.1 mmap() this file in shared write mode; 2.2 write 0 to this file in a PAGE_SIZE step till the end of the file; 2.3 unmap() this file and repeat this process. 3 After 5 minutes, check how many loops they managed to complete, the higher the better. The commit itself looks innocent enough as it merely changed some event counting mechanism and this test didn't trigger those events at all. Perf shows increased cycles spent on accessing root_mem_cgroup->stat_cpu in count_memcg_event_mm()(called by handle_mm_fault()) and in __mod_memcg_state() called by page_add_file_rmap(). So it's likely due to the changed layout of 'struct mem_cgroup' that either make stat_cpu falling into a constantly modifying cacheline or some hot fields stop being in the same cacheline. I verified this by moving memory_events[] back to where it was: diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d99b71b..c767db1 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -205,7 +205,6 @@ struct mem_cgroup { int oom_kill_disable; /* memory.events */ - atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; struct cgroup_file events_file; /* protect arrays of thresholds */ @@ -238,6 +237,7 @@ struct mem_cgroup { struct mem_cgroup_stat_cpu __percpu *stat_cpu; atomic_long_t stat[MEMCG_NR_STAT]; atomic_long_t events[NR_VM_EVENT_ITEMS]; + atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; unsigned long socket_pressure; And performance restored. Later investigation found that as long as the following 3 fields moving_account, move_lock_task and stat_cpu are in the same cacheline, performance will be good. To avoid future performance surprise by other commits changing the layout of 'struct mem_cgroup', this patch makes sure the 3 fields stay in the same cacheline. One concern of this approach is, moving_account and move_lock_task could be modified when a process changes memory cgroup while stat_cpu is a always read field, it might hurt to place them in the same cacheline. I assume it is rare for a process to change memory cgroup so this should be OK. LINK: https://lkml.kernel.org/r/20180528114019.GF9904@yexl-desktop Reported-by: kernel test robot Cc: Johannes Weiner Cc: Michal Hocko Cc: Tejun Heo Signed-off-by: Aaron Lu --- include/linux/memcontrol.h | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d99b71bc2c66..c79972a78d6c 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -158,6 +158,15 @@ enum memcg_kmem_state { KMEM_ONLINE, }; +#if defined(CONFIG_SMP) +struct memcg_padding { + char x[0]; +} ____cacheline_internodealigned_in_smp; +#define MEMCG_PADDING(name) struct memcg_padding name; +#else +#define MEMCG_PADDING(name) +#endif + /* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide @@ -225,17 +234,23 @@ struct mem_cgroup { * mem_cgroup ? And what type of charges should we move ? */ unsigned long move_charge_at_immigrate; + /* taken only while moving_account > 0 */ + spinlock_t move_lock; + unsigned long move_lock_flags; + + MEMCG_PADDING(_pad1_); + /* * set > 0 if pages under this cgroup are moving to other cgroup. */ atomic_t moving_account; - /* taken only while moving_account > 0 */ - spinlock_t move_lock; struct task_struct *move_lock_task; - unsigned long move_lock_flags; /* memory.stat */ struct mem_cgroup_stat_cpu __percpu *stat_cpu; + + MEMCG_PADDING(_pad2_); + atomic_long_t stat[MEMCG_NR_STAT]; atomic_long_t events[NR_VM_EVENT_ITEMS];