From patchwork Tue May 28 16:40:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 13677024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57705C25B7C for ; Tue, 28 May 2024 16:41:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA37F6B0095; Tue, 28 May 2024 12:41:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C52966B0098; Tue, 28 May 2024 12:41:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B19D16B0099; Tue, 28 May 2024 12:41:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 91A9D6B0095 for ; Tue, 28 May 2024 12:41:04 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F0997A02DC for ; Tue, 28 May 2024 16:41:03 +0000 (UTC) X-FDA: 82168369206.14.F7BCDA8 Received: from out-175.mta1.migadu.com (out-175.mta1.migadu.com [95.215.58.175]) by imf15.hostedemail.com (Postfix) with ESMTP id 84C4FA000D for ; Tue, 28 May 2024 16:41:01 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=g7JDLu4l; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf15.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716914461; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=bjv8TNU4rz5hziSH1zaNvZ7GqPhfO0iO59ilrtodBaM=; b=LoUyHpi3dwhORU+2S0ctKSRirfzLK9H1fjcMVxhIyGnBhe51ww0OxCL4iaomtgUIBjVpet 8dJQxbU1OCIiOSKiSIsG/X9zrbdABdUnVuMM+U9+sTllDkFqicNx5HhYVczGvI12i7o7AF ysft1+sR1g7405G1fhcR3KtVBl0/5I8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=g7JDLu4l; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf15.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.175 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716914461; a=rsa-sha256; cv=none; b=JvIj7MesWttA5hRJznlcQUBvAJcJdBywWjsmOKqD9idfsiTpn8EQAv+rfhuiYwr3aZISEC hKHDFnt5YaDdk/+qak4A9upsBjkiPDCLanflyIjKkBOmBF9wvliFWYYbmTC7S9AJ9Ccc2f NKv4PpNtsX5KEDro2/nZKm/0oA5vcSs= X-Envelope-To: akpm@linux-foundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1716914459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=bjv8TNU4rz5hziSH1zaNvZ7GqPhfO0iO59ilrtodBaM=; b=g7JDLu4lkMTPRMYnCvZtv9QOeBBMy35Cf/2rer78ItzGcWsajNv21QFYe3i1ZTXk/Rh3GO eUwELKOYoE/ocQp1p5X9VkG7eARAVjnsb8jQqbwDWxEC/9aS5AcOmkMhoAqgBmuFYs+2rG 4YNicB0958XasW2/4J2ozFFeszlo2Yg= X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: mhocko@kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Envelope-To: muchun.song@linux.dev X-Envelope-To: yosryahmed@google.com X-Envelope-To: ying.huang@intel.com X-Envelope-To: feng.tang@intel.com X-Envelope-To: fengwei.yin@intel.com X-Envelope-To: oliver.sang@intel.com X-Envelope-To: kernel-team@meta.com X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Yosry Ahmed Cc: ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com, oliver.sang@intel.com, kernel-team@meta.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] memcg: rearrange fields of mem_cgroup_per_node Date: Tue, 28 May 2024 09:40:50 -0700 Message-ID: <20240528164050.2625718-1-shakeel.butt@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 84C4FA000D X-Stat-Signature: jfyxtdbp1ou4u371q5tcr3dwyra6wddu X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1716914461-941808 X-HE-Meta: U2FsdGVkX18bdP5GI/xjPlxQx833ZzbtC+XMf1iOb6AP1kYnEftJ1T9x6G7mM1P0SvstrW73LXz3vEn4veA8i2Er4Je1OVanL33VOBZHfJ5dgxpLIqoGTNd/xJOCs575pjg2Xk+zItAMxTrZqi5elChvazC00sZF0R+DH/12kvWCHbEB9JPUBkEIR84E9OMkrQBWFIolI+RydSbioNl8YYoshwyG5b/f8ZXQiEJROsE3mwQpEnkwC0xjZTtS5A79CnuKNfxzneKyVjpvfkYxRcV2ih8JMeIMGHu2GxkVw/ZrT+Ob4g4hM4pap/tmvrJLr7XNtnqs+aQwgSl42cEany1qwmT7s7w0G+jqNlZa0zg1nWje51g8xxA7fI/vPo1gmabdIsWZrFIWvVt3AK/49UXXbkTd/GmDblukcSteNiSBTDrhXi1OJeA9uWRofs4gcVbRK4RviqMKa+3SA6NVnYho8pvTrRebyervHaO16hv6WxLTMs18QG/q4AfEuOcOM8ulaa0OxcU5aMS7Fa+vdkvHFLU2ruOwe5QNEPBHlLMxfpJ742iZrehu/xrmJ+hXlCvOmN4g4jeP8svd/G+7AilAV+SVT6Zma/grGDRC4PzeuSsV+3gPVqp+nzJ/sb/CIdFl7aeYHgpTw7HsacVCUEmuOqNpIXKXdoFlmG7t6PErFx4mQxsSchMvzQLxqcz+ympmzWNnyQ5WHb8F262F2vUo6jXuwur5l9Z0HHw+8V9KoUEk8jlSPmQJ4kj+8PDejQlfuVAFiDNzpiaxSmLX2snRAqPohPOMNA1DhSXO+yajBwrP0NmyZ8bhHMP0/zf9lIxcD/oNHE9dduZQu6wkBN4TSZ57G7pjT4GU50Z3vcItFFMED3c/hjJeryVvJd7WZFs6cYmObuZaHtHZX4XRt9ajngnjiJvLfmD296vUAcrYAZpsIv2JvwnH+ZRA80g0owB94xLfpdgNIZLxBi6 sbc7zA7v JrvvxwaS0PyTxYdmqytfVgUEnYjiR84t+WhC5dGOicR9riZK83aDBOAyiOkUpYLCDMS2ygN7JdG8Sn+/zpKBpcUemDazQDQCJIhjAnRcMoMyyIb9W9gGUY/Zo1S/CA3i0ClRC/AKbDIZ1WhwuVmQi296Kcn6ozw8I71mTJoSLjibGGFw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kernel test robot reported [1] performance regression for will-it-scale test suite's page_fault2 test case for the commit 70a64b7919cb ("memcg: dynamically allocate lruvec_stats"). After inspection it seems like the commit has unintentionally introduced false cache sharing. After the commit the fields of mem_cgroup_per_node which get read on the performance critical path share the cacheline with the fields which get updated often on LRU page allocations or deallocations. This has caused contention on that cacheline and the workloads which manipulates a lot of LRU pages are regressed as reported by the test report. The solution is to rearrange the fields of mem_cgroup_per_node such that the false sharing is eliminated. Let's move all the read only pointers at the start of the struct, followed by memcg-v1 only fields and at the end fields which get updated often. Experiment setup: Ran fallocate1, fallocate2, page_fault1, page_fault2 and page_fault3 from the will-it-scale test suite inside a three level memcg with /tmp mounted as tmpfs on two different machines, one a single numa node and the other one, two node machine. $ ./[testcase]_processes -t $NR_CPUS -s 50 Results for single node, 52 CPU machine: Testcase base with-patch fallocate1 1031081 1431291 (38.80 %) fallocate2 1029993 1421421 (38.00 %) page_fault1 2269440 3405788 (50.07 %) page_fault2 2375799 3572868 (50.30 %) page_fault3 28641143 28673950 ( 0.11 %) Results for dual node, 80 CPU machine: Testcase base with-patch fallocate1 2976288 3641185 (22.33 %) fallocate2 2979366 3638181 (22.11 %) page_fault1 6221790 7748245 (24.53 %) page_fault2 6482854 7847698 (21.05 %) page_fault3 28804324 28991870 ( 0.65 %) Fixes: 70a64b7919cb ("memcg: dynamically allocate lruvec_stats") Reported-by: kernel test robot Reviewed-by: Yosry Ahmed Reviewed-by: Roman Gushchin Signed-off-by: Shakeel Butt --- Changes since v1: - Added comment as requested by Yosry. - Removed the Closed tag to keep the regression open and keep improving. include/linux/memcontrol.h | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3d1599146afe..7403dd5926eb 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -96,23 +96,29 @@ struct mem_cgroup_reclaim_iter { * per-node information in memory controller. */ struct mem_cgroup_per_node { - struct lruvec lruvec; + /* Keep the read-only fields at the start */ + struct mem_cgroup *memcg; /* Back pointer, we cannot */ + /* use container_of */ struct lruvec_stats_percpu __percpu *lruvec_stats_percpu; struct lruvec_stats *lruvec_stats; - - unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; - - struct mem_cgroup_reclaim_iter iter; - struct shrinker_info __rcu *shrinker_info; + /* + * Memcg-v1 only stuff in middle as buffer between read mostly fields + * and update often fields to avoid false sharing. Once v1 stuff is + * moved in a separate struct, an explicit padding is needed. + */ + struct rb_node tree_node; /* RB tree node */ unsigned long usage_in_excess;/* Set to the value by which */ /* the soft limit is exceeded*/ bool on_tree; - struct mem_cgroup *memcg; /* Back pointer, we cannot */ - /* use container_of */ + + /* Fields which get updated often at the end. */ + struct lruvec lruvec; + unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; + struct mem_cgroup_reclaim_iter iter; }; struct mem_cgroup_threshold {