From patchwork Wed May 31 04:25:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raghavendra K T X-Patchwork-Id: 13261475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF196C77B73 for ; Wed, 31 May 2023 04:26:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E99B6B0078; Wed, 31 May 2023 00:26:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 872936B007B; Wed, 31 May 2023 00:26:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C650900002; Wed, 31 May 2023 00:26:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 588836B0078 for ; Wed, 31 May 2023 00:26:10 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2E2F3160166 for ; Wed, 31 May 2023 04:26:10 +0000 (UTC) X-FDA: 80849262900.15.2C47B18 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2040.outbound.protection.outlook.com [40.107.236.40]) by imf11.hostedemail.com (Postfix) with ESMTP id 2F85D40013 for ; Wed, 31 May 2023 04:26:05 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b="T/RB9ffY"; spf=pass (imf11.hostedemail.com: domain of Raghavendra.KodsaraThimmappa@amd.com designates 40.107.236.40 as permitted sender) smtp.mailfrom=Raghavendra.KodsaraThimmappa@amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1685507166; a=rsa-sha256; cv=pass; b=BrvQAu+X7nuygQK01oFBL91Pz6Mu6Ibb2O/hvYy5+6gokHjhTj6wFTHbeqJggRWspn5LTU A2HZPXfkRnbb1wRgbbxmO50bOz/yQuKAuIpKMYLnfX0SSkqOVa4+rBfrbj6r0AKTwZrk+i tZPbTq94DC08jNiWdQ9ApAE7bjHWBNw= ARC-Authentication-Results: i=2; imf11.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b="T/RB9ffY"; spf=pass (imf11.hostedemail.com: domain of Raghavendra.KodsaraThimmappa@amd.com designates 40.107.236.40 as permitted sender) smtp.mailfrom=Raghavendra.KodsaraThimmappa@amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685507166; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/As/PBtaSVbliupJmvoyzyddZIx9ICCL6mDp0EafHsY=; b=em7mCD1YvF1hZaJntfs20EoRkLe2fcOQy7kowtzQDoWeuaol4sFMrHcrgrKp2YuTYSMIMq +ixti8DoewTLW8eRSaC5o8OBTUTmGsAfW4i0T8PIqQ5N9PXMAQ9CkldSn/+2n1VYMphLz4 eM+hvQRiGD14SyaNZ8jWn9EyLKEq5io= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EHp0y3EzSUGFnFbzwUfdrviP/FdV1g7OFrk136YZRkQEeECUBKAvScArz08qikHKbLByV1+O6tCmWLs6nk5qE6Mk1UoVwQBv6hxTWyCemmmrJM9sbD5LhyOVUacFDmcEnAWcds8dblfbFXHaynGDjP8l0b3y/efOzkg8wKZbXMHUSp2Zbud5Y/Q2RHgnQWEhK7GPgA1ReQWO3vdU4g71EcxEeP+eJRWYaR3qItA25o1HqLGCAde2naKsMk3FlHWgg4rxSCXgJCtPEgKxmDQ25H8Qk4S6uWyrIPckS/BLke02ztHHXqNEbrehhuem6z1u/fCEcV/VdA+8yjsR7wXo8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/As/PBtaSVbliupJmvoyzyddZIx9ICCL6mDp0EafHsY=; b=iKHqmEbVrRRKBcXkLR4o3vkiwq7C1xtXaPoZ8ftUaSlslO/r5+6jzwJ3jP57R0sqhK3vhYM15requKkiMiAGpJ6QzM69qs/F7lHQyEkabTRwiSDYzTmmhfIC72BMfNExURiyC2GpKRLxJc4VJuRgBwvN0Shh8Vs+bzJUNCCc5lBrdxTvAg3uYw3mmPqXuIBU00MnYisKRi2aAFk8BqBUr5wd5opSy7IwoJLDfFYJJd9gjVB3hhMRcIcNB3icb5LVOhhc3wJlqxeOjZZsC8i042rz6hhKC519BloBc9UYdXWLBaoqzLBJiZ5MtjHp61ats1V6B8eWz0jXr09sVxdwIA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/As/PBtaSVbliupJmvoyzyddZIx9ICCL6mDp0EafHsY=; b=T/RB9ffYCCZszC7MZhhwsYlsfkDS4m+KtNIHfpOqpgbkehz2g6heD99DWz8jkcPArShroSFgAC7cwlsCDxCCzdulWPCvtMWGS55MM1DXmUkJi02wlHck0kpSHlviZ9XzTHtZ2YdWFQ3ynBrr2G7pCPAd8hnRn7DNpg1soCWwf2A= Received: from DM6PR11CA0067.namprd11.prod.outlook.com (2603:10b6:5:14c::44) by PH0PR12MB7886.namprd12.prod.outlook.com (2603:10b6:510:26e::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6433.18; Wed, 31 May 2023 04:26:02 +0000 Received: from DM6NAM11FT015.eop-nam11.prod.protection.outlook.com (2603:10b6:5:14c:cafe::7) by DM6PR11CA0067.outlook.office365.com (2603:10b6:5:14c::44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6433.23 via Frontend Transport; Wed, 31 May 2023 04:26:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT015.mail.protection.outlook.com (10.13.172.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6455.22 via Frontend Transport; Wed, 31 May 2023 04:26:02 +0000 Received: from BLR-L-RKODSARA.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Tue, 30 May 2023 23:25:58 -0500 From: Raghavendra K T To: , CC: Ingo Molnar , Peter Zijlstra , "Mel Gorman" , Andrew Morton , "David Hildenbrand" , , Juri Lelli , Vincent Guittot , Bharata B Rao , Aithal Srikanth , "kernel test robot" , Raghavendra K T Subject: [RFC PATCH V3 1/1] sched/numa: Fix disjoint set vma scan regression Date: Wed, 31 May 2023 09:55:26 +0530 Message-ID: <8581ca937d4064b3cd138845d5bd418580d177da.1685506205.git.raghavendra.kt@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT015:EE_|PH0PR12MB7886:EE_ X-MS-Office365-Filtering-Correlation-Id: b8ad0177-7e0e-427d-9012-08db618f24a2 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GyxF2ed525yNG2cyphFRUrSqCx5RMBxIBvhbbcCqGnIOeS5SCPHUlZQqyhiFXqb+285snsO+IC6x+tN4DSgdK+sXQPZWa9s0iXWXgUZqlKE/n/E5i03yh5JVbgLaD/Wr4Rfwf+y4iwB3u06ckKk4AUNfDGucL8YeWxEo4WMD6kDUyZyaIO8QycN5bRUCFh4D0mq+uAZ6n6BiWojv0c0suMSNRbrD/gtLjqEJtWjSm88ULGNjqQcoZ/nWuv5TiOzTAtBjYbJuHee0pVmTy2JCbZQTcjIcBE2/4RZnMsHe2uU6hF90ga54Q7lq+2TBstHng6cICXkgq8BO8HFcUqtbu/wAFjz/sJ3HXb1dOMav1z8QRuF/APYH+ZnvOq5IkvdG19drXZ7ZdSM18pOtpWAj874e/mDd+a/fpme90BTo3F6aBeph8xvYIjU9Mw3ZDt5WbCqypLwWT+qcmGcnmt5zaB9cj5etpHr4EoYupiOdlXUlHb3ztXC5AozWQfDOee3ll7Z35k9ghJfdLZkgkq0gd/EbXe7pFL/eIFeHkC4qakYHd+nSbo0Q6Gt9E57HWSytA4mBKNDk8a/64CrpubzBOr2GWMP1FesB2VkmyWziAYHVd1MJm1m9v8AjWO5idnRCWP+zspygKL6TDpk1E+JFwnpZxc4LnD2SuH+KzlzjH9MwTBl6FUbfQt1hQfQmrNmwrU5c2HdgFTsj8lYrWBAQdJ2SobJHIe6Eb89GuugREduAFif0w0Nps0HkhKTGSfEgEjzIKGRzHXG/iy6ElZkm6JfP8OkSYx4O0b+LL8INBmUvOZFEfC1B3ECKMbhC0BGT X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(136003)(376002)(39860400002)(346002)(396003)(451199021)(46966006)(40470700004)(36840700001)(36860700001)(70586007)(4326008)(54906003)(110136005)(356005)(82740400003)(70206006)(478600001)(316002)(6666004)(7696005)(41300700001)(40460700003)(186003)(81166007)(2906002)(40480700001)(26005)(16526019)(8676002)(7416002)(966005)(2616005)(82310400005)(83380400001)(36756003)(426003)(336012)(5660300002)(8936002)(47076005)(2101003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 May 2023 04:26:02.5127 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b8ad0177-7e0e-427d-9012-08db618f24a2 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT015.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB7886 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2F85D40013 X-Stat-Signature: 4zqxtae7jjurbwe4eabptp7nxowc61do X-Rspam-User: X-HE-Tag: 1685507165-446980 X-HE-Meta: U2FsdGVkX19o020XkjhpIp691sBu0SyV20SukoyKRrTOX/MI53gcC9B7k+D/R44c0WZIFKwV2+v2/OpSdo7BkUgQ/2BM86SMzwv2p2xAYGD9XSjoQLME4VXaQniK4DIZ6Du+JhclRddt7dEaxC/K6fSG4xuuhP93LmQORTgH5HI/GjTgAawPr0muAcYuxTa9u9QpoDOvX6opb5onFusbcG8dIyfXTqbijpDsmAo6TQP9+hZnRMhuLgEiHncv81KjVLd5BP9vZuqfkStPlDB/Cl/bwdMScXHvRBr9/R+O6YYDCid0rZ/jiF3SNpfxv0dBJP3TSJ0Al/gRoMPfEc4gguawJfCOhThCy3kWYYyIP9cg9bJyVTqjCgJJezjW8RpqtqEja11BY3U6MAPqrvLFIVR9X/Ba77HowJGF260QOnuIixYU/YsP7ZPknXT98s7TbI2jdZ2jMkiYqB4b4v4li6tYlAweqaOturErE6RSZn/qdGBUOCUbsVqdM2ATyiYA8PYKM1MRMzr0mSYyrD9LUGnq+6lsOJtPfPXVg3UIFCYzIDX9E/bpkd3pVMn02p1ZColMM6SOKemRHFyDI6IKhQg5uhGJEmkkWn3/pQsDn1A4mSqUBKIfBTQQG/sm8LxVVGLbRVkegDMnDfLQw+ONTQFOQbuNbFqT5VRbDfMLj2XOHUUWTwFdprBft8cmL4rb0piz0v75DZ0+RvbrWPgtfutmg7vspybSHtlNIrxv//ZUvpgXXyXSOIt6yb+2ltENZV7ssuLLSNKXJpcKrlQHs7rAxPzaQgU/6YJuwZvVdf+wHTeBAwByy9MiOOVC4PdU/TNZJQRWxbgHNXGl+prMy41EgQcvhodPD/EI4zuIVyO3GyT3tTkWP/lVZkzccIeRtylJ5obAEyTzTbuFwW7LssP5sD0sZMUkfq4N6jmnxv3lROOPZObo5cfr4RAblW+JcQzg/uKJ+baHWig5ntv KoOZ1+vr /KgJAiN99IBLuZ/jrYk5xuOKzP7A4oLoKpf2UUFC35PHDYQfKizwLK6i88kXWe647rD/IlVdm7l79tEqPES6SzF/HwfMAs12ItGu9K+AC5F4imyULb7J4RLk3MM/DaycLfrZbSLkdDCl4JX2jfd4Z1q4564pXvnfbL2mkD+knWfDEoded4cyUEwQCDa7hq/Hjqb5wQbPU8tfX2WbadSzwsAD5ShgHOHjjevEmAreKxh2YXjd1vW5g8cFwZVZrh5gEainPQ+lzmslBMrcDCXE4o5GK1I5166HeIFwb45/DbvQUFO9yBXSumJMPHRCfk7GYAOVeIyrM4T5RtrKtW8tECZTOpFufRGBZuoTBBEUMfLuZ1gcJ6J72p5i45Ds2bNN89LcCvK3s4UCAzVENhbKcdYcgyYJBMnkq0rM3rSpzo1VH/0Molj2U/6UY+5LNsuMgsNFvddPV+X5BfP82NcKOjiRDIA/2M3dD8sBKzAb0e0O0b3E2JT2N60+1w+w+2agfgkktBzgRp+oe9tawOsV4SpQdFshcsQj27PXrCS+WBeWoRgG/PQoy+aZW43OhhhAS/KAqyNozmmRTp+4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With the numa scan enhancements [1], only the threads which had previously accessed vma are allowed to scan. While this had improved significant system time overhead, there were corner cases, which genuinely need some relaxation. For e.g., 1) Concern raised by PeterZ, where if there are N partition sets of vmas belonging to tasks, then unfairness in allowing these threads to scan could potentially amplify the side effect of some of the vmas being left unscanned. 2) Below reports of LKP numa01 benchmark regression. Currently this is handled by allowing first two scanning unconditional as indicated by mm->numa_scan_seq. This is imprecise since for some benchmark vma scanning might itself start at numa_scan_seq > 2. Solution: Allow unconditional scanning of vmas of tasks depending on vma size. This is achieved by maintaining a per vma scan counter, where f(allowed_to_scan) = f(scan_counter < vma_size / scan_size) Result: numa01_THREAD_ALLOC result on 6.4.0-rc2 (that has numascan enhancement) base-numascan base base+fix real 1m1.507s 1m23.259s 1m2.632s user 213m51.336s 251m46.363s 220m35.528s sys 3m3.397s 0m12.492s 2m41.393s numa_hit 5615517 4560123 4963875 numa_local 5615505 4560024 4963700 numa_other 12 99 175 numa_pte_updates 1822797 493 1559111 numa_hint_faults 1307113 523 1469031 numa_hint_faults_local 612617 488 884829 numa_pages_migrated 694370 35 584202 Summary: Regression in base is recovered by allowing scanning as required. [1] https://lore.kernel.org/lkml/cover.1677672277.git.raghavendra.kt@amd.com/T/#t Fixes: fc137c0ddab2 ("sched/numa: enhance vma scanning logic") regression. Reported-by: Aithal Srikanth Reported-by: kernel test robot Closes: https://lore.kernel.org/lkml/db995c11-08ba-9abf-812f-01407f70a5d4@amd.com/T/ Signed-off-by: Raghavendra K T --- include/linux/mm_types.h | 1 + kernel/sched/fair.c | 31 ++++++++++++++++++++++++------- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..992e460a713e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -479,6 +479,7 @@ struct vma_numab_state { unsigned long next_scan; unsigned long next_pid_reset; unsigned long access_pids[2]; + unsigned int scan_counter; }; /* diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 373ff5f55884..4e71fb58085b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2931,17 +2931,30 @@ static void reset_ptenuma_scan(struct task_struct *p) static bool vma_is_accessed(struct vm_area_struct *vma) { unsigned long pids; + unsigned int vma_size; + unsigned int scan_threshold; + unsigned int scan_size; + + pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1]; + + if (test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids)) + return true; + + scan_size = READ_ONCE(sysctl_numa_balancing_scan_size); + /* vma size in MB */ + vma_size = (vma->vm_end - vma->vm_start) >> 20; + + /* Total scans needed to cover VMA */ + scan_threshold = vma_size / scan_size; + /* - * Allow unconditional access first two times, so that all the (pages) - * of VMAs get prot_none fault introduced irrespective of accesses. + * Allow the scanning of half of disjoint set's VMA to induce + * prot_none fault irrespective of accesses. * This is also done to avoid any side effect of task scanning * amplifying the unfairness of disjoint set of VMAs' access. */ - if (READ_ONCE(current->mm->numa_scan_seq) < 2) - return true; - - pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1]; - return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids); + scan_threshold = 1 + (scan_threshold >> 1); + return (vma->numab_state->scan_counter < scan_threshold); } #define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay) @@ -3058,6 +3071,8 @@ static void task_numa_work(struct callback_head *work) /* Reset happens after 4 times scan delay of scan start */ vma->numab_state->next_pid_reset = vma->numab_state->next_scan + msecs_to_jiffies(VMA_PID_RESET_PERIOD); + + vma->numab_state->scan_counter = 0; } /* @@ -3084,6 +3099,8 @@ static void task_numa_work(struct callback_head *work) vma->numab_state->access_pids[1] = 0; } + vma->numab_state->scan_counter++; + do { start = max(start, vma->vm_start); end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);