From patchwork Wed Feb 8 07:35:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bharata B Rao X-Patchwork-Id: 13132535 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49F20C636CC for ; Wed, 8 Feb 2023 07:36:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 995596B0072; Wed, 8 Feb 2023 02:36:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9460E6B0073; Wed, 8 Feb 2023 02:36:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BF9F6B0074; Wed, 8 Feb 2023 02:36:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6AA416B0072 for ; Wed, 8 Feb 2023 02:36:39 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 431121C6505 for ; Wed, 8 Feb 2023 07:36:39 +0000 (UTC) X-FDA: 80443317318.14.FE12B22 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2084.outbound.protection.outlook.com [40.107.92.84]) by imf29.hostedemail.com (Postfix) with ESMTP id 4D6E9120019 for ; Wed, 8 Feb 2023 07:36:36 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=JzRVy5KR; spf=pass (imf29.hostedemail.com: domain of bharata@amd.com designates 40.107.92.84 as permitted sender) smtp.mailfrom=bharata@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1675841796; a=rsa-sha256; cv=pass; b=YFJHaOBwGwzilQqGNJb5R7fnmO6Q7eMADUkbRLExKMGxVHRgdVhNVLJZkqnCZqGJfkKikk uj7oIMBOMJC7y6I8f2vsH8gNF2/nVZWGghrgzUSNO/oM0KF5XYDK8bwKE0FJZTwOs3+ErR Vj9RkoIEu2PuTvj0q/sKiS7Zp2K4sek= ARC-Authentication-Results: i=2; imf29.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=JzRVy5KR; spf=pass (imf29.hostedemail.com: domain of bharata@amd.com designates 40.107.92.84 as permitted sender) smtp.mailfrom=bharata@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675841796; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dFT8GmvQmeiy03Y13HF23aln8UNuD5lUUb3DZDujpbg=; b=VWRaR95RoKk/jwxqqm0jyIi4oMeApigGZpA6UyEncekJ70Jt4v1MinhJ0wniiJ1EYKkgAo 3C60JUdq3e0MdWMz+skGmll/5Lwa358hChtSDlyifwnAk69K4OXwZxX8568XRdZwDbX7j2 xQ1ZZ+pvp7R+zgk8AyxzZiCOVICs0Zs= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HBSPo9JHpVMYszezfhlJ96kG38TJDgblnmVJB0bAzqc7K1We5c0wtYwL0rQ85EsAoij0PupzZuihvdKxcluZGSFoFEdcYBx3PUdIwzvKgLSVd73vVGtEmrG+/mIS8AmzGOlH2BtGsmrK+cExNl2op9uHMK2//2Z6dUZjJroGThdN3t0DV4cGHWWsn5wV19UCLwBFWuKhUPf9f/ymYF5B+WOSBeJsACa9LwQee96ERuIH8aKswfU7G+WZKZFxZRCL7LfUtUiqZUOuTAbb1jNvIZ76M6Rtyw0+1ZFRWkvzOTjicSRK+Ugqodt00mUNlBUCGRuwdVtDryC6QVsr+R4f/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dFT8GmvQmeiy03Y13HF23aln8UNuD5lUUb3DZDujpbg=; b=OoS46GgIqIe0ssaEvWKothbSVB1HvtH0YqGB2+4ApxJ8NXewnXK4WuUzNp9wExC2/vjRMnlLHpx2JODvfQBOrwLg5D8DDLlLmzFPylbxTI6sRGpxF1tTu4S5lPifM4918lLhuZh2UmoHMTCbNQiBqpxrg4Bdxphph3N8HIFOxKKIF6etvnf5caV7bn71p8fxyed+S1iJEoesa5PpTDgQBpt1VMGBcS8YLxfx+Rf8hJPcCu1n7h2Ogl8TzkMO1AkaLfgruosykorqPZB8GRIAKqYytF3+F6yQrdXBWwziei0lRDspq/ZjPoo8VPHUjnLVJr8KiXvps2XY7Ayu7gXHlQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dFT8GmvQmeiy03Y13HF23aln8UNuD5lUUb3DZDujpbg=; b=JzRVy5KR2SH7TfRUK4X3jpGlJI+KdceYJxW8qtF+Mv1vu6yEEcFSfwOaz4FYOBk7rxw/ZQ+G06yz5PSPtnAgI3b5EevJMnpIpZFc4fBeolyKbCHGRziDqGmba9y1cS8Mmsa2bNz/T2JbShswKOFmfEBNJV6G1OqaRdOKNcz7Si4= Received: from BN0PR02CA0057.namprd02.prod.outlook.com (2603:10b6:408:e5::32) by CH0PR12MB5252.namprd12.prod.outlook.com (2603:10b6:610:d3::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17; Wed, 8 Feb 2023 07:36:33 +0000 Received: from BN8NAM11FT112.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e5:cafe::c4) by BN0PR02CA0057.outlook.office365.com (2603:10b6:408:e5::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.36 via Frontend Transport; Wed, 8 Feb 2023 07:36:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT112.mail.protection.outlook.com (10.13.176.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:32 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 8 Feb 2023 01:36:28 -0600 From: Bharata B Rao To: , CC: , , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH 1/5] x86/ibs: In-kernel IBS driver for page access profiling Date: Wed, 8 Feb 2023 13:05:29 +0530 Message-ID: <20230208073533.715-2-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230208073533.715-1-bharata@amd.com> References: <20230208073533.715-1-bharata@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT112:EE_|CH0PR12MB5252:EE_ X-MS-Office365-Filtering-Correlation-Id: aac0a47c-063a-4ed3-3bd1-08db09a73364 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SBYK1lOWWJjQh79+FWgTdBqK9l3T9YWuVQ6SnuMeTcovWCPNTk175ShmVZuN/LFuBcB39nW+HRsn0D1xxR78+5Y6v61gJxakXCbNUoZCnNa9J4qWh36WR5dvf87GjtqAgBEfTMxo9EIkj9cDmOCWl15xL1gua1bdM4uVC5BwXCUaphPVq0cz9dFdakApzsrizMEBLErys5yCCah+DqX3LV6SMFa80Lk0uNpnWezB8NdHnAXPExHY8gIAfJtf3G3dGmeTPe4VN4Yic8PGjVeieA4eriGaAFOtFMV1CrXnJCLuu69/hieIBgGHJzRp6qlYJEh71Z2k0OpefCSxvAdbPpTWp0FC/utdFS8RBDvjhwul6i4/kL6d7NJmxfNAbHMSNTivkGsM5+DcF5a8Y4hobuODOg4um770a2b1cgTD+zILMTEPVCgmna0YWnHZwJ/QUGVkono23TwQE80ItCi1fw7xrs6sluHGuSzeqyFjRs0aTxaHYXtXekAWn9Rw7W6lT3xNQV6gQ2swXup68PaYn/1TRaybEJOWChmrVZNgDS7bZ2EhW2SoBY3/nmhTeoyggw/ImI37PExA+q/t5cB+nqDIUfnYIlST/KpT/wmjiDEOAAnxGu+g6q62N9YJI5TuRJK9NiCvwwrelQjJzQ8g70jOZ0weANAuDv83m6z+/ywpJcx2vxB/WahCY7Qz2m90FoGauwMzFNfWpTCbn38pY1JZSfS4ME7sTpK5Go5U+NsmvEfLozNbgZXOQlPz4WDgc9BoO185BhCJzcBPwBea4o567O5zZOknTwS+TjJySlbBoh8ZQqmQzr7dqg+EgdXQ X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230025)(4636009)(136003)(396003)(376002)(346002)(39860400002)(451199018)(46966006)(36840700001)(40470700004)(40480700001)(36756003)(82740400003)(8676002)(1076003)(4326008)(7696005)(86362001)(478600001)(47076005)(110136005)(54906003)(316002)(40460700003)(966005)(70586007)(26005)(70206006)(186003)(16526019)(7416002)(5660300002)(426003)(36860700001)(336012)(83380400001)(2906002)(81166007)(8936002)(82310400005)(356005)(41300700001)(2616005)(36900700001)(2101003);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2023 07:36:32.8983 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: aac0a47c-063a-4ed3-3bd1-08db09a73364 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT112.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB5252 X-Rspam-User: X-Rspamd-Queue-Id: 4D6E9120019 X-Rspamd-Server: rspam01 X-Stat-Signature: g5tfduu87um1j9hk8a1ar4ozxqu8p3ip X-HE-Tag: 1675841796-872038 X-HE-Meta: U2FsdGVkX1+xgjcIqHxQ85zMkVyWS6Je37xIiIwnPo7dFtZL4Uo3Gi1mS8gWFK8RVkOnr7o3TL8MQhqF1Rx4z66ehnukR9RrVEmRrjc8RkWGM6LVT3EdkQ/+SPqhh/ZeeQbobWf0mHCMb3GmVxw721AYX1WKCIda+As+cPjkpUjvvHxG+HSo1P6RqL3PWG9GhuQzjuWbzfBd2ZsariZKY0FVyWagFx5crNC5Pc4x/iHApwz9460aPj26kVP/3Ou0MIM0RRNQkWWwuoGbTgTkWLTlJkNVccNvR28fb5bH8T9T7BNTSeXOcdSYZUn8t+QhEYZnU9iPKqZjF6/ZZOEUPD+sqxe8MfMYFss1Dt9zjHFxICte/8BYYiwqBF/P/zV8ZasJuqOfI3BxXQiFzOTXQ/hsBdhoDQethPZw+KjL9LyvfgX/5KUe9t4dvfOrFSHRFATwiwU+rdqVR5WUOS1VLjZgMJC8MC/D95cJIExfUnHJACSbPE0KLP4qMKbpsx42+jVcRNhfcMMj4/sraumNJndDPZE2aD9+wGuDamMaTMsPGGs9DDBxo33I3f807sf7AyFBvdTG2dL7AHZm6WbGYH29KsNXOE2Vv7wyugn05tptrAEArq4z/wMsDfWpHhMkHTDccFOLLPpRvJZzUuR17qDbBod2NEnffBiUHV1RknmkRBlwEEofBknoszVvaGKhDgas48DWnzvyYW3vmrVWnFSqvsHFj0izcMus77V8mukVVJ5ax4MCJy2d4iQCbx53xwWadAHzWC0U6bC8N6hF3elGXy5qOMcZgnDE4uDM6oa/HjUskF0bqa/Y/oeYFb/380O/C4FsCAUA2e5X4STEO8cVvYr08gNsqDAYtlvW9pwvk/4gNYz1Hv8r9OLpVZaxxM56IgFpkUjmr04ZweSBDEhgYGcRw/kBasYWJuawr6h4/VRRHgM4/5/bNHbJtXbcJBG0CIWpevGb4ZIg6rV BdCJ9WEx cQNars1j9DWyLnnmnZqe9CbM33SxlMWJ/6blwvaOJ4OMu6Qa6Na60HpAhvOfgcBIxdz8+VOCbDVyXCfAk8qD6ooYZl3DefxVmdcvUl1UNAYIZfgQ/qQex1HqkQibTAdMt7SJmIhInexPe9/EIZywCA46YFbKVz39isgwiIh9ZSXzTsVW7WQxEX5ljS0zOqmNQkSeulxc/zFC3N17A6eXcyryYX48P64KKgZGm1TeEi26ND6kqPFup1nVRwaIJpAXp/Fnta/mF3rR1Bmnn9JzDsPu+fLA1vNpb/dIkhB9kA1z5dRkUCduA1ByEYl0bvqL0VSBGj1auJ/2LtInfNixSdeAtp0e4s9kFsqKn6ilGfVMoHPRb1WWO2RarjJPtz/PN0GedJ7XTxCXglYHqP3XEefF1ee9q7JtJgXnImjXmHOgnLocEr7k/zXTEu4oSI+/SypB1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use IBS (Instruction Based Sampling) feature present in AMD processors for memory access tracking. The access information obtained from IBS will be used in subsequent patches to drive NUMA balancing. An NMI handler is registered to obtain the IBS data. The handler does nothing much yet. It just filters out the non-useful samples and collects some stats. This patch just builds the framework and IBS execution sampling is enabled only in a subsequent patch. TODOs ----- 1. Perf also uses IBS. For the purpose of this prototype just disable the use of IBS in perf. This needs to be done cleanly. 2. Only the required MSR bits are defined here. About IBS --------- IBS can be programmed to provide data about instruction execution periodically. This is done by programming a desired sample count (number of ops) in a control register. When the programmed number of ops are dispatched, a micro-op gets tagged, various information about the tagged micro-op's execution is populated in IBS execution MSRs and an interrupt is raised. While IBS provides a lot of data for each sample, for the purpose of memory access profiling, we are interested in linear and physical address of the memory access that reached DRAM. Recent AMD processors provide further filtering where it is possible to limit the sampling to those ops that had an L3 miss which greately reduces the non-useful samples. While IBS provides capability to sample instruction fetch and execution, only IBS execution sampling is used here to collect data about memory accesses that occur during the instruction execution. More information about IBS is available in Sec 13.3 of AMD64 Architecture Programmer's Manual, Volume 2:System Programming which is present at: https://bugzilla.kernel.org/attachment.cgi?id=288923 Information about MSRs used for programming IBS can be found in Sec 2.1.14.4 of PPR Vol 1 for AMD Family 19h Model 11h B1 which is currently present at: https://www.amd.com/system/files/TechDocs/55901_0.25.zip Signed-off-by: Bharata B Rao --- arch/x86/events/amd/ibs.c | 6 ++ arch/x86/include/asm/msr-index.h | 12 +++ arch/x86/mm/Makefile | 1 + arch/x86/mm/ibs.c | 169 +++++++++++++++++++++++++++++++ include/linux/vm_event_item.h | 11 ++ mm/vmstat.c | 11 ++ 6 files changed, 210 insertions(+) create mode 100644 arch/x86/mm/ibs.c diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c index da3f5ebac4e1..290e6d221844 100644 --- a/arch/x86/events/amd/ibs.c +++ b/arch/x86/events/amd/ibs.c @@ -1512,6 +1512,12 @@ static __init int amd_ibs_init(void) { u32 caps; + /* + * TODO: Find a clean way to disable perf IBS so that IBS + * can be used for NUMA balancing. + */ + return 0; + caps = __get_ibs_caps(); if (!caps) return -ENODEV; /* ibs not supported by the cpu */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 37ff47552bcb..443d4cf73366 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -593,6 +593,18 @@ /* AMD Last Branch Record MSRs */ #define MSR_AMD64_LBR_SELECT 0xc000010e +/* AMD IBS MSR bits */ +#define MSR_AMD64_IBSOPDATA2_DATASRC 0x7 +#define MSR_AMD64_IBSOPDATA2_DATASRC_DRAM 0x3 +#define MSR_AMD64_IBSOPDATA2_DATASRC_FAR_CCX_CACHE 0x5 + +#define MSR_AMD64_IBSOPDATA3_LDOP BIT_ULL(0) +#define MSR_AMD64_IBSOPDATA3_STOP BIT_ULL(1) +#define MSR_AMD64_IBSOPDATA3_DCMISS BIT_ULL(7) +#define MSR_AMD64_IBSOPDATA3_LADDR_VALID BIT_ULL(17) +#define MSR_AMD64_IBSOPDATA3_PADDR_VALID BIT_ULL(18) +#define MSR_AMD64_IBSOPDATA3_L2MISS BIT_ULL(20) + /* Fam 17h MSRs */ #define MSR_F17H_IRPERF 0xc00000e9 diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index c80febc44cd2..e74b95a57d86 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -27,6 +27,7 @@ endif obj-y := init.o init_$(BITS).o fault.o ioremap.o extable.o mmap.o \ pgtable.o physaddr.o tlb.o cpu_entry_area.o maccess.o pgprot.o +obj-$(CONFIG_NUMA_BALANCING) += ibs.o obj-y += pat/ # Make sure __phys_addr has no stackprotector diff --git a/arch/x86/mm/ibs.c b/arch/x86/mm/ibs.c new file mode 100644 index 000000000000..411dba2a88d1 --- /dev/null +++ b/arch/x86/mm/ibs.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include + +#include +#include /* TODO: Move defns like IBS_OP_ENABLE into non-perf header */ +#include + +static u64 ibs_config __read_mostly; + +static int ibs_overflow_handler(unsigned int cmd, struct pt_regs *regs) +{ + u64 ops_ctl, ops_data3, ops_data2; + u64 remote_access; + u64 laddr = -1, paddr = -1; + struct mm_struct *mm = current->mm; + + rdmsrl(MSR_AMD64_IBSOPCTL, ops_ctl); + + /* + * When IBS sampling period is reprogrammed via read-modify-update + * of MSR_AMD64_IBSOPCTL, overflow NMIs could be generated with + * IBS_OP_ENABLE not set. For such cases, return as HANDLED. + * + * With this, the handler will say "handled" for all NMIs that + * aren't related to this NMI. This stems from the limitation of + * having both status and control bits in one MSR. + */ + if (!(ops_ctl & IBS_OP_VAL)) + goto handled; + + wrmsrl(MSR_AMD64_IBSOPCTL, ops_ctl & ~IBS_OP_VAL); + + count_vm_event(IBS_NR_EVENTS); + + if (!mm) { + count_vm_event(IBS_KTHREAD); + goto handled; + } + + rdmsrl(MSR_AMD64_IBSOPDATA3, ops_data3); + + /* Load/Store ops only */ + if (!(ops_data3 & (MSR_AMD64_IBSOPDATA3_LDOP | + MSR_AMD64_IBSOPDATA3_STOP))) { + count_vm_event(IBS_NON_LOAD_STORES); + goto handled; + } + + /* Discard the sample if it was L1 or L2 hit */ + if (!(ops_data3 & (MSR_AMD64_IBSOPDATA3_DCMISS | + MSR_AMD64_IBSOPDATA3_L2MISS))) { + count_vm_event(IBS_DC_L2_HITS); + goto handled; + } + + rdmsrl(MSR_AMD64_IBSOPDATA2, ops_data2); + remote_access = ops_data2 & MSR_AMD64_IBSOPDATA2_DATASRC; + + /* Consider only DRAM accesses, exclude cache accesses from near ccx */ + if (remote_access < MSR_AMD64_IBSOPDATA2_DATASRC_DRAM) { + count_vm_event(IBS_NEAR_CACHE_HITS); + goto handled; + } + + /* Exclude hits from peer cache in far ccx */ + if (remote_access == MSR_AMD64_IBSOPDATA2_DATASRC_FAR_CCX_CACHE) { + count_vm_event(IBS_FAR_CACHE_HITS); + goto handled; + } + + /* Is linear addr valid? */ + if (ops_data3 & MSR_AMD64_IBSOPDATA3_LADDR_VALID) + rdmsrl(MSR_AMD64_IBSDCLINAD, laddr); + else { + count_vm_event(IBS_LADDR_INVALID); + goto handled; + } + + /* Discard kernel address accesses */ + if (laddr & (1UL << 63)) { + count_vm_event(IBS_KERNEL_ADDR); + goto handled; + } + + /* Is phys addr valid? */ + if (ops_data3 & MSR_AMD64_IBSOPDATA3_PADDR_VALID) + rdmsrl(MSR_AMD64_IBSDCPHYSAD, paddr); + else + count_vm_event(IBS_PADDR_INVALID); + +handled: + return NMI_HANDLED; +} + +static inline int get_ibs_lvt_offset(void) +{ + u64 val; + + rdmsrl(MSR_AMD64_IBSCTL, val); + if (!(val & IBSCTL_LVT_OFFSET_VALID)) + return -EINVAL; + + return val & IBSCTL_LVT_OFFSET_MASK; +} + +static void setup_APIC_ibs(void) +{ + int offset; + + offset = get_ibs_lvt_offset(); + if (offset < 0) + goto failed; + + if (!setup_APIC_eilvt(offset, 0, APIC_EILVT_MSG_NMI, 0)) + return; +failed: + pr_warn("IBS APIC setup failed on cpu #%d\n", + smp_processor_id()); +} + +static void clear_APIC_ibs(void) +{ + int offset; + + offset = get_ibs_lvt_offset(); + if (offset >= 0) + setup_APIC_eilvt(offset, 0, APIC_EILVT_MSG_FIX, 1); +} + +static int x86_amd_ibs_access_profile_startup(unsigned int cpu) +{ + setup_APIC_ibs(); + return 0; +} + +static int x86_amd_ibs_access_profile_teardown(unsigned int cpu) +{ + clear_APIC_ibs(); + return 0; +} + +int __init ibs_access_profiling_init(void) +{ + u32 caps; + + ibs_config = IBS_OP_CNT_CTL | IBS_OP_ENABLE; + + if (!boot_cpu_has(X86_FEATURE_IBS)) { + pr_info("IBS capability is unavailable for access profiling\n"); + return 0; + } + + caps = cpuid_eax(IBS_CPUID_FEATURES); + if (caps & IBS_CAPS_ZEN4) + ibs_config |= IBS_OP_L3MISSONLY; + + register_nmi_handler(NMI_LOCAL, ibs_overflow_handler, 0, "ibs"); + + cpuhp_setup_state(CPUHP_AP_PERF_X86_AMD_IBS_STARTING, + "x86/amd/ibs_access_profile:starting", + x86_amd_ibs_access_profile_startup, + x86_amd_ibs_access_profile_teardown); + + pr_info("IBS access profiling setup for NUMA Balancing\n"); + return 0; +} + +arch_initcall(ibs_access_profiling_init); diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 7f5d1caf5890..1d55e347d16c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -149,6 +149,17 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, +#ifdef CONFIG_NUMA_BALANCING + IBS_NR_EVENTS, + IBS_KTHREAD, + IBS_NON_LOAD_STORES, + IBS_DC_L2_HITS, + IBS_NEAR_CACHE_HITS, + IBS_FAR_CACHE_HITS, + IBS_LADDR_INVALID, + IBS_KERNEL_ADDR, + IBS_PADDR_INVALID, +#endif #endif NR_VM_EVENT_ITEMS }; diff --git a/mm/vmstat.c b/mm/vmstat.c index 1ea6a5ce1c41..c7a9d0d9ade8 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1398,6 +1398,17 @@ const char * const vmstat_text[] = { #ifdef CONFIG_X86 "direct_map_level2_splits", "direct_map_level3_splits", +#ifdef CONFIG_NUMA_BALANCING + "ibs_nr_events", + "ibs_kthread", + "ibs_non_load_stores", + "ibs_dc_l2_hits", + "ibs_near_cache_hits", + "ibs_far_cache_hits", + "ibs_invalid_laddr", + "ibs_kernel_addr", + "ibs_invalid_paddr", +#endif #endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; From patchwork Wed Feb 8 07:35:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bharata B Rao X-Patchwork-Id: 13132536 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CB7CC636D4 for ; Wed, 8 Feb 2023 07:36:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2B446B0074; Wed, 8 Feb 2023 02:36:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DDA146B0075; Wed, 8 Feb 2023 02:36:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7A5A6B0078; Wed, 8 Feb 2023 02:36:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B6C716B0074 for ; Wed, 8 Feb 2023 02:36:44 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 82656140D11 for ; Wed, 8 Feb 2023 07:36:44 +0000 (UTC) X-FDA: 80443317528.13.7EDAD79 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2055.outbound.protection.outlook.com [40.107.220.55]) by imf30.hostedemail.com (Postfix) with ESMTP id 6FA048000A for ; Wed, 8 Feb 2023 07:36:41 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=jynYEohD; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf30.hostedemail.com: domain of bharata@amd.com designates 40.107.220.55 as permitted sender) smtp.mailfrom=bharata@amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675841801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ViNAZg1p+8A6g62o4FlB5zKdANwB07NQFEia8b4dS9A=; b=XLkcg2a95UpyqIIYOwtjm/bHQ/LY0UIFagqB/N34EmpI3UAHfmduZsxTTk3xvNiFcsMbPR ClfcaKHKK0XqO4wiCg+zciHFOMIE8S7liDjd2xMxj3xfYyTT2jQ/SeTPnDjMpm8cE9hgbU ABfgw03PYaTcGqSGOXLhhAbI1u2TiSU= ARC-Authentication-Results: i=2; imf30.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=jynYEohD; dmarc=pass (policy=quarantine) header.from=amd.com; spf=pass (imf30.hostedemail.com: domain of bharata@amd.com designates 40.107.220.55 as permitted sender) smtp.mailfrom=bharata@amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1675841801; a=rsa-sha256; cv=pass; b=kbarErIJXgoacKnTrjk9ChtRkOiluazqK7N9xzr3QE3tZ8FQIPq1jPqo+M4FGe/kB9V+cp wPyYBIfw+pIb+mYAWFavJHJhtUTCTO9uDU6r5NaaNMR8wVC8Ffi5Hcm08vvEsVpB0Jgw3d vNcADMbn6K1251A6FisGlHeWrAnmg9o= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L3JXANGiO7ULx4UR8fr+ic/O65KGykwk/upSpi9X5AmnAqp1rZShMpVhcKcubRC9z6mHV3VUQRcjd4DXAu5+3SCjQKZ41rd3p+1v9nzbYw0deVaimj+4CESx9QhsqMH44HfXC4WAPfyj/N/IdsItTfrAzbhyiUaOh2FbpusRDabYC6twgRhnN6Fu2+34iyOsStWVozcxYCCF4IQeofILYPPP7LUJaW8O0z92IrZvmh/ppR+c1N5iG7XzuJtjmG3naw9/pFq8H9gm4AjhPYXT9DgT/fMUXvWeL4GMHniHQVpoKfqs2OO58ZImdy+b63w+IykETzCxiPUBHJuoohtJiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ViNAZg1p+8A6g62o4FlB5zKdANwB07NQFEia8b4dS9A=; b=OMnWO1KVP4Cp0VYmguQvrPDOqydcLiRwZgL9Ow/VuC/VoHZRT37DP1aN2qbfRcmt0oRORNlZCIZmqovYFP7RTkraGAxcix805b8JxFvUZ4qJk57FZJjl/TJlDFWKpbgmbkcEPZVd8z+iCK7Iv+XtMWP1Pvor9SJPNJhfrbw6/NEiOFIu5NshWhcsMC8bkqgE8FgKsIfXE89WWqVfpP13/9oZBKBGZn7iL58LOiQ3nfmdzqXvQHo/JP7O4ZhVdjwNWmw54M4sK1CZA7g9XP0QT5C1UMBvajWhnZaxotlQpkIRTcOi48I/GYMZG/zeL0yTGIlxMHxZKhH5OXlGu1YonA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ViNAZg1p+8A6g62o4FlB5zKdANwB07NQFEia8b4dS9A=; b=jynYEohDEPxn+8NRettW2Nnbs3KboStqvZcvFgmto4Z0JUN6HD7SRW1dp8i7aAODAPgaV7E9zHplTxWEoTEhmPeCQcpSHAUFRJt4+O9NBhNaIjz0JLz9AD5Xs66AXR19/ZYXgnY7vAVkdDl91JCl674B/WWZCXZzRdAMh0ncf0E= Received: from BN9PR03CA0505.namprd03.prod.outlook.com (2603:10b6:408:130::30) by CH2PR12MB4310.namprd12.prod.outlook.com (2603:10b6:610:a9::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17; Wed, 8 Feb 2023 07:36:38 +0000 Received: from BN8NAM11FT018.eop-nam11.prod.protection.outlook.com (2603:10b6:408:130:cafe::c7) by BN9PR03CA0505.outlook.office365.com (2603:10b6:408:130::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.18 via Frontend Transport; Wed, 8 Feb 2023 07:36:38 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT018.mail.protection.outlook.com (10.13.176.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:37 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 8 Feb 2023 01:36:32 -0600 From: Bharata B Rao To: , CC: , , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH 2/5] x86/ibs: Drive NUMA balancing via IBS access data Date: Wed, 8 Feb 2023 13:05:30 +0530 Message-ID: <20230208073533.715-3-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230208073533.715-1-bharata@amd.com> References: <20230208073533.715-1-bharata@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT018:EE_|CH2PR12MB4310:EE_ X-MS-Office365-Filtering-Correlation-Id: ce5579fc-c4d7-4c70-047f-08db09a7363c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5G7w4xQ3LiaDDy0exVQK3qNTLtPmW2idKELeFyLFM3W9h1ggDB4jnraqQHAo7jWLN8fDbX+vgtlyPUOot1IAASPPZL7+tl9G+y0GN0TDv1jXh6+EUj2KiesLOM7fF0koqQF09vStMpviF1LhGTcgH1PmthBlifXClQiNTXzSZFz6q6XCMbQpenAQhsN/SbanI8nJVxKmZBazqL3F2RRRVL5QEaavkF1gD5m5cerVG1op5d+n0Q+c7Oco16lf8EEeLBY8s5MtWYg6rgImMkKLOS/uStdgHbWEy7v8J+wNvPm+8I2h18kf0kxXLtM1dHac84NIsXXSMpOLhee5WFBN0oC2j7vIJw2DQOq+xSK+mrpefSGHlGnw4WOWV874QCx8xZnINb5yae24I5/S9Inc5SZ2XsK3xMd46f+eLTdo0GMONVrp3sbhVpYExIwXPhUN3ir9DM5ltlZjJgG0gTZ+YGnK1jXOTM4cl0wiGndxiwquWkVIarpcB9JcHjdGrgyZdfXENTz1xystDIf0QAyY0JSiM8fbJ7g9tAV5FlIrTE5uxrZxfk0vtVCnixW9wPwV+MibJiOTvM2VosBAdzkE6mmsx4oO+s6YZroYrRsHjGsIxLyhCqdOYd3oLnVgBL4QDejmdvO5u+53AicKprtRY8utWt5NOXS8RalQJGNf9gzy4VzhCL6aW9xRuWehTJQzQY8KX69TRDIaI13dzEM9Kme522kbHnXcJ44nN/TA/ZDK6P1eU2rbnrMrb2g2DUQs X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230025)(4636009)(396003)(376002)(346002)(136003)(39860400002)(451199018)(36840700001)(46966006)(40470700004)(8676002)(4326008)(70586007)(70206006)(316002)(54906003)(110136005)(2906002)(86362001)(82310400005)(36756003)(8936002)(40480700001)(41300700001)(81166007)(7416002)(5660300002)(40460700003)(356005)(1076003)(2616005)(26005)(83380400001)(426003)(16526019)(336012)(36860700001)(186003)(82740400003)(47076005)(478600001)(7696005)(36900700001)(2101003);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2023 07:36:37.6679 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ce5579fc-c4d7-4c70-047f-08db09a7363c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT018.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4310 X-Rspamd-Queue-Id: 6FA048000A X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: wjyyd4zdyq1zt4gp5o1zof51mpuypksj X-HE-Tag: 1675841801-19329 X-HE-Meta: U2FsdGVkX19BCizQ8OvVnvpHkVZT9rZJu9f+fI0kPawB1/RG6HKllSefQmT+MoBnCTKzr7jSo6wpxJVW0kpGeow6mDTMLuPaLkGvQD9646bcVs/fpUTNnJs1hVwHpjaLTFZPkBgUG+g4JUHEWOBGl6eXScFlloXsM4wvgHhVD7P//tkNaRXdUCy8lfOa8qFprQIMKtAILMHj8vFj+i3h4brxuu5BNYjthc2+nQ6sV28vltVHB0lNV9+VqElLQVBZ7zClIDiVa/y518y/h3DWPUfQVZaLDyd+dDkT2oAt94GAOnfL2FW5la7yaG4F3xzCu9hcqapZiAv2r4AxoD8Yn2e67hs+arIwJLwOqaYqBcVeoi2X/6M9VPWBAg0FErNyz38KbwF8xTijDNGughNt61NrGTTaukixpORt7t248XCFYZCo9B+MKX69Sm1WswGbvZgXfZmIFlS7r9FgyWx9p/fXPxedPRvQE2AArizQ+exvRbSekQMXSD6BUD9E188YKcj6loVp66kw+6f3eTkXykAjRLiy+nr5hRPsvl9DUZPZR+xt7r8cQ9yD0+HYQOwTBWjRpbfhLmuABMvjX3i/dsZYNwVLfkC1R2CzlnGtYiGJBS7LzLFReTE6uvPW2VMf8OV695GULB1nugmL5QWDHIxNbI2nYBEeDmycCC2CFOCO/HpVTE+q/Mfa2nGEgMWAUCPAy5SarWY6lnyg3Kaa1W7FNqb4vx9x+pKr8ech2mRC1QGVq9M3M6Biy/o9AsfuVSsC9ZOLQkDRJ6d1h4XlCJfzptvb/xMX2NjzeG5+ONdt0ysgwQe7Wc4ZFUjkavPR9sngkVBb/2ifI9aXED9tZ4KWlRH5A0TP/nQL7EKYK6yo0AMAY9/yN8RpSZK4jo55+Xn5Ee1wKuDzUwFS1s3xaQTjkfleE0OtfjvgNe2iNNDQcxuS42kKe3CvBnAUHEEo0DjZnZQn9yFiW7cKYOy V2COyaux zblJhz7zUxnrt8a3yWS/UvPVc7li1Uo3D4AUETDJgFKuuo4kLr8LhfX8n4BhIt9a5XuRix66tJOYpjXMjVJVFMEU5608sobT1JgjEH2ykCEaoMcx2T4Q5Lxy+c4wFYF/zjf0/o41mf4UcMfvuAYGeVTQ3VW0pgMz4uMNTZEHaQpZJ2EpUxyHkqTUKHqpKmMch6Vq0w8V9IoZDi4fewrP4nReKCx0tByuDDV/CW1QfQL0xkP+9GUulIOq93ZdyWoBztOJ4rykbIZy9iJL8xqGOqkCAa7eRUJ5j8K4bz+HHDuGwUUnmK5uB0guGmOYwexOeny1qEADDJ9Thloqr3tpXg/aUq96VtNRjNBeVdnsQWF9yC3gf896c8ETENgVQfB6hc5mfrQDgnlgAYjY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Feed the page access data obtained from IBS to NUMA balancing as hint fault equivalents. The existing per-task and per-group fault stats are now built from IBS-provided page access information. With this it will not be necessary to scan the address space to introduce NUMA hinting faults. Use task_work framework to process the IBS sampled data. Actual programming of IBS to generate page access information isn't done yet. Signed-off-by: Bharata B Rao --- arch/x86/mm/ibs.c | 38 ++++++++++++++- include/linux/migrate.h | 1 + include/linux/sched.h | 1 + include/linux/vm_event_item.h | 1 + kernel/sched/fair.c | 10 ++++ mm/memory.c | 92 +++++++++++++++++++++++++++++++++++ mm/vmstat.c | 1 + 7 files changed, 143 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/ibs.c b/arch/x86/mm/ibs.c index 411dba2a88d1..adbc587b1767 100644 --- a/arch/x86/mm/ibs.c +++ b/arch/x86/mm/ibs.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include +#include #include #include /* TODO: Move defns like IBS_OP_ENABLE into non-perf header */ @@ -8,12 +10,30 @@ static u64 ibs_config __read_mostly; +struct ibs_access_work { + struct callback_head work; + u64 laddr, paddr; +}; + +void task_ibs_access_work(struct callback_head *work) +{ + struct ibs_access_work *iwork = container_of(work, struct ibs_access_work, work); + struct task_struct *p = current; + + u64 laddr = iwork->laddr; + u64 paddr = iwork->paddr; + + kfree(iwork); + do_numa_access(p, laddr, paddr); +} + static int ibs_overflow_handler(unsigned int cmd, struct pt_regs *regs) { u64 ops_ctl, ops_data3, ops_data2; u64 remote_access; u64 laddr = -1, paddr = -1; struct mm_struct *mm = current->mm; + struct ibs_access_work *iwork; rdmsrl(MSR_AMD64_IBSOPCTL, ops_ctl); @@ -86,8 +106,24 @@ static int ibs_overflow_handler(unsigned int cmd, struct pt_regs *regs) /* Is phys addr valid? */ if (ops_data3 & MSR_AMD64_IBSOPDATA3_PADDR_VALID) rdmsrl(MSR_AMD64_IBSDCPHYSAD, paddr); - else + else { count_vm_event(IBS_PADDR_INVALID); + goto handled; + } + + /* + * TODO: GFP_ATOMIC! + */ + iwork = kzalloc(sizeof(*iwork), GFP_ATOMIC); + if (!iwork) + goto handled; + + count_vm_event(IBS_USEFUL_SAMPLES); + + iwork->laddr = laddr; + iwork->paddr = paddr; + init_task_work(&iwork->work, task_ibs_access_work); + task_work_add(current, &iwork->work, TWA_RESUME); handled: return NMI_HANDLED; diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 3ef77f52a4f0..4dcce7885b0c 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -216,6 +216,7 @@ void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns, unsigned long npages); void migrate_device_finalize(unsigned long *src_pfns, unsigned long *dst_pfns, unsigned long npages); +void do_numa_access(struct task_struct *p, u64 laddr, u64 paddr); #endif /* CONFIG_MIGRATION */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 853d08f7562b..19dd4ee07436 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2420,4 +2420,5 @@ static inline void sched_core_fork(struct task_struct *p) { } extern void sched_set_stop_task(int cpu, struct task_struct *stop); +DECLARE_STATIC_KEY_FALSE(hw_access_hints); #endif diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 1d55e347d16c..2ccc7dee3c13 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -159,6 +159,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, IBS_LADDR_INVALID, IBS_KERNEL_ADDR, IBS_PADDR_INVALID, + IBS_USEFUL_SAMPLES, #endif #endif NR_VM_EVENT_ITEMS diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0f8736991427..c9b9e62da779 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -47,6 +47,7 @@ #include #include #include +#include #include @@ -3125,6 +3126,8 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) } } +DEFINE_STATIC_KEY_FALSE(hw_access_hints); + /* * Drive the periodic memory faults.. */ @@ -3133,6 +3136,13 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr) struct callback_head *work = &curr->numa_work; u64 period, now; + /* + * If we are using access hints from hardware (like using + * IBS), don't scan the address space. + */ + if (static_branch_unlikely(&hw_access_hints)) + return; + /* * We don't care about NUMA placement if we don't have memory. */ diff --git a/mm/memory.c b/mm/memory.c index aad226daf41b..79096aba197c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4668,6 +4668,98 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma, return mpol_misplaced(page, vma, addr); } +/* + * Called from task_work context to act upon the page access. + * + * Physical address (provided by IBS) is used directly instead + * of walking the page tables to get to the PTE/page. Hence we + * don't check if PTE is writable for the TNF_NO_GROUP + * optimization, which means RO pages are considered for grouping. + */ +void do_numa_access(struct task_struct *p, u64 laddr, u64 paddr) +{ + struct mm_struct *mm = p->mm; + struct vm_area_struct *vma; + struct page *page = NULL; + int page_nid = NUMA_NO_NODE; + int last_cpupid; + int target_nid; + int flags = 0; + + if (!mm) + return; + + if (!mmap_read_trylock(mm)) + return; + + vma = find_vma(mm, laddr); + if (!vma) + goto out_unlock; + + if (!vma_migratable(vma) || !vma_policy_mof(vma) || + is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) + goto out_unlock; + + if (!vma->vm_mm || + (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ))) + goto out_unlock; + + if (!vma_is_accessible(vma)) + goto out_unlock; + + page = pfn_to_online_page(PHYS_PFN(paddr)); + if (!page || is_zone_device_page(page)) + goto out_unlock; + + if (unlikely(!PageLRU(page))) + goto out_unlock; + + /* TODO: handle PTE-mapped THP */ + if (PageCompound(page)) + goto out_unlock; + + /* + * Flag if the page is shared between multiple address spaces. This + * is later used when determining whether to group tasks together + */ + if (page_mapcount(page) > 1 && (vma->vm_flags & VM_SHARED)) + flags |= TNF_SHARED; + + last_cpupid = page_cpupid_last(page); + page_nid = page_to_nid(page); + + /* + * For memory tiering mode, cpupid of slow memory page is used + * to record page access time. So use default value. + */ + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && + !node_is_toptier(page_nid)) + last_cpupid = (-1 & LAST_CPUPID_MASK); + else + last_cpupid = page_cpupid_last(page); + + target_nid = numa_migrate_prep(page, vma, laddr, page_nid, &flags); + if (target_nid == NUMA_NO_NODE) { + put_page(page); + goto out; + } + + /* Migrate to the requested node */ + if (migrate_misplaced_page(page, vma, target_nid)) { + page_nid = target_nid; + flags |= TNF_MIGRATED; + } else { + flags |= TNF_MIGRATE_FAIL; + } + +out: + if (page_nid != NUMA_NO_NODE) + task_numa_fault(last_cpupid, page_nid, 1, flags); + +out_unlock: + mmap_read_unlock(mm); +} + static vm_fault_t do_numa_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; diff --git a/mm/vmstat.c b/mm/vmstat.c index c7a9d0d9ade8..33738426ae48 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1408,6 +1408,7 @@ const char * const vmstat_text[] = { "ibs_invalid_laddr", "ibs_kernel_addr", "ibs_invalid_paddr", + "ibs_useful_samples", #endif #endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ From patchwork Wed Feb 8 07:35:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bharata B Rao X-Patchwork-Id: 13132537 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB180C05027 for ; Wed, 8 Feb 2023 07:36:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B07B6B0075; Wed, 8 Feb 2023 02:36:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 65F596B0078; Wed, 8 Feb 2023 02:36:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 501226B007B; Wed, 8 Feb 2023 02:36:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3E7BE6B0075 for ; Wed, 8 Feb 2023 02:36:49 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 05731C03F8 for ; Wed, 8 Feb 2023 07:36:48 +0000 (UTC) X-FDA: 80443317738.22.F231E95 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2060.outbound.protection.outlook.com [40.107.94.60]) by imf20.hostedemail.com (Postfix) with ESMTP id 15D901C000A for ; Wed, 8 Feb 2023 07:36:45 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=vIoOqa4m; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf20.hostedemail.com: domain of bharata@amd.com designates 40.107.94.60 as permitted sender) smtp.mailfrom=bharata@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675841806; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MdjzojHcuBT96abEwzKozCOPIPJAXDc3x8XJ/bPfy5Q=; b=AIkHfyLIxLxWDAT4DsFjERauaB0ohjAqzHxwB3TYhPcaOA/y3nuBoPeDqv9p/lHTToYq+t +TJkvSJ3lNDrsNmChtwkOQoHabtf/ydniOV9KDnZfoFRZS62ijVC/g0+UIzVIUkpSL2Jxa UfAw8GdupAWCLJ6ec7OQuWjrFw5s3qg= ARC-Authentication-Results: i=2; imf20.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=vIoOqa4m; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf20.hostedemail.com: domain of bharata@amd.com designates 40.107.94.60 as permitted sender) smtp.mailfrom=bharata@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1675841806; a=rsa-sha256; cv=pass; b=hCJLEwid7n+yZMXMoZuUo/uIa8kDN2hat7dnRz4JYpWgDs1SjscqDWIfeFRZO9zmnE8TBE Ba2UYP12PmrVyQ19i5WIdOn7Pp1cyaT3mynnrJ4spWUNaVRIFsQGjTMSmvlosKk1wgw6MD uGQNqOfsH3B7eeS95LfHMvKdQet03CA= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=W/1oo5aZfjIYzZIXXNpj5NSRuFEmAeI7wA+mrspqDZCNK2nKW5q3AGr9nkPP1+LMJ9EvALNCgZ/wFbhSEGgMemDgPJVoMviWfkVfWeCIl998x1DvxTXcu/g4htXJwZu3NVQUndHvd3nWnFDywR/KWfHrWDXKOXt8eBFhYKJpT4yuEeaWf/jFl035iBar0FMgp4UEv5gRL8VEV1HF269OoB/GjVPUrelzoMER7QpglH6jNYTTDagKdCkX+YkiWtIGTmnZZwmOmIk4MeOKKTVjYBPw+6WAIXyas0adSySCBaVk1FhmnhfuDFa94mCBYuQxQbAHzWIp1UpCyMYzJuRyMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=MdjzojHcuBT96abEwzKozCOPIPJAXDc3x8XJ/bPfy5Q=; b=k7DA+t6UglK+1Q+c9VK3ZnFgRi6g7w0ty0OGJOcFatc9inttjcWmIP8c16761gjP93kgRRf3OEGvguYZ0gUw0LlMEpG8gZVNfFtum8DvbCvn/CKK/YRxXTADD/JsAAM/Jo+hH/VWvqMmNkS7Gyvym3ZjB1kO/3G/3pcRKsf2ffkmb41oN2HceuVm5qFNokahQDyVcKLdZPXuplM4kKsV7ODjqGvXCF+FKSMR5zpffpAkyvvKn06mjYzy3b9j85AYl0eVQV1O2KnZf+FYOZymI5BwEQa3Fpaoctw2LV+KwmsDYWoHyL+dK0PwqBk5/aXS+9Vwvy1dnOleEOzi/kSEHQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MdjzojHcuBT96abEwzKozCOPIPJAXDc3x8XJ/bPfy5Q=; b=vIoOqa4mvMhnwt7c9P4c9z1Ys0wh9nq+NxBSf7EVsPkeOVlBRxplLZ4WHsOSOjdAikzJi8jqzdKY/dfVAkJhrE49wJ/MXuEDejTbMQWr7ltffgt1Tup+wfYnDJwN+z6Xpze7x6m+ir2+uT3xsKhN02toNYeIkfEDR9zeJsOJGHs= Received: from BN1PR10CA0021.namprd10.prod.outlook.com (2603:10b6:408:e0::26) by DS7PR12MB5766.namprd12.prod.outlook.com (2603:10b6:8:75::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.34; Wed, 8 Feb 2023 07:36:42 +0000 Received: from BN8NAM11FT031.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e0:cafe::b5) by BN1PR10CA0021.outlook.office365.com (2603:10b6:408:e0::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT031.mail.protection.outlook.com (10.13.177.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:42 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 8 Feb 2023 01:36:37 -0600 From: Bharata B Rao To: , CC: , , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH 3/5] x86/ibs: Enable per-process IBS from sched switch path Date: Wed, 8 Feb 2023 13:05:31 +0530 Message-ID: <20230208073533.715-4-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230208073533.715-1-bharata@amd.com> References: <20230208073533.715-1-bharata@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT031:EE_|DS7PR12MB5766:EE_ X-MS-Office365-Filtering-Correlation-Id: 5a363116-817e-43b1-5c51-08db09a738df X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IdTCdkx5UyzbNDKrYc1PoxsP1Y2dCzUNsXnJggY4dBOSLyvfsHBaLBlXfiqGlVWtz3H0vdl+1OfrMRVCiJ3dZSrgKDKNWeCVnXy2Mlp1Dter99h1KbDBCYOSiQiJXIOcArvAMyaac1JOOc5iFWbxArQt60cxmdYOvbTSxez1UYu7RrtYs3PO88dxIX61z3xOYTkZVs6kNuwa+eXd4VvbC5RR5EVS9jDZL6eZnxNENQT+7dmZZ+jl79GL1hdKnVG9kMETXnvmoKJjmNS8tQ6iIZF5aisd/aVvvYvfUA0E7ohhsC6LgdF6lecxQbyy+ppTRagg9EK9qUK0dtFBSRDjB5azQavyyQR11nc1K9kxst10pNZgufjRdTt+gWOgUVjJ4qEhFuZQKlXA7ZajfwVJU7eS5ouheYXXaBrX3wBKjEuwRVaR0DJkxlLIglWoFg4MMIb01NUeQQpzCWvR/CcA4PVKD468yAketZqRFR+p00TMnWqzItIqq9eZt/WURbzT1rbjK0fY28AogW9j12EzD0ot84BABkdAzwTzLMi0G1RVTgnQuTiM/IYGPpimzxGc4ocz6+sL0Am8TaOZ9X46+o3Anz6NHkkTnLqnOrnAnMyxZnAH3O762kFyetNnNILxLnAqLIPiD0xLlknC2lVSOSFqcWBP3EbTi4kb3aRZI4cIBAVEYi7ia1H5CxzNTTBNi7afe2O/N0ASS4B6nwJiRTgRRPqLce2FJ09uxH0LXC6qgpAJ6sN/wh0p0kfp5Wu+ X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230025)(4636009)(346002)(136003)(376002)(396003)(39860400002)(451199018)(36840700001)(40470700004)(46966006)(81166007)(426003)(83380400001)(47076005)(36756003)(5660300002)(2906002)(7416002)(1076003)(186003)(16526019)(82310400005)(26005)(8936002)(6666004)(41300700001)(478600001)(40460700003)(7696005)(86362001)(82740400003)(2616005)(40480700001)(336012)(356005)(110136005)(54906003)(4326008)(70206006)(70586007)(316002)(8676002)(36860700001)(2101003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2023 07:36:42.1095 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5a363116-817e-43b1-5c51-08db09a738df X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT031.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB5766 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: mjbqhwin1pw84jpubshe4jju4x7mnpie X-Rspamd-Queue-Id: 15D901C000A X-HE-Tag: 1675841805-912634 X-HE-Meta: U2FsdGVkX18ZiXm/3bLFSCxQPFHbi8NeQc6JIuma1ZPkKt+vmYbNWEtZ7JONmEJn/dzQAZwSe3fs6dRYGv95iPII2LJHp5JjMlSRvfbG289NleX9tIdE4yalfTpkXXBH5myK7yYuLkQ4jbmNYGiMeJAsc2F8w7iCcb/8M70HtXQhQKS82K+7CKZnasO9sIFGJLcSokDjN8GppGAe2HgsEd7ghYsFLWbNnCDeMZPwSqVqZZmv0kZVe2lymEGX1eHy02UiznhG4TMCI87tFnSybOouOExH+yd4J3vsqWc6NFe1QYoVkN81nDN0uTeSJc11we5lX3LcSDzW/ETgCzLb3+IFMhYgf/Bf5j3BtoECVhWxR+Minu+41r/T8cM2sPFnbB+n/oReCSepxrEM85wGdafP9MTFWQkPzmJIsF9Wj+Epv/NZbEJEuoaUx3x/2xdcA2UeLz/k6a4j+wtZogAwnTqK2EHVaIoHEuSO+SrQzUU2gxF8sW/OWV+FwLgl+eTzhOqB8o0PSYR+fXRuCZ/CTo76uYQK6MESUokEi/C+QgiRA/n3usJxZ464/w1DnQxjzhqGP/5hEN+Z0qXf8/r7Gn5KYZ/A5Son6FfKMVJhs77Ael7Fa9DeC9Ttjr2Y3B60PMQX2DA9fPKZhGx3WlHsyFE0u1Jp51t8I9J084VdMMfw5z+qg8c+71vAJCC0LIsSEFA977tcG3DQPfmXDfThmfu5YqUsZ49EuiJKY3RREcQzwxJFNjs03RO99CBNBN1Uo/XeB+QgVK/9o3ms9H8F7D3MYNaG7zm3qocMmznE1ea6igci4FIxhnXDepxmmxL+0VOUppraI0elay57vG+facZh+865OqOa5o29tTES/x3oQrVD/c/BwcRuoR9LRQF1g1qpDTwXTlpfrIwl6EYe6U3P1R2sxO8vGHe6IhxagZAihlqpMzrGSc6LQoTYjiQ1Lzu3I4OVB13bNjhJxnC xggnTUVn QWJQIhQzbF5oo1ciMs/UBVFpJcMBW9MW5HTf5OdsmnTTim/MNUVTksihXqdef5gdE2m4Fol3wtr5pd2r5FhKFmuPCiYHVSLAATtsdUXq+akdAYtKSLKlkjMJhI8h/qlf4xVC96hM+Ew8lyG/WVmDMbcDoF5r0WpNYJtoO7f1AHZW8FCjyLxPOZ1V56ED4iKNP8e4Etk+L14gSwP4OLuxGE7xCzqQzqTEPjMU0K/54xZgADjQlNoOhO1WKy5SWnSvwq5j4Pg3JN8pJwRZZPrfu0Ek6XDoU7x3lJAa06al5n9jJIKhHXLHn1mQNSMDlWJ1G5FC9s9xkT8L4nI9UsdR8hITuleLJuyo/bon2BYfFMkOkm2hEXM/CBq3mud2mCBtqQzBh3FXifEZ9Jhs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Program IBS for access profiling for threads from the task sched switch path. IBS is programmed with a period that corresponds to the incoming thread. Kernel threads are excluded from this. The sample period is currently kept at a fixed value of 10000. Signed-off-by: Bharata B Rao --- arch/x86/mm/ibs.c | 27 +++++++++++++++++++++++++++ include/linux/sched.h | 1 + kernel/sched/core.c | 1 + kernel/sched/fair.c | 1 + kernel/sched/sched.h | 5 +++++ 5 files changed, 35 insertions(+) diff --git a/arch/x86/mm/ibs.c b/arch/x86/mm/ibs.c index adbc587b1767..a479029e9262 100644 --- a/arch/x86/mm/ibs.c +++ b/arch/x86/mm/ibs.c @@ -8,6 +8,7 @@ #include /* TODO: Move defns like IBS_OP_ENABLE into non-perf header */ #include +#define IBS_SAMPLE_PERIOD 10000 static u64 ibs_config __read_mostly; struct ibs_access_work { @@ -15,6 +16,31 @@ struct ibs_access_work { u64 laddr, paddr; }; +void hw_access_sched_in(struct task_struct *prev, struct task_struct *curr) +{ + u64 config = 0; + unsigned int period; + + if (!static_branch_unlikely(&hw_access_hints)) + return; + + /* Disable IBS for kernel thread */ + if (!curr->mm) + goto out; + + if (curr->numa_sample_period) + period = curr->numa_sample_period; + else + period = IBS_SAMPLE_PERIOD; + + + config = (period >> 4) & IBS_OP_MAX_CNT; + config |= (period & IBS_OP_MAX_CNT_EXT_MASK); + config |= ibs_config; +out: + wrmsrl(MSR_AMD64_IBSOPCTL, config); +} + void task_ibs_access_work(struct callback_head *work) { struct ibs_access_work *iwork = container_of(work, struct ibs_access_work, work); @@ -198,6 +224,7 @@ int __init ibs_access_profiling_init(void) x86_amd_ibs_access_profile_startup, x86_amd_ibs_access_profile_teardown); + static_branch_enable(&hw_access_hints); pr_info("IBS access profiling setup for NUMA Balancing\n"); return 0; } diff --git a/include/linux/sched.h b/include/linux/sched.h index 19dd4ee07436..66c532418d38 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1254,6 +1254,7 @@ struct task_struct { int numa_scan_seq; unsigned int numa_scan_period; unsigned int numa_scan_period_max; + unsigned int numa_sample_period; int numa_preferred_nid; unsigned long numa_migrate_retry; /* Migration stamp: */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e838feb6adc5..1c13fed8bebc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5165,6 +5165,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) prev_state = READ_ONCE(prev->__state); vtime_task_switch(prev); perf_event_task_sched_in(prev, current); + hw_access_sched_in(prev, current); finish_task(prev); tick_nohz_task_switch(); finish_lock_switch(rq); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c9b9e62da779..3f617c799821 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3094,6 +3094,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) p->node_stamp = 0; p->numa_scan_seq = mm ? mm->numa_scan_seq : 0; p->numa_scan_period = sysctl_numa_balancing_scan_delay; + p->numa_sample_period = 0; p->numa_migrate_retry = 0; /* Protect against double add, see task_tick_numa and task_numa_work */ p->numa_work.next = &p->numa_work; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 771f8ddb7053..953d16c802d6 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1723,11 +1723,16 @@ extern int migrate_task_to(struct task_struct *p, int cpu); extern int migrate_swap(struct task_struct *p, struct task_struct *t, int cpu, int scpu); extern void init_numa_balancing(unsigned long clone_flags, struct task_struct *p); +void hw_access_sched_in(struct task_struct *prev, struct task_struct *curr); #else static inline void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) { } +static inline void hw_access_sched_in(struct task_struct *prev, + struct task_struct *curr) +{ +} #endif /* CONFIG_NUMA_BALANCING */ #ifdef CONFIG_SMP From patchwork Wed Feb 8 07:35:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bharata B Rao X-Patchwork-Id: 13132538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43435C636CC for ; Wed, 8 Feb 2023 07:36:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC9576B007B; Wed, 8 Feb 2023 02:36:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C7A0C6B007D; Wed, 8 Feb 2023 02:36:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1AAB6B007E; Wed, 8 Feb 2023 02:36:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9E40E6B007B for ; Wed, 8 Feb 2023 02:36:54 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7C644A0387 for ; Wed, 8 Feb 2023 07:36:54 +0000 (UTC) X-FDA: 80443317948.16.6A1E17A Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2041.outbound.protection.outlook.com [40.107.236.41]) by imf03.hostedemail.com (Postfix) with ESMTP id 816C12000A for ; Wed, 8 Feb 2023 07:36:51 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=SixPSUxf; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf03.hostedemail.com: domain of bharata@amd.com designates 40.107.236.41 as permitted sender) smtp.mailfrom=bharata@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675841811; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FZkhBw3sxi8I3UCl/z/e/2WFGv+lgV25p4wF9bJi1T0=; b=4DkNqd9EtJrZXXI+Pyfd/Uis93xNNBLZdR2INcf+I/Pkdf1F8+qSTiwzqG4HnxQ3meUalr vqXL1txhkY2n4yMZ5MySfv/JXT7oNJWrJxMrrQBDWdXTl5YuwJ2V2LnAwa201FoLnwtCDm mVoxr9oTNZReSVVUyRWxHtIJM+jD0uQ= ARC-Authentication-Results: i=2; imf03.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=SixPSUxf; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf03.hostedemail.com: domain of bharata@amd.com designates 40.107.236.41 as permitted sender) smtp.mailfrom=bharata@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1675841811; a=rsa-sha256; cv=pass; b=HtsLPRn6hrZz6CHNo+STamwe+rW67ngoOM5Y4832zKGd13yBF62zQRWRHEki+vXhg6qz3h Ek8nGUFTlIu4gMHUUvfA0C8ne9NT+8eQGg6SK1wtdqCT6ftpDyH27Y6kBZ7uM8Q2PR5ljM 7SihbBJ2Vb7TfW4xpc/vbIjVhYvMys8= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bZd/jWjDorDrpLnBdh6JKxIin6tcG6TfVfGReEiHSKV9rIB12xtNFy+RXD6UZitFTkVsul72R2Z5Pk/rxfgwWr9r58tv8+SKJrSEVL6dht4ySwefRieIP+xw5UwBVAUbOa6WDdAY+KKPs3V/oxryHOTFNY2tNxy0fd4EOcGCyR00vDdnIUW8lCKVdsNMp4ORuvr3lwngI/vuWT4YIYASSjrBqFn6X/7pK6aqpQ1M5Gi/70XIOCs+iSxeBVV7VwZJta1LbwXJu0Mp95NxiPsGyL/PgrvjkMyDL1JSGIf2/RupGgKQybulzjFxvEc5cYMdojmuXacbPckXvfbfP5fRxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FZkhBw3sxi8I3UCl/z/e/2WFGv+lgV25p4wF9bJi1T0=; b=GXbjyIwTjlq0BbxO/Ratyi0hFgdXRVN4qSRcwHpmr/gXRV4u/71ef0/PDj73gDZrCznqf08Dq9eUF2yBlUSC2zNV+VSwZKfu4cpYKiDYb6uin/gM6p/XUg0yDpAx5vvoLNN0GycbXT31KMopgc4wV99B8mt4VOpcDU5Yr/AM9QrDp6K4RmR8ogt3xnSyLUh/IZ549O9o4w/14V/793fcGqYBrkIwMCORLdMYn1WibVpORZgvG+CSnEIwyWXp94Ts2AsiQ/5+2JWhFzEhygi4n/6JVseEgHlCNI30Bl7k7/d4Sr/x2UwceEbTWRt5ALxX/5UxvaRYU9SYSUuMSwJgmw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FZkhBw3sxi8I3UCl/z/e/2WFGv+lgV25p4wF9bJi1T0=; b=SixPSUxfPmOApGwqC7n05HkJvW0mkfJSZSYwJUJuo8RPkrxj4skVqLegGnvbITNK2SJMVPy3kX/6VyjjLmS+U3OqzjaN2/rdIeGKsfyKsPDMLVtovOVs+6jc6CfkMT+ew7O/86kqZTMfXpAv8DTrnkL+FDjEvNGucJtfcZ1Ih84= Received: from BN8PR03CA0019.namprd03.prod.outlook.com (2603:10b6:408:94::32) by IA1PR12MB6436.namprd12.prod.outlook.com (2603:10b6:208:3ac::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6064.36; Wed, 8 Feb 2023 07:36:47 +0000 Received: from BN8NAM11FT010.eop-nam11.prod.protection.outlook.com (2603:10b6:408:94:cafe::9b) by BN8PR03CA0019.outlook.office365.com (2603:10b6:408:94::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT010.mail.protection.outlook.com (10.13.177.53) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:47 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 8 Feb 2023 01:36:42 -0600 From: Bharata B Rao To: , CC: , , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH 4/5] x86/ibs: Adjust access faults sampling period Date: Wed, 8 Feb 2023 13:05:32 +0530 Message-ID: <20230208073533.715-5-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230208073533.715-1-bharata@amd.com> References: <20230208073533.715-1-bharata@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT010:EE_|IA1PR12MB6436:EE_ X-MS-Office365-Filtering-Correlation-Id: f197b081-34c5-4bd7-5c6c-08db09a73bee X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 5ZY0fx/a8fXolnZRTIy4EFetIVVHpE8cOyBznCmmF25UGJHGGieYsOWYR+olbrwdOHS3HQ4YHlo81TU71tdH4gr650yD455tqTSOj1T0SOfJRt2DD7VEUpGm/Vy/vhAc49VEkBCP4DWWnAR8/mzjAbWhqZ912aeIvaYfn05HRpQyqe6LYI01UXgS6VGzVqthvT8jR6vJ1r9jBF1M7uUySCF4OZ8Ka59ZKAG+lSiZ4LZtLAsMKq5FS4hlu0ZJTZZ/jy6Dnyc6ZNnnz9hz5GMkoh+DCygTJc1etRIyIYgWUn9QZ0hli2ho6QAnHhfkjC7duMlNXz0H7GdN/PRmL6CliSw9v4rL/pUsQaTgU5FVyIY5lLV+7cMc0hNN477ySNG8kyJIwVglaSakk8UDh/XwQMbffNtaKb+2D7EWE0dxtfmdWFpgchcS4cEblG2yUrVxQkbNfHbB9RF/HlxM1Oy3lXXNW9FqoLpLU1mzuwRGRBJRky0d9toI0cZoj/rbV/T0HB86+/O4S0U7hVNV4BvE7xtwfGn33V/mcvaSfiDo15jPug2v3ZODPJLvCikSv2U/CdCPDS70iGhAQ5283OsAjd5MTQGPqJGZYQLnvcPavDwW21OJ5qhTdF6xH5O8Cm1S8XPi2kgjKiy3BSzvFHqJoRKn/bDQW4Va8ChB+0X7c3eN8FDuxM7dISfEcP7fNvEEZ7LXz6CiWucP3NWwlnDeqjRvOhzAqsqYCiu44y6/k8Dwbtfi+MTufl9OH6aPDBAk X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230025)(4636009)(346002)(396003)(39860400002)(136003)(376002)(451199018)(46966006)(40470700004)(36840700001)(54906003)(186003)(26005)(82310400005)(86362001)(110136005)(40480700001)(426003)(47076005)(36756003)(7696005)(8676002)(40460700003)(16526019)(2906002)(5660300002)(70586007)(7416002)(36860700001)(8936002)(70206006)(41300700001)(4326008)(478600001)(336012)(2616005)(6666004)(1076003)(82740400003)(356005)(316002)(83380400001)(81166007)(36900700001)(2101003);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2023 07:36:47.2253 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f197b081-34c5-4bd7-5c6c-08db09a73bee X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT010.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB6436 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 816C12000A X-Rspam-User: X-Stat-Signature: 8asne7abejms1uzfacq9oeidn4jyszqm X-HE-Tag: 1675841811-637373 X-HE-Meta: U2FsdGVkX1+twVLVM1TisRkq/2wqjAZFFzCyAYLPxRVJSPi5ikJG2EaFWyZbmu9C0TU+VjuxOPJDwQ+Yt9q5Xzp0OOEj9lc6Bu+QU7n2MpjfbAyXHEwNvY/IR7yobpC5uaAbSEnLwidBKDc4eUZbKvD3OXM/1fS4E/FN9wPz9qGnnlyikpHP659pTVpB/ZoUpk/G85GW3KAYRM7dtGHSFFv6+xcX798jMMCIIKsOiaVQ+p4uP0JD1b7PwhcfDsJ6Bcm5emxzYo1oaD6k1TtXsCukwEUump6bKs//bCIc81ADgzTGiQO7XX2coci9fDhllG4ITgxKsD9TON0s0g0p/5ZXusHB8fWVYXRm6ATwYPfMeQMxEnLTramy9UMi2nlieaU6npouT8oDA3y3npUojpzIRpCAOElWDt+SiTsSjZ4a7koiKhqQVXaT1/mpmUayYA4tLUg7EtxIgZEIP482ibKo2a9vYN0cgw9Po713nxcQ8uHILXAdxTUUiqP6LxqVVGO7VlDAI801i2h8WEoSLKGXKgerEoAMkRyx3YGR3GAwduHRME6AW0r3gcmAYmM6dQ8CGj9yFatidrdUbAznOKBiQxLTmpmdJTQjzQneY/61gQsAc5AEi6IuJVplSf0Gnc6QWee6cbwg8PplIYUuSbyamRZ/sgGuObAi6kiKSNr3/LgW+20uk3ea+H9dae1Uvopo6PVeYiWhHtC32Mdp1GczFIO1BGRHfgyMoKMD90OXdrWkK8ZpFUCoZmEx6UYRTGA8Hdml/9b5V+PvWQwiqW8IPLdIuJhibPcq0yepl4/uOjWm8ROQ/vrvRHiYMhSSR9194CfG2LJ9QVNvkt6ylMWsGF5GDEaPn9/Lk8mxqj2W6oW9s8RJ1wW9YyP03PNqcU+1H9yxSdkysbRazrCmsilbYgyR58XIIX20o59f06y+549eHtTL1evoBjWHG40TzPCzsnOE5eL9bOnElTH qN97jQ9g KWcoH9WyY+qFio1uyr9xkI28UDGwylYydUo30fE0G4AHCxxajO/s3SxZGl3sfURTMKywM/nKlTlHDEkbfYAxuM6wy5NuNeOR7s5+BN1/aog5NxSKeCOk0VnlTFHgl5igQnO2u2qOHaPXZ5g+byhKdbBlzxjtJ6VBe+aGyQBFvvpaHkazlrEcADWC7SJIFP84tN4UfGBbFoT6JN3nFx8NZyEDajseaRrfEnbSKU5g1Hp8HWLO+V7JyJoO95BXeq3tJK3myewwbEuyUB07TkYPPhTeN1+1h6n9aL6fMDuvD3ZM8E6ZWN/mxjw5yECUYie/37QNgM3q/BkJIk6YZMhDpzsAScmtzE0i+pn8mv/Ln9WacOvvqOoepMNyE1zg+xmhmzUpB/WYtz55wYJt4upj8Cg+K0iB4A9ozoG7dY4WvEglrSgIjwUQxDJvmyg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Adjust the access faults sampling period of a thread to be within the fixed mininum and maximum value. The adjustment logic uses the private/shared and local/remote access faults stats. The algorithm is same as the logic followed to adjust the scan period. Unlike hinting faults, the min and max sampling period aren't adjusted (yet) for access based sampling. Signed-off-by: Bharata B Rao --- include/linux/sched.h | 2 + kernel/sched/debug.c | 8 +++ kernel/sched/fair.c | 130 +++++++++++++++++++++++++++++++++++++----- kernel/sched/sched.h | 4 ++ 4 files changed, 130 insertions(+), 14 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 66c532418d38..101c6377abbc 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1257,6 +1257,8 @@ struct task_struct { unsigned int numa_sample_period; int numa_preferred_nid; unsigned long numa_migrate_retry; + unsigned int numa_access_faults; + unsigned int numa_access_faults_window; /* Migration stamp: */ u64 node_stamp; u64 last_task_numa_placement; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1637b65ba07a..1cf19778a232 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -334,6 +334,14 @@ static __init int sched_init_debug(void) debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balancing_scan_period_max); debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size); debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing_hot_threshold); + debugfs_create_u32("sample_period_def", 0644, numa, + &sysctl_numa_balancing_sample_period_def); + debugfs_create_u32("sample_period_min", 0644, numa, + &sysctl_numa_balancing_sample_period_min); + debugfs_create_u32("sample_period_max", 0644, numa, + &sysctl_numa_balancing_sample_period_max); + debugfs_create_u32("access_faults_threshold", 0644, numa, + &sysctl_numa_balancing_access_faults_threshold); #endif debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3f617c799821..1b0665b034d0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1093,6 +1093,11 @@ adjust_numa_imbalance(int imbalance, int dst_running, int imb_numa_nr) #endif /* CONFIG_NUMA */ #ifdef CONFIG_NUMA_BALANCING +unsigned int sysctl_numa_balancing_sample_period_def = 10000; +unsigned int sysctl_numa_balancing_sample_period_min = 5000; +unsigned int sysctl_numa_balancing_sample_period_max = 20000; +unsigned int sysctl_numa_balancing_access_faults_threshold = 250; + /* * Approximate time to scan a full NUMA task in ms. The task scan period is * calculated based on the tasks virtual memory size and @@ -1572,6 +1577,7 @@ bool should_numa_migrate_memory(struct task_struct *p, struct page * page, struct numa_group *ng = deref_curr_numa_group(p); int dst_nid = cpu_to_node(dst_cpu); int last_cpupid, this_cpupid; + bool early = false; /* * The pages in slow memory node should be migrated according @@ -1611,13 +1617,21 @@ bool should_numa_migrate_memory(struct task_struct *p, struct page * page, !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid)) return false; + if (static_branch_unlikely(&hw_access_hints)) { + if (p->numa_access_faults < sysctl_numa_balancing_access_faults_threshold * 4) + early = true; + } else { + if (p->numa_scan_seq <= 4) + early = true; + } + /* * Allow first faults or private faults to migrate immediately early in * the lifetime of a task. The magic number 4 is based on waiting for * two full passes of the "multi-stage node selection" test that is * executed below. */ - if ((p->numa_preferred_nid == NUMA_NO_NODE || p->numa_scan_seq <= 4) && + if ((p->numa_preferred_nid == NUMA_NO_NODE || early) && (cpupid_pid_unset(last_cpupid) || cpupid_match_pid(p, last_cpupid))) return true; @@ -2305,7 +2319,11 @@ static void numa_migrate_preferred(struct task_struct *p) return; /* Periodically retry migrating the task to the preferred node */ - interval = min(interval, msecs_to_jiffies(p->numa_scan_period) / 16); + if (static_branch_unlikely(&hw_access_hints)) + interval = min(interval, msecs_to_jiffies(p->numa_sample_period) / 16); + else + interval = min(interval, msecs_to_jiffies(p->numa_scan_period) / 16); + p->numa_migrate_retry = jiffies + interval; /* Success if task is already running on preferred CPU */ @@ -2430,6 +2448,77 @@ static void update_task_scan_period(struct task_struct *p, memset(p->numa_faults_locality, 0, sizeof(p->numa_faults_locality)); } +static void update_task_sample_period(struct task_struct *p, + unsigned long shared, unsigned long private) +{ + unsigned int period_slot; + int lr_ratio, ps_ratio; + int diff; + + unsigned long remote = p->numa_faults_locality[0]; + unsigned long local = p->numa_faults_locality[1]; + + /* + * If there were no access faults then either the task is + * completely idle or all activity is in areas that are not of interest + * to automatic numa balancing. Related to that, if there were failed + * migration then it implies we are migrating too quickly or the local + * node is overloaded. In either case, increase the sampling rate. + */ + if (local + shared == 0 || p->numa_faults_locality[2]) { + p->numa_sample_period = min(sysctl_numa_balancing_sample_period_max, + p->numa_sample_period << 1); + return; + } + + /* + * Prepare to scale scan period relative to the current period. + * == NUMA_PERIOD_THRESHOLD sample period stays the same + * < NUMA_PERIOD_THRESHOLD sample period decreases + * >= NUMA_PERIOD_THRESHOLD sample period increases + */ + period_slot = DIV_ROUND_UP(p->numa_sample_period, NUMA_PERIOD_SLOTS); + lr_ratio = (local * NUMA_PERIOD_SLOTS) / (local + remote); + ps_ratio = (private * NUMA_PERIOD_SLOTS) / (private + shared); + + if (ps_ratio >= NUMA_PERIOD_THRESHOLD) { + /* + * Most memory accesses are local. There is no need to + * do fast access sampling, since memory is already local. + */ + int slot = ps_ratio - NUMA_PERIOD_THRESHOLD; + + if (!slot) + slot = 1; + diff = slot * period_slot; + } else if (lr_ratio >= NUMA_PERIOD_THRESHOLD) { + /* + * Most memory accesses are shared with other tasks. + * There is no point in continuing fast access sampling, + * since other tasks may just move the memory elsewhere. + */ + int slot = lr_ratio - NUMA_PERIOD_THRESHOLD; + + if (!slot) + slot = 1; + diff = slot * period_slot; + } else { + /* + * Private memory faults exceed (SLOTS-THRESHOLD)/SLOTS, + * yet they are not on the local NUMA node. Speed up + * access sampling to get the memory moved over. + */ + int ratio = max(lr_ratio, ps_ratio); + + diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot; + } + + p->numa_sample_period = clamp(p->numa_sample_period + diff, + sysctl_numa_balancing_sample_period_min, + sysctl_numa_balancing_sample_period_max); + memset(p->numa_faults_locality, 0, sizeof(p->numa_faults_locality)); +} + /* * Get the fraction of time the task has been running since the last * NUMA placement cycle. The scheduler keeps similar statistics, but @@ -2560,16 +2649,24 @@ static void task_numa_placement(struct task_struct *p) spinlock_t *group_lock = NULL; struct numa_group *ng; - /* - * The p->mm->numa_scan_seq field gets updated without - * exclusive access. Use READ_ONCE() here to ensure - * that the field is read in a single access: - */ - seq = READ_ONCE(p->mm->numa_scan_seq); - if (p->numa_scan_seq == seq) - return; - p->numa_scan_seq = seq; - p->numa_scan_period_max = task_scan_max(p); + if (static_branch_unlikely(&hw_access_hints)) { + p->numa_access_faults_window++; + p->numa_access_faults++; + if (p->numa_access_faults_window < sysctl_numa_balancing_access_faults_threshold) + return; + p->numa_access_faults_window = 0; + } else { + /* + * The p->mm->numa_scan_seq field gets updated without + * exclusive access. Use READ_ONCE() here to ensure + * that the field is read in a single access: + */ + seq = READ_ONCE(p->mm->numa_scan_seq); + if (p->numa_scan_seq == seq) + return; + p->numa_scan_seq = seq; + p->numa_scan_period_max = task_scan_max(p); + } total_faults = p->numa_faults_locality[0] + p->numa_faults_locality[1]; @@ -2672,7 +2769,10 @@ static void task_numa_placement(struct task_struct *p) sched_setnuma(p, max_nid); } - update_task_scan_period(p, fault_types[0], fault_types[1]); + if (static_branch_unlikely(&hw_access_hints)) + update_task_sample_period(p, fault_types[0], fault_types[1]); + else + update_task_scan_period(p, fault_types[0], fault_types[1]); } static inline int get_numa_group(struct numa_group *grp) @@ -3094,7 +3194,9 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) p->node_stamp = 0; p->numa_scan_seq = mm ? mm->numa_scan_seq : 0; p->numa_scan_period = sysctl_numa_balancing_scan_delay; - p->numa_sample_period = 0; + p->numa_sample_period = sysctl_numa_balancing_sample_period_def; + p->numa_access_faults = 0; + p->numa_access_faults_window = 0; p->numa_migrate_retry = 0; /* Protect against double add, see task_tick_numa and task_numa_work */ p->numa_work.next = &p->numa_work; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 953d16c802d6..0367dc727cc4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2473,6 +2473,10 @@ extern unsigned int sysctl_numa_balancing_scan_period_min; extern unsigned int sysctl_numa_balancing_scan_period_max; extern unsigned int sysctl_numa_balancing_scan_size; extern unsigned int sysctl_numa_balancing_hot_threshold; +extern unsigned int sysctl_numa_balancing_sample_period_def; +extern unsigned int sysctl_numa_balancing_sample_period_min; +extern unsigned int sysctl_numa_balancing_sample_period_max; +extern unsigned int sysctl_numa_balancing_access_faults_threshold; #endif #ifdef CONFIG_SCHED_HRTICK From patchwork Wed Feb 8 07:35:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bharata B Rao X-Patchwork-Id: 13132539 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D6A1C636D3 for ; Wed, 8 Feb 2023 07:36:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E50B96B007E; Wed, 8 Feb 2023 02:36:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E00206B0080; Wed, 8 Feb 2023 02:36:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7ADA6B0081; Wed, 8 Feb 2023 02:36:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B544A6B007E for ; Wed, 8 Feb 2023 02:36:56 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 824DF140D15 for ; Wed, 8 Feb 2023 07:36:56 +0000 (UTC) X-FDA: 80443318032.07.581BF0D Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2062.outbound.protection.outlook.com [40.107.92.62]) by imf28.hostedemail.com (Postfix) with ESMTP id 94CF4C000C for ; Wed, 8 Feb 2023 07:36:53 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=qPvQNPFi; spf=pass (imf28.hostedemail.com: domain of bharata@amd.com designates 40.107.92.62 as permitted sender) smtp.mailfrom=bharata@amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675841813; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IVShu4+QzQHsTFEmQ4HZTu3XYpXXfIe/NUivxg9UnAs=; b=hgIMkt67FuRrtNT69t7OwcHfP7HzSgq5A15UbSkUd2J3dwOsCE4PUAN01SS7rJgKS9/NYO w/MckGL0bjcxptY7CaqhXfbsD39CRsPed848tIx+2fC29lj0rO+CS8u/A7PGi4h2Q3JZdy M9Jh3XZo4IOMqWLAik98xh0X/svDs7g= ARC-Authentication-Results: i=2; imf28.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=qPvQNPFi; spf=pass (imf28.hostedemail.com: domain of bharata@amd.com designates 40.107.92.62 as permitted sender) smtp.mailfrom=bharata@amd.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1675841813; a=rsa-sha256; cv=pass; b=WvTYfIvzak+P/2sN4LBVWR4NsvyHJb01fAhLl9wuaH5iQTAhJPqQ3kgkPIP7L4frdPtxZ9 NcR17tNTxdezFgX9S1kx+xt9oWCQNIK5s5wWXCul3t2H9wbQbS5SawS9T2urW8OLrVWhVG u14jALZvHX0ABOj5iw+K39FWA4eC7SE= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fX+iaam+5szrGbaySOxlfuSb7niha0Wg+a+b4Xqg+HbVgPavlaFrvFlAcnS3BxfvkPy4Wyh2nzPhgFR54wYFNSOV1tkV4tzbNC1ab4lMhAO9gEG1bfQ3Yoy76mWt4UM8W1ixdcxy/6hx4qDFP9M6a/7/y+jc62TG4x9U1gI1Pb4DtBU03H+Bxo/VS2pAc2FqAdITfvJ6DGSnu2g9MOM57ZB279pi9iG1kA8v1Zqvf1X2YEoFWj2G3WHGhEDh4KijijHdieco0LbIyLLD+Rw4zCYRn41qJlPvxs3pW0ZKQ2PZdBtvutQtl2LqTM4yJZU7wj4uthQJRGCH9mgUy/oPJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IVShu4+QzQHsTFEmQ4HZTu3XYpXXfIe/NUivxg9UnAs=; b=APentNT8N3Ykv7fIPXf+KWpZ8hD+LQfcsSDOyT535gDotgHZ7X6/2FDMEwvKH5fFLsw/SF640iO8XjM1AopT1Jr4Uwxm1Ra8zUMBpY/MPBpFygul+NmKPa81RPft3L3/cckcrl13GTliRa3hjX43Eeh/GG2jiZzQcLwLbetpvigF/Ykmr4leKZyYH48GIQtJv0Bj2Du+0XCnaPpCd13Ai196zuqApLDkL+K3o6arLief5U9Ii0BXKyfFjXIZD8D1ybnqRiljT5Am39hlXUgheDiPkjNR2oBUXeSqhvyLGniLJHLXa7k1rDw+xafCQ5J4UK8oYPSvAcaDoBov8n3FHQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IVShu4+QzQHsTFEmQ4HZTu3XYpXXfIe/NUivxg9UnAs=; b=qPvQNPFiYKAMqgQNyLOGu4M9PWS6NyZXYyiU7VWz2J5rKhDSnnyIRyQuY0eTx2InKPM5f2RWlOkjRPs4zm/sndZ5rkjPN+UqUYCmEd7i8xtlUfKPE0u0YGVSOjMemr2qtuEZpUUL3FF+OXFqBP7+3USckDjQ6CvazdqBRqyj7rk= Received: from BN0PR03CA0023.namprd03.prod.outlook.com (2603:10b6:408:e6::28) by IA0PR12MB8328.namprd12.prod.outlook.com (2603:10b6:208:3dd::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17; Wed, 8 Feb 2023 07:36:51 +0000 Received: from BN8NAM11FT059.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e6:cafe::c0) by BN0PR03CA0023.outlook.office365.com (2603:10b6:408:e6::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT059.mail.protection.outlook.com (10.13.177.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6086.17 via Frontend Transport; Wed, 8 Feb 2023 07:36:51 +0000 Received: from BLR-5CG1133937.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 8 Feb 2023 01:36:46 -0600 From: Bharata B Rao To: , CC: , , , , , , , , , , , Bharata B Rao Subject: [RFC PATCH 5/5] x86/ibs: Delay the collection of HW-provided access info Date: Wed, 8 Feb 2023 13:05:33 +0530 Message-ID: <20230208073533.715-6-bharata@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230208073533.715-1-bharata@amd.com> References: <20230208073533.715-1-bharata@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT059:EE_|IA0PR12MB8328:EE_ X-MS-Office365-Filtering-Correlation-Id: f743052a-3f22-48a8-74c3-08db09a73e62 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9iibJvhHoow1en+liIen4YbhYWJUdjq/E2YIyw3XoGlVLHubOMYyjwcLOGee0oZSjlBXe5M2+GBy8eWVmP21ReKnTmy7NUfRYSO5fwTEa0Sqtzzvgcd613A9XNTUr7cbn2JSDqOR9UJyErTeCOBE4732KTeLUQBpwxBxAZIlhKc+rOFSRLJ7EHrt0TLqWdE7hmUOtANit6fuXCgW154H3+PIO5cWp9t4+2ArGyiGC5rfRi5MWXjoxfba4RDTwohbqGngL7/ulRdhUkON7i1HSAvzstZpJ/FMYoN5/L9C8lx75XdpXLQGsSV5h4HAhNLZhgWaTxLS957MCcSA1prIjjGuOiQRW6aK+LHWty1j8LaOe2RoQSp4fZk4Glh8CnZ1+pDov3cHnFjRxI5/aO23FFJveKWffRXQMZRbyGCWCusXZa8G9xw04h+h0KTRDq7j3ukeVm4lraVvfbCZusRa9HGmlTdM2joTOasXXVGr5Kygxg/H5YancHQgP/nbShYUDnd6QESpgntuhpK9PA9SHePharJvB8WFQBl9RpZFI5fymyPxHiK+VRen2rduaZ/07G8HzNHEJXI2BfDP0aZiF0NGNF1gediDAr4Gpoe+HfPY8Qqw4tgoCwvdNQUzRtZhiJIQ16QlmVO5qWJk7JVPCkDWIrGg4yItBYfuOuG5NTXBZfSgfokdwGVSMPWkEWJfufkt77J+rDLE5Kn22y3s9Fb6Z1DSDwMnb37sFwpJNM+wowfkXqoWT9VWJIP9S+h9 X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230025)(4636009)(346002)(136003)(39860400002)(396003)(376002)(451199018)(40470700004)(36840700001)(46966006)(5660300002)(16526019)(86362001)(7416002)(81166007)(356005)(26005)(186003)(8936002)(6666004)(2906002)(82740400003)(8676002)(82310400005)(70206006)(70586007)(4326008)(41300700001)(36860700001)(316002)(40480700001)(426003)(2616005)(1076003)(40460700003)(478600001)(47076005)(7696005)(54906003)(110136005)(36756003)(336012)(83380400001)(36900700001)(2101003);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Feb 2023 07:36:51.3416 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f743052a-3f22-48a8-74c3-08db09a73e62 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT059.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB8328 X-Stat-Signature: mohmybd8kgzfwhgyses8whp1fks3kjbk X-Rspam-User: X-Rspamd-Queue-Id: 94CF4C000C X-Rspamd-Server: rspam06 X-HE-Tag: 1675841813-457443 X-HE-Meta: U2FsdGVkX18T+zplj2GuX1NbChX90IlLzGm5m2XV5/u+znkQjidi9yNCWiwxq/dZOZzbO5HE6e1BdUD12gTbC/fc04XcvrnfHTt2MiUKys5yOfySWJNqpsb3lyEnk06oUB9TgQ3Nkk5vLmQNo1Q04vB8bvczhnibxelXEmFSE7IAJ3GoC0J94Q9ksqCqlcq0HfX65WMvjU8A1Tn95ehUnW8N5QAx8f3CH49MDfEVcWDoicPVbooff/3nKtfnmJF44BBZamGGfhVYE8K9XsH9FOxLdpVuDnHV1T0APnXw+71ioztrVZ+PSfandoaTj4q+hVn0bDjjhSApHVdB5TmJj69MNh8eGKLneXALoyKAXNv8JJGQKCFj2cnHH8KFvl3u+MwIXLSZ+CpjIdzWEUVLwPjOFgeivPr2JE4diqhyBkspcUjw8MevGPJ1Xmk99/muXqgN7M+x21AUbSv4nZ3Gf/aCMrojnZzhH/TgZ0wwF+TkwYVE5vwkZREmmzfpFNLv4v79WQGgUQKgu3UDqoNZs4BUwBy/Arl/IUK1hdYvL5ytlJZMq3ov944/B1t/A/vj53rSZUL2ve9gMvSaDiIbFqOdA37nfFrPt33zgXfK18j29mlPP5pIOO1DMfF0bbv/U+UyEsBIKCDZzCp/Wa1EvdKhjAOGz/a/jj1n/6N54yzB2hOy+f6v1u64KObmXr57Vr9kC8Gx/XE9q4HXupiYwZ5+/aQVsQB6EYEaoix8thEDQd3ElFBQiURPS7cnJnPfwbuVB+rRleO1QLGD7S2+SGJAo9l7JU132DbX4oQbpd3Xha9d/YY4qNaZeGYh2/eFN5duidwWkiXaP8M2CYIHfPiw9N0h1A/fa46qAPxEk+rkW/TBzFV2P8140iBUEMPS0yIoQcDkRIQ7/CQVvM2UEc07Od7DOIptr32NzXEU40gX+q8PJc7NAWZ73t1qWxvC+j5HDvMofj8+zDFtqvQ SgVzHeeN X+aFF8y2e5LSzveRfoSVo5U+teyE86jlpsMmjnXU9eHacBEwGyrV9v8Wh9+/vpZX6fTpi7I4mArmA3CqSSJe6V6O6pz15U2ATtRAR49kTgr59US+81srSsvGq0wWdHcc3VcZoQ/QRRA0V65rFnPlrlw3Xt1I7enRkqlSGKBkvY7VHrMVUMQaNiaf+L2Tyjw40xooQazJv6qVpVyP7s26dDpUBlGTZVDrp1TBnWbzg0VXCVqKwBH3PqWr9dMJrN+/ioI8dBaaz+o4sfgNUuuSPMuxIKa5lwwymS4nDrymhlVRWv9K390HJI2y+4r5YbsugWO3u41lDtR7ALR1AhfXcS7CHKQPuyQvmx5vV0V4HOhvNQM8sOyfRbxIY8/lF9jRyQyUE8Bp/jypzo2w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow an initial delay before enabling the collection of IBS provided access info. Signed-off-by: Bharata B Rao --- arch/x86/mm/ibs.c | 18 ++++++++++++++++++ include/linux/mm.h | 2 ++ include/linux/mm_types.h | 3 +++ kernel/sched/debug.c | 2 ++ kernel/sched/fair.c | 3 +++ 5 files changed, 28 insertions(+) diff --git a/arch/x86/mm/ibs.c b/arch/x86/mm/ibs.c index a479029e9262..dfe5246954c0 100644 --- a/arch/x86/mm/ibs.c +++ b/arch/x86/mm/ibs.c @@ -16,6 +16,21 @@ struct ibs_access_work { u64 laddr, paddr; }; +static bool delay_hw_access_profiling(struct mm_struct *mm) +{ + unsigned long delay, now = jiffies; + + if (!mm->numa_hw_access_delay) + mm->numa_hw_access_delay = now + + msecs_to_jiffies(sysctl_numa_balancing_access_faults_delay); + + delay = mm->numa_hw_access_delay; + if (time_before(now, delay)) + return true; + + return false; +} + void hw_access_sched_in(struct task_struct *prev, struct task_struct *curr) { u64 config = 0; @@ -28,6 +43,9 @@ void hw_access_sched_in(struct task_struct *prev, struct task_struct *curr) if (!curr->mm) goto out; + if (delay_hw_access_profiling(curr->mm)) + goto out; + if (curr->numa_sample_period) period = curr->numa_sample_period; else diff --git a/include/linux/mm.h b/include/linux/mm.h index 8f857163ac89..118705a296ef 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1397,6 +1397,8 @@ static inline int folio_nid(const struct folio *folio) } #ifdef CONFIG_NUMA_BALANCING +extern unsigned int sysctl_numa_balancing_access_faults_delay; + /* page access time bits needs to hold at least 4 seconds */ #define PAGE_ACCESS_TIME_MIN_BITS 12 #if LAST_CPUPID_SHIFT < PAGE_ACCESS_TIME_MIN_BITS diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 9757067c3053..8a2fb8bf2d62 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -750,6 +750,9 @@ struct mm_struct { /* numa_scan_seq prevents two threads remapping PTEs. */ int numa_scan_seq; + + /* HW-provided access info is collected after this initial delay */ + unsigned long numa_hw_access_delay; #endif /* * An operation with batched TLB flushing is going on. Anything diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1cf19778a232..5c76a7594358 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -342,6 +342,8 @@ static __init int sched_init_debug(void) &sysctl_numa_balancing_sample_period_max); debugfs_create_u32("access_faults_threshold", 0644, numa, &sysctl_numa_balancing_access_faults_threshold); + debugfs_create_u32("access_faults_delay", 0644, numa, + &sysctl_numa_balancing_access_faults_delay); #endif debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1b0665b034d0..2e2b1e706a24 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1097,6 +1097,7 @@ unsigned int sysctl_numa_balancing_sample_period_def = 10000; unsigned int sysctl_numa_balancing_sample_period_min = 5000; unsigned int sysctl_numa_balancing_sample_period_max = 20000; unsigned int sysctl_numa_balancing_access_faults_threshold = 250; +unsigned int sysctl_numa_balancing_access_faults_delay = 1000; /* * Approximate time to scan a full NUMA task in ms. The task scan period is @@ -3189,6 +3190,8 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) if (mm_users == 1) { mm->numa_next_scan = jiffies + msecs_to_jiffies(sysctl_numa_balancing_scan_delay); mm->numa_scan_seq = 0; + mm->numa_hw_access_delay = jiffies + + msecs_to_jiffies(sysctl_numa_balancing_access_faults_delay); } } p->node_stamp = 0;