From patchwork Tue Dec 10 09:28:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13901075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DD71E77180 for ; Tue, 10 Dec 2024 09:29:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B6686B0153; Tue, 10 Dec 2024 04:29:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0669B6B0155; Tue, 10 Dec 2024 04:29:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD3A76B0156; Tue, 10 Dec 2024 04:29:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BCC866B0153 for ; Tue, 10 Dec 2024 04:29:14 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3D33B40209 for ; Tue, 10 Dec 2024 09:29:14 +0000 (UTC) X-FDA: 82878525156.26.079F4DA Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf14.hostedemail.com (Postfix) with ESMTP id 61976100004 for ; Tue, 10 Dec 2024 09:28:48 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iBxm4684; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733822942; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1vqDO82M+XCnuau+JNGqK2Zehr2Sd7WYRXd6APN34sY=; b=6KMB8bs2PpTZUKetsXESDyd2QKQRWqVqsnj+RM57if7B486NPhvJg06Yxnpmtm0cTX/u3i ISXXkuV6ht9xkNNW2ShdQ4vS/iKx0FvQexKBfd2NPJDWj+vn68zRUPPgrwS5RbcJgzhtV0 LZV+QqY8GMefAPH+siQmVDK8isLzjW8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733822942; a=rsa-sha256; cv=none; b=XTYMIUx0tpk25/EbzzXxlek1p3+kYcEenxOeL7IkpMfxhaNc+ba5YcBrvmp/S60epTO+P5 4E2nc7IulM74mgtgPLuMM9aOTMc5uWJ7z8/IbGv+9okvk5QmkqgpqDz15j4WcwKK+461W1 2Y+TBFoge/jJ4wUdM+AOmHLXG4mmcck= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iBxm4684; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-7fd35b301bdso1995901a12.2 for ; Tue, 10 Dec 2024 01:29:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733822951; x=1734427751; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=1vqDO82M+XCnuau+JNGqK2Zehr2Sd7WYRXd6APN34sY=; b=iBxm4684G8DSyrBnp1Swa3pJNq/Jdxaek1dz5GcMb4sZZosYVwDLkJxf1tC00ivFJC DSyUds5NQQgJxumiXdxzIy0o+KZd+27hmVtFsjtPFR+FdLSuY6VPiOAY02MXS6fyrFCW G9r/WrAR876lmoXhLa4SV4NYHb4GWBjwSbsceE47pvotlSnxIM6Ec3r+2mtrMB8Qibmh Z19a39nCaNnpPFHkj1rf+66YFHySkU2f29WHmRAEOTNOJfpcPrG8XypeFH/4L18Fa9g8 s4oUQX69Lu0BRXFfXT5wHJ3ZPtgu5Bj8kdEu/tQ3Stjon1Le4X6cGxWC65O/fXdZlLBJ azxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733822951; x=1734427751; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=1vqDO82M+XCnuau+JNGqK2Zehr2Sd7WYRXd6APN34sY=; b=IntHlEPCzyk2s+poQViDiSz0z0VP/3Ofy4vd60eDDRhvX+uapSnPTIJCFpEsHUYQdC MyeLTKMRZDu2jEOzaazTKOhiRPCfsjDrorQ2M+VMEEvTlQ5N/6gOTJX1YYsLdqhW84gN wbor0r/1zCV5M5yBYmH8W2Kxiu88nACRmTwPi6tUZ/bE4wsQ3hmnpT330rd5+Ud6kdZx oaXnBEYiYIcIxSALA96jC4f011obHoQpN7/UDsWAJ7DB+o1zfSAZy/+36y6HdczsVhNZ bifuPiefpXSLsO2mKw1nsGSkFMUULN1/UwUUMrRDzmgsEPOKja+bfunKsAGne9XWsIRd +VkQ== X-Gm-Message-State: AOJu0YwkPdt3ivkoIEeWOxA7cZ0rssB6+juQV21OOloPGxOFVNQZSKSA Vwi48fclRjB2u07PDd4cdi5nG2cEPaQsCtB3OOZ72K32n/NzPTkj3octzDNLBjE= X-Gm-Gg: ASbGncsl+JqR2e8jt6ennuYJ4RqwNHFV0l1ns9AwFg1WnitBGzol58IPju6gP0HGkfV O7UGRc5tdz1RDOK2mKLU0YtzF5IUolu0cFHhgEpG88+C0X9nwGS8eMGpn1nTEYNYzHlRyrni0AX hrcWI3BC10voUYbrXIe72mzFtk2uoGaVos4D8ZuJzBIqdwwNtSt3fk1tq9qmUbR+OECUUMpcXvA aeOt4ezGq9K/iOfTCSXL50hxZ/HR9kdmcNTUJ370QRjiGj3x+gXQtv47DUeYhJt8dU1BzfF17Lr Y1M4tppl X-Google-Smtp-Source: AGHT+IHZXjaBzRsPyNk7dOYudpCeCcHTvHtx8g1LtmYlWSNCnk4Uj7a71khFvxql+RO1r55gAZOC2Q== X-Received: by 2002:a17:90b:384f:b0:2ee:b0f1:ba17 with SMTP id 98e67ed59e1d1-2efcf2c624dmr6555372a91.37.1733822950656; Tue, 10 Dec 2024 01:29:10 -0800 (PST) Received: from KASONG-MC4.tencent.com ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7fd1568f26asm8750095a12.9.2024.12.10.01.29.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 10 Dec 2024 01:29:10 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Hugh Dickins , "Huang, Ying" , Yosry Ahmed , Roman Gushchin , Shakeel Butt , Johannes Weiner , Barry Song , Michal Hocko , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 3/3] mm, swap_cgroup: remove global swap cgroup lock Date: Tue, 10 Dec 2024 17:28:05 +0800 Message-ID: <20241210092805.87281-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241210092805.87281-1-ryncsn@gmail.com> References: <20241210092805.87281-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Stat-Signature: m4dapozd6sbnurg48ycbqzf4hipp5f6w X-Rspamd-Queue-Id: 61976100004 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1733822928-599563 X-HE-Meta: U2FsdGVkX1/F5PyWj15Skff8oUKO3uFDrpSEBk/8Qb4tGJ1BdGeCKhURX4uCJN5eNwTEWZQy65pBZIb8A6ZkN6AmemEcfK70B6U6I9HA4qzxKl6dxie5C7Cf43yPKlBrHGnQ4OZAOIzfeFYjsoMc8scIQrH74blR0Hl8HQTrMaKJOHKuGbvB/NS2ofifCh8Uy/lDXlNTqbe6J4s3HBGe/YMwu6xZyA1ut97ubo0Y62ferY7ezu53vGTrvbKeF35pnSBAVFpeDikpO7myDFKiouDG5+hj4gelZbdjVcRQ1NT1dQSqVpVEals4ksetJMHPcZuFo6HwjCxB11i3xy20A4IWrd2/iKEv/jBPQprrOkguqaiarX/kpgG+c3m31gTm9f7Wb2Thrm9BYezLdAH0EPgR6ZTlAQKh18GgTxVFd09McBptDe2075nxjyxGUohSdNn+g5heHdDjLnI2lyk7xatpNrsVs1DyOtXusdvbnPXmCB1xFKyDBtNa336G66ZvedgkjJea5TCrbFbBKA0trlQ5RAjmBUG65yOmgWWpJ8iqPFq3jfDjlwzxvebY7nRQoMK87R0DEfUl8Lm7X40RazSjQ3uGsYDowv7+jHk8ZiinYAhR8wQU0QmgoaVK2R5guBD/Xrnk7OELV7eAKm7YE1yfUbla82b0cPNXoGsMgVMIZoS8EhnLvHFtfsthY7RIAQzUDgM4ZSKbgUMjtt/eCZ0Pepj4B5W6bNo7OK2c1NHkNauNuamtbdFmW5+IYlyviJMYDzzK5mCIBpmoWbqrQhTdiwf2yIUzNGtC3n+tyIlY7cZqNm96TATHfNqgjE3etYh+DnJGnlOtqdpwaG3Ses2eYxr8rKDFiT0BiX5ou2u36OKigUMeXelHSKoeWL0PiS0ImgXLU4D5edKaLt7NTmBXUOSCpZEV6wpa53v299EolF8wQbVQ0MHp+CJXiwH5M4TFfVWm8ZynDvTXh2Q KVcBvwqS z5tnKSyWFyHzZtRNmnbOtIxgzROeNFkxmDjRISowHxuodP23puHa8xiVkIm83ujdscqsbsEpiZYRwWB3vC2XedkReqSaxFhKLh6bt+SQ2BGcnxO+xtDmFMZPbxmGNcd9yGOtTn01QUUcAUgJBi6HQRPLSGNrEpJ4VWLqWjhkw8Sf68C5tz3nwnF7DysZ+d2HqZkWRkFXwrYEjYNfEnc61RviJ4/rG3QtP+iisc4P1Us/S2J8qw/S0+7JGIkLNmtz+5eb3x4WQFZf5217/2f4EAzwbvpMc9tp3A+IXbYG+L8Z0MAqsbOWyncBiTDf/1y3EMw6ou6v2/RwLfkVuQR3yoyUz3LcF7Arb2fqxkTbLZtn3mzAF6BrZ1sjXeb3xrH00lgg7WjWRduMBo/dZKrsoCexlI8RTJUB6twH+ivyXqdRxVARgenCKGA8qfVmtdbX4wCLdphqIlVjybJyx1kbFlp3Evgc1819VQHsNTY/lrNGV6o74n3GZxhPLT0pgijnNUGMswlJUN7hu4011QtNKzpR0lT1rkK/FAej1DSjIJSOUtQ1rQFwhvmzIpAi2avhCCxUV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song commit e9e58a4ec3b1 ("memcg: avoid use cmpxchg in swap cgroup maintainance") replaced the cmpxchg/xchg with a global irq spinlock because some archs doesn't support 2 bytes cmpxchg/xchg. Clearly this won't scale well. And as commented in swap_cgroup.c, this lock is not needed for map synchronization. Emulation of 2 bytes xchg with atomic cmpxchg isn't hard, so implement it to get rid of this lock. Introduced two helpers for doing so and they can be easily dropped if a generic 2 byte xchg is support. Testing using 64G brd and build with build kernel with make -j96 in 1.5G memory cgroup using 4k folios showed below improvement (10 test run): Before this series: Sys time: 10809.46 (stdev 80.831491) Real time: 171.41 (stdev 1.239894) After this commit: Sys time: 9621.26 (stdev 34.620000), -10.42% Real time: 160.00 (stdev 0.497814), -6.57% With 64k folios and 2G memcg: Before this series: Sys time: 8231.99 (stdev 30.030994) Real time: 143.57 (stdev 0.577394) After this commit: Sys time: 7403.47 (stdev 6.270000), -10.06% Real time: 135.18 (stdev 0.605000), -5.84% Sequential swapout of 8G 64k zero folios with madvise (24 test run): Before this series: 5461409.12 us (stdev 183957.827084) After this commit: 5420447.26 us (stdev 196419.240317) Sequential swapin of 8G 4k zero folios (24 test run): Before this series: 19736958.916667 us (stdev 189027.246676) After this commit: 19662182.629630 us (stdev 172717.640614) Performance is better or at least not worse for all tests above. Signed-off-by: Kairui Song Reviewed-by: Roman Gushchin --- mm/swap_cgroup.c | 73 +++++++++++++++++++++++++++++------------------- 1 file changed, 45 insertions(+), 28 deletions(-) diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 1770b076f6b7..a0a8547dc85d 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -7,19 +7,20 @@ static DEFINE_MUTEX(swap_cgroup_mutex); +/* Pack two cgroup id (short) of two entries in one swap_cgroup (atomic_t) */ +#define ID_PER_SC (sizeof(atomic_t) / sizeof(unsigned short)) +#define ID_SHIFT (BITS_PER_TYPE(unsigned short)) +#define ID_MASK (BIT(ID_SHIFT) - 1) struct swap_cgroup { - unsigned short id; + atomic_t ids; }; struct swap_cgroup_ctrl { struct swap_cgroup *map; - spinlock_t lock; }; static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; -#define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup)) - /* * SwapCgroup implements "lookup" and "exchange" operations. * In typical usage, this swap_cgroup is accessed via memcg's charge/uncharge @@ -30,19 +31,32 @@ static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; * SwapCache(and its swp_entry) is under lock. * - When called via swap_free(), there is no user of this entry and no race. * Then, we don't need lock around "exchange". - * - * TODO: we can push these buffers out to HIGHMEM. */ -static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, - struct swap_cgroup_ctrl **ctrlp) +static unsigned short __swap_cgroup_id_lookup(struct swap_cgroup *map, + pgoff_t offset) { - pgoff_t offset = swp_offset(ent); - struct swap_cgroup_ctrl *ctrl; + unsigned int shift = (offset & 1) ? 0 : ID_SHIFT; + unsigned int old_ids = atomic_read(&map[offset / ID_PER_SC].ids); - ctrl = &swap_cgroup_ctrl[swp_type(ent)]; - if (ctrlp) - *ctrlp = ctrl; - return &ctrl->map[offset]; + return (old_ids & (ID_MASK << shift)) >> shift; +} + +static unsigned short __swap_cgroup_id_xchg(struct swap_cgroup *map, + pgoff_t offset, + unsigned short new_id) +{ + unsigned short old_id; + unsigned int shift = (offset & 1) ? 0 : ID_SHIFT; + struct swap_cgroup *sc = &map[offset / ID_PER_SC]; + unsigned int new_ids, old_ids = atomic_read(&sc->ids); + + do { + old_id = (old_ids & (ID_MASK << shift)) >> shift; + new_ids = (old_ids & ~(ID_MASK << shift)); + new_ids |= ((unsigned int)new_id) << shift; + } while (!atomic_try_cmpxchg(&sc->ids, &old_ids, new_ids)); + + return old_id; } /** @@ -58,21 +72,19 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; - unsigned short old; - unsigned long flags; pgoff_t offset = swp_offset(ent); pgoff_t end = offset + nr_ents; + unsigned short old, iter; + struct swap_cgroup *map; - sc = lookup_swap_cgroup(ent, &ctrl); + ctrl = &swap_cgroup_ctrl[swp_type(ent)]; + map = ctrl->map; - spin_lock_irqsave(&ctrl->lock, flags); - old = sc->id; - for (; offset < end; offset++, sc++) { - VM_BUG_ON(sc->id != old); - sc->id = id; - } - spin_unlock_irqrestore(&ctrl->lock, flags); + old = __swap_cgroup_id_lookup(map, offset); + do { + iter = __swap_cgroup_id_xchg(map, offset, id); + VM_BUG_ON(iter != old); + } while (++offset != end); return old; } @@ -85,9 +97,13 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, */ unsigned short lookup_swap_cgroup_id(swp_entry_t ent) { + struct swap_cgroup_ctrl *ctrl; + if (mem_cgroup_disabled()) return 0; - return lookup_swap_cgroup(ent, NULL)->id; + + ctrl = &swap_cgroup_ctrl[swp_type(ent)]; + return __swap_cgroup_id_lookup(ctrl->map, swp_offset(ent)); } int swap_cgroup_swapon(int type, unsigned long max_pages) @@ -98,14 +114,15 @@ int swap_cgroup_swapon(int type, unsigned long max_pages) if (mem_cgroup_disabled()) return 0; - map = vcalloc(max_pages, sizeof(struct swap_cgroup)); + BUILD_BUG_ON(!ID_PER_SC); + map = vcalloc(DIV_ROUND_UP(max_pages, ID_PER_SC), + sizeof(struct swap_cgroup)); if (!map) goto nomem; ctrl = &swap_cgroup_ctrl[type]; mutex_lock(&swap_cgroup_mutex); ctrl->map = map; - spin_lock_init(&ctrl->lock); mutex_unlock(&swap_cgroup_mutex); return 0;