From patchwork Wed Dec 18 11:46:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13913540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79FCDE77187 for ; Wed, 18 Dec 2024 11:47:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDE876B008C; Wed, 18 Dec 2024 06:47:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8D416B0092; Wed, 18 Dec 2024 06:47:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B090F6B0093; Wed, 18 Dec 2024 06:47:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 908076B008C for ; Wed, 18 Dec 2024 06:47:05 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1054EC0D66 for ; Wed, 18 Dec 2024 11:47:05 +0000 (UTC) X-FDA: 82907903316.29.2705A7B Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf02.hostedemail.com (Postfix) with ESMTP id 7C5ED80015 for ; Wed, 18 Dec 2024 11:46:02 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aAJVghdu; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734522390; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+Vzzy564Q7ltilyK4Qu+63SEqQV/iay9myd3eLrJl6Y=; b=CHIPuTiFwcjqtGt56cjeEGarrUAb/eXU9Ko6EheSS7GVXtHCJSv5uMzNpZ4Dkt6RQqf1i/ QEZas4Rn2DJ2m0ehCwjeK6aG5ree2cuTs+W82Wx2uO4f93zzBlIPPKnxHl3orvtswx/4dc EqvB+AwWGiGIkwSVIkhx04me8cEobXg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aAJVghdu; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734522390; a=rsa-sha256; cv=none; b=SL/BEqWB0SBRAc1W0hWh7wI1pwyjS9cfJ9cyD8dh/ESUvK0M794alrNQmj97cDY6wGKJwr 3ww7WsXfuob6yHYZPDwpmkO6B4S139IrUI4FBtvTcn9ycJw1zKUSAw/P+5tG/b0yK56rOK uai1B62Yzz+0RBIwBY2K9T/qi5GrJYU= Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-2ef87d24c2dso4938844a91.1 for ; Wed, 18 Dec 2024 03:47:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734522421; x=1735127221; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=+Vzzy564Q7ltilyK4Qu+63SEqQV/iay9myd3eLrJl6Y=; b=aAJVghdu8Wv2nKeeEeXfWGiTwoWAmd7EWeDAC4zCVldBqEMkpMKBdFcKd+ia0GJwW5 qqL2NpW771BWEjjmC398xXE4CKAzaqzmuqjqlenjLIrDcqNoifq7sdkNEAxUOIPBUAh/ BA1owxWA9zNtBCWjJRi2UdR7l/k3LBm7HQCmp1XuVQnpAAlXUJNGUmiAlt4Ztih+1O5h Mky16lmmCZ9So67vROIR7TgwALEruXiII0zw9XZ82PEe9Blt9EMeCxQQ8deW/e402/1r 9DU+LXVP0IeyORKfVEXPClJdwLMXm+4YWx34GRwawffXZ1aPRXSvUD1sxWoy4l4aaYc2 mbqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734522421; x=1735127221; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=+Vzzy564Q7ltilyK4Qu+63SEqQV/iay9myd3eLrJl6Y=; b=FETxvAJx/aUtNFP7TNjF6dVfpHqh0Hkqtr1Xrov6tECWKgCVgWqEjy1kvQpIBO5oWw hnejO2H2hdCwlnbC6ZcanHhGxTlIJZ2ZozYnOy4tgZGaPM6/K3nRCPkUiJDUuXKIZ8F9 WqOQ5NxHfZ/11KYy25+cLsB2AnndCPGohLfqymQW1dGUnx3bQcCpoa+vNSoKkhGNiYIJ fWcYoSQNa2ZPcgZSiY3ExhvbfVVSmYPPiVZSV1eXG5Q12hYod5qI3LwakGGiC2df1+gN PmhGa+dhBZwiymoOHYlRtIOl0u5rP+J2q/HLeDoEWpLsYCAcXzWE1KMEevM9SeXbqLXv Iv8g== X-Gm-Message-State: AOJu0YzcRPT6mI8b/l6AnmAgMekk4hqacw1ukIuYby9Ao8GVyzFRvpWh vV0TvLWb6N5cBSDMi0Y3++9b92ztiDtYWgP/gIJj1XCCwuhgN4n3gvGjOgcc X-Gm-Gg: ASbGncuDTPOvCPyMAHqarJP/2mxhytdtoj0JQED5MGcXqJ5AkoOxT3qXvvKMhXAioMB 1qTS5c39hEckUMjlv8rFQUa7HKKJ1BEVxIyn3CoHtJBz/wXHuYu06I9YzEu0+3NwNXfSud6O7Le Qn6ERcBNLx8gPVHW8TVMNNciqj4HneH2lT+o7fF7xNAW7goC0i+ajVP5rOvgjSFu+au2yaZz25h BioKDwHMVMyiN/J21iV42Hlfr6KPB099NOlJJd/P3BR2DjRS7GQIY9MAjZt5Cig2q0461ILI+L+ W/MbuAQxnUM= X-Google-Smtp-Source: AGHT+IEO+JWDMXQW6FsZJE3+1Wvh/7poieUhJ+CwA1N0IXPZ0ViHOqmG3ylHlvEdt5u9FJAhum58XA== X-Received: by 2002:a17:90b:5206:b0:2ee:c04a:4281 with SMTP id 98e67ed59e1d1-2f2e91a903cmr3823034a91.6.1734522421476; Wed, 18 Dec 2024 03:47:01 -0800 (PST) Received: from KASONG-MC4.tencent.com ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-218a1db7f50sm74337285ad.39.2024.12.18.03.46.58 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 18 Dec 2024 03:47:01 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Hugh Dickins , "Huang, Ying" , Yosry Ahmed , Roman Gushchin , Shakeel Butt , Johannes Weiner , Barry Song , Michal Hocko , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 3/4] mm/swap_cgroup: remove global swap cgroup lock Date: Wed, 18 Dec 2024 19:46:32 +0800 Message-ID: <20241218114633.85196-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241218114633.85196-1-ryncsn@gmail.com> References: <20241218114633.85196-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7C5ED80015 X-Rspam-User: X-Stat-Signature: q1t95mixz17buk5xj5cuxt17qyiyxeji X-HE-Tag: 1734522362-883401 X-HE-Meta: U2FsdGVkX1/H5tF4BwIEuc3SV4rA1bU2umU2/kQ4BvpTBtMT5VXto5HF6unzf5MW2k52GTMP8gNDwY3OITVZYqE5Bg4hTUU6IIy4oW5PhOdPGQyjGolaciGx4ZTy4fjnz4MfJ7UzH4miYzepF+ojR6ekcxUdiIxn7iu0wwhhTYjpTqRPMUxHqhfrKmqzmfZS8VWYMKD8wYIcek/TanqyK3n068NJAUVg1fZtzDoqr74J1pP9mYUa3Vb7+i41/P8B8/mtTEhuPOadH2ZGlps1yfkEmdY/SEQ/K6WFqqIYmktNAcjP9vcBPaL6/JXgMHeKF3HLN1/FpizLqSHM54iBnzlKfXa0REMFpFn2dqPovB4K/9HQxrfC7PUppUz0Lvaje1LRcefaKBMEVTxDk5wZfZPmMhDTv0ff4PdcKi/sOkAz/bKKSPpIxB41pVh3a7nbZcZDVLTjcg0Ld8vFjCzCEYdQCUhjHhMPiiwe634vVuHq49BIWmvrwvKW2+ef8R0A81FOevNMh8oOBj4CdW9tov7eW8JZ37JFmAWtK7oaqEpkOM17ljUlF+fTTSyUny1uMSnphlhN9U5EaiV3PHC0WS3DsIUadRGtQDWyYezc8HDSyzAYxle+pLJt9a49k+fwBSkm7CY9lnA8pbUJdps8lV9pxr41iiFqrWJw3IXTk1imLnBFbQVw/MvsSDLH9eKiTH01DOVPffdER+/FKVSaoRoLadtCBgNSEsT//1CtkTOy5RCEi89ulTejzdXbYdludXpyzVLNVAvD4D2sXPRCQjB/os+cWVUlazmV8NEpJQx2jE099u356+uAmEfEzn78PGENBPAWMQA7+FTXB1zH0Mr5VG4ujR+HbouyCgrF8gl/4XyxONz2VI55YrTfApaPDiMnZcbtbk2I7aez/EKZQx6datcAyF35qRAAr3nksNnKA7W/SSuF6nHBb9yiMdB5O5AmYkwGkVxCCK9Wiut qNImhBOo uXgv5ty4qpMHcTjAkJWzGDZzjys26w10UarJxMOahzijZprXSRONjoqlioW7ado0wCmUkEQSmMagUrmhcK/hJihV4fu9Ceb/qJimxOEOfQQO8buc58XQykcjW4gLl1MXzvBHmZQiLt+oq/PsmyGZzK4DUhRVsM7L6LD1q3SIz48TautcvZ6ZFwLMHYp2kMhGBiWf8hE7gJuesv0/YrHPryse9oWVBfY3G3gy17cGJC5IdaD2zEg2AFgdEpAm8962LdL6Lf3/0P47wi+9F1714YBcuG6r6r9HKSvo4msFhLk7LSJB19wkMGCN6qnuNT3bymyOxE23pBZquUytrjDsw9WkCgLDuro1RrRmcVQWQidioM4RWmY3hsRaVji2SfwzUhZzBIZZoMcci/E4C5F1hvXcNotlZus/L2E3SDssuItuPMnmD0PQyXi8ijv/ffGVtWHhilqEdwULI4rHfNLxjUjiTM9ry8lENvmj2kmXe2f2m/aB+bz1zRaNm1ZWmMuTBgrhu0lD43C+orjc5acBEL9mE8JR1ZWGdRFxZUcvf2crdD1fCUDzKcUfnlo86wjTR5itG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song commit e9e58a4ec3b1 ("memcg: avoid use cmpxchg in swap cgroup maintainance") replaced the cmpxchg/xchg with a global irq spinlock because some archs doesn't support 2 bytes cmpxchg/xchg. Clearly this won't scale well. And as commented in swap_cgroup.c, this lock is not needed for map synchronization. Emulation of 2 bytes xchg with atomic cmpxchg isn't hard, so implement it to get rid of this lock. Introduced two helpers for doing so and they can be easily dropped if a generic 2 byte xchg is support. Testing using 64G brd and build with build kernel with make -j96 in 1.5G memory cgroup using 4k folios showed below improvement (6 test run): Before this series: Sys time: 10782.29 (stdev 42.353886) Real time: 171.49 (stdev 0.595541) After this commit: Sys time: 9617.23 (stdev 37.764062), -10.81% Real time: 159.65 (stdev 0.587388), -6.90% With 64k folios and 2G memcg: Before this series: Sys time: 8176.94 (stdev 26.414712) Real time: 141.98 (stdev 0.797382) After this commit: Sys time: 7358.98 (stdev 54.927593), -10.00% Real time: 134.07 (stdev 0.757463), -5.57% Sequential swapout of 8G 64k zero folios with madvise (24 test run): Before this series: 5461409.12 us (stdev 183957.827084) After this commit: 5420447.26 us (stdev 196419.240317) Sequential swapin of 8G 4k zero folios (24 test run): Before this series: 19736958.916667 us (stdev 189027.246676) After this commit: 19662182.629630 us (stdev 172717.640614) Performance is better or at least not worse for all tests above. Signed-off-by: Kairui Song Reviewed-by: Roman Gushchin --- mm/swap_cgroup.c | 77 ++++++++++++++++++++++++++++++------------------ 1 file changed, 49 insertions(+), 28 deletions(-) diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c index 1770b076f6b7..cf0445cb35ed 100644 --- a/mm/swap_cgroup.c +++ b/mm/swap_cgroup.c @@ -7,19 +7,20 @@ static DEFINE_MUTEX(swap_cgroup_mutex); +/* Pack two cgroup id (short) of two entries in one swap_cgroup (atomic_t) */ +#define ID_PER_SC (sizeof(struct swap_cgroup) / sizeof(unsigned short)) +#define ID_SHIFT (BITS_PER_TYPE(unsigned short)) +#define ID_MASK (BIT(ID_SHIFT) - 1) struct swap_cgroup { - unsigned short id; + atomic_t ids; }; struct swap_cgroup_ctrl { struct swap_cgroup *map; - spinlock_t lock; }; static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; -#define SC_PER_PAGE (PAGE_SIZE/sizeof(struct swap_cgroup)) - /* * SwapCgroup implements "lookup" and "exchange" operations. * In typical usage, this swap_cgroup is accessed via memcg's charge/uncharge @@ -30,19 +31,35 @@ static struct swap_cgroup_ctrl swap_cgroup_ctrl[MAX_SWAPFILES]; * SwapCache(and its swp_entry) is under lock. * - When called via swap_free(), there is no user of this entry and no race. * Then, we don't need lock around "exchange". - * - * TODO: we can push these buffers out to HIGHMEM. */ -static struct swap_cgroup *lookup_swap_cgroup(swp_entry_t ent, - struct swap_cgroup_ctrl **ctrlp) +static unsigned short __swap_cgroup_id_lookup(struct swap_cgroup *map, + pgoff_t offset) { - pgoff_t offset = swp_offset(ent); - struct swap_cgroup_ctrl *ctrl; + unsigned int shift = (offset % ID_PER_SC) * ID_SHIFT; + unsigned int old_ids = atomic_read(&map[offset / ID_PER_SC].ids); - ctrl = &swap_cgroup_ctrl[swp_type(ent)]; - if (ctrlp) - *ctrlp = ctrl; - return &ctrl->map[offset]; + BUILD_BUG_ON(!is_power_of_2(ID_PER_SC)); + BUILD_BUG_ON(sizeof(struct swap_cgroup) != sizeof(atomic_t)); + + return (old_ids >> shift) & ID_MASK; +} + +static unsigned short __swap_cgroup_id_xchg(struct swap_cgroup *map, + pgoff_t offset, + unsigned short new_id) +{ + unsigned short old_id; + struct swap_cgroup *sc = &map[offset / ID_PER_SC]; + unsigned int shift = (offset % ID_PER_SC) * ID_SHIFT; + unsigned int new_ids, old_ids = atomic_read(&sc->ids); + + do { + old_id = (old_ids >> shift) & ID_MASK; + new_ids = (old_ids & ~(ID_MASK << shift)); + new_ids |= ((unsigned int)new_id) << shift; + } while (!atomic_try_cmpxchg(&sc->ids, &old_ids, new_ids)); + + return old_id; } /** @@ -58,21 +75,19 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, unsigned int nr_ents) { struct swap_cgroup_ctrl *ctrl; - struct swap_cgroup *sc; - unsigned short old; - unsigned long flags; pgoff_t offset = swp_offset(ent); pgoff_t end = offset + nr_ents; + unsigned short old, iter; + struct swap_cgroup *map; - sc = lookup_swap_cgroup(ent, &ctrl); + ctrl = &swap_cgroup_ctrl[swp_type(ent)]; + map = ctrl->map; - spin_lock_irqsave(&ctrl->lock, flags); - old = sc->id; - for (; offset < end; offset++, sc++) { - VM_BUG_ON(sc->id != old); - sc->id = id; - } - spin_unlock_irqrestore(&ctrl->lock, flags); + old = __swap_cgroup_id_lookup(map, offset); + do { + iter = __swap_cgroup_id_xchg(map, offset, id); + VM_BUG_ON(iter != old); + } while (++offset != end); return old; } @@ -85,9 +100,13 @@ unsigned short swap_cgroup_record(swp_entry_t ent, unsigned short id, */ unsigned short lookup_swap_cgroup_id(swp_entry_t ent) { + struct swap_cgroup_ctrl *ctrl; + if (mem_cgroup_disabled()) return 0; - return lookup_swap_cgroup(ent, NULL)->id; + + ctrl = &swap_cgroup_ctrl[swp_type(ent)]; + return __swap_cgroup_id_lookup(ctrl->map, swp_offset(ent)); } int swap_cgroup_swapon(int type, unsigned long max_pages) @@ -98,14 +117,16 @@ int swap_cgroup_swapon(int type, unsigned long max_pages) if (mem_cgroup_disabled()) return 0; - map = vcalloc(max_pages, sizeof(struct swap_cgroup)); + BUILD_BUG_ON(sizeof(unsigned short) * ID_PER_SC != + sizeof(struct swap_cgroup)); + map = vcalloc(DIV_ROUND_UP(max_pages, ID_PER_SC), + sizeof(struct swap_cgroup)); if (!map) goto nomem; ctrl = &swap_cgroup_ctrl[type]; mutex_lock(&swap_cgroup_mutex); ctrl->map = map; - spin_lock_init(&ctrl->lock); mutex_unlock(&swap_cgroup_mutex); return 0;