From patchwork Wed Sep 13 07:38:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13382561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8459CA5500 for ; Wed, 13 Sep 2023 07:44:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 20DC56B00A9; Wed, 13 Sep 2023 03:44:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 194C56B00B2; Wed, 13 Sep 2023 03:44:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00F206B00B9; Wed, 13 Sep 2023 03:44:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DEF906B00A9 for ; Wed, 13 Sep 2023 03:44:35 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 93401B34D6 for ; Wed, 13 Sep 2023 07:44:35 +0000 (UTC) X-FDA: 81230786910.03.F5E1527 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf05.hostedemail.com (Postfix) with ESMTP id E1E2C100013 for ; Wed, 13 Sep 2023 07:44:33 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="en/rpe2E"; spf=pass (imf05.hostedemail.com: domain of 3DGcBZQoKCA8D376Dpw1tsv33v0t.r310x29C-11zAprz.36v@flex--yosryahmed.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3DGcBZQoKCA8D376Dpw1tsv33v0t.r310x29C-11zAprz.36v@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694591074; a=rsa-sha256; cv=none; b=BNjkTmYZvRwiAIwGZ2pB7wChUoCWF/zJKsMvaYHqC33KWuGoD2jv+W/pHbq0mmXvNkN4ey Ka2v/MuxDB9cYBa0qWlP+qRoaIK30qL/V653iWpOGDMtSkZm4f3jSXfRwvHONoNVZACSFt fU9Afs0/hVK09gVz8mbUrCO8tcivU7Q= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="en/rpe2E"; spf=pass (imf05.hostedemail.com: domain of 3DGcBZQoKCA8D376Dpw1tsv33v0t.r310x29C-11zAprz.36v@flex--yosryahmed.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3DGcBZQoKCA8D376Dpw1tsv33v0t.r310x29C-11zAprz.36v@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694591073; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JvedlnHbJ7Xvblewl93Q0+oTKItg5cpjvq/m6HrJuqg=; b=GVr6Xk18zpP04iFUaFA+e3JgJyNqqIWcKdrRc5F9lftnpBfUEevl+4y54PhLYgAnHWho3B DGWrSYDtaipGK3P8NYOAwZ+swCchmxGicXXa9HfQjulSRPBITu1gwLB7z28OR6oJPV/HXI QG1JDso5aGm8nuJz6DHSi7fXVsPv7IE= Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-68fb4e66ce0so4689313b3a.2 for ; Wed, 13 Sep 2023 00:44:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694591073; x=1695195873; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JvedlnHbJ7Xvblewl93Q0+oTKItg5cpjvq/m6HrJuqg=; b=en/rpe2Ef4F2dbwblYl4hAwcyywv5SDTORpTIqWkV0Bo2AD8yughyBvXumQ5iX3ev0 dpauDXXilhP3ucrVM1lODbquONiHSMb6w4bLs4QWJaVDqCXaBdefVpDQCUfhBzABErxr bt2xMT056WLa7/jGZzj90rTdwM2Tcu6eFOEUEy4WeqzFrhqCQ+CIISYzOP58Y43c1Zns VEb8Gl1CPcwfhV/dXx4boP0Jp4xaCfqcH8mSp2bNmOFKWtBp/0peBo67P50Y5XMZg4r5 U91QtiyvdsW4ueZY/mFr0/KLUvFtFD474FAWZ0tQMjScDtJAC/NshToqtUVyKN89ilEI a4Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694591073; x=1695195873; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JvedlnHbJ7Xvblewl93Q0+oTKItg5cpjvq/m6HrJuqg=; b=M+AJnyhyQ/NZ+G4gt8X9l2f+pBZyUM/GxPXUt5iUHzWA2aKv6/zoatMksuNVrcEwJI S9pmAOEwiUgn2gzEYXKvxYZADK94HIH5QYUU9dr4zno5X9ZkPzs0S1z0//jYxtE8T7Nh l9asBPO1MypuD2+ewIwoSf0PSBBVyuu/m5zDyE4IAbCn+Hadb/dtyU8Me6OQD8tCKZxq wQ/e9/cLOigZlp/FilVRysi8nqGIEuqVhZzvYIKT70aBy2+1Mnf+Y6xdHQVkA16IRADX 8/IRyQ/qHbPfq3rePMjjjE5TJEl5GDTWRGSso1snb6cUBaI2Sce8X9SX5qH2YkqZIq8Q Ve5A== X-Gm-Message-State: AOJu0YxwIryQQrq7kK57VYSR5t4LQF2HPO2xLGleZF9AnU1/8sZgqUGQ GAs83mlscZWcffp3MCXA5lEGRkyry3IAMuGC X-Google-Smtp-Source: AGHT+IHlBNs2Mx/ZSSb+5tjW06WVFGJiIVkb2ZqOrPE3dU6m0k0diTK/q/q6GD3+gQB+u8r6mi7KeG2OWdyZrcyi X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a25:d393:0:b0:d7e:dff4:b0fe with SMTP id e141-20020a25d393000000b00d7edff4b0femr32185ybf.7.1694590732515; Wed, 13 Sep 2023 00:38:52 -0700 (PDT) Date: Wed, 13 Sep 2023 07:38:44 +0000 In-Reply-To: <20230913073846.1528938-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230913073846.1528938-1-yosryahmed@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230913073846.1528938-2-yosryahmed@google.com> Subject: [PATCH 1/3] mm: memcg: change flush_next_time to flush_last_time From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Waiman Long , kernel-team@cloudflare.com, Wei Xu , Greg Thelen , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E1E2C100013 X-Stat-Signature: sqwby9mybhbqt6br5ej4cp336x9u67ao X-Rspam-User: X-HE-Tag: 1694591073-626558 X-HE-Meta: U2FsdGVkX1/lAZPPSYX8GVXFPv2noyARQUVi820mOqpElfkpyT/6ZOImeQkLzZxaud9fVIXG57cUihzYkY5AI6APdq8DRqPDaGeVxLr/Vj4F4fr+pr6iDFYiBeKmIbTzkIlfjdlNsc39a78S9603DwhUKsqp5SEFwMn4Btg9/NWCNtM68b1TbRjicX/B1fFomSa5npv9lQ9QgLoii56jFv2aO7VVIEtGZKI4oeIBELVO8Mv+AFpvCPpd5kHu2qB/hYSqBTHEZ9k77KnSRSFS3WOZtqWy0i/5qJ0pbqsmn0dDwL6ke0wLjcUKa/I3liR5KTT+tDoV+ge85HH9vFzq882OOmoqmh7xgbHL9JNmSa2a8dS3+ZD4PN9vBlaCDSzMrOws+qED7WMevHe/TdCDcir1w4lPOH8dTczVj9MzpOtBf14u/HG+oI/hTLoI9CCfT4FlQLniUE34+3r8Bl0OWVbZF266F2Xm24h+s5h3fgRT9kRFZVe0QsqOoQ+9tBfQUxiyZ5FijwsndbmgbzTbAxCIqlsSfV5y9tPQ1Oofm/va4EIDzCphrKB38uJeM9QqJzluHxUjSWDKx28MAQZfVxu8kCnZqSmpb0onvQLgguGmUxQzWXa32z8Z5bHj+bWKGCsitQFBoBFi4wV72lMBpa12U6FJS6WH2qPpmIgU0Wtl8YYX1p3XTQiZJfUFBDxvMg6gyBRz9TsWH0KxvfYfO/bqxp+Oaywc00tIRSRX9mDgH1+e86/PXbfgN//8ofDV/straz6iiabI++AdZ3nhO7yTcazAjyHPPWO7iqHhcwEizwULlymgCrpYZW2q7oK0ZX5ANE2e+OE8HqiXFW+tlTrhgzKxqPkNZlUt0xPeIRLClt24Uzm37XucV1ErqQtz59OD9mBuEqMgKP95tw4MdFVnG/F2sPvIAV6HUBQZRmCK5qCkzlN9T5vy1xzzKR5svdQgHn3f4s1twXqOsZ2 ha/gCWFJ yx8R7Il25ioWtZLVZh128iAm0KStUnNqpYrlayjJnDbNlabj2JRBM8+EhlRJphOXT4vXRKq9XzN2Dw/2edRokuWlNyFcJ6GQjnpdKH3EWsw9HVq/SP2gdHJ8U6A/dC3X9EJMc8DVKaYDZod1vnJhHJ1XB7Dn7BONquSZ6ogt64LgQCTN7iUJQakCH4MqrGQtnkvX5/FKHHLAu730qla3/ko3ZocxfA3aINvJllHV12z89rqTFgRViyxj025cDqxkzjoQhfphSeGqSoA+VwCCNZAak7kcBMYppbEMzjaNl7tL0oLeaBU7lu1zExWr/E1/ZVRQJ++2cMj53oRjK7Nl4LLTQ8alpRlfYPPliulMUL5V4sQYtiz0lnTDOOw77Ric6YR0i+iGE4Rf+zO3skwrn3o6TGGcJgaykdsKyyKDeZW0aSa8wzDfFSwkKnPyRVv8BrX040Y/0fva9rpKXV3LjWd2jdowOT9WFf07cn0udWF0d7D0gZzUgthiZAX3SbFMv1a5Ilm41+KgB6njNAdTMSfreNFTzxhdfc+ARl8Tfjk/xi/WUVnNdMYKd7QWOqG7xPBuXuXyd+2Ha0oKns/StYgk8M5RWe5JcAQYwD+B6+uZEVUVs/RrK/87m+KKj34FrX1M/wh4Wsd23uZq7SKstBRV+/JssX4Gb1DuNMIY2OJ0r7fuCtI+PsenGyQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: flush_next_time is an inaccurate name. It's not the next time that periodic flushing will happen, it's rather the next time that ratelimited flushing can happen if the periodic flusher is late. Simplify its semantics by just storing the timestamp of the last flush instead, flush_last_time. Move the 2*FLUSH_TIME addition to mem_cgroup_flush_stats_ratelimited(), and add a comment explaining it. This way, all the ratelimiting semantics live in one place. No functional change intended. Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b29b850cf399..35a9c013d755 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -590,7 +590,7 @@ static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); static DEFINE_PER_CPU(unsigned int, stats_updates); static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); -static u64 flush_next_time; +static u64 flush_last_time; #define FLUSH_TIME (2UL*HZ) @@ -650,7 +650,7 @@ static void do_flush_stats(void) atomic_xchg(&stats_flush_ongoing, 1)) return; - WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); + WRITE_ONCE(flush_last_time, jiffies_64); cgroup_rstat_flush(root_mem_cgroup->css.cgroup); @@ -666,7 +666,8 @@ void mem_cgroup_flush_stats(void) void mem_cgroup_flush_stats_ratelimited(void) { - if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) + /* Only flush if the periodic flusher is one full cycle late */ + if (time_after64(jiffies_64, READ_ONCE(flush_last_time) + 2*FLUSH_TIME)) mem_cgroup_flush_stats(); } From patchwork Wed Sep 13 07:38:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13382560 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0380CA5500 for ; Wed, 13 Sep 2023 07:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42DFB6B015C; Wed, 13 Sep 2023 03:44:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DE916B015D; Wed, 13 Sep 2023 03:44:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A6006B015E; Wed, 13 Sep 2023 03:44:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1A8D66B015C for ; Wed, 13 Sep 2023 03:44:31 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D932E140380 for ; Wed, 13 Sep 2023 07:44:30 +0000 (UTC) X-FDA: 81230786700.10.8B705E0 Received: from mail-oi1-f202.google.com (mail-oi1-f202.google.com [209.85.167.202]) by imf20.hostedemail.com (Postfix) with ESMTP id 2B3571C0032 for ; Wed, 13 Sep 2023 07:44:28 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="EOI/nyLR"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 3DmcBZQoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com designates 209.85.167.202 as permitted sender) smtp.mailfrom=3DmcBZQoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694591069; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3zT0hHZy8cis/iWPUQS68SvJSs9nRfkwgZ1RrbYh+5w=; b=x2LszrXNJV9+zEPMDIdT9tvgUjBMlSbLEhaHNvlPRm/iT6Jx2IMt/fnHmSD1Sk5uUXw9e9 4QY/cM1Hf473KdImLqVpXTw68JW/OnBzQFAJ1U9FYa/3wOuWpQg5zL4l7QYWy9Gu3tS/cr 2SG9GAqmGrKREo7tzbNai41PArbnZP4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="EOI/nyLR"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 3DmcBZQoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com designates 209.85.167.202 as permitted sender) smtp.mailfrom=3DmcBZQoKCBEF598Fry3vux55x2v.t532z4BE-331Crt1.58x@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694591069; a=rsa-sha256; cv=none; b=A+zno57iACf2ZXMRbBN5gbg+DP09LwZik3gp4a8Hf6d/k3ptspLmy9h0e2l1WewY1cdBBL n6vft0Blrtyn/I5fYSTY0bF/I/EcnomKmnNC3ze+tnBkNqaj1AbH3spZc6hsXHC1/mxiEZ 3r/4AN81qDuG1/Sfk96aGTd41U4oukA= Received: by mail-oi1-f202.google.com with SMTP id 5614622812f47-3ab7edcfb4cso768616b6e.0 for ; Wed, 13 Sep 2023 00:44:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694591068; x=1695195868; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3zT0hHZy8cis/iWPUQS68SvJSs9nRfkwgZ1RrbYh+5w=; b=EOI/nyLRIKGgVx7KJ5doPpqCbBbLrTl6KLQq54rNd+oAAXcG1MBbRblzB7D0d0kl9o Y59GRxn24/HIA6xZjNwpIN6dTguWoHdK2dgLQP6La1ggfgK8hNaIFMePy99WRVF3r/Eu 7LkeMejdFJ5/L0GBJnJwfy+IvGyD/FlXj6hBCy3uPxTKrjQvOXEkel+ZGYS6Bluw2qI1 wvGRqDGSfghQX7/iHhNOf+O4ryzJupKi+bASM0us1ttJDykLWXB9UDrG4So/yq+8KFdN TIKVV1R8Vx3nBd+bdIqeMLCqkgWnQIu2NsCeSHU5PqYanFE+6F3IZyXajr4nQ6elqBP1 BTOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694591068; x=1695195868; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3zT0hHZy8cis/iWPUQS68SvJSs9nRfkwgZ1RrbYh+5w=; b=cNPtSprBMyp+hQImlUgEbIc65z/HkKN2RGa0JC8QPzKu/d+kZKAr+2eg2iAvtJPpzz c7XdwR+4FBvzecrzSNYS2sipQy284Jvyu2xeiXRZzsypXbesiKIZXkbmj3UdYbcKyzLo mjxxRIE7tNK7gmY0VSx2eVTewfxvd8udXnJnoAUMoc7SuNIDlex73wHfTDrXgki3JmmF UtXxxICY/a69pCZTiLOeskvLyGxWBPuS57UY4nno7c5vYaMEx7JEGFAPnpUx8cZQIKWh bdHORSF3LwPs0Hs9TzxZsn/PO4x9kK6MlFJaB811bEdIjodM2nwT3kVfJTHkJQ3Tf7vA 3zkw== X-Gm-Message-State: AOJu0Yz1qN6zgLnDUqs3nFXpTdO8HlGDyqNeatyUVg31z2hkw+tZNEmJ z4zcJyhIJ04tvppKJj2tL6cEdWfmS1/5K75q X-Google-Smtp-Source: AGHT+IG3xAxVjeVugRkb9w85c4h3IHyzBTJ6cMwknrJlaNgvG0fGdTwmuzoswAYft+iKkPbWy/xXuohXtxPiB5qL X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a25:d2d5:0:b0:d78:28d0:15bc with SMTP id j204-20020a25d2d5000000b00d7828d015bcmr54189ybg.4.1694590734274; Wed, 13 Sep 2023 00:38:54 -0700 (PDT) Date: Wed, 13 Sep 2023 07:38:45 +0000 In-Reply-To: <20230913073846.1528938-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230913073846.1528938-1-yosryahmed@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230913073846.1528938-3-yosryahmed@google.com> Subject: [PATCH 2/3] mm: memcg: rename stats_flush_threshold to stats_updates_order From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Waiman Long , kernel-team@cloudflare.com, Wei Xu , Greg Thelen , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed X-Rspamd-Queue-Id: 2B3571C0032 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: ndqfgh5yer5nny85t8a7kabk8gr8wxrz X-HE-Tag: 1694591068-588870 X-HE-Meta: U2FsdGVkX18qsHnxOKDEEf7urj9IRlPwkiCNoMzEizWH98eEXbsGU4HWiRIdgIgqpSlnzfQu7NQKIfRwMQkHPi5TXUZmOjPAmmFjjvF7YmJryYvoSvXK2K4oI5m8nbtgUUcLWBMsLAnxxqeIt+C72DuPAO4Jx7a+JIydyXJ7erjw/TRXkg4/5DzZ+bYQ+ia5UVzTbg8HzfU2c65P/zQAVe0axsJzx+fnBruUSSawukV56VcI7agvjrlcuFJuD/nDy01v+JoKihSeBabjmY/YseWIk4J6FhL+cm4SZsM8VtyA+HGTN36o7DGx8mzris5/1PdbgCbj72U+wOEVinJN3V0K5y1ZjslwWWqncgYewaLdXRb8w81AeamzpwBwKx0Cqdfpy4wct7iRTJIoLWxJlXjLE58lcBOCrwzbnebMu5x1hbSn82ZnYqAek4j7kCL/BzK2YGjtW+AVixOJ7WggmJCCvdk0NjpZE93ZFvLZvWqYVZDBPGKnrBjOxgqNYc+4txNcoJIca7bqwmHJJOISEMIHbRXHwWYQc7pA2SDzZbkgE1vv40UQAbAQvBqEQMylZ55X4aYmo6/06ylYi4lgfcn4Uy4KPj9zbz7R8V4rIUjkAthqTxHxgQSyruJdfRgicI7xhUIgKfWHQI5NDpA//GvkoQQTKXnmao7KLt/ASKgzktwzplokJod6P+kQVgmr82lnGRhGjFsYRdR9CYjg5ihbrcGnr/sJKl7rEAfabHXGzi9psltB+AMQ83L8wJuN6hvZj5p592y6V2SkdLCW/pXOKv1mq5zhBNofJZy0NW0dn4KzLit0AVJcXwiW/z1mtyroH4UX50sjdOlm5mvcYKKC7giQGIdcRKKdjIpxkz7CFZE17sC7BWKdcrLgfli1oLu+9WylcO1PBlATf49izj7O6LuoyxPIReSnuojAE5QIRl8LuH9goTDILpczgMwdHKa17YWPlHNVaVUeYzy 6SINFiSy eHJyucMyEME/dVNImR5M1h6vDutVF8hfjb/vstyLPkc0jDHuHJkNsXkFLNmg89TBYYsNjTS9zCKNQ+yEla9TeCV6vNerylG2oa9c6ykrxPBwUBWFmLW5Gh68LbFRIq0LYFn8QGW3ZqB5gMqEWKP/GyOc3C+Fj2akqXhWJurORiKGAWbls9QNK3SOgetq2R+V7XiMM4KE+ackoQVoaUFHVnXgsQPENeUsp7zY1sVhZryrrAuS5wTbB97fR4FlOajcv5scKT5mYFA4xdTp5R7bSMbPBs9Q19VZ9AfN0jxQpsv9FhKgler4mUNVnBNyeFFRMXHFu9kC1ZQQvW0FSt8tOUrQ5EY7xonqdyzhoiPUpgd/JI9JNxfVpZdQeedpJ91OlHQwDAXEe5vrlmtLOxI1tAYrz5S9WiMpciR/0QVO1rZfPDqgIDL7h2sYVjjn8IYENhM34Y55FzGV5Y5USsE1hK9ChuY0RQWl1EUT4IroiZ2ShaIlnG+V5G6Yb5TAjHCcQoOYcaMdYQueA416EBSmh+xUZcWelLthfNPEJK+4cO9Byutq7GTJtZzR2ABs7guLvxUHrQbKz4+uC4AtnacvxzNNqsUIc0DyVu6qkFexbTOJjdqqIIKLtS8x8Usma6OaGdLWmO4D/RcMygud4wlnuNqVPH3mNCSCO523u4JEikzbUYmPlS0ClHZUUPw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: stats_flush_threshold is a misnomer. It is not actually a threshold, but rather a number that represents the amount of updates that we have. It is counted in multiples of MEMCG_CHARGE_BATCH. When this value reaches num_online_cpus(), we flush the stats. Hence, num_online_cpus() is the actual threshold, and stats_flush_threshold is actually an order of the stats updates magnitude. Rename stats_flush_threshold to stats_updates_order, and define a STATS_FLUSH_THRESHOLD constant that resolves to num_online_cpus(). No functional change intended. Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 35a9c013d755..d729870505f1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -589,10 +589,12 @@ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); static DEFINE_PER_CPU(unsigned int, stats_updates); static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); -static atomic_t stats_flush_threshold = ATOMIC_INIT(0); +/* stats_updates_order is in multiples of MEMCG_CHARGE_BATCH */ +static atomic_t stats_updates_order = ATOMIC_INIT(0); static u64 flush_last_time; #define FLUSH_TIME (2UL*HZ) +#define STATS_FLUSH_THRESHOLD num_online_cpus() /* * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can @@ -628,13 +630,11 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) x = __this_cpu_add_return(stats_updates, abs(val)); if (x > MEMCG_CHARGE_BATCH) { /* - * If stats_flush_threshold exceeds the threshold - * (>num_online_cpus()), cgroup stats update will be triggered - * in __mem_cgroup_flush_stats(). Increasing this var further - * is redundant and simply adds overhead in atomic update. + * Incrementing stats_updates_order beyond the threshold is + * redundant. Avoid the overhead of the atomic update. */ - if (atomic_read(&stats_flush_threshold) <= num_online_cpus()) - atomic_add(x / MEMCG_CHARGE_BATCH, &stats_flush_threshold); + if (atomic_read(&stats_updates_order) <= STATS_FLUSH_THRESHOLD) + atomic_add(x / MEMCG_CHARGE_BATCH, &stats_updates_order); __this_cpu_write(stats_updates, 0); } } @@ -654,13 +654,13 @@ static void do_flush_stats(void) cgroup_rstat_flush(root_mem_cgroup->css.cgroup); - atomic_set(&stats_flush_threshold, 0); + atomic_set(&stats_updates_order, 0); atomic_set(&stats_flush_ongoing, 0); } void mem_cgroup_flush_stats(void) { - if (atomic_read(&stats_flush_threshold) > num_online_cpus()) + if (atomic_read(&stats_updates_order) > STATS_FLUSH_THRESHOLD) do_flush_stats(); } @@ -674,8 +674,8 @@ void mem_cgroup_flush_stats_ratelimited(void) static void flush_memcg_stats_dwork(struct work_struct *w) { /* - * Always flush here so that flushing in latency-sensitive paths is - * as cheap as possible. + * Deliberately ignore stats_updates_order here so that flushing in + * latency-sensitive paths is as cheap as possible. */ do_flush_stats(); queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); From patchwork Wed Sep 13 07:38:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13382554 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F2FBCA5500 for ; Wed, 13 Sep 2023 07:39:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 932926B0159; Wed, 13 Sep 2023 03:39:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E1DF6B015B; Wed, 13 Sep 2023 03:39:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75C3F6B015C; Wed, 13 Sep 2023 03:39:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5B84B6B0159 for ; Wed, 13 Sep 2023 03:39:00 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 08B378033B for ; Wed, 13 Sep 2023 07:39:00 +0000 (UTC) X-FDA: 81230772840.01.8E7B421 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf19.hostedemail.com (Postfix) with ESMTP id 476251A0003 for ; Wed, 13 Sep 2023 07:38:57 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bxnXGdPt; spf=pass (imf19.hostedemail.com: domain of 3EGcBZQoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3EGcBZQoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694590737; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8wFSE4R6tL7pv8rA6Rke9m26hzyRIxknt8a2PL5Y0/4=; b=bXV1CRXlbfDXqVftQYHlIeHq1GJg740vHB+Mcs3h8O893RaFWSbuI2k9K8S8SzVmYuBHH4 3/nfeVbY7jUeskeO28bxXdurca2d4nBW/535TQ8n0nP5ItHyZd1qsCuooY//Jz9oV5k3zg UcrmqrHSzS4RhqD1Acr1d9xVbf6mdRc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694590737; a=rsa-sha256; cv=none; b=A0/L0RHwdKC6QYHn3LQdHVgOV+xLnnU+MT1eRipnebK/7xUUeZCKiUD4NN5wzIxKLmzmYt PZha8lvUe3oFoiMd2XennDFJ6Ez5dgJklXUa+C91ZmgJ3z2IEWqWwyP2C4I6Gp8eWpfZYt ZQbUoR0PhBOvjE4uOTvFDNUG3i+HzH8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bxnXGdPt; spf=pass (imf19.hostedemail.com: domain of 3EGcBZQoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3EGcBZQoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-57787e29037so2651194a12.2 for ; Wed, 13 Sep 2023 00:38:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694590736; x=1695195536; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8wFSE4R6tL7pv8rA6Rke9m26hzyRIxknt8a2PL5Y0/4=; b=bxnXGdPtLO7bdvlNwEo27ifB9RWeRbRIsq3U4jPx3R3OKolR6YPjO6PBHgIbzcgwfu U0y6k1ThTJOv4sRhnovQ60zt7UIkh2KdvlMp3LOoiprcIuQEovdKLEzofAMb5VRofuUI XPWUZj+CSpW0a/vCfPw+q4y0zdadIgXKdPvSLtgIR+PHrOryYpA/+4oBqCDNoisamPdf VsUceiazHuzMgIiEBA7mJagQQ/dpQVcSpAYT+ThZO7Z5zrJiKbXq0T5sXo0Lw3TXw5L7 3UwQPjLwmSDb/1QPIAW9V3zsmSaYQ2QVjUdirxbWYJjoURK94NrlpFigw7PTPlWWzrDj +zAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694590736; x=1695195536; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8wFSE4R6tL7pv8rA6Rke9m26hzyRIxknt8a2PL5Y0/4=; b=Y7PeukbM86vR8n/+aEVxxzTmFutMHixigtaF/cnU+XKhQuPCq8yY7pWXy9Pi3iw7QK i/s6/pF1962atV0DyB57PTAzZlXVou45WYoyczqhtJ1aSf4HmfGweEwnPS/a4s3TccHN jEmDa7W+gcGBWE9qtQtt+golYAR2JxRj/WMzOQ93nuLj3SZC5FDcyxzYB3kMB09Tm3qt escb5ZZT+kvvZchBGlfYyIle5kSZiqspHfF2rtJO4lwXLr4Cf/Uz6qMIGPHO00CWR0ks Hwyl/vgtavj4tn+le01wxAUcdub2VCUrMCb7WaOzFoPDILfuxg3GbaCnXqGnB2dDcLhM 4DTg== X-Gm-Message-State: AOJu0YxqZpxUAjfjvDP6rwi6O22WsyWkpW79qk9j5efOUxiNzCdhC8pu FYMcPqGu2PhhzoXzqfFgbNBCLzeWmZwFzLgy X-Google-Smtp-Source: AGHT+IH7x+NeMXFHRZHoocEmERPyLyLuDI56svfvzejgbaWkciesmXJkInMLMzn2vBpTjTv7kSbwSzaXeRtqWs08 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a63:8c4b:0:b0:577:b70e:56b5 with SMTP id q11-20020a638c4b000000b00577b70e56b5mr38740pgn.5.1694590736054; Wed, 13 Sep 2023 00:38:56 -0700 (PDT) Date: Wed, 13 Sep 2023 07:38:46 +0000 In-Reply-To: <20230913073846.1528938-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230913073846.1528938-1-yosryahmed@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230913073846.1528938-4-yosryahmed@google.com> Subject: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Waiman Long , kernel-team@cloudflare.com, Wei Xu , Greg Thelen , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed X-Rspamd-Queue-Id: 476251A0003 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: arjofitxhazd1igd3h7i9x367n698qxr X-HE-Tag: 1694590737-747870 X-HE-Meta: U2FsdGVkX1/id0RPmoK65yu9S/F1zz8DzcJCdQol9FA8x5NQRRTQxR39Q1xnrPB9pMYa8wGnyHcjdqDr8V9NkmPUNsJrt8wgOhox7LCPxk8qL0Wrsmuz5pWPqfPuKOUo353o/ab2hZeL45doU/zzW7coLlq4/PGNxWcN1cguaB9zKRJGqm5VtC2g8SBHOpFMUG5Ehv/ikZgehJi+2XX8xo2ieYlmNbrn5BvjMuPQMqqiJQomKPAkyqE0/w1rOjSIZCGn5qfImu1DZjEdG9+bjRq7rVV6oBnyEJ4ln/LlB7eTjMLDV75LZrLoqpJdvt2Y5BtsAFGcmo+sS/sZMkV/V2A0Z/kl5thZNbJFQ1ew5s34M3ZZ0f8/awo9vxTpsV1KRDtw7lK0HCdVK/yJl1fB9EnvnFvJSTppjTQ6f/+7yV/kEIENBnyCX1hgjU4g5sdlSU1tIhZrkprfxEF/H2zR4PASz29uVjA5LgiEc3gmeDf/10S3NZLJIfwvf86JJFy5PiNcLF+9IdRYHMOQBBPqXMnYKV24VnXCOiamcvi5Ykyh7UBE0ZQxbAi41+5ChP9F2tci2+wC9tg4TFlg0BdtPQm3zxi6oT+ikP510bCkOtBlcm/38y4FXaYX1GdLRHg3RkVZvu2a4JYjNIaUdOOFmGnQs43qcidFuVi0nIXNn1BZuqw/78Mkc4KmoNspAUQl1s079Ns+S26Nx9dPv+P+PzbxW6V/cx1xuYvjfNEVoWHEWgZxKebqx8ULil17HbTMlFpwwNarG4JvDvQX8yW48Zp7XvgPK8WI6N6R8MlGT/4SYiUSYYXWXqoNYkGTgavYElSNmOITX//tqz6LuFfeh1tdVdObiAOoh15h6tbm7eEfKnSLGP/m2VEAzAvDWzAe7sNZaN2o3M6qfTzJWeAGtEV35BLRwjP0suq4IKbFhs6aXXz7ztsGZ/4m4jyan6l87NkxSXA7eXL3gGCKXdw sNRl+TgA 0bL4eW7sLcA9tpWOiu47bpXxhUMq4cjaUZ5XKjgIYs2l/yGEuIojv8Koq73Hei8/bN9rv5o2MTNeXNnSRygaLC2L0+Vv/QhBHjubvxxzykxG4s3ohwTGw3pJL84UaT9xgibxE0o5CQIKZ4cypQW9cfh3gmDqS/JOhjPjX+NEPnHXm366jWwAHf7xS2sXcbaixljJFEXr2RWXm99E1r9WyQK9Brg4uDA76KdE8yLfHBkslKW6rh6S+nkInZRvj1QtDBlK1qbkSSINd+1hy2AvsRvvUaX3qU94fO/AWvI0rNq3x2rVaodx08nSWlh4Z/XHUqAyrHPXkNg3UjQ42P/9SahhqXZ+oHGPpH1gikBxKW3746NYiF6LWeEmwmeAzGze+Upa/WRzC2dydI7pIHINO/GCNR73CXNh4pSgJ7Y5JvxQ3kt4oiNK1u9uj0seC9gq6KSeAbkYyrhAPBXOEmmxYRDt2SUpCtOePTH3+ClvovGRyOos+VvPxcXQCFTn5xLqdk+uFblAvDq+j3dozkRo6hWoVVWtWa9Ca2fbD2SfCbgOpNxhREYcsTflxOIo7e1Hv1kdlkAaZmXSI0goN3jkttenERr9V8qHwkt1pGuVBtPASS/eNvF6vxDG6smoRjBVz68J5BITOUthMyeRBZitgK/6lwLaAJtT5a9eNk2n1bfekVz7AoRun1ugSkwBRQcvs/hhZnOHYUpXBH5zvC9ImzstBjiEStkOOyF+x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Stats flushing for memcg currently follows the following rules: - Always flush the entire memcg hierarchy (i.e. flush the root). - Only one flusher is allowed at a time. If someone else tries to flush concurrently, they skip and return immediately. - A periodic flusher flushes all the stats every 2 seconds. The reason this approach is followed is because all flushes are serialized by a global rstat spinlock. On the memcg side, flushing is invoked from userspace reads as well as in-kernel flushers (e.g. reclaim, refault, etc). This approach aims to avoid serializing all flushers on the global lock, which can cause a significant performance hit under high concurrency. This approach has the following problems: - Occasionally a userspace read of the stats of a non-root cgroup will be too expensive as it has to flush the entire hierarchy [1]. - Sometimes the stats accuracy are compromised if there is an ongoing flush, and we skip and return before the subtree of interest is actually flushed. This is more visible when reading stats from userspace, but can also affect in-kernel flushers. This patch aims to solve both problems by reworking how flushing currently works as follows: - Without contention, there is no need to flush the entire tree. In this case, only flush the subtree of interest. This avoids the latency of a full root flush if unnecessary. - With contention, fallback to a coalesced (aka unified) flush of the entire hierarchy, a root flush. In this case, instead of returning immediately if a root flush is ongoing, wait for it to finish *without* attempting to acquire the lock or flush. This is done using a completion. Compared to competing directly on the underlying lock, this approach makes concurrent flushing a synchronization point instead of a serialization point. Once a root flush finishes, *all* waiters can wake up and continue at once. - Finally, with very high contention, bound the number of waiters to the number of online cpus. This keeps the flush latency bounded at the tail (very high concurrency). We fallback to sacrificing stats freshness only in such cases in favor of performance. This was tested in two ways on a machine with 384 cpus: - A synthetic test with 5000 concurrent workers doing allocations and reclaim, as well as 1000 readers for memory.stat (variation of [2]). No significant regressions were noticed in the total runtime. Note that if concurrent flushers compete directly on the spinlock instead of waiting for a completion, this test shows 2x-3x slowdowns. Even though subsequent flushers would have nothing to flush, just the serialization and lock contention is a major problem. Using a completion for synchronization instead seems to overcome this problem. - A synthetic stress test for concurrently reading memcg stats provided by Wei Xu. With 10k threads reading the stats every 100ms: - 98.8% of reads take <100us - 1.09% of reads take 100us to 1ms. - 0.11% of reads take 1ms to 10ms. - Almost no reads take more than 10ms. With 10k threads reading the stats every 10ms: - 82.3% of reads take <100us. - 4.2% of reads take 100us to 1ms. - 4.7% of reads take 1ms to 10ms. - 8.8% of reads take 10ms to 100ms. - Almost no reads take more than 100ms. [1] https://lore.kernel.org/lkml/CABWYdi0c6__rh-K7dcM_pkf9BJdTRtAU08M43KO9ME4-dsgfoQ@mail.gmail.com/ [2] https://lore.kernel.org/lkml/CAJD7tka13M-zVZTyQJYL1iUAYvuQ1fcHbCjcOBZcz6POYTV-4g@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CAAPL-u9D2b=iF5Lf_cRnKxUfkiEe0AMDTu6yhrUAzX0b6a6rDg@mail.gmail.com/ [weixugc@google.com: suggested the fallback logic and bounding the number of waiters] Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 4 +- mm/memcontrol.c | 100 ++++++++++++++++++++++++++++--------- mm/vmscan.c | 2 +- mm/workingset.c | 8 ++- 4 files changed, 85 insertions(+), 29 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 11810a2cfd2d..4453cd3fc4b8 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1034,7 +1034,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return x; } -void mem_cgroup_flush_stats(void); +void mem_cgroup_flush_stats(struct mem_cgroup *memcg); void mem_cgroup_flush_stats_ratelimited(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, @@ -1519,7 +1519,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return node_page_state(lruvec_pgdat(lruvec), idx); } -static inline void mem_cgroup_flush_stats(void) +static inline void mem_cgroup_flush_stats(struct mem_cgroup *memcg) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d729870505f1..edff41e4b4e7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -588,7 +588,6 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); static DEFINE_PER_CPU(unsigned int, stats_updates); -static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); /* stats_updates_order is in multiples of MEMCG_CHARGE_BATCH */ static atomic_t stats_updates_order = ATOMIC_INIT(0); static u64 flush_last_time; @@ -639,36 +638,87 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) } } -static void do_flush_stats(void) +/* + * do_flush_stats - flush the statistics of a memory cgroup and its tree + * @memcg: the memory cgroup to flush + * @wait: wait for an ongoing root flush to complete before returning + * + * All flushes are serialized by the underlying rstat global lock. If there is + * no contention, we try to only flush the subtree of the passed @memcg to + * minimize the work. Otherwise, we coalesce multiple flushing requests into a + * single flush of the root memcg. When there is an ongoing root flush, we wait + * for its completion (unless otherwise requested), to get fresh stats. If the + * number of waiters exceeds the number of cpus just skip the flush to bound the + * flush latency at the tail with very high concurrency. + * + * This is a trade-off between stats accuracy and flush latency. + */ +static void do_flush_stats(struct mem_cgroup *memcg, bool wait) { + static DECLARE_COMPLETION(root_flush_done); + static DEFINE_SPINLOCK(root_flusher_lock); + static DEFINE_MUTEX(subtree_flush_mutex); + static atomic_t waiters = ATOMIC_INIT(0); + static bool root_flush_ongoing; + bool root_flusher = false; + + /* Ongoing root flush, just wait for it (unless otherwise requested) */ + if (READ_ONCE(root_flush_ongoing)) + goto root_flush_or_wait; + /* - * We always flush the entire tree, so concurrent flushers can just - * skip. This avoids a thundering herd problem on the rstat global lock - * from memcg flushers (e.g. reclaim, refault, etc). + * Opportunistically try to only flush the requested subtree. Otherwise + * fallback to a coalesced flush below. */ - if (atomic_read(&stats_flush_ongoing) || - atomic_xchg(&stats_flush_ongoing, 1)) + if (!mem_cgroup_is_root(memcg) && mutex_trylock(&subtree_flush_mutex)) { + cgroup_rstat_flush(memcg->css.cgroup); + mutex_unlock(&subtree_flush_mutex); return; + } - WRITE_ONCE(flush_last_time, jiffies_64); - - cgroup_rstat_flush(root_mem_cgroup->css.cgroup); + /* A coalesced root flush is in order. Are we the designated flusher? */ + spin_lock(&root_flusher_lock); + if (!READ_ONCE(root_flush_ongoing)) { + reinit_completion(&root_flush_done); + /* + * We must reset the completion before setting + * root_flush_ongoing. Otherwise, waiters may call + * wait_for_completion() and immediately return before we flush. + */ + smp_wmb(); + WRITE_ONCE(root_flush_ongoing, true); + root_flusher = true; + } + spin_unlock(&root_flusher_lock); - atomic_set(&stats_updates_order, 0); - atomic_set(&stats_flush_ongoing, 0); +root_flush_or_wait: + if (root_flusher) { + cgroup_rstat_flush(root_mem_cgroup->css.cgroup); + complete_all(&root_flush_done); + atomic_set(&stats_updates_order, 0); + WRITE_ONCE(flush_last_time, jiffies_64); + WRITE_ONCE(root_flush_ongoing, false); + } else if (wait && atomic_add_unless(&waiters, 1, num_online_cpus())) { + smp_rmb(); /* see smp_wmb() above */ + wait_for_completion_interruptible(&root_flush_done); + atomic_dec(&waiters); + } } -void mem_cgroup_flush_stats(void) +void mem_cgroup_flush_stats(struct mem_cgroup *memcg) { + if (!memcg) + memcg = root_mem_cgroup; + if (atomic_read(&stats_updates_order) > STATS_FLUSH_THRESHOLD) - do_flush_stats(); + do_flush_stats(memcg, true); } void mem_cgroup_flush_stats_ratelimited(void) { /* Only flush if the periodic flusher is one full cycle late */ if (time_after64(jiffies_64, READ_ONCE(flush_last_time) + 2*FLUSH_TIME)) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(root_mem_cgroup); } static void flush_memcg_stats_dwork(struct work_struct *w) @@ -677,7 +727,7 @@ static void flush_memcg_stats_dwork(struct work_struct *w) * Deliberately ignore stats_updates_order here so that flushing in * latency-sensitive paths is as cheap as possible. */ - do_flush_stats(); + do_flush_stats(root_mem_cgroup, false); queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); } @@ -1577,7 +1627,7 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) * * Current memory state: */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(memcg); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { u64 size; @@ -4019,7 +4069,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) int nid; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(memcg); for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { seq_printf(m, "%s=%lu", stat->name, @@ -4094,7 +4144,7 @@ static void memcg1_stat_format(struct mem_cgroup *memcg, struct seq_buf *s) BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(memcg); for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { unsigned long nr; @@ -4596,7 +4646,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(memcg); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); @@ -6606,7 +6656,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v) int i; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(memcg); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { int nid; @@ -7764,7 +7814,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) break; } - cgroup_rstat_flush(memcg->css.cgroup); + mem_cgroup_flush_stats(memcg); pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE; if (pages < max) continue; @@ -7829,8 +7879,10 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) static u64 zswap_current_read(struct cgroup_subsys_state *css, struct cftype *cft) { - cgroup_rstat_flush(css->cgroup); - return memcg_page_state(mem_cgroup_from_css(css), MEMCG_ZSWAP_B); + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + mem_cgroup_flush_stats(memcg); + return memcg_page_state(memcg, MEMCG_ZSWAP_B); } static int zswap_max_show(struct seq_file *m, void *v) diff --git a/mm/vmscan.c b/mm/vmscan.c index 6f13394b112e..fc356b9bc003 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2923,7 +2923,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(sc->target_mem_cgroup); /* * Determine the scan balance between anon and file LRUs. diff --git a/mm/workingset.c b/mm/workingset.c index da58a26d0d4d..13cbccf907f1 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -519,7 +519,11 @@ void workingset_refault(struct folio *folio, void *shadow) return; } - /* Flush stats (and potentially sleep) before holding RCU read lock */ + /* + * Flush stats (and potentially sleep) before holding RCU read lock + * XXX: This can be reworked to pass in a memcg, similar to + * mem_cgroup_flush_stats(). + */ mem_cgroup_flush_stats_ratelimited(); rcu_read_lock(); @@ -664,7 +668,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker, struct lruvec *lruvec; int i; - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(sc->memcg); lruvec = mem_cgroup_lruvec(sc->memcg, NODE_DATA(sc->nid)); for (pages = 0, i = 0; i < NR_LRU_LISTS; i++) pages += lruvec_page_state_local(lruvec,