From patchwork Thu Mar 23 04:00:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5D50C6FD1C for ; Thu, 23 Mar 2023 04:00:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229491AbjCWEAy (ORCPT ); Thu, 23 Mar 2023 00:00:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229485AbjCWEAw (ORCPT ); Thu, 23 Mar 2023 00:00:52 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDCA51F91E for ; Wed, 22 Mar 2023 21:00:43 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id e8-20020a17090a118800b0023d35ae431eso6639482pja.8 for ; Wed, 22 Mar 2023 21:00:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544043; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zlRHMwQCwPxTz7URsIPFYEiZ1y/NkKG6JVNDPEffcgo=; b=q0IiHI38SIToezuWJ5YCNMq/43EaxUo3i4AMEPHcC9bZCfWe3qfaYbxabY9EbcCoqm LUESQ/1v1D2aalE9SO+twkVNP5hnW+UNiSPbsUMxGnfaQG+8mDUTyeXBv18fXpGW3SWD GCVDR2eWTWBw72p7FFHYiZQYGqUWBUdEyjEquw53iNIuEWGhIQhSBNfbqBw3sfwQmtJy zq0COAvO1AYz/HqEKQJev5lhSkkjfsk/GYKhQDhBPMJgp6hmFHeDaNzuC/4om1icb3Uk IS34ZPkCnh/Dxdz7tz/xvLp+bFG8mdt7D5IhxCaGQj3LQws3RSkaD2yIkaXnXsqGxGWA GdAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544043; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zlRHMwQCwPxTz7URsIPFYEiZ1y/NkKG6JVNDPEffcgo=; b=cyhaZ3ZyeiZrahPyDyEef2EgqM5DbTi8EgkL9EWJ/PZM1WC9e0zvdfrrDK8pqNBT3n g4QoOErqh9L20CYnvWZysr9FAQk82h9ovYYVfXf9MVnwG5FmleAVBaONC23Y3kRW8tUN 5ZtZfrQudx3LPipKyxeLPWbeTKuMSX38vzS/XMfzEJTRjbrfLfCOnN3/dylnJY31CmZd 9xL8bkjP12RbbOtmKLp5AujBopFwxwaF6i6C9p2SGOKraPvE9uI5nFI2Lqb7hrEhnf96 QoK9evMn72bSyWibCY/aVli1NBtAD7nGyKBXlWXiTIBSCNicMnIx4kK8EEP6qzXjqNaI Ozxg== X-Gm-Message-State: AO0yUKUx8+TPpqhL9jwpFsmcaV1DMB8vb0hoMv7n7819hISWZ/Zh7LIE Gsi3IQAgneTFIGyABZ4gvq920aewy23xv31Z X-Google-Smtp-Source: AK7set97jrlAfVvPpHOrM2Vr0RCmWVnsZU6sMMMwHyygC2dGwvgADEuqjxobUXV27YUvQlS1Att+r/ZLiyOif2dS X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:902:e745:b0:1a0:4aa3:3a9a with SMTP id p5-20020a170902e74500b001a04aa33a9amr1983132plf.2.1679544043324; Wed, 22 Mar 2023 21:00:43 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:31 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-2-yosryahmed@google.com> Subject: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, when sleeping is not allowed during rstat flushing, we hold the global rstat lock with interrupts disabled throughout the entire flush operation. Flushing in an O(# cgroups * # cpus) operation, and having interrupts disabled throughout is dangerous. For some contexts, we may not want to sleep, but can be interrupted (e.g. while holding a spinlock or RCU read lock). As such, do not disable interrupts throughout rstat flushing, only when holding the percpu lock. This breaks down the O(# cgroups * # cpus) duration with interrupts disabled to a series of O(# cgroups) durations. Furthermore, if a cpu spinning waiting for the global rstat lock, it doesn't need to spin with interrupts disabled anymore. If the caller of rstat flushing needs interrupts to be disabled, it's up to them to decide that, and it should be fine to hold the global rstat lock with interrupts disabled. There is currently a single context that may invoke rstat flushing with interrupts disabled, the mem_cgroup_flush_stats() call in mem_cgroup_usage(), if called from mem_cgroup_threshold(). To make it safe to hold the global rstat lock with interrupts enabled, make sure we only flush from in_task() contexts. The side effect of that we read stale stats in interrupt context, but this should be okay, as flushing in interrupt context is dangerous anyway as it is an expensive operation, so reading stale stats is safer. Signed-off-by: Yosry Ahmed --- kernel/cgroup/rstat.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 831f1f472bb8..af11e28bd055 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -7,6 +7,7 @@ #include #include +/* This lock should only be held from task context */ static DEFINE_SPINLOCK(cgroup_rstat_lock); static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); @@ -210,14 +211,24 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) /* if @may_sleep, play nice and yield if necessary */ if (may_sleep && (need_resched() || spin_needbreak(&cgroup_rstat_lock))) { - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); if (!cond_resched()) cpu_relax(); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock(&cgroup_rstat_lock); } } } +static bool should_skip_flush(void) +{ + /* + * We acquire cgroup_rstat_lock without disabling interrupts, so we + * should not try to acquire from interrupt contexts to avoid deadlocks. + * It can be expensive to flush stats in interrupt context anyway. + */ + return !in_task(); +} + /** * cgroup_rstat_flush - flush stats in @cgrp's subtree * @cgrp: target cgroup @@ -229,15 +240,18 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool may_sleep) * This also gets all cgroups in the subtree including @cgrp off the * ->updated_children lists. * - * This function may block. + * This function is safe to call from contexts that disable interrupts, but + * @may_sleep must be set to false, otherwise the function may block. */ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) { - might_sleep(); + if (should_skip_flush()) + return; - spin_lock_irq(&cgroup_rstat_lock); + might_sleep(); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, true); - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); } /** @@ -248,11 +262,12 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) */ void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) { - unsigned long flags; + if (should_skip_flush()) + return; - spin_lock_irqsave(&cgroup_rstat_lock, flags); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, false); - spin_unlock_irqrestore(&cgroup_rstat_lock, flags); + spin_unlock(&cgroup_rstat_lock); } /** @@ -267,8 +282,11 @@ void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) void cgroup_rstat_flush_hold(struct cgroup *cgrp) __acquires(&cgroup_rstat_lock) { + if (should_skip_flush()) + return; + might_sleep(); - spin_lock_irq(&cgroup_rstat_lock); + spin_lock(&cgroup_rstat_lock); cgroup_rstat_flush_locked(cgrp, true); } @@ -278,7 +296,7 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp) void cgroup_rstat_flush_release(void) __releases(&cgroup_rstat_lock) { - spin_unlock_irq(&cgroup_rstat_lock); + spin_unlock(&cgroup_rstat_lock); } int cgroup_rstat_init(struct cgroup *cgrp) From patchwork Thu Mar 23 04:00:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED83EC77B60 for ; Thu, 23 Mar 2023 04:00:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229485AbjCWEA4 (ORCPT ); Thu, 23 Mar 2023 00:00:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229615AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0677520040 for ; Wed, 22 Mar 2023 21:00:45 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id x4-20020a170902ec8400b001a1a5f6f272so10187079plg.1 for ; Wed, 22 Mar 2023 21:00:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544045; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=0V6KepxSWjC0U8YUMhyf5sEsW80RflfKR4tDJnFyjUM=; b=nvFwhwj7qGyMPQaTPrzYFw+T0XNSe66OQzM5b4pjwLTwVBRy9Yg/AbwpYFnRMIM5gG L465fbRhjctsBw/iYduTrtP7LUUTB2pfnwQVaagEQM0GyAtfd40drA654+PpJ27hrtIB BUWcp/c32KQ98SFLCZ9FSHYNM6vaide1p2wRQYwdJvG/bU3/l2g1YAjLINeXoxbcrV2W 3/G5Y/6fhAJDfUsgxXfVWzP78YqVFCMwh/+/7IKP9AVvA+7qfhsyQkLxneMASA+CreO0 VfuwrktKMl3lyTDdGpOXeKO210ebUx1dz4XnfSvMg+6i1Ne/G/0TP1oqkrL/txGwmyNx VJtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544045; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0V6KepxSWjC0U8YUMhyf5sEsW80RflfKR4tDJnFyjUM=; b=AVjl9Plwj8jIH/rsqQScIOJaPaHPTjKrFfjoXYG+eetY5yyQu9JSlnZgt9Onf128CA bp1CDvUHIl23qvpJvxul4ZV+lMdVJ0K1HhdCewTbBb4vCZLAXMVrR05Aev67a0Jfrchu CuCZQeQxrGhBi3SOZ3QdBu2A6a6qJhVHBe7Cn1jPcrL5eTeBdvdhdDt+aEDJid64aTh7 Au7IasLDS1wEHZ/526Fm5Sx5UG6jlzNsa56iomBm6QFLFcnK+kXPqfBS1TpMenW8G2bM ZfK6fMgAQINgwUnlamhocBHQuigosYA4VWWIbG+KlCR54YcsI2IUf0imUtreTqfwCr+F Ek+A== X-Gm-Message-State: AO0yUKXYOSvHhlbJdQ/HgOYzUDdRHb/vcFdercAbDxh1CJSjxeXCklG0 ghOlxFVXx/8wk2Wtaqh0DD4bzDc6aUpqAWlz X-Google-Smtp-Source: AK7set+NjW/4AXEB5d0IG7XiVNOSp7doL59qiTn9uvLqiFnoG/BqWGCoWZHV7SR3pQA/2uxuP1t6d8Miz+ziUaX1 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6a00:851:b0:628:30d:2d2f with SMTP id q17-20020a056a00085100b00628030d2d2fmr2879444pfk.5.1679544045305; Wed, 22 Mar 2023 21:00:45 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:32 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-3-yosryahmed@google.com> Subject: [RFC PATCH 2/7] memcg: do not disable interrupts when holding stats_flush_lock From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The rstat flushing code was modified so that we do not disable interrupts when we hold the global rstat lock. Do the same for stats_flush_lock on the memcg side to avoid unnecessarily disabling interrupts throughout flushing. Since the code exclusively uses trylock to acquire this lock, it should be fine to hold from interrupt contexts or normal contexts without disabling interrupts as a deadlock cannot occur. For interrupt contexts we will return immediately without flushing anyway. Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5abffe6f8389..e0e92b38fa51 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -636,15 +636,17 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * This lock can be acquired from interrupt context, but we only acquire + * using trylock so it should be fine as we cannot cause a deadlock. + */ + if (!spin_trylock(&stats_flush_lock)) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + spin_unlock(&stats_flush_lock); } void mem_cgroup_flush_stats(void) From patchwork Thu Mar 23 04:00:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184914 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C464C7619A for ; Thu, 23 Mar 2023 04:01:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229625AbjCWEA6 (ORCPT ); Thu, 23 Mar 2023 00:00:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229775AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BB6B2005B for ; Wed, 22 Mar 2023 21:00:48 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5411f21f849so209041887b3.16 for ; Wed, 22 Mar 2023 21:00:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544047; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=736+92Vy7w+whzoBxwgGPc/WG8vZYtmz8qFOhZmUk9w=; b=GwAMe3Sdh/YfT7TszWBGAWb/Omm+6/db9kDSH3rx0U2jTcaU6cdo97INaHv86kxNQA jhXBaZOARcF2wynwqEfwzWZoINYtyHtPiv9iRsKtX2yswvpMBD6qg4pgoYL8XYK3JdrY NUg095+EPmMHIupEKGzGCL9BJH8z0FwgmxWYVXMQd4hSYkBDyIXOQyRVbTCpsq34KRPV CyG3E5BvF3wTZXEl8tQiaUrIBU2ScbP5rypiT3jSEorEK7YISHgjHIMI4g2IycEeQTfl s5/P1y3KHUJjZSE6PJ3e2OlZHSMu+2KD/QDk8kFiQ8Hi9jpOcpijxBB3B0Ib/9iMoeNk 6zZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544047; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=736+92Vy7w+whzoBxwgGPc/WG8vZYtmz8qFOhZmUk9w=; b=1RDQ5o4VPfOgpEdU8qSx5ANJIuDn0YscnuWkDVKoXnOY195qlN48MQpP5iZ+8whvTV F4kdL3p/R9tTJJvOeb+RC1iypHvppqVB3meT0nIZWQVg6NWcXTfjf/qxiQnV7RT9UqNK 5WJpigRWwlsFt8Gp+GkAdiRl3kXpBqA1H0EdPdSDbgVHQKiiny/Wbbt16Z9EQTcruu8y 9Ht9VDBWPwN1cUIfAKxFPfv1dmIM5BtpOGj8Cq8Nhs9Aca6zt4VzSAMhfdSvYol0c7p2 fyq800pgBvPgEtnLgXk0iar7OfD5mbb3tBRf5doCnoMYyKoB2E8X8c5VEuBHmP4uBju4 W8yg== X-Gm-Message-State: AAQBX9cDCTT3SJvnDXbNaUfjj/ZrrjmBGim3F/U6ZAH/FC0JGVxZ9E8x 5/qmqet1oxyQ6QOaWxRxgsx0GUgC0XNk6LCm X-Google-Smtp-Source: AKy350YCGOdNWpxPNSJcwpjqKe6Apct4Cbzrlv7LWeidWe3sKdVG3fkxwH2jR1od8/69JvygqRYEaeYcvlRO26C1 X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a81:ad1b:0:b0:541:69bc:8626 with SMTP id l27-20020a81ad1b000000b0054169bc8626mr1137749ywh.10.1679544047097; Wed, 22 Mar 2023 21:00:47 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:33 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-4-yosryahmed@google.com> Subject: [RFC PATCH 3/7] cgroup: rstat: remove cgroup_rstat_flush_irqsafe() From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The naming of cgroup_rstat_flush_irqsafe() can be confusing. It can read like "irqsave", which means that it disables interrupts throughout, but it actually doesn't. It is just "safe" to call from contexts with interrupts disabled. Furthermore, this is only used today by mem_cgroup_flush_stats(), which will be changed by a later patch to optionally allow sleeping. Simplify the code and make it easier to reason about by instead passing in an explicit @may_sleep argument to cgroup_rstat_flush(), which gets passed directly to cgroup_rstat_flush_locked(). Signed-off-by: Yosry Ahmed --- block/blk-cgroup.c | 2 +- include/linux/cgroup.h | 3 +-- kernel/cgroup/cgroup.c | 4 ++-- kernel/cgroup/rstat.c | 24 +++++------------------- mm/memcontrol.c | 6 +++--- 5 files changed, 12 insertions(+), 27 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index bd50b55bdb61..3fe313ce5e6b 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1043,7 +1043,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v) if (!seq_css(sf)->parent) blkcg_fill_root_iostats(); else - cgroup_rstat_flush(blkcg->css.cgroup); + cgroup_rstat_flush(blkcg->css.cgroup, true); rcu_read_lock(); hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 3410aecffdb4..743c345b6a3f 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -691,8 +691,7 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) * cgroup scalable recursive statistics. */ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu); -void cgroup_rstat_flush(struct cgroup *cgrp); -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp); +void cgroup_rstat_flush(struct cgroup *cgrp, bool may_sleep); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(void); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 935e8121b21e..58df0fc379f6 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5393,7 +5393,7 @@ static void css_release_work_fn(struct work_struct *work) if (ss) { /* css release path */ if (!list_empty(&css->rstat_css_node)) { - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(cgrp, true); list_del_rcu(&css->rstat_css_node); } @@ -5406,7 +5406,7 @@ static void css_release_work_fn(struct work_struct *work) /* cgroup release path */ TRACE_CGROUP_PATH(release, cgrp); - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(cgrp, true); spin_lock_irq(&css_set_lock); for (tcgrp = cgroup_parent(cgrp); tcgrp; diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index af11e28bd055..fe8690bb1e1c 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -243,30 +243,16 @@ static bool should_skip_flush(void) * This function is safe to call from contexts that disable interrupts, but * @may_sleep must be set to false, otherwise the function may block. */ -__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp) +__bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp, bool may_sleep) { if (should_skip_flush()) return; - might_sleep(); - spin_lock(&cgroup_rstat_lock); - cgroup_rstat_flush_locked(cgrp, true); - spin_unlock(&cgroup_rstat_lock); -} - -/** - * cgroup_rstat_flush_irqsafe - irqsafe version of cgroup_rstat_flush() - * @cgrp: target cgroup - * - * This function can be called from any context. - */ -void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp) -{ - if (should_skip_flush()) - return; + if (may_sleep) + might_sleep(); spin_lock(&cgroup_rstat_lock); - cgroup_rstat_flush_locked(cgrp, false); + cgroup_rstat_flush_locked(cgrp, may_sleep); spin_unlock(&cgroup_rstat_lock); } @@ -325,7 +311,7 @@ void cgroup_rstat_exit(struct cgroup *cgrp) { int cpu; - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(cgrp, true); /* sanity check */ for_each_possible_cpu(cpu) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e0e92b38fa51..72cd44f88d97 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -644,7 +644,7 @@ static void __mem_cgroup_flush_stats(void) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; - cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup); + cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false); atomic_set(&stats_flush_threshold, 0); spin_unlock(&stats_flush_lock); } @@ -7745,7 +7745,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) break; } - cgroup_rstat_flush(memcg->css.cgroup); + cgroup_rstat_flush(memcg->css.cgroup, true); pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE; if (pages < max) continue; @@ -7810,7 +7810,7 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size) static u64 zswap_current_read(struct cgroup_subsys_state *css, struct cftype *cft) { - cgroup_rstat_flush(css->cgroup); + cgroup_rstat_flush(css->cgroup, true); return memcg_page_state(mem_cgroup_from_css(css), MEMCG_ZSWAP_B); } From patchwork Thu Mar 23 04:00:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89D31C6FD1D for ; Thu, 23 Mar 2023 04:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230234AbjCWEBA (ORCPT ); Thu, 23 Mar 2023 00:01:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229846AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E0FA20062 for ; Wed, 22 Mar 2023 21:00:49 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-536a4eba107so209617297b3.19 for ; Wed, 22 Mar 2023 21:00:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xrG0SezuE3D1BzHk1bOIN+tOwhU+wFsyLck18RLDkiM=; b=cgw7YjSYw+yZCrYg4MWNWyL1ocWVlMfhI6U/IH5SNNI/KXZEirBQTnVHw1Jk19ca7x YZMRDXY2GIXRShVJyUBfGcwrm5OQ54Bn3XOEBif3BbHD9ArewU0b2vLyFm38bp8+NRx8 4lwk3DHRAIpY5JhlVhZi+qJFnS5DTPOFtneU65+9wwMhEPN3ScjeohcSLADEGreopxgz hPSmYPd9gVS979yLkwNh9j+9QGCIU5M1yATespEadqsIAym+pG18x2O6VFUlYJCK8ejc v8yJF8c/6m0MiyUkNdlDzwM4KjC9ikuULQ0wGXuneSrTmETcMqRGnUz3Mhll67JgWKef njxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xrG0SezuE3D1BzHk1bOIN+tOwhU+wFsyLck18RLDkiM=; b=JGhYjCVE2UQ0NQs4cppVPHUjcAE9wefElqgDMMI7jYWr9jFb08LavNAfeKh1GQyVNS 1YJ60tJEVyjYyeFlwF24xYC7kJerxwA+gBHsW2BpjsOFSB5blA7WVAcsGi8kypZx/+4D GulkvnRUqmoQX4tGNHY+bLLw9HJHYfUl7mocjZObbYzYvuSIPHbsVtiEjbTpP0cXRv6Y WjmZTy+FLVgrY8dFgPKxGg61ieaEyBBwxNRCcfeTvBo5/oPps3l7DtjOBZX1NhDs4jAT 2YrTvDkygeASPR9LCycW+N/g48MYlmNassOGgaxNqzmhFrC2Pu+f/4icIGynEHK9OZhp iC+A== X-Gm-Message-State: AAQBX9cdpufzr770R5vmvcnBFMyTKRem77TIrZnw1QFVGa9QN0KZkIa3 eMk6nC29ncv/DB376BMsx5nmcGHgc73YTVo7 X-Google-Smtp-Source: AKy350bTQGWCAJk1730fCXLD+nAem5ueAaNE2q7H+XY1RCF27byWvmJXxhgyQR2PhcI0/Laslob9u/xZD8mfS05D X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a81:b149:0:b0:544:bb1e:f9cf with SMTP id p70-20020a81b149000000b00544bb1ef9cfmr1185027ywh.4.1679544048914; Wed, 22 Mar 2023 21:00:48 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:34 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-5-yosryahmed@google.com> Subject: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Enable choosing whether sleeping is allowed or not when flushing memcg stats, and allow sleeping in safe contexts to avoid unnecessarily performing a lot of work without sleeping. Signed-off-by: Yosry Ahmed --- include/linux/memcontrol.h | 8 ++++---- mm/memcontrol.c | 35 ++++++++++++++++++++++------------- mm/vmscan.c | 2 +- mm/workingset.c | 3 ++- 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b6eda2ab205d..0c7b286f2caf 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1036,8 +1036,8 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return x; } -void mem_cgroup_flush_stats(void); -void mem_cgroup_flush_stats_delayed(void); +void mem_cgroup_flush_stats(bool may_sleep); +void mem_cgroup_flush_stats_delayed(bool may_sleep); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1531,11 +1531,11 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, return node_page_state(lruvec_pgdat(lruvec), idx); } -static inline void mem_cgroup_flush_stats(void) +static inline void mem_cgroup_flush_stats(bool may_sleep) { } -static inline void mem_cgroup_flush_stats_delayed(void) +static inline void mem_cgroup_flush_stats_delayed(bool may_sleep) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 72cd44f88d97..39a9c7a978ae 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -634,7 +634,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) } } -static void __mem_cgroup_flush_stats(void) +static void __mem_cgroup_flush_stats(bool may_sleep) { /* * This lock can be acquired from interrupt context, but we only acquire @@ -644,26 +644,26 @@ static void __mem_cgroup_flush_stats(void) return; flush_next_time = jiffies_64 + 2*FLUSH_TIME; - cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false); + cgroup_rstat_flush(root_mem_cgroup->css.cgroup, may_sleep); atomic_set(&stats_flush_threshold, 0); spin_unlock(&stats_flush_lock); } -void mem_cgroup_flush_stats(void) +void mem_cgroup_flush_stats(bool may_sleep) { if (atomic_read(&stats_flush_threshold) > num_online_cpus()) - __mem_cgroup_flush_stats(); + __mem_cgroup_flush_stats(may_sleep); } -void mem_cgroup_flush_stats_delayed(void) +void mem_cgroup_flush_stats_delayed(bool may_sleep) { if (time_after64(jiffies_64, flush_next_time)) - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(may_sleep); } static void flush_memcg_stats_dwork(struct work_struct *w) { - __mem_cgroup_flush_stats(); + __mem_cgroup_flush_stats(true); queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); } @@ -1570,7 +1570,7 @@ static void memory_stat_format(struct mem_cgroup *memcg, char *buf, int bufsize) * * Current memory state: */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { u64 size; @@ -3671,7 +3671,11 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) unsigned long val; if (mem_cgroup_is_root(memcg)) { - mem_cgroup_flush_stats(); + /* + * mem_cgroup_threshold() calls here from irqsafe context. + * Don't sleep. + */ + mem_cgroup_flush_stats(false); val = memcg_page_state(memcg, NR_FILE_PAGES) + memcg_page_state(memcg, NR_ANON_MAPPED); if (swap) @@ -4014,7 +4018,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) int nid; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (stat = stats; stat < stats + ARRAY_SIZE(stats); stat++) { seq_printf(m, "%s=%lu", stat->name, @@ -4090,7 +4094,7 @@ static int memcg_stat_show(struct seq_file *m, void *v) BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats)); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) { unsigned long nr; @@ -4594,7 +4598,12 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages, struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css); struct mem_cgroup *parent; - mem_cgroup_flush_stats(); + /* + * wb_writeback() takes a spinlock and calls + * wb_over_bg_thresh()->mem_cgroup_wb_stats(). + * Do not sleep. + */ + mem_cgroup_flush_stats(false); *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY); *pwriteback = memcg_page_state(memcg, NR_WRITEBACK); @@ -6596,7 +6605,7 @@ static int memory_numa_stat_show(struct seq_file *m, void *v) int i; struct mem_cgroup *memcg = mem_cgroup_from_seq(m); - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(true); for (i = 0; i < ARRAY_SIZE(memory_stats); i++) { int nid; diff --git a/mm/vmscan.c b/mm/vmscan.c index 9c1c5e8b24b8..59d1830d08ac 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(); + mem_cgroup_flush_stats(false); /* * Determine the scan balance between anon and file LRUs. diff --git a/mm/workingset.c b/mm/workingset.c index 00c6f4d9d9be..042eabbb43f6 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -462,7 +462,8 @@ void workingset_refault(struct folio *folio, void *shadow) mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - mem_cgroup_flush_stats_delayed(); + /* Do not sleep with RCU lock held */ + mem_cgroup_flush_stats_delayed(false); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Mar 23 04:00:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D5E4C6FD1C for ; Thu, 23 Mar 2023 04:01:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230274AbjCWEBD (ORCPT ); Thu, 23 Mar 2023 00:01:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230072AbjCWEAx (ORCPT ); Thu, 23 Mar 2023 00:00:53 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 372142055B for ; Wed, 22 Mar 2023 21:00:51 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id d5-20020a17090a7bc500b0023d3366e005so321315pjl.6 for ; Wed, 22 Mar 2023 21:00:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544050; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9m7S+Ebo8MkLWAncbvKsOW8wVqp1O+0Jr3lM7bEbkLI=; b=VdCCijFiKDJ4yDz8tFMap2tBXNfAakMH2cmowf1HUrZHuzx98c171wgP+Q37bsMmOY dvm3L1704oXKLAgAkOK0GjUnYQoP54+fQujrx3ACvUW9YBMoC0EV4BPolgDqFHWKSqHY vC+M0AbP8Thw306r5f0tU0s0RnClNRrpM7iLZQZCh4urjjm1nPlLVN2cvKbhB0gks3OE JIza4tgsYELq7gTHgeHRM8vhlVtpzgxg6m7gbPXptQ7q4d0Q9K++kyZO3UiZG82ponKY DM+ht1D3A+Eg8Co2dt+ym/gky/eIx2ki3TeVSJ2PxhVMHdlX5/2KaImaACWOeUxKy2jU FXPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544050; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9m7S+Ebo8MkLWAncbvKsOW8wVqp1O+0Jr3lM7bEbkLI=; b=3PtMNMbulCinvMkiKqdAEDoeBhkUHyvKCrphoWi50YspTeYxpWWNYdglsGpYdIx2fw E0EYH+5c/sgjzC4AME1rlutnkUZzSqfpoA4T/J5OSOps9ElRyZ9fiz1U3YeNkFawT+BP dYzeCCY1+S6YwJ8bHbxta4lZGTvdZ+CgAcdlgRaUZTacL113tmkrtOGpqqwDVq3UvTml fvgNC2FtSuLdFXuvr/oac6WE9VytggvyPubyvgK8uypeJwYwT8biDGN9ttgh0X6cghQx b4ixSw2AI9gsgvZw0dKXUc/7pnDLUiqCk9qc0LkAMFHWSCPJYUd8nBQITRT74JF1twGu VqiA== X-Gm-Message-State: AO0yUKWhJ1Nz8k41VgmntYy2pCKoDOLA7fk324F9/JtnoyKw7jVWCRIS LLAY0UK/5b8YyKeXEil1/V0Hr1DWRYsnO6TN X-Google-Smtp-Source: AK7set+m4AnWZnXPXOtgLtJe2rjiQdXqc3xUA0g5EOBUkvh8xUumZjwEW1aaQclKH89BoDnkrewjWDyAajrYgNTk X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6a00:80e6:b0:623:8a88:1bba with SMTP id ei38-20020a056a0080e600b006238a881bbamr2937696pfb.2.1679544050539; Wed, 22 Mar 2023 21:00:50 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:35 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-6-yosryahmed@google.com> Subject: [RFC PATCH 5/7] vmscan: memcg: sleep when flushing stats during reclaim From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Memory reclaim should be a sleepable context. Allow sleeping when flushing memcg stats to avoid unnecessarily performing a lot of work without sleeping. This can slow down reclaim code if flushing stats is taking too long, but there is already multiple cond_resched()'s in reclaim code. Signed-off-by: Yosry Ahmed --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 59d1830d08ac..bae35cfb33c8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2845,7 +2845,7 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) * Flush the memory cgroup stats, so that we read accurate per-memcg * lruvec stats for heuristics. */ - mem_cgroup_flush_stats(false); + mem_cgroup_flush_stats(true); /* * Determine the scan balance between anon and file LRUs. From patchwork Thu Mar 23 04:00:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E539C761AF for ; Thu, 23 Mar 2023 04:01:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229846AbjCWEBB (ORCPT ); Thu, 23 Mar 2023 00:01:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229379AbjCWEAy (ORCPT ); Thu, 23 Mar 2023 00:00:54 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0B541F5EF for ; Wed, 22 Mar 2023 21:00:52 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id d5-20020a17090ac24500b0023cb04ec86fso321554pjx.7 for ; Wed, 22 Mar 2023 21:00:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544052; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IoeZYZryJtZ5YKLINZigwUk+eCMrWPKZYyl5pNjfs0s=; b=CYUJafqLeSTzFoXixoXxJfg+d8a1R4JvCUerDh64ifDjpdaxbdbWqowybN/gvJ19LK Og473fSzkkpOeMA58DC64BMRm6M0MCGDM/VGSfHLkAlVcw8rDdEYpMTrVbzG7kCaO9yo XEy99SSbQbOpczsL5Zy04pgal2f5dv+k8luu/GDpUWQB1RZl9qYQhWpR8p3p3vD98PGc ce78QmYoavhlvFbHk6YMCQ65CM/lnXEYztj/xUq2pBOa3UnU3uBgApeimhahme8eLkyC CQZkL9MM7bPQMzx+6ozclKMKwr8T8qPF+ZR71Vewcxu4v3esM5hXIWD8hj0gOLxV+qOY L9ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544052; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IoeZYZryJtZ5YKLINZigwUk+eCMrWPKZYyl5pNjfs0s=; b=gMXXKzKZs7NbxXMlIRe1g0uXkCoYtjr85x/9ZlrerK3tNtbLIHSxNn+/kUZ7rSzMdx Cc9gzIDbCGpXgAAWphmTG5/PGV2zP8nLbwCg7EDeRehEoenTqGeWlNqY2TsyOXvT6d1W PC5ZPnM4uOwaUwK8kT4MlByH0Kjl7oEch2EcjpBCnP+UdC5W0OgA2m264/xtcT0DQVUq buDUXjxJTBAzG7Bj17LHD23r6Pja4W13nqgfU+6QllpLOCXVKzoTsmgzfVHlGVNgnrwF /y0anL8E562YHYlHI3EpIZe4EL2tnNfsTGru7YmwLsJcSHvT51+Blz9WE+OREW6nY+24 Ywjw== X-Gm-Message-State: AO0yUKWTfe3fDQDVqJeO4v8g3LQ69DxojvCx8BqwT2pIy+2z1KDHrBD0 Ob4C3qwotkKxqlJ0CZBzDVliae1rWTk2E/my X-Google-Smtp-Source: AK7set9aF1O2O+BZVk6rqDRrbbtYYQ/yqA23fBe+9ZzPXGEhcCIhooyW8p0Z2epSuqN1jjR7FDL5kfmsz1ZKE44X X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a17:90a:ce11:b0:23f:d473:dd44 with SMTP id f17-20020a17090ace1100b0023fd473dd44mr1881490pju.3.1679544052257; Wed, 22 Mar 2023 21:00:52 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:36 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-7-yosryahmed@google.com> Subject: [RFC PATCH 6/7] workingset: memcg: sleep when flushing stats in workingset_refault() From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In workingset_refault(), we call mem_cgroup_flush_stats_delayed() to flush stats within an RCU read section and with sleeping disallowed. Move the call to mem_cgroup_flush_stats_delayed() above the RCU read section and allow sleeping to avoid unnecessarily performing a lot of work without sleeping. Signed-off-by: Yosry Ahmed --- A lot of code paths call into workingset_refault(), so I am not generally sure at all whether it's okay to sleep in all contexts or not. Feedback here would be very helpful. --- mm/workingset.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/workingset.c b/mm/workingset.c index 042eabbb43f6..410bc6684ea7 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -406,6 +406,8 @@ void workingset_refault(struct folio *folio, void *shadow) unpack_shadow(shadow, &memcgid, &pgdat, &eviction, &workingset); eviction <<= bucket_order; + /* Flush stats (and potentially sleep) before holding RCU read lock */ + mem_cgroup_flush_stats_delayed(true); rcu_read_lock(); /* * Look up the memcg associated with the stored ID. It might @@ -461,9 +463,6 @@ void workingset_refault(struct folio *folio, void *shadow) lruvec = mem_cgroup_lruvec(memcg, pgdat); mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr); - - /* Do not sleep with RCU lock held */ - mem_cgroup_flush_stats_delayed(false); /* * Compare the distance to the existing workingset size. We * don't activate pages that couldn't stay resident even if From patchwork Thu Mar 23 04:00:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13184918 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC397C76196 for ; Thu, 23 Mar 2023 04:01:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230331AbjCWEBG (ORCPT ); Thu, 23 Mar 2023 00:01:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229563AbjCWEA5 (ORCPT ); Thu, 23 Mar 2023 00:00:57 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABB561F919 for ; Wed, 22 Mar 2023 21:00:54 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 204-20020a250fd5000000b00b6d6655dc35so10076289ybp.6 for ; Wed, 22 Mar 2023 21:00:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679544054; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GbeROioJTSRzpGZIx65CHc5YHmhlz1VRWOdmKxovUco=; b=Hsck6MLI0adIPASRaTY3d/eUkBNtZFq5fHTozHNB8yOtl2ZkHkJQQ4GNrxQh4iSoWz JkY9CbdFxu+ujh+Pyx1oKlS53Vj0EANz1HJ/mNRwoFFcZsbYiHuLCMd4Lx5axIZ0gB5l yaad/tHsVvn1tHU57blkHy/+dS8g9HHclDAV/QT3pdal8fCNCqCcJD9lMmCv5J074fQc TEWaFK69tR0mz3hWUAlm/mUgTwQep/HE638zK7huhLNEr0Yv6vPjLeQWwk4UQCFbwI31 KRefecpL1tiBaS/Y5Rnpmz8ntCNmT0BogLXtNGk9m7i1zmm8BH4noVNiIwwFJbgEqIWE ZH9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679544054; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GbeROioJTSRzpGZIx65CHc5YHmhlz1VRWOdmKxovUco=; b=qvIdspQU3g7YJv+iFa/9T2/7yifbglw7cYvllIoSnOTdHJwA3JDIVmbvUX4mKdh5Oe ZYAqHg+EYOiFmcuMDL2mJ+8UDaqTPKeLptEW57xwHCBhrNKygTG7YnG1UR287BSnlZFz v3PoNf6NUugZznfhkA/VeoZHJPb9CHGJu2w1yTakFf3lnI2hXrKA0LGfzOJfBe/qiYK4 xAWGHwEKHBwpBUfR7emcvy22vmaIr1rQO3jjWNwQ4BlB5MEXeFA5SM9svaAC+j9CnFFy XoUvnjzrvJwqXzcVoIyP3owFWOwbpnM3vOKmKzw2cIYj5HAXOHVu1sQZlQccJxPrHptS q5vw== X-Gm-Message-State: AAQBX9fhBtrDzo45odZLRDrIkk6xzFQlFyNoe8/PyCspZ8CT7SWMGwV8 JOS4Ahe3aTskTX03NVfGaUxea+9LLINNOCHB X-Google-Smtp-Source: AKy350aHC0fJxZVNDyOM4vrsmksnjt2c5UEEiO2IoOQY1+2DycjUjKQKdK6RunG6Yd33ZztPbzyijCwEZgPQmgPG X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:1549:b0:b73:4a25:fd36 with SMTP id r9-20020a056902154900b00b734a25fd36mr1286158ybu.2.1679544053971; Wed, 22 Mar 2023 21:00:53 -0700 (PDT) Date: Thu, 23 Mar 2023 04:00:37 +0000 In-Reply-To: <20230323040037.2389095-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230323040037.2389095-8-yosryahmed@google.com> Subject: [RFC PATCH 7/7] memcg: do not modify rstat tree for zero updates From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org In some situations, we may end up calling memcg_rstat_updated() with a value of 0, which means the stat was not actually updated. An example is if we fail to reclaim any pages in shrink_folio_list(). Do not add the cgroup to the rstat updated tree in this case, to avoid unnecessarily flushing it. Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 39a9c7a978ae..7afd29399409 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -618,6 +618,9 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { unsigned int x; + if (!val) + return; + cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id()); x = __this_cpu_add_return(stats_updates, abs(val));