From patchwork Wed Aug 30 17:53:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13370361 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EEC2C83F01 for ; Wed, 30 Aug 2023 17:53:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E510B28005B; Wed, 30 Aug 2023 13:53:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E012328005A; Wed, 30 Aug 2023 13:53:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF01328005B; Wed, 30 Aug 2023 13:53:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C111A28005A for ; Wed, 30 Aug 2023 13:53:41 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 74B9F8025F for ; Wed, 30 Aug 2023 17:53:41 +0000 (UTC) X-FDA: 81181518642.10.14981E2 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf01.hostedemail.com (Postfix) with ESMTP id B45BF40002 for ; Wed, 30 Aug 2023 17:53:39 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DzI6533N; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3IoLvZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3IoLvZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693418019; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=Qn85V7xkChNZBzxdpkci4eOIaMaLPSl7wYjvXs+DpjY=; b=RjWriDn5jKI54VVHfhuLPza8CKqq54uw66B0atopcvHyR80oVjnOXT1/SlVVLdkSInoGBU UqFksUkJF2kBWxMM4sQdpsqkdreksj/0uyw0oHQ0ejSMeC7yPg1JtbuNjxdl+ok+VnM+Gk Cdg+0iBpKSlJIOL1SMTE8d1rGdlpHjQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=DzI6533N; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3IoLvZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3IoLvZAoKCBMH7BAHt05xwz77z4x.v75416DG-553Etv3.7Az@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693418019; a=rsa-sha256; cv=none; b=RDs9pZY43Zqcv7IXjpbeiXrv9avJHuhzOIidxdaVLucif8tv5dt+1imLi8sPGwf4fK/V7x BIUMe3RFF5HE0gg2Agx32MCxkNM5xv/Z0vqfH6TsXna9Ojsn7xyYSBohMp5HOrKmRgwiT6 7BS6XVS92S95e1aqWm5LAEksRxQo47o= Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-68a56401b5bso7602567b3a.0 for ; Wed, 30 Aug 2023 10:53:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693418018; x=1694022818; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Qn85V7xkChNZBzxdpkci4eOIaMaLPSl7wYjvXs+DpjY=; b=DzI6533NYOt0kdmXW9Ixaaq13CxI6Q6UswkeQg4Bf2RJUh7UcXi1RE6rAVWura17Gv xfdEJIzjzyDHO9Hum5uSTwDCyju4uzmy5H0xztqEucD+nCygwnkiCvBYnkNXSfa6hcsL WQua5/MNPun98+ECnWERvld8uoHbapNy2IVoP7HUOD0tTXqluQJKsRk2D0TnviPRJx4q GUBpq96CBYMvfGNVmmSjNXigNVSyx0osjoAqdis0wSP9oaXCdXjL/Ds8DnpgkTUVMNUQ jKtzef7uIkeSzG+fCZRIoliRt1ogSSJE2ln9rTTfRpHwanK2zGhhhNUdT6Yb8Jwp06+O i+Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693418018; x=1694022818; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Qn85V7xkChNZBzxdpkci4eOIaMaLPSl7wYjvXs+DpjY=; b=XO1M0QbKwsf71bDlMoPcbW44y7CzXld+yuv0HUOJMuP2jPtPCViNdOe4t58m6MYfB4 h8ftaS2Zt5E/YBUFQm8UjhqGks/013h3v7UuOK1WByWs0pe0QCv4V+mkFJ5qvMh7OnxS K7/SBT6etTRNLG6sAvsnczZG8CU0FcDlWkNC8zdmOITqSO/+aJn7NavbAyg248RQ9CvQ /0zE1b5rr6in26yinFnHwY+kByx0B6pVx3zrHSQ0ozqHInjGtq9oIOCHB4xQ4w0cTMjn PzVXlgsBD5pM4nTVftjDyaenzYXqECC1FdYonQ0d+2q52B5n49COjUPDqOeV5KK0D/Yj qYTA== X-Gm-Message-State: AOJu0YxYtGmBwJgbJILZET3V/hoF08t6unnK+Is3jODRUqwMaG4CHVLN yMc9jcGcZoXlHF08e0XB9YSxlJNq/qBFG/zv X-Google-Smtp-Source: AGHT+IGGDKKfoNP2xfxTBpw+9N5MksrPBIQXDSHS7eCPZsqXaSM9BVLpOjsAYkP4jO89eNOeY/j/0cNO6HzdBDkK X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6a00:17a5:b0:68a:6082:2c54 with SMTP id s37-20020a056a0017a500b0068a60822c54mr1000970pfg.6.1693418018100; Wed, 30 Aug 2023 10:53:38 -0700 (PDT) Date: Wed, 30 Aug 2023 17:53:31 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.42.0.rc2.253.gd59a3bf2b4-goog Message-ID: <20230830175335.1536008-1-yosryahmed@google.com> Subject: [PATCH v3 0/4] memcg: non-unified flushing for userspace stats From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Waiman Long , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Yosry Ahmed X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B45BF40002 X-Stat-Signature: k9z68oj81ou8ke49143bx7h6npiknj3n X-Rspam-User: X-HE-Tag: 1693418019-617380 X-HE-Meta: U2FsdGVkX1/OkF5umsGm5zOlp4Em8YvhttXNM3uu6c0tdPxZiFAtNPcLJuwQnujHANOgoayk0pWrTCJBmDKNtgBN0yu2lS9KGQ8U+28crwaOX+Wh8jnxbkyWfJeslnKqPsNdRO1WBusp58Rk/NZMsdPWA66ok3tiWWlNOfPrwOhobRHQ0CmRhmVX6oZBPyzR+LOrWZ0bSOr+ll+Zd07wIsblE3VbUe+C5afbgmsDVye1ytlEXfybMPFwLYzVHn9w5rS2AYt2IPihCTNIPr6oqMeha/+MXgxwytkHYJqQNEgO6TvnE3KfFmsvtyBcVNWw/56ov3EFwV7fiFuRzIEM79Olnprkvzi3PU2dss4Ugt4mgfnlfXS6hMzV7EM5y5Tgq8UT1j50UsfBfJ6bgD1abh+wW9M9Y9Z/88z+YMV4DD+RU4lGAkrE1UQ50Kp3jJih0N3vbmpbI4DUVBQMX8DdqQ8hwAa6MAChtZULvg/SenR/2BFBAgojNFkKexNzIs4DibnHX+PnffM7okS0gGOh8vMyoULo52af8CQinPApZtG0DuAPNv6EIiC8dyNpknr0Zo6lpQ+5Q90ogBLKWcMi2VzFmBSr5dbc8OEzlKlrlF+HOFYmABstURR0WetZx+iBZbzHcIFNXGqPylaGHRG2djp+sa3FhhuDpTWJPMW/CiN/f2F1gQiMYt/vdvNl/qM2tI/cdpEJ53OEkPi2/xPfW4amL7I5RQp71RwV3v3u0N8lz3c0NR+m/xciU5rIC0i2S3z/rRe6nF2KA/JFs+MC/slYsYH0xwxo7ua5Dx5IXpUIEfzXG3g1ghwTgDzFvv4zekuYNDGXM1kXBuCkSriOmw0+dEceuaMNaW8B4fHUMQWLCesmauiItBx/9GNiaPkK0LQabQJDSY0W5tSeYD/Y7DioKTwDNBddSwn6/T/Me953qPiIAJnHdrjvTKl6BsLZGiUKts+ScDUoaoOUojb Iv3fFaTF EchCDjv/ntqSEvp/dYjHx6UgfPAUtOXGpU/DIvSjxV4qC+SMLWMJSMKsBwFUq9V4HO4RrKBN1noL4OmKfmZfpeDnPInSbhaMt3tXE/S3y6MRFm7kSeHEDXihVGvBP5ZhK/0hMwhG9hjbyx0TQ2WAbvzMAG/TKh16UAT8kJj6Ud5+3n3vaehE8dQYSEuGDvsKTjXFrx6W+Ln/DBL1PhycG3M8VbzBt/CHT0dzUtr91iF4TmA6vn90jpTLD2ss6fmdewYfjdTJmY7t6sGIm9MAdCpbvYk5IAfjR5IJik3XMF4HoREycJdGg52I8WIsTY4yebWBOQtOV2x4Al9Uz2lQOxXj10/PXPrhVikedgwb/7FU0lroTlBOKzdJBpwNHhtEowtkS0R8Gjl2280YCpUDC4KUGnLc7xVeAqD5TXw87nFRdjKoQgXiD4lxqXOw98c34owfzYf2tb/qybNgju5zOhSdglghdHPtHFdyNM9G7Rquucl4gaGuqw5MzyzacHAJdIQsdWpiFcVWfT60TvXTa0c1cb/J0APN/gquRCGXmnii/MhYIoGTVjFJdIOYG+pqscZFxvkaZBLqLnMaT7JCGVfvAubJq7sTDe/WixcNajJfBjqw45oIPaPeKmLbahHHqF43ayMk1keIlxUhieiJSlVr32oQK4JzfF4yfyVqHy/u13mNGrjk+wTcFAUzN9MjM3eT0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Most memcg flushing contexts using "unified" flushing, where only one flusher is allowed at a time (others skip), and all flushers need to flush the entire tree. This works well with high concurrency, which mostly comes from in-kernel flushers (e.g. reclaim, refault, ..). For userspace reads, unified flushing leads to non-deterministic stats staleness and reading cost. This series clarifies and documents the differences between unified and non-unified flushing (patches 1 & 2), then opts userspace reads out of unified flushing (patch 3). This patch series is a follow up on the discussion in [1]. That was a patch that proposed that userspace reads wait for ongoing unified flushers to complete before returning. There were concerns about the latency that this introduces to userspace reads, especially with ongoing reports of expensive stat reads even with unified flushing. Hence, this series follows a different approach, by opting userspace reads out of unified flushing completely. The cost of userspace reads are now determinstic, and depend on the size of the subtree being read. This should fix both the *sometimes* expensive reads (due to flushing the entire tree) and occasional staless (due to skipping flushing). I attempted to remove unified flushing completely, but noticed that in-kernel flushers with high concurrency (e.g. hundreds of concurrent reclaimers). This sort of concurrency is not expected from userspace reads. More details about testing and some numbers in the last patch's changelog. v2 -> v3: - Renamed stats_flush_ongoing to stats_unified_flush_ongoing in patch 1 for more clarity. - Added a mutex for flushes by userspace readers to guard the internal globla rstat lock from being hogged by userspace readers as suggested by Tejun Heo and Waiman Long. - Fixed a bug in v2, where patch 4 also opted mem_cgroup_wb_stats() out of unified flushing by mistake. mem_cgroup_wb_stats() is not for userspace reading. v2: https://lore.kernel.org/lkml/20230828233319.340712-1-yosryahmed@google.com/ Yosry Ahmed (4): mm: memcg: properly name and document unified stats flushing mm: memcg: add a helper for non-unified stats flushing mm: memcg: let non-unified root stats flushes help unified flushes mm: memcg: use non-unified stats flushing for userspace reads include/linux/memcontrol.h | 8 +-- mm/memcontrol.c | 106 +++++++++++++++++++++++++++---------- mm/vmscan.c | 2 +- mm/workingset.c | 4 +- 4 files changed, 85 insertions(+), 35 deletions(-)