From patchwork Mon Feb 26 00:57:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13571111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA44AC47DD9 for ; Mon, 26 Feb 2024 00:57:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CBED6B0123; Sun, 25 Feb 2024 19:57:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0554C6B0124; Sun, 25 Feb 2024 19:57:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E37D46B0125; Sun, 25 Feb 2024 19:57:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CA4FD6B0123 for ; Sun, 25 Feb 2024 19:57:58 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6B91C80311 for ; Mon, 26 Feb 2024 00:57:58 +0000 (UTC) X-FDA: 81832143036.14.E3B51DB Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf14.hostedemail.com (Postfix) with ESMTP id CBC88100008 for ; Mon, 26 Feb 2024 00:57:56 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IF/b+kcb"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708909076; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=o+qnvS94w9j3asRgT4N2c6CzHzKjC/brNsXbrYMqqhc=; b=PRlaZgATiDWhOHYqcbCrGhha7swu/8fOLIaDV8f/N1i75Z+lie6zsXS6R5Ko0eBdrKSE4D 98FHIv4aiTmjpoz84r1B5kA6rynah6gpCVJSQyCPM8kieYmCNU3Nr9+OozOLrRkCj7Nu/z YYzEAFKVWQnWxNm5886m1FPIXaEThy0= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="IF/b+kcb"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708909076; a=rsa-sha256; cv=none; b=zEC5Q578EhV/1hqF5P5MB2PawHYlKWb5JBHC9ZeSrvVZ4pjU3lFCKIwIMLGtW4l2BXO1KB ZDn22KrUqfRPxH/jGqkhj9O60Smxc/J+ti0KS0jgzAUM1Y1lX6/qBkGCBKnmKjrWuhbCYE FZXrXnu5hn0fTgyxT2jeRbSGROL8zCo= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1dca160163dso3543665ad.3 for ; Sun, 25 Feb 2024 16:57:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708909075; x=1709513875; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=o+qnvS94w9j3asRgT4N2c6CzHzKjC/brNsXbrYMqqhc=; b=IF/b+kcbY44jrCD1YDJgA3YuHenouNLD18CH12TJe92g+0U5QQBkaxFMGx/19m7P21 W2piU/8uyK21nIg8am5RT6bhPurxS3l8MWFxxnWVM5FSc1Ow+SL55r2VG1GwWXMMvAo6 at5BkJMhz0S+Mn+fj7mcMQqzhTIt3VxejH4o0v1gKrehKOfy3CEhzShNtgZkkTuYSGWv vPSNV6tmtaT/N9OwHA+be9S6byZBAUoEH3YXLqTcX15I7gjpTU9xvM/Dyb7+azzUAlJu Ii2As2+eJiao2dBICdV6GZp4ng3+YiMaFJezaM6QIv1RPjZsKuCyo629WH0wjhy1wqSQ jqmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708909075; x=1709513875; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=o+qnvS94w9j3asRgT4N2c6CzHzKjC/brNsXbrYMqqhc=; b=qzeherKqO4Rb8iQ8ozrXxlxAD6wsBm1xMB/fIz313Om9jgDFKVvNW6mEtuXj3aMH8Y nklHBeiIx46/dKVXOs2wqtHvCXeVFA3VHFJU6nfLH1IH9j9VH6AVYnELPC98ArK2NEpJ 04ZVDWdf65oxShW21IInbdNYnYfEoKx125pG+ECNo4l5WPI5kQmH27/WuTzqHgbIUFsX ZXCr7l94UM83DeHLbGy3Y4e4xllZbAst3wssuZodk1EkrxH8JDQsTE1pfDuL+RO15Yqs O0pMzw8EgWHGHNg7aiVrYmCbGzgKhS1k7/msy1f9h8IvrBVhvlldxCHuv+3D0uB9LyQ1 jLrw== X-Forwarded-Encrypted: i=1; AJvYcCVd88hdJqyyz074Z2TBcZIaQ6OzNx24y75mfcu+hNMrjPL6KBGkK14EJvO5HIOigBU/icYo8T/cHl0DEaeMjXBFnmE= X-Gm-Message-State: AOJu0Yza5pISwkfxNlQpxS/112xM/2ns/AzJYQFU8L8jj9KU+xRU1NZv Z/IO9FnN34q4O9XI3F3a3rsMDoa5gX6i5fhE5E3x7+NMnY0cLrYH X-Google-Smtp-Source: AGHT+IEEnPIrezItgpQtz9vMVuK973GBAY4uUa4MDgiciRSiYNpMa0fD9CU4OWpdrUxvsNp+guEXdg== X-Received: by 2002:a17:903:11c9:b0:1dc:7887:7723 with SMTP id q9-20020a17090311c900b001dc78877723mr8144378plh.61.1708909075362; Sun, 25 Feb 2024 16:57:55 -0800 (PST) Received: from barry-desktop.hub ([2407:7000:8942:5500:fae4:3bff:fecb:410]) by smtp.gmail.com with ESMTPSA id mq3-20020a170902fd4300b001d949e663d5sm2741392plb.31.2024.02.25.16.57.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 25 Feb 2024 16:57:55 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, damon@lists.linux.dev, sj@kernel.org, linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Barry Song , Minchan Kim , Michal Hocko , Johannes Weiner Subject: [PATCH v2] mm: madvise: pageout: ignore references rather than clearing young Date: Mon, 26 Feb 2024 13:57:39 +1300 Message-Id: <20240226005739.24350-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: c9zaa6xkem55ax6tj5c5eqszs6r3cj4u X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: CBC88100008 X-HE-Tag: 1708909076-541730 X-HE-Meta: U2FsdGVkX1/dSAChQACn4VCVrBBGw8+2s3m+52NFrZH1or8rQbvMfrk6LSXeXv6k6EN8uW8xT+FPsgrqt4w5kTtPLEd8ZpCUxpJHaNRIpu2kLLIJx0kPswbgNtztPpAyryWaDzy+G08mIgAzI9JoYKiRz7lpUKMaZa17HOUgkkTDbVTehHMOOph/vqjrviNR1PyBjQmkIi0YQoUaqmdmDEzOzEQoNdnyDMw214Aj0jfy0RtNv0O+ectkC0UsAHQV38zw1z/tJc8y8YIdZIZIaWoiUkUHnjaTGP1402dpaIyg7OVDTIk9r/s7aeOggRQ3x7FkhZkPOwqnOUH1UgEsE+XrdDXrKTZSM6FP0L5ez0ZTz5xd5nQ3fnuA7S241877S2GNmgSi7HpHblS6+vX7lFOfCBIZoBMyOLvUrBeioAx//F+6/V5/+JZktwvtj2R1JKDxdeuQDYx3JbrL9Qz1z+arM1+Tn/LUP1W3+kv/73qd/cs7Vfe+TJ6/A1t3R/Gd6TuMi1x2rt8gxicwXZntPFmX+ywR1FfT4EoXTz6H3wVPo3R4DYFv6v5vgGqzaZO/pNz4RJwz0ZCPrKFx19Oc6J4pcvtMYVDiy8EFPp0+z+dskTajX4ERtEYGhxqUmjoe0uwsNi/pebup1ClqEGoYXjHaQGomyD6Y8XbDT4P73RqOXvr2X6DMboj5TV362x6JkhS1sIisKLjT4gnNHxay/nQad77b31UrL07LMmiDj/E4bDgg6V2ivBrFL/rg4cktSirfoUn6p/jkaZN0NovsaSogbXjPekA9o4JIb3WTSLOHUZkf4pSOsL2XwW1K23oWSujYDlkadsmAhLRo+Gm9W2rjs1IhN9xMujf2r6osJ/+Ps4MRkJiA/F5+UoFtryrjA+zv+kkCXNU3qiM9mXSjZQ/igfSWvmsA7GfntAxRH/V8a5LdUymbF2wrlW++Ls2dJq+MqDk0vY4enenyEfY HN7DJxLT NXAhUQ8uDeCoqhUPLOIxw2UsQ+xBD7lIGUVECcNfAeywDp6f0xjWZz0SeyF5PdxD+idFQ0tLAsGPqlZlEDxLJC04rFjKY0pGdPMU/MYEVyYD14V0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song While doing MADV_PAGEOUT, the current code will clear PTE young so that vmscan won't read young flags to allow the reclamation of madvised folios to go ahead. It seems we can do it by directly ignoring references, thus we can remove tlb flush in madvise and rmap overhead in vmscan. Regarding the side effect, in the original code, if a parallel thread runs side by side to access the madvised memory with the thread doing madvise, folios will get a chance to be re-activated by vmscan (though the time gap is actually quite small since checking PTEs is done immediately after clearing PTEs young). But with this patch, they will still be reclaimed. But this behaviour doing PAGEOUT and doing access at the same time is quite silly like DoS. So probably, we don't need to care. Or ignoring the new access during the quite small time gap is even better. For DAMON's DAMOS_PAGEOUT based on physical address region, we still keep its behaviour as is since a physical address might be mapped by multiple processes. MADV_PAGEOUT based on virtual address is actually much more aggressive on reclamation. To untouch paddr's DAMOS_PAGEOUT, we simply pass ignore_references as false in reclaim_pages(). A microbench as below has shown 6% decrement on the latency of MADV_PAGEOUT, #define PGSIZE 4096 main() { int i; #define SIZE 512*1024*1024 volatile long *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); for (i = 0; i < SIZE/sizeof(long); i += PGSIZE / sizeof(long)) p[i] = 0x11; madvise(p, SIZE, MADV_PAGEOUT); } w/o patch w/ patch root@10:~# time ./a.out root@10:~# time ./a.out real 0m49.634s real 0m46.334s user 0m0.637s user 0m0.648s sys 0m47.434s sys 0m44.265s Cc: SeongJae Park Cc: Minchan Kim Cc: Michal Hocko Cc: Johannes Weiner Signed-off-by: Barry Song Acked-by: Minchan Kim --- -v2: Add description about paddr's pageout of DAMON in commit message (Thanks for SeongJae's help on clarifying lots of things about DAMON) mm/damon/paddr.c | 2 +- mm/internal.h | 2 +- mm/madvise.c | 8 ++++---- mm/vmscan.c | 12 +++++++----- 4 files changed, 13 insertions(+), 11 deletions(-) diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c index 081e2a325778..5e6dc312072c 100644 --- a/mm/damon/paddr.c +++ b/mm/damon/paddr.c @@ -249,7 +249,7 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s) put_folio: folio_put(folio); } - applied = reclaim_pages(&folio_list); + applied = reclaim_pages(&folio_list, false); cond_resched(); return applied * PAGE_SIZE; } diff --git a/mm/internal.h b/mm/internal.h index 93e229112045..36c11ea41f47 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -868,7 +868,7 @@ extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long, unsigned long, unsigned long); extern void set_pageblock_order(void); -unsigned long reclaim_pages(struct list_head *folio_list); +unsigned long reclaim_pages(struct list_head *folio_list, bool ignore_references); unsigned int reclaim_clean_pages_from_list(struct zone *zone, struct list_head *folio_list); /* The ALLOC_WMARK bits are used as an index to zone->watermark */ diff --git a/mm/madvise.c b/mm/madvise.c index abde3edb04f0..44a498c94158 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -386,7 +386,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, return 0; } - if (pmd_young(orig_pmd)) { + if (!pageout && pmd_young(orig_pmd)) { pmdp_invalidate(vma, addr, pmd); orig_pmd = pmd_mkold(orig_pmd); @@ -410,7 +410,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, huge_unlock: spin_unlock(ptl); if (pageout) - reclaim_pages(&folio_list); + reclaim_pages(&folio_list, true); return 0; } @@ -490,7 +490,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - if (pte_young(ptent)) { + if (!pageout && pte_young(ptent)) { ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); ptent = pte_mkold(ptent); @@ -524,7 +524,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, pte_unmap_unlock(start_pte, ptl); } if (pageout) - reclaim_pages(&folio_list); + reclaim_pages(&folio_list, true); cond_resched(); return 0; diff --git a/mm/vmscan.c b/mm/vmscan.c index 402c290fbf5a..ba2f37f46a73 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2102,7 +2102,8 @@ static void shrink_active_list(unsigned long nr_to_scan, } static unsigned int reclaim_folio_list(struct list_head *folio_list, - struct pglist_data *pgdat) + struct pglist_data *pgdat, + bool ignore_references) { struct reclaim_stat dummy_stat; unsigned int nr_reclaimed; @@ -2115,7 +2116,7 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list, .no_demotion = 1, }; - nr_reclaimed = shrink_folio_list(folio_list, pgdat, &sc, &dummy_stat, false); + nr_reclaimed = shrink_folio_list(folio_list, pgdat, &sc, &dummy_stat, ignore_references); while (!list_empty(folio_list)) { folio = lru_to_folio(folio_list); list_del(&folio->lru); @@ -2125,7 +2126,7 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list, return nr_reclaimed; } -unsigned long reclaim_pages(struct list_head *folio_list) +unsigned long reclaim_pages(struct list_head *folio_list, bool ignore_references) { int nid; unsigned int nr_reclaimed = 0; @@ -2147,11 +2148,12 @@ unsigned long reclaim_pages(struct list_head *folio_list) continue; } - nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid)); + nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid), + ignore_references); nid = folio_nid(lru_to_folio(folio_list)); } while (!list_empty(folio_list)); - nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid)); + nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid), ignore_references); memalloc_noreclaim_restore(noreclaim_flag);