From patchwork Sat Nov 20 20:12:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shakeel Butt X-Patchwork-Id: 12630587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42C78C433EF for ; Sat, 20 Nov 2021 20:13:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 603EC6B0071; Sat, 20 Nov 2021 15:12:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5AFEC6B0072; Sat, 20 Nov 2021 15:12:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49E4F6B0073; Sat, 20 Nov 2021 15:12:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 3ACEF6B0071 for ; Sat, 20 Nov 2021 15:12:56 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id F35448276624 for ; Sat, 20 Nov 2021 20:12:45 +0000 (UTC) X-FDA: 78830406690.11.02B43E6 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf23.hostedemail.com (Postfix) with ESMTP id A1876900038B for ; Sat, 20 Nov 2021 20:12:42 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id mv1-20020a17090b198100b001a67d5901d2so9019583pjb.7 for ; Sat, 20 Nov 2021 12:12:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:to:cc; bh=d3b1nVGIZXYy+vW6njXoGGYnh5z92DR8Qs9Js11uG4c=; b=lvhrt5+94ZafWIvjVlEi/Q5YKK67SJRcvFeNjR7v32bEi2xfuCp/ApfnjFf1uugdoG reT6YJV7x1EJC3fMZvSv35nkuGV+AdDTU6W2jI06xZxN/p9faVtpkBV0Ln4fWQhf61M8 y7301nFUgskQCdOdkJB6oDN5CDJMGYnq0mVaIEGgZY310jS4UsxO/Ee9Mi2Kk5BWOM+3 9Ei7eu7p+S9sDTrGjmH3a8ATNuKEYv/vBg15ZBBD1JeeFqn+20DYaHpLmVQwli61KCQm pMYKlg/B+zII9wWtQlRmVjrT4t05qY/isZqYItKEHcyE1tGK4msStvvd3Um6YPmPAkqF 7Mzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=d3b1nVGIZXYy+vW6njXoGGYnh5z92DR8Qs9Js11uG4c=; b=Mqub7DZylNN4lATDfUcdX0JjDj7afasYvQZ2paWSsBgiPzihylkt708NfkUKuYMZO6 xZIAFigXBjhQQC0UECz32qJjzoyx6qs3nc9NU90IXDYcybxULNLMVSefgBu2+wlbesSe 6kui+wsLyjo0MCd36YU1aZNHE89RMtw25OLqWvmRScf6amUek5nBGSo4ffCa6lL3L01y eABl0poYeDBGRoZqTZwPK97gzIEexRCE8OFveIwSxNgo3Rq3PBKdyRqR/k70eHmIdRDq IR1fqDPQvVSy8vvqI6tcSrTcEaXW0GY8ZDAp0HwQvyK+Z55OIqEhSDeP7MAapBlZytyi Ht+A== X-Gm-Message-State: AOAM530PVE6xGJx9SJ4zEkWWnyjHLCpnpUOL9jaCqKat4QYmpxYr/R3X +eAdokFiMsqaWXKGYcJrlkCCcfdkw5T+ag== X-Google-Smtp-Source: ABdhPJwRfEMFST7mpnPWnXBHH9Ii/xmCV+dzZPgQdMU3SaEey10fj7pjHjocqIjJ5uS20Qn7/ff569qaZR4SXA== X-Received: from shakeelb.svl.corp.google.com ([2620:15c:2cd:202:b46b:4675:ed89:d807]) (user=shakeelb job=sendgmr) by 2002:a17:90b:4c51:: with SMTP id np17mr13489635pjb.213.1637439164383; Sat, 20 Nov 2021 12:12:44 -0800 (PST) Date: Sat, 20 Nov 2021 12:12:30 -0800 Message-Id: <20211120201230.920082-1-shakeelb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.34.0.rc2.393.gf8c9666880-goog Subject: [PATCH] mm: split thp synchronously on MADV_DONTNEED From: Shakeel Butt To: David Hildenbrand , "Kirill A . Shutemov" , Yang Shi , Zi Yan , Matthew Wilcox Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt X-Rspamd-Queue-Id: A1876900038B X-Stat-Signature: b54i3rowyhzqnoceoqe4a139r5er6tfp Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lvhrt5+9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 3vFaZYQgKCOEVKDNHHOEJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--shakeelb.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3vFaZYQgKCOEVKDNHHOEJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--shakeelb.bounces.google.com X-Rspamd-Server: rspam02 X-HE-Tag: 1637439162-983733 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Many applications do sophisticated management of their heap memory for better performance but with low cost. We have a bunch of such applications running on our production and examples include caching and data storage services. These applications keep their hot data on the THPs for better performance and release the cold data through MADV_DONTNEED to keep the memory cost low. The kernel defers the split and release of THPs until there is memory pressure. This causes complicates the memory management of these sophisticated applications which then needs to look into low level kernel handling of THPs to better gauge their headroom for expansion. In addition these applications are very latency sensitive and would prefer to not face memory reclaim due to non-deterministic nature of reclaim. This patch let such applications not worry about the low level handling of THPs in the kernel and splits the THPs synchronously on MADV_DONTNEED. Signed-off-by: Shakeel Butt --- include/linux/mmzone.h | 5 ++++ include/linux/sched.h | 4 ++++ include/linux/sched/mm.h | 11 +++++++++ kernel/fork.c | 3 +++ mm/huge_memory.c | 50 ++++++++++++++++++++++++++++++++++++++++ mm/madvise.c | 8 +++++++ 6 files changed, 81 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 58e744b78c2c..7fa0035128b9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -795,6 +795,11 @@ struct deferred_split { struct list_head split_queue; unsigned long split_queue_len; }; +void split_local_deferred_list(struct list_head *defer_list); +#else +static inline void split_local_deferred_list(struct list_head *defer_list) +{ +} #endif /* diff --git a/include/linux/sched.h b/include/linux/sched.h index 9d27fd0ce5df..a984bb6509d9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1412,6 +1412,10 @@ struct task_struct { struct mem_cgroup *active_memcg; #endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + struct list_head *deferred_split_list; +#endif + #ifdef CONFIG_BLK_CGROUP struct request_queue *throttle_queue; #endif diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index fdf742e15e27..9b438c7e811e 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -374,6 +374,17 @@ set_active_memcg(struct mem_cgroup *memcg) } #endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static inline void set_local_deferred_list(struct list_head *list) +{ + current->deferred_split_list = list; +} +#else +static inline void set_local_deferred_list(struct list_head *list) +{ +} +#endif + #ifdef CONFIG_MEMBARRIER enum { MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY = (1U << 0), diff --git a/kernel/fork.c b/kernel/fork.c index 01af6129aa38..8197b8ed4b7a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1019,6 +1019,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + tsk->deferred_split_list = NULL; #endif return tsk; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e5483347291c..2f73eeecb857 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2754,6 +2754,7 @@ void free_transhuge_page(struct page *page) void deferred_split_huge_page(struct page *page) { struct deferred_split *ds_queue = get_deferred_split_queue(page); + struct list_head *list = current->deferred_split_list; #ifdef CONFIG_MEMCG struct mem_cgroup *memcg = page_memcg(compound_head(page)); #endif @@ -2774,7 +2775,14 @@ void deferred_split_huge_page(struct page *page) if (PageSwapCache(page)) return; + if (list && list_empty(page_deferred_list(page))) { + /* corresponding put in split_local_deferred_list. */ + get_page(page); + list_add(page_deferred_list(page), list); + } + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + if (list_empty(page_deferred_list(page))) { count_vm_event(THP_DEFERRED_SPLIT_PAGE); list_add_tail(page_deferred_list(page), &ds_queue->split_queue); @@ -2801,6 +2809,48 @@ static unsigned long deferred_split_count(struct shrinker *shrink, return READ_ONCE(ds_queue->split_queue_len); } +void split_local_deferred_list(struct list_head *defer_list) +{ + struct list_head *pos, *next; + struct page *page; + + /* First iteration for split. */ + list_for_each_safe(pos, next, defer_list) { + page = list_entry((void *)pos, struct page, deferred_list); + page = compound_head(page); + + if (!trylock_page(page)) + continue; + + if (split_huge_page(page)) { + unlock_page(page); + continue; + } + /* split_huge_page() removes page from list on success */ + unlock_page(page); + + /* corresponding get in deferred_split_huge_page. */ + put_page(page); + } + + /* Second iteration to putback failed pages. */ + list_for_each_safe(pos, next, defer_list) { + struct deferred_split *ds_queue; + unsigned long flags; + + page = list_entry((void *)pos, struct page, deferred_list); + page = compound_head(page); + ds_queue = get_deferred_split_queue(page); + + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + list_move(page_deferred_list(page), &ds_queue->split_queue); + spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + /* corresponding get in deferred_split_huge_page. */ + put_page(page); + } +} + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { diff --git a/mm/madvise.c b/mm/madvise.c index 8c927202bbe6..15614115e359 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -762,7 +762,15 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, static long madvise_dontneed_single_vma(struct vm_area_struct *vma, unsigned long start, unsigned long end) { + LIST_HEAD(list); + + set_local_deferred_list(&list); + zap_page_range(vma, start, end - start); + + set_local_deferred_list(NULL); + split_local_deferred_list(&list); + return 0; }