From patchwork Fri Mar 27 17:05:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11462711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E14B91667 for ; Fri, 27 Mar 2020 17:06:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 93F242073B for ; Fri, 27 Mar 2020 17:06:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="dxt90wtd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 93F242073B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3587C6B0070; Fri, 27 Mar 2020 13:06:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 26B356B0075; Fri, 27 Mar 2020 13:06:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C53936B0071; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 6EE196B0072 for ; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 454EA180AD811 for ; Fri, 27 Mar 2020 17:06:08 +0000 (UTC) X-FDA: 76641770016.14.eye95_6fb814ced551e X-Spam-Summary: 2,0,0,32be905cac799fcb,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:327:355:379:541:960:966:973:981:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2196:2199:2393:2559:2562:2899:2987:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4250:4321:4385:5007:6119:6238:6261:6653:7875:8603:8957:9036:10004:11026:11473:11657:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12895:12986:13161:13229:13894:14096:14394:21063:21080:21222:21324:21444:21451:21524:21627:21990:30029:30054:30056:30069:30070,0,RBL:209.85.167.53:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: eye95_6fb814ced551e X-Filterd-Recvd-Size: 23798 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 17:06:07 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id x200so1316254lff.0 for ; Fri, 27 Mar 2020 10:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5dwFr6HcymDLOIrMKCtTMLXPo8aw6UW5xdvlC1GuDAY=; b=dxt90wtdqd68ifsCJWGsr6N/nKXxhdSvLh1xYBoNKphjKH/oLSHON7qZEHP3Fyt9xe b5DZrTLHu75Ebl2eJ7xaD1plMscKT4X+kFWK2EYqSQZVxe4SYwRKnuyPmUXaJztcw764 uAjM6rF0V+rSgLRcVR/8KeQhKXIdHkx8YQEYh1bM7wpwKaqrbyq7y3TkXEpwEqhAbmi0 i2ls6w/Z1iV/ugIo5xa23hAfQgH0QWY859EkviZXEKAWTxNLlRFWEgBhioup5BXqTBJ7 8p/AhHWF6cXMUQO4k3zqqbWVmFN6dNg80HFGER1hziCmMOGMVlkFQu1AjbeL5mHVqhR3 45Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5dwFr6HcymDLOIrMKCtTMLXPo8aw6UW5xdvlC1GuDAY=; b=ANCLwn/R+Wcwk2abpqi2i77MEvtDlSx62UlqSAERpPLtR489UhFUZnld/PIocODYHN etmcyMduKtoP4YG96HyCl2CZ0Wgd7CxJ5NzhI1uH2pNrZGXupOA1Y+P/dPlnvG3UYsrN n7U1blBq5YaoC1i2Jh6QwI3ESDvag60rIXgTb6Qm7CtNazRmky/yMvYsA54gsapMvyPd yVxHGr/AcDLgw1i5lKHLsXeudmrCra5Qzh83ZyFf76AsriDhANb4PFV4TSj8dOHeuMmj 2JucTT81Oi6Q4CN/KnHRJdoyNz2X2Hf9nBZ0Mk8xjfkBQx/wq8gO9JXk9+7pV4/P5SH2 yuMQ== X-Gm-Message-State: AGi0Pua3DO9tXN061MBRUe7I7Ywl+xkbL/VC5niT1MPbNTXFKqqXyANr ZUhD0J9PX0ygRbgoY5HxZp4SzDVClE8= X-Google-Smtp-Source: APiQypKMOjB9YuQqQyFCMt6ks9MLBN1D9QLpFl8bAwbMICWhGoEihGQ6AnDa8qR2UnRUSzkyQWD+1A== X-Received: by 2002:a05:6512:66:: with SMTP id i6mr192489lfo.48.1585328765064; Fri, 27 Mar 2020 10:06:05 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id x11sm2994986ljm.7.2020.03.27.10.06.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2020 10:06:03 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 5C114100D26; Fri, 27 Mar 2020 20:06:07 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH 1/7] khugepaged: Add self test Date: Fri, 27 Mar 2020 20:05:55 +0300 Message-Id: <20200327170601.18563-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> References: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The test checks if khugepaged is able to recover huge page where we expetect to do so. It only covers anon-THP for now. Currenlty the test shows few failures. They are going to be addressed by the following patches. Signed-off-by: Kirill A. Shutemov --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/khugepaged.c | 841 ++++++++++++++++++++++++ 2 files changed, 842 insertions(+) create mode 100644 tools/testing/selftests/vm/khugepaged.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da..981d0dc21f9e 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,6 +18,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_FILES += khugepaged ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c new file mode 100644 index 000000000000..193bde6a1534 --- /dev/null +++ b/tools/testing/selftests/vm/khugepaged.c @@ -0,0 +1,841 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#ifndef MADV_PAGEOUT +#define MADV_PAGEOUT 21 +#endif + +#define BASE_ADDR ((void *)(1UL << 30)) +static unsigned long hpage_pmd_size; +static unsigned long page_size; +static int hpage_pmd_nr; + +#define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" + +enum thp_enabled { + THP_ALWAYS, + THP_MADVISE, + THP_NEVER, +}; + +static const char *thp_enabled_strings[] = { + "always", + "madvise", + "never", + NULL +}; + +enum thp_defrag { + THP_DEFRAG_ALWAYS, + THP_DEFRAG_DEFER, + THP_DEFRAG_DEFER_MADVISE, + THP_DEFRAG_MADVISE, + THP_DEFRAG_NEVER, +}; + +static const char *thp_defrag_strings[] = { + "always", + "defer", + "defer+madvise", + "madvise", + "never", + NULL +}; + +enum shmem_enabled { + SHMEM_ALWAYS, + SHMEM_WITHIN_SIZE, + SHMEM_ADVISE, + SHMEM_NEVER, + SHMEM_DENY, + SHMEM_FORCE, +}; + +static const char *shmem_enabled_strings[] = { + "always", + "within_size", + "advise", + "never", + "deny", + "force", + NULL +}; + +struct khugepaged_settings { + bool defrag; + unsigned int alloc_sleep_millisecs; + unsigned int scan_sleep_millisecs; + unsigned int max_ptes_none; + unsigned int max_ptes_swap; + unsigned long pages_to_scan; +}; + +struct settings { + enum thp_enabled thp_enabled; + enum thp_defrag thp_defrag; + enum shmem_enabled shmem_enabled; + bool debug_cow; + bool use_zero_page; + struct khugepaged_settings khugepaged; +}; + +static struct settings default_settings = { + .thp_enabled = THP_MADVISE, + .thp_defrag = THP_DEFRAG_ALWAYS, + .shmem_enabled = SHMEM_NEVER, + .debug_cow = 0, + .use_zero_page = 0, + .khugepaged = { + .defrag = 1, + .alloc_sleep_millisecs = 10, + .scan_sleep_millisecs = 10, + }, +}; + +static struct settings saved_settings; +static bool skip_settings_restore; + +static int exit_status; + +static void success(const char *msg) +{ + printf(" \e[32m%s\e[0m\n", msg); +} + +static void fail(const char *msg) +{ + printf(" \e[31m%s\e[0m\n", msg); + exit_status++; +} + +static int read_file(const char *path, char *buf, size_t buflen) +{ + int fd; + ssize_t numread; + + fd = open(path, O_RDONLY); + if (fd == -1) + return 0; + + numread = read(fd, buf, buflen - 1); + if (numread < 1) { + close(fd); + return 0; + } + + buf[numread] = '\0'; + close(fd); + + return (unsigned int) numread; +} + +static int write_file(const char *path, const char *buf, size_t buflen) +{ + int fd; + ssize_t numwritten; + + fd = open(path, O_WRONLY); + if (fd == -1) + return 0; + + numwritten = write(fd, buf, buflen - 1); + close(fd); + if (numwritten < 1) + return 0; + + return (unsigned int) numwritten; +} + +static int read_string(const char *name, const char *strings[]) +{ + char path[PATH_MAX]; + char buf[256]; + char *c; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + if (!read_file(path, buf, sizeof(buf))) { + perror(path); + exit(EXIT_FAILURE); + } + + c = strchr(buf, '['); + if (!c) { + printf("%s: Parse failure\n", __func__); + exit(EXIT_FAILURE); + } + + c++; + memmove(buf, c, sizeof(buf) - (c - buf)); + + c = strchr(buf, ']'); + if (!c) { + printf("%s: Parse failure\n", __func__); + exit(EXIT_FAILURE); + } + *c = '\0'; + + ret = 0; + while (strings[ret]) { + if (!strcmp(strings[ret], buf)) + return ret; + ret++; + } + + printf("Failed to parse %s\n", name); + exit(EXIT_FAILURE); +} + +static void write_string(const char *name, const char *val) +{ + char path[PATH_MAX]; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + if (!write_file(path, val, strlen(val) + 1)) { + perror(path); + exit(EXIT_FAILURE); + } +} + +static const unsigned long read_num(const char *name) +{ + char path[PATH_MAX]; + char buf[21]; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + ret = read_file(path, buf, sizeof(buf)); + if (ret < 0) { + perror("read_file(read_num)"); + exit(EXIT_FAILURE); + } + + return strtoul(buf, NULL, 10); +} + +static void write_num(const char *name, unsigned long num) +{ + char path[PATH_MAX]; + char buf[21]; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + sprintf(buf, "%ld", num); + if (!write_file(path, buf, strlen(buf) + 1)) { + perror(path); + exit(EXIT_FAILURE); + } +} + +static void write_settings(struct settings *settings) +{ + struct khugepaged_settings *khugepaged = &settings->khugepaged; + + write_string("enabled", thp_enabled_strings[settings->thp_enabled]); + write_string("defrag", thp_defrag_strings[settings->thp_defrag]); + write_string("shmem_enabled", + shmem_enabled_strings[settings->shmem_enabled]); + write_num("debug_cow", settings->debug_cow); + write_num("use_zero_page", settings->use_zero_page); + + write_num("khugepaged/defrag", khugepaged->defrag); + write_num("khugepaged/alloc_sleep_millisecs", + khugepaged->alloc_sleep_millisecs); + write_num("khugepaged/scan_sleep_millisecs", + khugepaged->scan_sleep_millisecs); + write_num("khugepaged/max_ptes_none", khugepaged->max_ptes_none); + write_num("khugepaged/max_ptes_swap", khugepaged->max_ptes_swap); + write_num("khugepaged/pages_to_scan", khugepaged->pages_to_scan); +} + +static void restore_settings(int sig) +{ + if (skip_settings_restore) + goto out; + + printf("Restore THP and khugepaged settings..."); + write_settings(&saved_settings); + success("OK"); + if (sig) + exit(EXIT_FAILURE); +out: + exit(exit_status); +} + +static void save_settings(void) +{ + printf("Save THP and khugepaged settings..."); + saved_settings = (struct settings) { + .thp_enabled = read_string("enabled", thp_enabled_strings), + .thp_defrag = read_string("defrag", thp_defrag_strings), + .shmem_enabled = + read_string("shmem_enabled", shmem_enabled_strings), + .debug_cow = read_num("debug_cow"), + .use_zero_page = read_num("use_zero_page"), + }; + saved_settings.khugepaged = (struct khugepaged_settings) { + .defrag = read_num("khugepaged/defrag"), + .alloc_sleep_millisecs = + read_num("khugepaged/alloc_sleep_millisecs"), + .scan_sleep_millisecs = + read_num("khugepaged/scan_sleep_millisecs"), + .max_ptes_none = read_num("khugepaged/max_ptes_none"), + .max_ptes_swap = read_num("khugepaged/max_ptes_swap"), + .pages_to_scan = read_num("khugepaged/pages_to_scan"), + }; + success("OK"); + + signal(SIGTERM, restore_settings); + signal(SIGINT, restore_settings); + signal(SIGHUP, restore_settings); + signal(SIGQUIT, restore_settings); +} + +static void adjust_settings(void) +{ + + printf("Adjust settings..."); + write_settings(&default_settings); + success("OK"); +} + +#define CHECK_HUGE_FMT "sed -ne " \ + "'/^%lx/,/^AnonHugePages/{/^AnonHugePages:\\s*%ld kB/ q1}' " \ + "/proc/%d/smaps" + +static bool check_huge(void *p) +{ + char *cmd; + int ret; + + ret = asprintf(&cmd, CHECK_HUGE_FMT, + (unsigned long)p, hpage_pmd_size >> 10, getpid()); + if (ret < 0) { + perror("asprintf(CHECK_FMT)"); + exit(EXIT_FAILURE); + } + + ret = system(cmd); + free(cmd); + if (ret < 0 || !WIFEXITED(ret)) { + perror("system(check_huge)"); + exit(EXIT_FAILURE); + } + + return WEXITSTATUS(ret); +} + +#define CHECK_SWAP_FMT "sed -ne " \ + "'/^%lx/,/^Swap:/{/^Swap:\\s*%ld kB/ q1}' " \ + "/proc/%d/smaps" + +static bool check_swap(void *p, unsigned long size) +{ + char *cmd; + int ret; + + ret = asprintf(&cmd, CHECK_SWAP_FMT, + (unsigned long)p, size >> 10, getpid()); + if (ret < 0) { + perror("asprintf(CHECK_SWAP)"); + exit(EXIT_FAILURE); + } + + ret = system(cmd); + free(cmd); + if (ret < 0 || !WIFEXITED(ret)) { + perror("system(check_swap)"); + exit(EXIT_FAILURE); + } + + return WEXITSTATUS(ret); +} + +static void *alloc_mapping(void) +{ + void *p; + + p = mmap(BASE_ADDR, hpage_pmd_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (p != BASE_ADDR) { + printf("Failed to allocate VMA at %p\n", BASE_ADDR); + exit(EXIT_FAILURE); + } + + return p; +} + +static void fill_memory(int *p, unsigned long start, unsigned long end) +{ + int i; + + for (i = start / page_size; i < end / page_size; i++) + p[i * page_size / sizeof(*p)] = i + 0xdead0000; +} + +static void validate_memory(int *p, unsigned long start, unsigned long end) +{ + int i; + + for (i = start / page_size; i < end / page_size; i++) { + if (p[i * page_size / sizeof(*p)] != i + 0xdead0000) { + printf("Page %d is corrupted: %#x\n", + i, p[i * page_size / sizeof(*p)]); + exit(EXIT_FAILURE); + } + } +} + +#define TICK 500000 +static bool wait_for_scan(const char *msg, char *p) +{ + int full_scans; + int timeout = 6; /* 3 seconds */ + + /* Sanity check */ + if (check_huge(p)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + + /* Wait until the second full_scan completed */ + full_scans = read_num("khugepaged/full_scans") + 2; + + printf("%s...", msg); + while (timeout--) { + if (check_huge(p)) + break; + if (read_num("khugepaged/full_scans") >= full_scans) + break; + printf("."); + usleep(TICK); + } + + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + + return !timeout; +} + +static void alloc_at_fault(void) +{ + struct settings settings = default_settings; + char *p; + + settings.thp_enabled = THP_ALWAYS; + write_settings(&settings); + + p = alloc_mapping(); + *p = 1; + printf("Allocate huge page on fault..."); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + write_settings(&default_settings); + + madvise(p, page_size, MADV_DONTNEED); + printf("Split huge PMD on MADV_DONTNEED..."); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + munmap(p, hpage_pmd_size); +} + +static void collapse_full(void) +{ + void *p; + + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + if (wait_for_scan("Collapse fully populated PTE table", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_empty(void) +{ + void *p; + + p = alloc_mapping(); + if (wait_for_scan("Do not collapse empty PTE table", p)) + fail("Timeout"); + else if (check_huge(p)) + fail("Fail"); + else + success("OK"); + munmap(p, hpage_pmd_size); +} + +static void collapse_signle_pte_entry(void) +{ + void *p; + + p = alloc_mapping(); + fill_memory(p, 0, page_size); + if (wait_for_scan("Collapse PTE table with single PTE entry present", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_max_ptes_none(void) +{ + int max_ptes_none = hpage_pmd_nr / 2; + struct settings settings = default_settings; + void *p; + + settings.khugepaged.max_ptes_none = max_ptes_none; + write_settings(&settings); + + p = alloc_mapping(); + + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); + if (wait_for_scan("Do not collapse with max_ptes_none exeeded", p)) + fail("Timeout"); + else if (check_huge(p)) + fail("Fail"); + else + success("OK"); + validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); + + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + if (wait_for_scan("Collapse with max_ptes_none PTEs empty", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + + munmap(p, hpage_pmd_size); + write_settings(&default_settings); +} + +static void collapse_swapin_single_pte(void) +{ + void *p; + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + + printf("Swapout one page..."); + if (madvise(p, page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } + + if (wait_for_scan("Collapse with swaping in single PTE entry", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); +out: + munmap(p, hpage_pmd_size); +} + +static void collapse_max_ptes_swap(void) +{ + int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); + void *p; + + p = alloc_mapping(); + + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap + 1, hpage_pmd_nr); + if (madvise(p, (max_ptes_swap + 1) * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, (max_ptes_swap + 1) * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } + + if (wait_for_scan("Do not collapse with max_ptes_swap exeeded", p)) + fail("Timeout"); + else if (check_huge(p)) + fail("Fail"); + else + success("OK"); + validate_memory(p, 0, hpage_pmd_size); + + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); + if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, max_ptes_swap * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } + + if (wait_for_scan("Collapse with max_ptes_swap pages swapped out", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); +out: + munmap(p, hpage_pmd_size); +} + +static void collapse_signle_pte_entry_compound(void) +{ + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + + printf("Split huge page leaving single PTE mapping compount page..."); + madvise(p + page_size, hpage_pmd_size - page_size, MADV_DONTNEED); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Collapse PTE table with single PTE mapping compount page", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_full_of_compound(void) +{ + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Split huge page leaving single PTE page table full of compount pages..."); + madvise(p, page_size, MADV_NOHUGEPAGE); + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Collapse PTE table full of compound pages", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_fork(void) +{ + int wstatus; + void *p; + + p = alloc_mapping(); + + printf("Allocate small page..."); + fill_memory(p, 0, page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Share small page over fork()..."); + if (!fork()) { + /* Do not touch settings on child exit */ + skip_settings_restore = true; + exit_status = 0; + + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + fill_memory(p, page_size, 2 * page_size); + + if (wait_for_scan("Collapse PTE table with single page shared with parent process", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); + exit(exit_status); + } + + wait(&wstatus); + exit_status += WEXITSTATUS(wstatus); + + printf("Check if parent still has small page..."); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_fork_compound(void) +{ + int wstatus; + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Share huge page over fork()..."); + if (!fork()) { + /* Do not touch settings on child exit */ + skip_settings_restore = true; + exit_status = 0; + + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Split huge page PMD in child process..."); + madvise(p, page_size, MADV_NOHUGEPAGE); + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + fill_memory(p, 0, page_size); + + if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); + exit(exit_status); + } + + wait(&wstatus); + exit_status += WEXITSTATUS(wstatus); + + printf("Check if parent still has huge page..."); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +int main(void) +{ + setbuf(stdout, NULL); + + page_size = getpagesize(); + hpage_pmd_size = read_num("hpage_pmd_size"); + hpage_pmd_nr = hpage_pmd_size / page_size; + + default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; + default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; + default_settings.khugepaged.pages_to_scan = hpage_pmd_nr * 8; + + save_settings(); + adjust_settings(); + + alloc_at_fault(); + collapse_full(); + collapse_empty(); + collapse_signle_pte_entry(); + collapse_max_ptes_none(); + collapse_swapin_single_pte(); + collapse_max_ptes_swap(); + collapse_signle_pte_entry_compound(); + collapse_full_of_compound(); + collapse_fork(); + collapse_fork_compound(); + + restore_settings(0); +} From patchwork Fri Mar 27 17:05:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11462705 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5DC86913 for ; Fri, 27 Mar 2020 17:06:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 28D9B206F2 for ; Fri, 27 Mar 2020 17:06:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="pqPZ+KnQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 28D9B206F2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D64E96B000E; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CC6F86B006E; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3F246B0070; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 6FE2A6B000E for ; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5872E6103 for ; Fri, 27 Mar 2020 17:06:06 +0000 (UTC) X-FDA: 76641769932.22.land38_6f7625d227922 X-Spam-Summary: 2,0,0,5643b8ea05960c13,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:355:379:541:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3352:3608:3865:3866:3867:3870:3871:3872:3874:4250:4321:5007:6261:6653:10004:11026:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12679:12895:12986:13069:13161:13229:13255:13311:13357:13894:14096:14181:14384:14394:14721:21080:21433:21444:21451:21627:21990:30003:30034:30054:30070,0,RBL:209.85.208.194:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: land38_6f7625d227922 X-Filterd-Recvd-Size: 4338 Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 17:06:05 +0000 (UTC) Received: by mail-lj1-f194.google.com with SMTP id g27so10922648ljn.10 for ; Fri, 27 Mar 2020 10:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Fw/iy6xjbd/Vbu7hIqNLPaVOpy/RK+By1QJ7syGCyKw=; b=pqPZ+KnQrKyd6FiKwsInKAWqhn2zXakFKwgrm9nzLBgFus1py3Iu2UO8q6Qx1MQ6+C e8Ejg39nuKDg9y06gDkoTBO9VJJ4QW4vH2JXjX9tkq3eYlJN5MzXHZG2HDwpxJaG9oLV +1gbfeTURUHlkt1jQlO6qZPrpP5TnwfKGAH1BDJOhJ5qiTlotjonDqsfcZ205isYssHd RXb7QKpee++7GHCpH53vyhFQ/Hanp/zRWwJgCSmWLbZy6TW3aAQqvNaZWFb8I4dUEFdF PpJCf/z6u1OGEnuXJRcCchYi5/AuZPs+FQk0gQBGH+PnjeWKCzfLhnK7BUUuJkboh7LO Ficw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Fw/iy6xjbd/Vbu7hIqNLPaVOpy/RK+By1QJ7syGCyKw=; b=X2YDezz2roFgL0t6VMgbzeuYqOS5e1f5vx474jBSgI9HuS1kYuPD8IKnUd36KusaVh jEfoO2QE/w5rhH+1bnH0yfK6T11iMVZoWtaDfIamcZRBBcJYnDWx8uCjBDeaB/0kHVKw s9WMV/irFUxam9VdT0g8zaYrWh1PzswJ6hDQSbM482/T/UY5BlnUI4S+KA3aShQ0d5il 6K1Iu0Q9Edp8KwVvo/3Fk/g0PjIKxYUn+46Z4FDAgdwL4n9Da2tbgHOlBv0la7kXGJ+p bM3C2EsAKDaxG47G011Yoacb4z2pa1xnvMVh9MCLw4aJvf3JpxaGxew3j/G/MZAZWJej 0kRA== X-Gm-Message-State: AGi0PuYHdm75KP7Utbbs1R8CYsAJqPxesKWwtoxcPF77yhi/wHuV5OXh 7sBtLaYb9mkP3mYGULqeGsWwPclWjAE= X-Google-Smtp-Source: ADFU+vvdqlE5S6Ml34ZpWzMm9diRAgh8m5sJC5LUbAaWxqKa8rxqiLVwiLIVGcpzhADb4m0kD666Zg== X-Received: by 2002:a05:651c:515:: with SMTP id o21mr9232046ljp.91.1585328764152; Fri, 27 Mar 2020 10:06:04 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id y20sm2967120ljy.100.2020.03.27.10.06.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2020 10:06:03 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 62ED8100D27; Fri, 27 Mar 2020 20:06:07 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH 2/7] khugepaged: Do not stop collapse if less than half PTEs are referenced Date: Fri, 27 Mar 2020 20:05:56 +0300 Message-Id: <20200327170601.18563-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> References: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __collapse_huge_page_swapin() check number of referenced PTE to decide if the memory range is hot enough to justify swapin. The problem is that it stops collapse altogether if there's not enough refereced pages, not only swappingin. Signed-off-by: Kirill A. Shutemov Fixes: 0db501f7a34c ("mm, thp: convert from optimistic swapin collapsing to conservative") Reviewed-by: Zi Yan --- mm/khugepaged.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 99bab7e4d05b..14d7afc90786 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -905,7 +905,8 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, /* we only decide to swapin, if there is enough young ptes */ if (referenced < HPAGE_PMD_NR/2) { trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); - return false; + /* Do not block collapse, only skip swapping in */ + return true; } vmf.pte = pte_offset_map(pmd, address); for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE; From patchwork Fri Mar 27 17:05:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11462707 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7B898913 for ; Fri, 27 Mar 2020 17:06:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4801F2072F for ; Fri, 27 Mar 2020 17:06:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="z3j+Mugv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4801F2072F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 80EE56B006E; Fri, 27 Mar 2020 13:06:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7704D6B0070; Fri, 27 Mar 2020 13:06:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B17C6B0071; Fri, 27 Mar 2020 13:06:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id 237636B006E for ; Fri, 27 Mar 2020 13:06:07 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0D3E9181AD0A4 for ; Fri, 27 Mar 2020 17:06:07 +0000 (UTC) X-FDA: 76641769974.18.balls02_6f8e160982525 X-Spam-Summary: 2,0,0,bcda6f222e0ae3b1,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:355:379:541:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2553:2559:2562:3138:3139:3140:3141:3142:3352:3865:3866:3867:3868:3870:3872:4250:4321:5007:6261:6653:7903:8957:10004:11026:11473:11658:11914:12043:12297:12438:12517:12519:12555:12895:13069:13311:13357:13894:14096:14181:14384:14394:14721:21080:21444:21451:21627:30054:30090,0,RBL:209.85.167.65:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: balls02_6f8e160982525 X-Filterd-Recvd-Size: 4287 Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 17:06:06 +0000 (UTC) Received: by mail-lf1-f65.google.com with SMTP id j188so8460033lfj.11 for ; Fri, 27 Mar 2020 10:06:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uEmPFiw3NMkf3IfRIxWgnKIkN+OrB/j1csttn5Htb5E=; b=z3j+MugvIOEV5ZTNeV8Y5iSCTsByA3McfeDK2HMga5j30TJGS1jCU2mARtvWuwgHZT wBwryyzEC9XH8i28e+3+RHU3jD6Q86e2Y7N/2LASbsk4xhgGhE/LKp0YxH614uxCaLm2 Uxb43/hP9GUS78Cz+RGLCxlp/k0xwNjWmpLM/AY1ydd0ToWnqmkTAkioJShsGdnUnmIq KDHOOiJRSvG43K61Vm4lBM4R+aHhWi5VCLrVpWl0+Xy9KMLWv5aei3nbPj/r3QC5WhNR oOxCPsb+UmkK4aAszIDPTddUV08oDFX+pDtf6D7W6u5qb88ypahuJSlTJ0irMeO6isQL UEHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uEmPFiw3NMkf3IfRIxWgnKIkN+OrB/j1csttn5Htb5E=; b=ih1e6I6x+ESEFXIk+aGJnhSd0oIohZK4ktDr0ytdGOdOzNXZszRU6IJ/7ZOsWrTFPs JkG03JkMONN2rCxFmhfOltQYk/FWeU0tMZJAgGNMbAD3ZhwbZMWCkji8GclrTfQoa+rs Yrh65wyk4vNerQwL53qQjo8C9P3X1q9FUcqZecVTHgUqF5shAtyYZqirIJ58y3OOoF+9 dO8awO/kQlSqaU3AAN7KMXt3tfw9DY3YGP5VXKOEq9vp/SrOdCTPMGMGVZiO8lu+vsgI EbXdDzoUeGpUEwBEXnf3Fl1X6RadzaNulp96h97E4d1kW6dqwLq7LHa6euF/B/SYO06m 2SjQ== X-Gm-Message-State: AGi0PubQ+EDAEU/aLirKunMFCG/sUYTjBsIZp8iM+DE89KYj8AcEzMQK +0jtOB90ZAVN06IpnK8GQsW2sWyAc0Y= X-Google-Smtp-Source: APiQypI3S8W00hxoDxa3oFcnVq3mUYLKs3IlsAXA0vKYmfajc3bCUm9dNAKtZxa0tJ1SdtOMPJlmlg== X-Received: by 2002:a19:4f01:: with SMTP id d1mr184636lfb.182.1585328764571; Fri, 27 Mar 2020 10:06:04 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id c5sm3225046lfg.82.2020.03.27.10.06.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2020 10:06:03 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 699D3100D28; Fri, 27 Mar 2020 20:06:07 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH 3/7] khugepaged: Drain LRU add pagevec to get rid of extra pins Date: Fri, 27 Mar 2020 20:05:57 +0300 Message-Id: <20200327170601.18563-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> References: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __collapse_huge_page_isolate() may fail due to extra pin in the LRU add pagevec. It's petty common for swapin case: we swap in pages just to fail due to the extra pin. Signed-off-by: Kirill A. Shutemov Reviewed-by: Zi Yan Reviewed-by: Yang Shi --- mm/khugepaged.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 14d7afc90786..39e0994abeb8 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -585,11 +585,19 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * The page must only be referenced by the scanned process * and page swap cache. */ + if (page_count(page) != 1 + PageSwapCache(page)) { + /* + * Drain pagevec and retry just in case we can get rid + * of the extra pin, like in swapin case. + */ + lru_add_drain(); + } if (page_count(page) != 1 + PageSwapCache(page)) { unlock_page(page); result = SCAN_PAGE_COUNT; goto out; } + if (pte_write(pteval)) { writable = true; } else { From patchwork Fri Mar 27 17:05:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11462703 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 863B71667 for ; Fri, 27 Mar 2020 17:06:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 494E320774 for ; Fri, 27 Mar 2020 17:06:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="kGB2Kiza" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 494E320774 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 929A06B006C; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8ABF16B0071; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 708006B0070; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 524776B000E for ; Fri, 27 Mar 2020 13:06:06 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 394C8181AD0A4 for ; Fri, 27 Mar 2020 17:06:06 +0000 (UTC) X-FDA: 76641769932.26.dress79_6f726c8250122 X-Spam-Summary: 2,0,0,b21dc21858f9de8d,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:69:355:379:541:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1542:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:4250:5007:6119:6261:6653:7903:8957:10004:11026:11232:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12895:13161:13229:13894:14096:14181:14394:14721:21080:21212:21444:21451:21627:21987:30003:30054:30070,0,RBL:209.85.208.193:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: dress79_6f726c8250122 X-Filterd-Recvd-Size: 5733 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 17:06:05 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id q19so10921954ljp.9 for ; Fri, 27 Mar 2020 10:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fCQS2NypN0JhzS0ypfUQeVkn+9tCE2dlRA9LI38iq5A=; b=kGB2Kiza8RBWjYqxA3azU6g8J+6XrN9ulu4WXST9nTuu7lCP+8CM9665C9vnVCYq54 a54sLLad2b2bTddxdKbxR7qHl+0LKFhsMw61E1fYfXl7LzKcj+gX4gaotJhuZNR4aKUF s2u6W4nwNqfG9/o7+MBxAp0NSDCKH6hAROgHEnlVip56MyiJ7W3YZYHlgIvp9m2ZyLuG Ah2k+r1+ivnzGTsaJwo5xxCsuEd7s3YIYTHKWJJnhB38iR2aXTLScV37U1ksZN6YNXMX 6IiWaat9A/HJGfNaanM/u784X4QNsGw7Uit8DElFc6L58KRu9Br4VDrfkFVrR0M8FTzW rS9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fCQS2NypN0JhzS0ypfUQeVkn+9tCE2dlRA9LI38iq5A=; b=isF//rFbRxiC9sjcUamujbB/ebNp3ZQdeB8Wn23zmbq2hIP/o0WJ/Oj8wQ6MT2utbc uU1Jbiz1Ee7XbMiFbc1SOWb2/BTrhSYjpKiUAwvOkAR1uFX/T3dOhMneltdsP/zpcgBg grFa4fS1qscHokq3JmcljjBJ4tYXOhMzLJVMWD/NVomrlT2xUp7LXCXJ2y8YlEjbdj9f jgn/J+xgvgMz7unlxK7OHyK9hCTxPfMlUA4dgB0URMQG/R76K/K6URj6U9vNR2JI/Lrv ogZ1ex11/Ca2SzCgRDGB0pU86R4VwXqUgWYuppHWh76t3HbBpcEXDzF06HEzSaZZVRs1 +ZJA== X-Gm-Message-State: AGi0PuadaXa5TbB8q4o7gBtlPshHeF0jhzrxIkGoQwT2GR32kt10yabn qofau8UTGT0QPrAkoR1a6WmKBtNfhI0= X-Google-Smtp-Source: APiQypKBQAwVwbRAVi/qZWZFWn/tOxio6lk6NKivsgpAW6fPbyvc3MwwRJKbLPJvgEySXhgoHQQ6lQ== X-Received: by 2002:a2e:854e:: with SMTP id u14mr8553799ljj.182.1585328763840; Fri, 27 Mar 2020 10:06:03 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id o68sm3692303lff.81.2020.03.27.10.06.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2020 10:06:03 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 6FEFD100D29; Fri, 27 Mar 2020 20:06:07 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH 4/7] khugepaged: Allow to callapse a page shared across fork Date: Fri, 27 Mar 2020 20:05:58 +0300 Message-Id: <20200327170601.18563-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> References: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The page can be included into collapse as long as it doesn't have extra pins (from GUP or otherwise). Signed-off-by: Kirill A. Shutemov --- mm/khugepaged.c | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 39e0994abeb8..b47edfe57f7b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -581,18 +581,26 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, } /* - * cannot use mapcount: can't collapse if there's a gup pin. - * The page must only be referenced by the scanned process - * and page swap cache. + * Check if the page has any GUP (or other external) pins. + * + * The page table that maps the page has been already unlinked + * from the page table tree and this process cannot get + * additinal pin on the page. + * + * New pins can come later if the page is shared across fork, + * but not for the this process. It is fine. The other process + * cannot write to the page, only trigger CoW. */ - if (page_count(page) != 1 + PageSwapCache(page)) { + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { /* * Drain pagevec and retry just in case we can get rid * of the extra pin, like in swapin case. */ lru_add_drain(); } - if (page_count(page) != 1 + PageSwapCache(page)) { + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { unlock_page(page); result = SCAN_PAGE_COUNT; goto out; @@ -680,7 +688,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } else { src_page = pte_page(pteval); copy_user_highpage(page, src_page, address, vma); - VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); release_pte_page(src_page); /* * ptl mostly unnecessary, but preempt has to @@ -1209,12 +1216,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } - /* - * cannot use mapcount: can't collapse if there's a gup pin. - * The page must only be referenced by the scanned process - * and page swap cache. - */ - if (page_count(page) != 1 + PageSwapCache(page)) { + /* Check if the page has any GUP (or other external) pins */ + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { result = SCAN_PAGE_COUNT; goto out_unmap; } From patchwork Fri Mar 27 17:05:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11462713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 393AD913 for ; Fri, 27 Mar 2020 17:06:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F160F2073B for ; Fri, 27 Mar 2020 17:06:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="fNHTd/MA" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F160F2073B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 633C86B0074; Fri, 27 Mar 2020 13:06:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 553EF6B0071; Fri, 27 Mar 2020 13:06:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 044146B0074; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 9E5726B0070 for ; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 82100180AD815 for ; Fri, 27 Mar 2020 17:06:08 +0000 (UTC) X-FDA: 76641770016.29.scent39_6fc3656982401 X-Spam-Summary: 2,0,0,8e003c3fca8abb9a,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:69:355:379:541:960:973:981:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1542:1711:1730:1747:1777:1792:2393:2559:2562:2898:3138:3139:3140:3141:3142:3353:3608:3865:3866:3867:3868:3870:3871:3872:4117:4250:5007:6119:6261:6653:8957:10004:11026:11232:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:13894:14096:14110:14181:14394:14721:21063:21080:21444:21451:21524:21627:21987:21990:30003:30054:30070,0,RBL:209.85.208.196:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: scent39_6fc3656982401 X-Filterd-Recvd-Size: 6126 Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 17:06:07 +0000 (UTC) Received: by mail-lj1-f196.google.com with SMTP id r7so3324775ljg.13 for ; Fri, 27 Mar 2020 10:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ThpcVoxxySBsyGDL9yqIPPj2mnnyBXzq6rEXvZDPIuA=; b=fNHTd/MARGBDdMO/5j8pGyv1gV0nBmX6x3XXpQnpq1EssAZZ1iPmkK24TfgSdoIPZ7 zB+EDT1S8pUEnv4AU2xKOt/qRJw7Brw6WoyxvFeKej16Mv20jxP8xGZSLsE04KJhwz4m zYuRXoJsT3WE0gMGKx2NqYQYonIYj6KDAwwAQhsB5sQtLXcKsWmZWMxgNfqbXoENWLYj Tm/dpS/UUQagNliXw1OPspoP8YQYKKKbnrditaiksEOnKWJoYKu/QLPfwqOn4FsZ8nZN FLuhSnDFJgoGb2EI79yLSfh49rnhZPCtq6WLO1L+oSgIMgFPJbbJ0n7490F2GybiezZ8 TWQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ThpcVoxxySBsyGDL9yqIPPj2mnnyBXzq6rEXvZDPIuA=; b=e+tiWkj4niLZ8eg5uVpc53YfyAUSU/PNdonLPazD8jsBJm+OxfpoK4HeeNTSuz9WNN Ko9awEb01YpC1r7YcKkYNI+pyL/Yi31GtvOigP2SJfTk+Tnw+nShGIJoxSIJLk+wbIHm tt6gk3kJhu/nP9xNOiArq8pYewRrZqPGf0gohVkJrXiZ5hbvXo0EzA1eef1YoW0CdpC3 mRdXukrRjvotGL2JPKwiLijcSSChcx0ev6/lvPQd6izcw+gUUia/oVsFvZdTlKegmRH9 DKpf9aDxnb1Ng3NDQYsshQb/RzD8Pq/LuI7aPtYfkhZ6P8p/xX5YPrcOsRZDsvH75yFs c3Gw== X-Gm-Message-State: ANhLgQ0Z9FcwutuSewUw+XxBpVgDPvnH8Tjit+2t+8Lr2VEBwj0qcf11 xUiweQQGEhlBNm/8sYNhMAZWJEeXV5Q= X-Google-Smtp-Source: APiQypLtn/NpXZCnCXpoQScUjDIxmuhP+tRIc0nLOzigxrfVLxvwc+v9+cf1EYMqS3kUQzsle5YSxQ== X-Received: by 2002:a2e:730e:: with SMTP id o14mr8917059ljc.273.1585328765915; Fri, 27 Mar 2020 10:06:05 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id u8sm3272039lfi.12.2020.03.27.10.06.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2020 10:06:05 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 7673A100D2A; Fri, 27 Mar 2020 20:06:07 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH 5/7] khugepaged: Allow to collapse PTE-mapped compound pages Date: Fri, 27 Mar 2020 20:05:59 +0300 Message-Id: <20200327170601.18563-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> References: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We can collapse PTE-mapped compound pages. We only need to avoid handling them more than once: lock/unlock page only once if it's present in the PMD range multiple times as it handled on compound level. The same goes for LRU isolation and putpack. Signed-off-by: Kirill A. Shutemov --- mm/khugepaged.c | 41 +++++++++++++++++++++++++++++++---------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b47edfe57f7b..c8c2c463095c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -515,6 +515,17 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { + /* + * We need to unlock and put compound page on LRU only once. + * The rest of the pages have to be locked and not on LRU here. + */ + VM_BUG_ON_PAGE(!PageCompound(page) && + (!PageLocked(page) && PageLRU(page)), page); + + if (!PageLocked(page)) + return; + + page = compound_head(page); dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); @@ -537,6 +548,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, pte_t *_pte; int none_or_zero = 0, result = 0, referenced = 0; bool writable = false; + LIST_HEAD(compound_pagelist); for (_pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, address += PAGE_SIZE) { @@ -561,13 +573,23 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, goto out; } - /* TODO: teach khugepaged to collapse THP mapped with pte */ + VM_BUG_ON_PAGE(!PageAnon(page), page); + if (PageCompound(page)) { - result = SCAN_PAGE_COMPOUND; - goto out; - } + struct page *p; + page = compound_head(page); - VM_BUG_ON_PAGE(!PageAnon(page), page); + /* + * Check if we have dealt with the compount page + * already + */ + list_for_each_entry(p, &compound_pagelist, lru) { + if (page == p) + break; + } + if (page == p) + continue; + } /* * We can do it before isolate_lru_page because the @@ -640,6 +662,9 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, page_is_young(page) || PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, address)) referenced++; + + if (PageCompound(page)) + list_add_tail(&page->lru, &compound_pagelist); } if (likely(writable)) { if (likely(referenced)) { @@ -1185,11 +1210,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } - /* TODO: teach khugepaged to collapse THP mapped with pte */ - if (PageCompound(page)) { - result = SCAN_PAGE_COMPOUND; - goto out_unmap; - } + page = compound_head(page); /* * Record which node the original page is from and save this From patchwork Fri Mar 27 17:06:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11462709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3041D1668 for ; Fri, 27 Mar 2020 17:06:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C668220774 for ; Fri, 27 Mar 2020 17:06:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="u+C71ie/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C668220774 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C57BF6B0073; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B44D66B0075; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BF716B0073; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id 45F7E6B0070 for ; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 095358248047 for ; Fri, 27 Mar 2020 17:06:08 +0000 (UTC) X-FDA: 76641770016.16.brain42_6fbb1bbbdae2a X-Spam-Summary: 2,0,0,2cbb786eabac7456,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:1:2:41:69:355:379:541:960:966:968:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2196:2198:2199:2200:2380:2393:2559:2562:2731:2901:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4052:4250:4321:4385:4605:5007:6119:6261:6653:7903:8660:8957:9592:10004:11026:11232:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12683:12895:12986:13148:13161:13184:13229:13230:13894:14096:14394:21080:21444:21451:21627:21740:21990:30003:30012:30054:30062:30070,0,RBL:209.85.208.194:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:28,LUA_SUMMARY:none X-HE-Tag: brain42_6fbb1bbbdae2a X-Filterd-Recvd-Size: 12771 Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) by imf30.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 17:06:07 +0000 (UTC) Received: by mail-lj1-f194.google.com with SMTP id k21so10995500ljh.2 for ; Fri, 27 Mar 2020 10:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lGq5mNtWk1yCOGUlZF7eNnLsBNsX4wbf+5kA2+B3Tzo=; b=u+C71ie/Y/RnMh4ZZLXO0tWqoqDpvrhIH9hQMpb/q9or3BR2I8SXJ7ZwY1ZgohZl7a qgPgVjm5A52nXZ4shBttJnpaCpoM3b+KGoHyUKKjnFtffk1qk5M2bVrbJBG9eDRej44S 8KT91S3hGbDldf0/MF1H7IkrILfL63wtBQbzl/zpM8YgwDjXS70f9/cVFnOqrZncWb0x DilVI6mmktHdUxDetzXq42Z1+JYjpj14n0bN1F2pJYrMaiP+5YE+Tg4fuWaTxbKGjkNX u9h9eRNlZ2ZGYMdhYccmb3afLy9tYjfVShnbgDjpCLX4uHLGIAHZFoRPWVHNBRMyZi+S y6rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lGq5mNtWk1yCOGUlZF7eNnLsBNsX4wbf+5kA2+B3Tzo=; b=nhPDKYEzoncB3/kHopYGZ264nhj1pv0KQuV3er1ocYu7kGJQHA5GGiEAZ9pKMSlNh7 UmOY5wkGwDwR6Oxl9OvwDuM1A2xp5jxdWDszb/pgdxM1qvWpFJw2u0sjy/NPqaCregw0 /nyQqZ7wDPKVGa/Eb+RZA9n+vdnkYu8rCChhAF+mHU6uGFL3Kk6StKWC/jECIHqM/SIA lvq8yklgY6lE6DycxnciKRoMVPv3bEZlSz/J4daL2OoySNYngZbTIJPIL/S4kb/Wzont zk8qG1E60bridTEN1mzOxNqOqXlbFBKrWd20kCuzuYYg3mlA8/Wb5LHm6lW3fbp0/bGi o0Cw== X-Gm-Message-State: ANhLgQ3U44bAQQ++mk1MyzYxjwJY84kprwYWbUUYqU7ZNeIY5WqrF1YB dXQVxto83tosyBGcLU++eoRWJQfh57A= X-Google-Smtp-Source: APiQypL6ZLqxNMBHqT8Dws7pnqIWj7EmywZXM/I+NTdJ6bkGPga3w2odbs1WKqvAUcrKiAko34XjEQ== X-Received: by 2002:a05:651c:1061:: with SMTP id y1mr8388958ljm.285.1585328765641; Fri, 27 Mar 2020 10:06:05 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id r141sm3261695lff.58.2020.03.27.10.06.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2020 10:06:05 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 7D8A8100D2B; Fri, 27 Mar 2020 20:06:07 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH 6/7] thp: Change CoW semantics for anon-THP Date: Fri, 27 Mar 2020 20:06:00 +0300 Message-Id: <20200327170601.18563-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> References: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently we have different copy-on-write semantics for anon- and file-THP. For anon-THP we try to allocate huge page on the write fault, but on file-THP we split PMD and allocate 4k page. Arguably, file-THP semantics is more desirable: we don't necessary want to unshare full PMD range from the parent on the first access. This is the primary reason THP is unusable for some workloads, like Redis. The original THP refcounting didn't allow to have PTE-mapped compound pages, so we had no options, but to allocate huge page on CoW (with fallback to 512 4k pages). The current refcounting doesn't have such limitations and we can cut a lot of complex code out of fault path. khugepaged is now able to recover THP from such ranges if the configuration allows. Signed-off-by: Kirill A. Shutemov --- mm/huge_memory.c | 247 +++++------------------------------------------ 1 file changed, 24 insertions(+), 223 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ef6a6bcb291f..15b7a9c86b7c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1206,262 +1206,63 @@ void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd) spin_unlock(vmf->ptl); } -static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, - pmd_t orig_pmd, struct page *page) -{ - struct vm_area_struct *vma = vmf->vma; - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - struct mem_cgroup *memcg; - pgtable_t pgtable; - pmd_t _pmd; - int i; - vm_fault_t ret = 0; - struct page **pages; - struct mmu_notifier_range range; - - pages = kmalloc_array(HPAGE_PMD_NR, sizeof(struct page *), - GFP_KERNEL); - if (unlikely(!pages)) { - ret |= VM_FAULT_OOM; - goto out; - } - - for (i = 0; i < HPAGE_PMD_NR; i++) { - pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE, vma, - vmf->address, page_to_nid(page)); - if (unlikely(!pages[i] || - mem_cgroup_try_charge_delay(pages[i], vma->vm_mm, - GFP_KERNEL, &memcg, false))) { - if (pages[i]) - put_page(pages[i]); - while (--i >= 0) { - memcg = (void *)page_private(pages[i]); - set_page_private(pages[i], 0); - mem_cgroup_cancel_charge(pages[i], memcg, - false); - put_page(pages[i]); - } - kfree(pages); - ret |= VM_FAULT_OOM; - goto out; - } - set_page_private(pages[i], (unsigned long)memcg); - } - - for (i = 0; i < HPAGE_PMD_NR; i++) { - copy_user_highpage(pages[i], page + i, - haddr + PAGE_SIZE * i, vma); - __SetPageUptodate(pages[i]); - cond_resched(); - } - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, - haddr, haddr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - - vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) - goto out_free_pages; - VM_BUG_ON_PAGE(!PageHead(page), page); - - /* - * Leave pmd empty until pte is filled note we must notify here as - * concurrent CPU thread might write to new page before the call to - * mmu_notifier_invalidate_range_end() happens which can lead to a - * device seeing memory write in different order than CPU. - * - * See Documentation/vm/mmu_notifier.rst - */ - pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd); - - pgtable = pgtable_trans_huge_withdraw(vma->vm_mm, vmf->pmd); - pmd_populate(vma->vm_mm, &_pmd, pgtable); - - for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { - pte_t entry; - entry = mk_pte(pages[i], vma->vm_page_prot); - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - memcg = (void *)page_private(pages[i]); - set_page_private(pages[i], 0); - page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false); - mem_cgroup_commit_charge(pages[i], memcg, false, false); - lru_cache_add_active_or_unevictable(pages[i], vma); - vmf->pte = pte_offset_map(&_pmd, haddr); - VM_BUG_ON(!pte_none(*vmf->pte)); - set_pte_at(vma->vm_mm, haddr, vmf->pte, entry); - pte_unmap(vmf->pte); - } - kfree(pages); - - smp_wmb(); /* make pte visible before pmd */ - pmd_populate(vma->vm_mm, vmf->pmd, pgtable); - page_remove_rmap(page, true); - spin_unlock(vmf->ptl); - - /* - * No need to double call mmu_notifier->invalidate_range() callback as - * the above pmdp_huge_clear_flush_notify() did already call it. - */ - mmu_notifier_invalidate_range_only_end(&range); - - ret |= VM_FAULT_WRITE; - put_page(page); - -out: - return ret; - -out_free_pages: - spin_unlock(vmf->ptl); - mmu_notifier_invalidate_range_end(&range); - for (i = 0; i < HPAGE_PMD_NR; i++) { - memcg = (void *)page_private(pages[i]); - set_page_private(pages[i], 0); - mem_cgroup_cancel_charge(pages[i], memcg, false); - put_page(pages[i]); - } - kfree(pages); - goto out; -} - vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) { struct vm_area_struct *vma = vmf->vma; - struct page *page = NULL, *new_page; - struct mem_cgroup *memcg; + struct page *page; unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - struct mmu_notifier_range range; - gfp_t huge_gfp; /* for allocation and charge */ - vm_fault_t ret = 0; vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); VM_BUG_ON_VMA(!vma->anon_vma, vma); + if (is_huge_zero_pmd(orig_pmd)) - goto alloc; + goto fallback; + spin_lock(vmf->ptl); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) - goto out_unlock; + + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } page = pmd_page(orig_pmd); VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page); - /* - * We can only reuse the page if nobody else maps the huge page or it's - * part. - */ + + /* Lock page for reuse_swap_page() */ if (!trylock_page(page)) { get_page(page); spin_unlock(vmf->ptl); lock_page(page); spin_lock(vmf->ptl); if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); unlock_page(page); put_page(page); - goto out_unlock; + return 0; } put_page(page); } + + /* + * We can only reuse the page if nobody else maps the huge page or it's + * part. + */ if (reuse_swap_page(page, NULL)) { pmd_t entry; entry = pmd_mkyoung(orig_pmd); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - ret |= VM_FAULT_WRITE; unlock_page(page); - goto out_unlock; - } - unlock_page(page); - get_page(page); - spin_unlock(vmf->ptl); -alloc: - if (__transparent_hugepage_enabled(vma) && - !transparent_hugepage_debug_cow()) { - huge_gfp = alloc_hugepage_direct_gfpmask(vma); - new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PMD_ORDER); - } else - new_page = NULL; - - if (likely(new_page)) { - prep_transhuge_page(new_page); - } else { - if (!page) { - split_huge_pmd(vma, vmf->pmd, vmf->address); - ret |= VM_FAULT_FALLBACK; - } else { - ret = do_huge_pmd_wp_page_fallback(vmf, orig_pmd, page); - if (ret & VM_FAULT_OOM) { - split_huge_pmd(vma, vmf->pmd, vmf->address); - ret |= VM_FAULT_FALLBACK; - } - put_page(page); - } - count_vm_event(THP_FAULT_FALLBACK); - goto out; - } - - if (unlikely(mem_cgroup_try_charge_delay(new_page, vma->vm_mm, - huge_gfp, &memcg, true))) { - put_page(new_page); - split_huge_pmd(vma, vmf->pmd, vmf->address); - if (page) - put_page(page); - ret |= VM_FAULT_FALLBACK; - count_vm_event(THP_FAULT_FALLBACK); - goto out; - } - - count_vm_event(THP_FAULT_ALLOC); - count_memcg_events(memcg, THP_FAULT_ALLOC, 1); - - if (!page) - clear_huge_page(new_page, vmf->address, HPAGE_PMD_NR); - else - copy_user_huge_page(new_page, page, vmf->address, - vma, HPAGE_PMD_NR); - __SetPageUptodate(new_page); - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, - haddr, haddr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - - spin_lock(vmf->ptl); - if (page) - put_page(page); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { spin_unlock(vmf->ptl); - mem_cgroup_cancel_charge(new_page, memcg, true); - put_page(new_page); - goto out_mn; - } else { - pmd_t entry; - entry = mk_huge_pmd(new_page, vma->vm_page_prot); - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd); - page_add_new_anon_rmap(new_page, vma, haddr, true); - mem_cgroup_commit_charge(new_page, memcg, false, true); - lru_cache_add_active_or_unevictable(new_page, vma); - set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); - update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - if (!page) { - add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); - } else { - VM_BUG_ON_PAGE(!PageHead(page), page); - page_remove_rmap(page, true); - put_page(page); - } - ret |= VM_FAULT_WRITE; + return VM_FAULT_WRITE; } + + unlock_page(page); spin_unlock(vmf->ptl); -out_mn: - /* - * No need to double call mmu_notifier->invalidate_range() callback as - * the above pmdp_huge_clear_flush_notify() did already call it. - */ - mmu_notifier_invalidate_range_only_end(&range); -out: - return ret; -out_unlock: - spin_unlock(vmf->ptl); - return ret; +fallback: + __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); + return VM_FAULT_FALLBACK; } /* From patchwork Fri Mar 27 17:06:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11462715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 680071667 for ; Fri, 27 Mar 2020 17:06:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A7C32073B for ; Fri, 27 Mar 2020 17:06:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="Q9Z9XrgC" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A7C32073B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 921E26B0071; Fri, 27 Mar 2020 13:06:09 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7EA766B0078; Fri, 27 Mar 2020 13:06:09 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 355676B0072; Fri, 27 Mar 2020 13:06:09 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id E4F966B0070 for ; Fri, 27 Mar 2020 13:06:08 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CA2B13CE1 for ; Fri, 27 Mar 2020 17:06:08 +0000 (UTC) X-FDA: 76641770016.11.pets69_6fcf5c0d2a10f X-Spam-Summary: 2,0,0,752c22133003de7a,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:1:2:41:355:379:541:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1801:2393:2553:2559:2562:2901:3138:3139:3140:3141:3142:3608:3865:3866:3867:3868:3870:4052:4250:4321:4605:5007:6117:6119:6261:6653:8957:10004:11026:11232:11473:11657:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12895:12986:13146:13230:13894:14093:14096:14394:21080:21222:21324:21444:21451:21524:21627:21740:21990:30003:30034:30054:30056:30070:30090,0,RBL:209.85.167.68:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: pets69_6fcf5c0d2a10f X-Filterd-Recvd-Size: 12587 Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 17:06:08 +0000 (UTC) Received: by mail-lf1-f68.google.com with SMTP id z23so8487120lfh.8 for ; Fri, 27 Mar 2020 10:06:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5kQB28Q1srv5R/NFD9NiHhGkcte0ot9iIq76f8JehPk=; b=Q9Z9XrgC1agbGS6iVd1s2BTilTO0EpLN+sb0E1/T4r7u8Tvb/ifohsIOQxVZiio+VQ SOg+W0ADbWLo87YzursLjwt4RcWg7zpPiDeBYSN+wtK1dxLrdrN0FtDa8DCML9ASlM27 bsw0DhpdD0wElolWk9Z3X/JGQAkfKOAmpLQ4orKmxSwYv/1DkIGC6Q2T/V6IlhrfUpMk nj5Qf/2D2DeyTiKHJnw83cEmIl+yl7R+3GyT9i+laFYnxUy6D5kzUFPjLzN2qJU6GmXw 9g/+0SmJgLcq7O+XfJ7D3/0ifej4g5dwkW2frwZ04PGgV1XfGtkMwtUQbYujGNL05CDE 6psw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5kQB28Q1srv5R/NFD9NiHhGkcte0ot9iIq76f8JehPk=; b=N/ECi14/L1A76l1pdPzQpN6lvOrBsLf9USVI8Nidm5STwVDVIKt/EB4159AAKAjbNc hLXkB/4WfHqImNC9bWpJFDf9DicODeWe00DzhAviaW2OPbKVYbBqgu6RgfBC2uMZvHT0 HBYntYv2VppSLivyeT8YGcLIhN6brs4Un5cVDOSdNa3PH+kS5iekVbi4l/CNANe6ZcIL c7XY+pOxWXZAuOAwA15dfXRGK+G3LcvdGem28dSs/MoyipgocsH512dBySoI0viJez+m RnaVJ00qeyXFBO5dr39if5wrgDcLNptDdo81Mlt6FYxTv29vH0FfTVCR40ugh81F4dhb PsIw== X-Gm-Message-State: AGi0PuaSPWcN6cIZBZaoxVGMRfDUtHTgZCMI35WmYyGq5qh8T3mByNp0 LJFVZwnkWiAQfhjufi2V+9SirG7hmMw= X-Google-Smtp-Source: APiQypJTg/7Jr+2YbWa9fzssusziciWK2L5tL//ePdoqgZeZK4AL4xepP8yE4nyjxIGV8HzMS9IywA== X-Received: by 2002:ac2:4316:: with SMTP id l22mr185811lfh.150.1585328766284; Fri, 27 Mar 2020 10:06:06 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id x128sm3240696lff.67.2020.03.27.10.06.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Mar 2020 10:06:05 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 84E70100D2C; Fri, 27 Mar 2020 20:06:07 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH 7/7] khugepaged: Introduce 'max_ptes_shared' tunable Date: Fri, 27 Mar 2020 20:06:01 +0300 Message-Id: <20200327170601.18563-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> References: <20200327170601.18563-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ``max_ptes_shared`` speicies how many pages can be shared across multiple processes. Exeeding the number woul block the collapse:: /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared A higher value may increase memory footprint for some workloads. By default, at least half of pages has to be not shared. Signed-off-by: Kirill A. Shutemov --- Documentation/admin-guide/mm/transhuge.rst | 7 ++ mm/khugepaged.c | 52 ++++++++++++-- tools/testing/selftests/vm/khugepaged.c | 83 ++++++++++++++++++++++ 3 files changed, 138 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index bd5714547cee..d16e4f2bb70f 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being collapsed, resulting fewer pages being collapsed into THPs, and lower memory access performance. +``max_ptes_shared`` speicies how many pages can be shared across multiple +processes. Exeeding the number woul block the collapse:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared + +A higher value may increase memory footprint for some workloads. + Boot parameter ============== diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c8c2c463095c..8e728a602491 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -28,6 +28,8 @@ enum scan_result { SCAN_SUCCEED, SCAN_PMD_NULL, SCAN_EXCEED_NONE_PTE, + SCAN_EXCEED_SWAP_PTE, + SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, SCAN_PAGE_RO, SCAN_LACK_REFERENCED_PAGE, @@ -46,7 +48,6 @@ enum scan_result { SCAN_DEL_PAGE_LRU, SCAN_ALLOC_HUGE_PAGE_FAIL, SCAN_CGROUP_CHARGE_FAIL, - SCAN_EXCEED_SWAP_PTE, SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, }; @@ -71,6 +72,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); */ static unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; +static unsigned int khugepaged_max_ptes_shared __read_mostly; #define MM_SLOTS_HASH_BITS 10 static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); @@ -290,15 +292,43 @@ static struct kobj_attribute khugepaged_max_ptes_swap_attr = __ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show, khugepaged_max_ptes_swap_store); +static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%u\n", khugepaged_max_ptes_shared); +} + +static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err; + unsigned long max_ptes_shared; + + err = kstrtoul(buf, 10, &max_ptes_shared); + if (err || max_ptes_shared > HPAGE_PMD_NR-1) + return -EINVAL; + + khugepaged_max_ptes_shared = max_ptes_shared; + + return count; +} + +static struct kobj_attribute khugepaged_max_ptes_shared_attr = + __ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show, + khugepaged_max_ptes_shared_store); + static struct attribute *khugepaged_attr[] = { &khugepaged_defrag_attr.attr, &khugepaged_max_ptes_none_attr.attr, + &khugepaged_max_ptes_swap_attr.attr, + &khugepaged_max_ptes_shared_attr.attr, &pages_to_scan_attr.attr, &pages_collapsed_attr.attr, &full_scans_attr.attr, &scan_sleep_millisecs_attr.attr, &alloc_sleep_millisecs_attr.attr, - &khugepaged_max_ptes_swap_attr.attr, NULL, }; @@ -360,6 +390,7 @@ int __init khugepaged_init(void) khugepaged_pages_to_scan = HPAGE_PMD_NR * 8; khugepaged_max_ptes_none = HPAGE_PMD_NR - 1; khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8; + khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2; return 0; } @@ -546,7 +577,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, { struct page *page = NULL; pte_t *_pte; - int none_or_zero = 0, result = 0, referenced = 0; + int none_or_zero = 0, shared = 0, result = 0, referenced = 0; bool writable = false; LIST_HEAD(compound_pagelist); @@ -575,6 +606,12 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); + if (page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { + result = SCAN_EXCEED_SHARED_PTE; + goto out; + } + if (PageCompound(page)) { struct page *p; page = compound_head(page); @@ -1160,7 +1197,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, none_or_zero = 0, result = 0, referenced = 0; + int ret = 0, result = 0, referenced = 0; + int none_or_zero = 0, shared = 0; struct page *page = NULL; unsigned long _address; spinlock_t *ptl; @@ -1210,6 +1248,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } + if (page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { + result = SCAN_EXCEED_SHARED_PTE; + goto out_unmap; + } + page = compound_head(page); /* diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 193bde6a1534..3a98d5b2d6d8 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -77,6 +77,7 @@ struct khugepaged_settings { unsigned int scan_sleep_millisecs; unsigned int max_ptes_none; unsigned int max_ptes_swap; + unsigned int max_ptes_shared; unsigned long pages_to_scan; }; @@ -276,6 +277,7 @@ static void write_settings(struct settings *settings) khugepaged->scan_sleep_millisecs); write_num("khugepaged/max_ptes_none", khugepaged->max_ptes_none); write_num("khugepaged/max_ptes_swap", khugepaged->max_ptes_swap); + write_num("khugepaged/max_ptes_shared", khugepaged->max_ptes_shared); write_num("khugepaged/pages_to_scan", khugepaged->pages_to_scan); } @@ -312,6 +314,7 @@ static void save_settings(void) read_num("khugepaged/scan_sleep_millisecs"), .max_ptes_none = read_num("khugepaged/max_ptes_none"), .max_ptes_swap = read_num("khugepaged/max_ptes_swap"), + .max_ptes_shared = read_num("khugepaged/max_ptes_shared"), .pages_to_scan = read_num("khugepaged/pages_to_scan"), }; success("OK"); @@ -786,12 +789,90 @@ static void collapse_fork_compound(void) fail("Fail"); fill_memory(p, 0, page_size); + write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) fail("Timeout"); else if (check_huge(p)) success("OK"); else fail("Fail"); + write_num("khugepaged/max_ptes_shared", + default_settings.khugepaged.max_ptes_shared); + + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); + exit(exit_status); + } + + wait(&wstatus); + exit_status += WEXITSTATUS(wstatus); + + printf("Check if parent still has huge page..."); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_max_ptes_shared() +{ + int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); + int wstatus; + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Share huge page over fork()..."); + if (!fork()) { + /* Do not touch settings on child exit */ + skip_settings_restore = true; + exit_status = 0; + + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Trigger CoW in %d of %d...", + hpage_pmd_nr - max_ptes_shared - 1, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Do not collapse with max_ptes_shared exeeded", p)) + fail("Timeout"); + else if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Trigger CoW in %d of %d...", + hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + + if (wait_for_scan("Collapse with max_ptes_shared PTEs shared", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -820,6 +901,7 @@ int main(void) default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; + default_settings.khugepaged.max_ptes_shared = hpage_pmd_nr / 2; default_settings.khugepaged.pages_to_scan = hpage_pmd_nr * 8; save_settings(); @@ -836,6 +918,7 @@ int main(void) collapse_full_of_compound(); collapse_fork(); collapse_fork_compound(); + collapse_max_ptes_shared(); restore_settings(0); }