From patchwork Wed Jan 4 14:49:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13088669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C862CC4708D for ; Wed, 4 Jan 2023 14:49:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49D148E0003; Wed, 4 Jan 2023 09:49:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44C728E0001; Wed, 4 Jan 2023 09:49:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EDE58E0003; Wed, 4 Jan 2023 09:49:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 20E758E0001 for ; Wed, 4 Jan 2023 09:49:16 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D373A8063E for ; Wed, 4 Jan 2023 14:49:15 +0000 (UTC) X-FDA: 80317399470.04.932A583 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 3BC50180009 for ; Wed, 4 Jan 2023 14:49:13 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KSNd9UEa; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672843753; a=rsa-sha256; cv=none; b=pmCrKzCOune50C/d3umizATpgFk74J0FdaTqvXeU06yBgqbq+ti13F3vGLohOT+VBXJTyh HEMlljL5p5tu38IN+fxBJrgIXju1sHQ2vnzODWO8bsizMtDGXuwZY5ptwEB2bb+fgEQ3xZ HrJBaOOhWUlnqq7NHDjdqT1mj/RneCo= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KSNd9UEa; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672843753; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=vYvLrxNyig4C2xqpvcE+0kFAae3psmSX82BXbPf3PXw=; b=kI4tntrjdnwv2qqPUnXKiplSzQxXy/AGKcmIeoWcDSLyXnBQ7v+RU8yTXjlmlu2dl2EwCJ /1rJbKCgnBtZ4LvHB3go+UsvYQiAnhroH72r532XFwlxemhCMdXyadSvlRTB3s4AL/UtkW 23OLazJ/vIYgXRXMR70XlGXLA5+sPL4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672843752; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=vYvLrxNyig4C2xqpvcE+0kFAae3psmSX82BXbPf3PXw=; b=KSNd9UEa34FD9WRguLjSqyUPb1ltRb3iXlPUFPyYTxY9YyETAd9x0sO5/9OeV4F/n5n6VK qgQuLcTtfqSObsrdpC+6tGWwni8xxVLW53zwhjfn4MBOIfVKaxvO37lvZJkfW850tGNpNp SYWaPHgoWOpLDyg2cHuzFeDLwalQ2mA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-37-uZTwnicCP9GfYWq8ihQLrg-1; Wed, 04 Jan 2023 09:49:10 -0500 X-MC-Unique: uZTwnicCP9GfYWq8ihQLrg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8414E858F09; Wed, 4 Jan 2023 14:49:09 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.90]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9FC66C15BA0; Wed, 4 Jan 2023 14:49:06 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Shuah Khan , Hugh Dickins , Peter Xu , Vlastimil Babka , Nadav Amit , Zach O'Keefe , Andrea Arcangeli Subject: [PATCH mm-unstable v1] selftests/vm: cow: Add COW tests for collapsing of PTE-mapped anon THP Date: Wed, 4 Jan 2023 15:49:05 +0100 Message-Id: <20230104144905.460075-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Rspam-User: X-Rspamd-Queue-Id: 3BC50180009 X-Rspamd-Server: rspam01 X-Stat-Signature: x3onnibb7nwbgyyo47g8fb1yn9gp86yp X-HE-Tag: 1672843753-325242 X-HE-Meta: U2FsdGVkX18BQXE5HtHrnnOo7ROF4urdsI+SP7gTY9UjN4Dp4SAZuuuHudhqaULvci6aHQGL1OQIvozP1aN5OguBT6CT8K9VPUcMni/ScCEfxLdEUHqrxaP4ylfM6D19w1LRStYaTR+2aOTQsqkz/ilcY8PDuxy3ICUsAZU6crnCIC66Cpre9qHeo7rpdUexLCY41YjHCD5MG39Xl8NwEA/FcHVZ08nfj3yb9qHExdo2LuEZTqoSBOEFB/JZxg4HgU1UVkJmpKMph1PghnwreuBX3hgkS2vsQFjQAH0FHBdy7iLkx2mXZ/ib2R/GPbzdk1szjYl1A1rV1C8/57+u75+WOKC5N9zbfxsElMwoOHmGZ7dc9RVw2IrhePAQ7dad2pYSdpPOrxfQu+sgxPWZlO+YfNrvvBIUuYb54Xgn+NaSf9MFzkWzxzE7/8DScYBF6RWJU/iju5HKC5PH1Rxs/8CaI8crcLkKx4pF/ygfSm31c7BJESjn/angRT7DruulBGqfqk9be3SAO6WMTrhcfATcj9CvLlZhZaIlhPcv2poQzDXS1VA7AR+NvLCyRA8d699PEPemTey+8gDChhi/c570GGdfZYP3ALP3n5zf443sx1TQpxVpdzrv7MAQQ9ctwR4SFlZ/+CVO/va4bhu23hAYZRxE4Ez8E5TdcHH15z0u2DT8GbIfhuC8DLxsZzF0qZ3yNEOgYXL0QFkeLgZmCF0DKSHR6s+/lbLs9N5tqcCiKws11lxFdRO9l4ZsL8M25UuEq7fYGi1H5OMvlyKtX6w+dr2J3ZUKa7MR+TAsFD9iyewPq4VLduk7SidOaMLsxnL+op8Owdy2r2x40sCCKQjiBVf6PGZS0+1G5eY1UUfHfdsZ1syPi2JLIPB2J6W/+K8DZ8dqonbjL1UqYvxL6Do/XOtQrPxCOaYJRbaQ32HXHrjzbgcT5dGXX6MOHmxfDadXrb7O5cDPk604/8d p5ULRtvv PsILeBb7N0CUTuVk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, anonymous PTE-mapped THPs cannot be collapsed in-place: collapsing (e.g., via MADV_COLLAPSE) implies allocating a fresh THP and mapping that new THP via a PMD: as it's a fresh anon THP, it will get the exclusive flag set on the head page and everybody is happy. However, if the kernel would ever support in-place collapse of anonymous THPs (replacing a page table mapping each sub-page of a THP via PTEs with a single PMD mapping the complete THP), exclusivity information stored for each sub-page would have to be collapsed accordingly: (1) All PTEs map !exclusive anon sub-pages: the in-place collapsed THP must not not have the exclusive flag set on the head page mapped by the PMD. This is the easiest case to handle ("simply don't set any exclusive flags"). (2) All PTEs map exclusive anon sub-pages: when collapsing, we have to clear the exclusive flag from all tail pages and only leave the exclusive flag set for the head page. Otherwise, fork() after collapse would not clear the exclusive flags from the tail pages and we'd be in trouble once PTE-mapping the shared THP when writing to shared tail pages that still have the exclusive flag set. This would effectively revert what the PTE-mapping code does when propagating the exclusive flag to all sub-pages. (3) PTEs map a mixture of exclusive and !exclusive anon sub-pages (can happen e.g., due to MADV_DONTFORK before fork()). We must not collapse the THP in-place, otherwise bad things may happen: the exclusive flags of sub-pages would get ignored and the exclusive flag of the head page would get used instead. Now that we have MADV_COLLAPSE in place to trigger collapsing a THP, let's add some test cases that would bail out early, if we'd voluntarily/accidantially unlock in-place collapse for anon THPs and forget about taking proper care of exclusive flags. Running the test on a kernel with MADV_COLLAPSE support: # [INFO] Anonymous THP tests # [RUN] Basic COW after fork() when collapsing before fork() ok 169 No leak from parent into child # [RUN] Basic COW after fork() when collapsing after fork() (fully shared) ok 170 # SKIP MADV_COLLAPSE failed: Invalid argument # [RUN] Basic COW after fork() when collapsing after fork() (lower shared) ok 171 No leak from parent into child # [RUN] Basic COW after fork() when collapsing after fork() (upper shared) ok 172 No leak from parent into child For now, MADV_COLLAPSE always seems to fail if all PTEs map shared sub-pages. Cc: Andrew Morton Cc: Shuah Khan Cc: Hugh Dickins Cc: Peter Xu Cc: Vlastimil Babka Cc: Nadav Amit Cc: Zach O'Keefe Cc: Andrea Arcangeli Signed-off-by: David Hildenbrand --- A patch from Hugh made me explore the wonderful world of in-place collapse of THP, and I was briefly concerned that it would apply to anon THP as well. After thinking about it a bit, I decided to add test cases, to better be safe than sorry in any case, and to document how PG_anon_exclusive is to be handled in that case. --- tools/testing/selftests/vm/cow.c | 228 +++++++++++++++++++++++++++++++ 1 file changed, 228 insertions(+) diff --git a/tools/testing/selftests/vm/cow.c b/tools/testing/selftests/vm/cow.c index 26f6ea3079e2..16216d893d96 100644 --- a/tools/testing/selftests/vm/cow.c +++ b/tools/testing/selftests/vm/cow.c @@ -30,6 +30,10 @@ #include "../kselftest.h" #include "vm_util.h" +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif + static size_t pagesize; static int pagemap_fd; static size_t thpsize; @@ -1178,6 +1182,228 @@ static int tests_per_anon_test_case(void) return tests; } +enum anon_thp_collapse_test { + ANON_THP_COLLAPSE_UNSHARED, + ANON_THP_COLLAPSE_FULLY_SHARED, + ANON_THP_COLLAPSE_LOWER_SHARED, + ANON_THP_COLLAPSE_UPPER_SHARED, +}; + +static void do_test_anon_thp_collapse(char *mem, size_t size, + enum anon_thp_collapse_test test) +{ + struct comm_pipes comm_pipes; + char buf; + int ret; + + ret = setup_comm_pipes(&comm_pipes); + if (ret) { + ksft_test_result_fail("pipe() failed\n"); + return; + } + + /* + * Trigger PTE-mapping the THP by temporarily mapping a single subpage + * R/O, such that we can try collapsing it later. + */ + ret = mprotect(mem + pagesize, pagesize, PROT_READ); + if (ret) { + ksft_test_result_fail("mprotect() failed\n"); + goto close_comm_pipes; + } + ret = mprotect(mem + pagesize, pagesize, PROT_READ | PROT_WRITE); + if (ret) { + ksft_test_result_fail("mprotect() failed\n"); + goto close_comm_pipes; + } + + switch (test) { + case ANON_THP_COLLAPSE_UNSHARED: + /* Collapse before actually COW-sharing the page. */ + ret = madvise(mem, size, MADV_COLLAPSE); + if (ret) { + ksft_test_result_skip("MADV_COLLAPSE failed: %s\n", + strerror(errno)); + goto close_comm_pipes; + } + break; + case ANON_THP_COLLAPSE_FULLY_SHARED: + /* COW-share the full PTE-mapped THP. */ + break; + case ANON_THP_COLLAPSE_LOWER_SHARED: + /* Don't COW-share the upper part of the THP. */ + ret = madvise(mem + size / 2, size / 2, MADV_DONTFORK); + if (ret) { + ksft_test_result_fail("MADV_DONTFORK failed\n"); + goto close_comm_pipes; + } + break; + case ANON_THP_COLLAPSE_UPPER_SHARED: + /* Don't COW-share the lower part of the THP. */ + ret = madvise(mem, size / 2, MADV_DONTFORK); + if (ret) { + ksft_test_result_fail("MADV_DONTFORK failed\n"); + goto close_comm_pipes; + } + break; + default: + assert(false); + } + + ret = fork(); + if (ret < 0) { + ksft_test_result_fail("fork() failed\n"); + goto close_comm_pipes; + } else if (!ret) { + switch (test) { + case ANON_THP_COLLAPSE_UNSHARED: + case ANON_THP_COLLAPSE_FULLY_SHARED: + exit(child_memcmp_fn(mem, size, &comm_pipes)); + break; + case ANON_THP_COLLAPSE_LOWER_SHARED: + exit(child_memcmp_fn(mem, size / 2, &comm_pipes)); + break; + case ANON_THP_COLLAPSE_UPPER_SHARED: + exit(child_memcmp_fn(mem + size / 2, size / 2, + &comm_pipes)); + break; + default: + assert(false); + } + } + + while (read(comm_pipes.child_ready[0], &buf, 1) != 1) + ; + + switch (test) { + case ANON_THP_COLLAPSE_UNSHARED: + break; + case ANON_THP_COLLAPSE_UPPER_SHARED: + case ANON_THP_COLLAPSE_LOWER_SHARED: + /* + * Revert MADV_DONTFORK such that we merge the VMAs and are + * able to actually collapse. + */ + ret = madvise(mem, size, MADV_DOFORK); + if (ret) { + ksft_test_result_fail("MADV_DOFORK failed\n"); + write(comm_pipes.parent_ready[1], "0", 1); + wait(&ret); + goto close_comm_pipes; + } + /* FALLTHROUGH */ + case ANON_THP_COLLAPSE_FULLY_SHARED: + /* Collapse before anyone modified the COW-shared page. */ + ret = madvise(mem, size, MADV_COLLAPSE); + if (ret) { + ksft_test_result_skip("MADV_COLLAPSE failed: %s\n", + strerror(errno)); + write(comm_pipes.parent_ready[1], "0", 1); + wait(&ret); + goto close_comm_pipes; + } + break; + default: + assert(false); + } + + /* Modify the page. */ + memset(mem, 0xff, size); + write(comm_pipes.parent_ready[1], "0", 1); + + wait(&ret); + if (WIFEXITED(ret)) + ret = WEXITSTATUS(ret); + else + ret = -EINVAL; + + ksft_test_result(!ret, "No leak from parent into child\n"); +close_comm_pipes: + close_comm_pipes(&comm_pipes); +} + +static void test_anon_thp_collapse_unshared(char *mem, size_t size) +{ + do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_UNSHARED); +} + +static void test_anon_thp_collapse_fully_shared(char *mem, size_t size) +{ + do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_FULLY_SHARED); +} + +static void test_anon_thp_collapse_lower_shared(char *mem, size_t size) +{ + do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_LOWER_SHARED); +} + +static void test_anon_thp_collapse_upper_shared(char *mem, size_t size) +{ + do_test_anon_thp_collapse(mem, size, ANON_THP_COLLAPSE_UPPER_SHARED); +} + +/* + * Test cases that are specific to anonymous THP: pages in private mappings + * that may get shared via COW during fork(). + */ +static const struct test_case anon_thp_test_cases[] = { + /* + * Basic COW test for fork() without any GUP when collapsing a THP + * before fork(). + * + * Re-mapping a PTE-mapped anon THP using a single PMD ("in-place + * collapse") might easily get COW handling wrong when not collapsing + * exclusivity information properly. + */ + { + "Basic COW after fork() when collapsing before fork()", + test_anon_thp_collapse_unshared, + }, + /* Basic COW test, but collapse after COW-sharing a full THP. */ + { + "Basic COW after fork() when collapsing after fork() (fully shared)", + test_anon_thp_collapse_fully_shared, + }, + /* + * Basic COW test, but collapse after COW-sharing the lower half of a + * THP. + */ + { + "Basic COW after fork() when collapsing after fork() (lower shared)", + test_anon_thp_collapse_lower_shared, + }, + /* + * Basic COW test, but collapse after COW-sharing the upper half of a + * THP. + */ + { + "Basic COW after fork() when collapsing after fork() (upper shared)", + test_anon_thp_collapse_upper_shared, + }, +}; + +static void run_anon_thp_test_cases(void) +{ + int i; + + if (!thpsize) + return; + + ksft_print_msg("[INFO] Anonymous THP tests\n"); + + for (i = 0; i < ARRAY_SIZE(anon_thp_test_cases); i++) { + struct test_case const *test_case = &anon_thp_test_cases[i]; + + ksft_print_msg("[RUN] %s\n", test_case->desc); + do_run_with_thp(test_case->fn, THP_RUN_PMD); + } +} + +static int tests_per_anon_thp_test_case(void) +{ + return thpsize ? 1 : 0; +} + typedef void (*non_anon_test_fn)(char *mem, const char *smem, size_t size); static void test_cow(char *mem, const char *smem, size_t size) @@ -1518,6 +1744,7 @@ int main(int argc, char **argv) ksft_print_header(); ksft_set_plan(ARRAY_SIZE(anon_test_cases) * tests_per_anon_test_case() + + ARRAY_SIZE(anon_thp_test_cases) * tests_per_anon_thp_test_case() + ARRAY_SIZE(non_anon_test_cases) * tests_per_non_anon_test_case()); gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR); @@ -1526,6 +1753,7 @@ int main(int argc, char **argv) ksft_exit_fail_msg("opening pagemap failed\n"); run_anon_test_cases(); + run_anon_thp_test_cases(); run_non_anon_test_cases(); err = ksft_get_fail_cnt();