From patchwork Tue Mar 23 00:48:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6D72C433DB for ; Tue, 23 Mar 2021 00:49:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5CCF36199F for ; Tue, 23 Mar 2021 00:49:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5CCF36199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 752AB6B00FF; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6EB7F6B0101; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E0E96B0100; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id 2B1876B00FD for ; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DB7B06D99 for ; Tue, 23 Mar 2021 00:49:20 +0000 (UTC) X-FDA: 77949305280.21.70CB7F2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf01.hostedemail.com (Postfix) with ESMTP id C71EB5001522 for ; Tue, 23 Mar 2021 00:49:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460559; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=hD2W80XSCvgjJyAJdeh/8rKOZd4SgNsGsbdJSKqKW+4=; b=VbJfe1TA2GCsWlw2vYyemLVN/HfPaJWwTYFK23M02D2XqYNeVM55EWCoF3JE2Tveu6g67G lP1zIxdrwkFelPIZSAIrbPis0eZWVK6UaXPf4kXWpEsquKrJIWdflkMdJg4sarOBZQaKhP vEEHxFZMyXiOj/418Hu6/xUjiZ4VF4I= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-496-1B-aw2QZN42ZQc7lUdU3jw-1; Mon, 22 Mar 2021 20:49:16 -0400 X-MC-Unique: 1B-aw2QZN42ZQc7lUdU3jw-1 Received: by mail-qk1-f199.google.com with SMTP id b127so789715qkf.19 for ; Mon, 22 Mar 2021 17:49:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=rvoI1qdKEfspGJCPdlDFMoYFs7ZhJOMq3uHtD+HwICI=; b=F2T1LsmWwtkTamV2ir6t6RZBCsoT45haGMjLwF3WuhkKvSuiNRtEvZd7iGMb+BtWcX 0Z77WC7m07V+fVX8rHYaq+TWjjOtt2wKETE8KcUYwnbjJaab3/fLDqURYZXVMNThXl8a gLYyS1rOt/8lZKgq7Q80LVWkKidLmKwFtg9fElz0zqmSiH+/uAppyEimJSX1apMPduZ6 MMdyRCf26cOoksQhhi06zOQgAmXuxea1jFLK1L82MMqAis0RzilvdfNUW4WMxAac6rSJ mZbI4H3S5I2bKzUJjXeEQmSN7Ks4M/MKqu5+ptW26f9BUdKf0JibwzKp20CdosFgPrWZ yF+w== X-Gm-Message-State: AOAM533rUQ2nhAhCBJjzf8/h0cMmZTdKgaAgJiv2pl0/1lcqUgEMU7Xy cIAE5zhdSmiAhuLVBmO1yTKbpS5nry+jZNA5eqaE3lWw9LC0aP114yEr9AOGdk0FbFwIYUvK1fv GwkvEdw2twM0= X-Received: by 2002:ac8:6f2e:: with SMTP id i14mr2339152qtv.277.1616460555972; Mon, 22 Mar 2021 17:49:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxa8nFhBKigj+cy7jQp1pDPBmbwrf4XYyXLlKuRA437qcGajMfEx+HTBlRrPU7LH8eL0xJKww== X-Received: by 2002:ac8:6f2e:: with SMTP id i14mr2339127qtv.277.1616460555441; Mon, 22 Mar 2021 17:49:15 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:14 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 00/23] userfaultfd-wp: Support shmem and hugetlbfs Date: Mon, 22 Mar 2021 20:48:49 -0400 Message-Id: <20210323004912.35132-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: czcfw9phnjygomnmz7tmsgq7jnwxccdi X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C71EB5001522 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf01; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460559-326507 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patchset is based on tag v5.12-rc3-mmots-2021-03-17-22-26. To run the selftest, need to apply the two patches to fix minor mode page leak: https://lore.kernel.org/lkml/20210322175132.36659-1-peterx@redhat.com/ https://lore.kernel.org/lkml/20210322204836.1650221-1-axelrasmussen@google.com/ Since I didn't get any NACK in the previous RFC series for months, I decided to remove the RFC tag starting from this version, so this is v1 of uffd-wp support on shmem & hugetlb. The whole series can also be found online [1]. The major comment I'd like to get is on the new idea of swap special pte. That comes from suggestions from both Hugh and Andrea and I appreciated a lot for those discussions. In short, the so-called "swap special pte" in this patchset is a new type of pte that doesn't exist in the past, but it got used initially in this series in file-backed memories. It is used to persist information even if the ptes got dropped meanwhile when the page cache still existed. For example, when splitting a file-backed huge pmd, we could be simply dropping the pmd entry then wait until another fault coming. It's okay in the past since all information in the pte can be retained from the page cache when the next page fault triggers. However in this case, uffd-wp is per-pte information which cannot be kept in page cache, so that information needs to be maintained somehow still in the pgtable entry, even if the pgtable entry is going to be dropped. Here instead of replacing with a none entry, we used the "swap special pte". Then when the next page fault triggers, we can observe orig_pte to retain this information. I'm copy-pasting some commit message from the patch "mm/swap: Introduce the idea of special swap ptes", where it tried to explain this pte in another angle: We used to have special swap entries, like migration entries, hw-poison entries, device private entries, etc. Those "special swap entries" reside in the range that they need to be at least swap entries first, and their types are decided by swp_type(entry). This patch introduces another idea called "special swap ptes". It's very easy to get confused against "special swap entries", but a speical swap pte should never contain a swap entry at all. It means, it's illegal to call pte_to_swp_entry() upon a special swap pte. Make the uffd-wp special pte to be the first special swap pte. Before this patch, is_swap_pte()==true means one of the below: (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. After this patch, is_swap_pte()==true means one of the below, where case (b) is added: (a) The pte contains a swap entry. (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. (b) The pte does not contain a swap entry at all (so it cannot be passed into pte_to_swp_entry()). For example, uffd-wp special swap pte. Hugetlbfs needs similar thing because it's also file-backed. I directly reused the same special pte there, though the shmem/hugetlb change on supporting this new pte is different since they don't share code path a lot. Patch layout ============ Part (1): Shmem support, this is where the special swap pte is introduced. Some zap rework is needed within the process: shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP mm: Clear vmf->pte after pte_unmap_same() returns mm/userfaultfd: Introduce special pte for unmapped file-backed mem mm/swap: Introduce the idea of special swap ptes shmem/userfaultfd: Handle uffd-wp special pte in page fault handler mm: Drop first_index/last_index in zap_details mm: Introduce zap_details.zap_flags mm: Introduce ZAP_FLAG_SKIP_SWAP mm: Pass zap_flags into unmap_mapping_pages() shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed shmem/userfaultfd: Allow wr-protect none pte for file-backed mem shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps shmem/userfaultfd: Handle the left-overed special swap ptes shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Part (2): Hugetlb support, we need to disable huge pmd sharing for uffd-wp because not compatible just like uffd minor mode. The rest is the changes required to teach hugetlbfs understand the special swap pte too that introduced with the uffd-wp change: hugetlb/userfaultfd: Hook page faults for uffd write protection hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT hugetlb: Pass vma into huge_pte_alloc() hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled mm/hugetlb: Introduce huge version of special swap pte helpers mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler hugetlb/userfaultfd: Allow wr-protect none ptes hugetlb/userfaultfd: Only drop uffd-wp special pte if required Part (3): Enable both features in code and test userfaultfd: Enable write protection for shmem & hugetlbfs userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Tests ========= I've tested it using either userfaultfd kselftest program, but also with umapsort [2] which should be even stricter. Tested page swapping in/out during umapsort. If anyone would like to try umapsort, need to use an extremely hacked version of umap library [3], because by default umap only supports anonymous. So to test it we need to build [3] then [2]. Any comment would be greatly welcomed. Thanks, [1] https://github.com/xzpeter/linux/tree/uffd-wp-shmem-hugetlbfs [2] https://github.com/LLNL/umap-apps [3] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs Peter Xu (23): shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP mm: Clear vmf->pte after pte_unmap_same() returns mm/userfaultfd: Introduce special pte for unmapped file-backed mem mm/swap: Introduce the idea of special swap ptes shmem/userfaultfd: Handle uffd-wp special pte in page fault handler mm: Drop first_index/last_index in zap_details mm: Introduce zap_details.zap_flags mm: Introduce ZAP_FLAG_SKIP_SWAP mm: Pass zap_flags into unmap_mapping_pages() shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed shmem/userfaultfd: Allow wr-protect none pte for file-backed mem shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps shmem/userfaultfd: Handle the left-overed special swap ptes shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() hugetlb/userfaultfd: Hook page faults for uffd write protection hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Introduce huge version of special swap pte helpers hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler hugetlb/userfaultfd: Allow wr-protect none ptes hugetlb/userfaultfd: Only drop uffd-wp special pte if required mm/userfaultfd: Enable write protection for shmem & hugetlbfs userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs arch/arm64/kernel/mte.c | 2 +- arch/x86/include/asm/pgtable.h | 28 +++ fs/dax.c | 10 +- fs/hugetlbfs/inode.c | 15 +- fs/proc/task_mmu.c | 14 +- fs/userfaultfd.c | 38 ++-- include/asm-generic/hugetlb.h | 10 + include/asm-generic/pgtable_uffd.h | 3 + include/linux/hugetlb.h | 25 ++- include/linux/mm.h | 50 ++++- include/linux/mm_inline.h | 43 ++++ include/linux/shmem_fs.h | 5 +- include/linux/swapops.h | 39 +++- include/linux/userfaultfd_k.h | 46 +++++ include/uapi/linux/userfaultfd.h | 7 +- mm/gup.c | 2 +- mm/hmm.c | 2 +- mm/hugetlb.c | 167 ++++++++++++--- mm/khugepaged.c | 14 +- mm/madvise.c | 4 +- mm/memcontrol.c | 2 +- mm/memory.c | 249 ++++++++++++++++++----- mm/migrate.c | 4 +- mm/mincore.c | 2 +- mm/mprotect.c | 63 +++++- mm/mremap.c | 2 +- mm/page_vma_mapped.c | 6 +- mm/rmap.c | 8 + mm/shmem.c | 31 ++- mm/swapfile.c | 2 +- mm/truncate.c | 17 +- mm/userfaultfd.c | 37 ++-- tools/testing/selftests/vm/userfaultfd.c | 9 +- 33 files changed, 776 insertions(+), 180 deletions(-)