[v1,07/11] mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (!hugetlb)

It is currently possible for a child process to observe modifications of
anonymous pages by the parent process after fork() in some cases, which is
not only a userspace visible violation of the POSIX semantics of
MAP_PRIVATE, but more importantly a real security issue.

This issue, including other related COW issues, has been summarized in [1]:
"
  1. Observing Memory Modifications of Private Pages From A Child Process

  Long story short: process-private memory might not be as private as you
  think once you fork(): successive modifications of private memory
  regions in the parent process can still be observed by the child
  process, for example, by smart use of vmsplice()+munmap().

  The core problem is that pinning pages readable in a child process, such
  as done via the vmsplice system call, can result in a child process
  observing memory modifications done in the parent process the child is
  not supposed to observe. [1] contains an excellent summary and [2]
  contains further details. This issue was assigned CVE-2020-29374 [9].

  For this to trigger, it's required to use a fork() without subsequent
  exec(), for example, as used under Android zygote. Without further
  details about an application that forks less-privileged child processes,
  one cannot really say what's actually affected and what's not -- see the
  details section the end of this mail for a short sshd/openssh analysis.

  While commit 17839856fd58 ("gup: document and work around "COW can break
  either way" issue") fixed this issue and resulted in other problems
  (e.g., ptrace on pmem), commit 09854ba94c6a ("mm: do_wp_page()
  simplification") re-introduced part of the problem unfortunately.

  The original reproducer can be modified quite easily to use THP [3] and
  make the issue appear again on upstream kernels. I modified it to use
  hugetlb [4] and it triggers as well. The problem is certainly less
  severe with hugetlb than with THP; it merely highlights that we still
  have plenty of open holes we should be closing/fixing.

  Regarding vmsplice(), the only known workaround is to disallow the
  vmsplice() system call ... or disable THP and hugetlb. But who knows
  what else is affected (RDMA? O_DIRECT?) to achieve the same goal -- in
  the end, it's a more generic issue.
"

This security issue / MAP_PRIVATE POSIX violation was first reported by
Jann Horn on 27 May 2020 and it currently affects anonymous THP and
hugetlb.

Ordinary anonymous pages are currently not affected, because the COW logic
was changed in commit 09854ba94c6a ("mm: do_wp_page() simplification")
for them to COW on "page_count() != 1" instead of "mapcount > 1", which
unfortunately results in other COW issues, some of them documented in [1]
as well.

To fix this COW issue once and for all, introduce GUP-triggered unsharing
that can be conditionally triggered via FAULT_FLAG_UNSHARE. In contrast to
traditional COW, unsharing will leave the copied page mapped
write-protected in the page table, not having the semantics of a write
fault.

Logically, unsharing is triggered "early", as soon as GUP performs the
action that could result in a COW getting missed later and the security
issue triggering: however, unsharing is not triggered as before via a
write fault with undesired side effects.

GUP triggers unsharing if all of the following conditions are met:
* The page is mapped R/O
* We have an anonymous page, excluding KSM
* We want to read (!FOLL_WRITE)
* Unsharing is not disabled (!FOLL_NOUNSHARE)
* We want to take a reference (FOLL_GET or FOLL_PIN)
* The page is a shared anonymous page: mapcount > 1

As this patch introduces the same unsharing logic also for ordinary
PTE-mapped anonymous pages, it also paves the way to fix the other known
COW related issues documented in [1] without reintroducing the security
issue or reintroducing other issues we observed in the past (e.g., broken
ptrace on pmem).

We better leave the follow_page() API alone: it's an internal API and
its users don't actually allow for user space to read page content and they
don't expect to get "NULL" for actually present pages -- because they
usually don't trigger faults. Introduce and use FOLL_NOUNSHARE for that
purpose. We could also think about using it for other corner cases, such
as get_dump_page().

Note: GUP users that use memory notifiers to synchronize with the MM
      don't have to bother about unsharing: they don't actually take
      a reference on the pages and are properly synchronized against MM
      changes to never result in consistency issues.

Add a TODO item that the mechanism should be extended to improve GUP
long-term as a whole, avoiding the requirement for FOLL_WRITE|FOLL_FORCE.

hugetlb case will be handled separately.

This commit is based on prototype patches by Andrea.

[1] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com

Co-developed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mm.h | 10 ++++++
 mm/gup.c           | 90 ++++++++++++++++++++++++++++++++++++++++++++--
 mm/huge_memory.c   |  7 ++++
 3 files changed, 104 insertions(+), 3 deletions(-)

Message ID	20211217113049.23850-8-david@redhat.com (mailing list archive)
State	New
Headers	show Return-Path: <linux-kselftest-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7DF3C433EF for <linux-kselftest@archiver.kernel.org>; Fri, 17 Dec 2021 11:34:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235532AbhLQLeS (ORCPT <rfc822;linux-kselftest@archiver.kernel.org>); Fri, 17 Dec 2021 06:34:18 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:50542 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235517AbhLQLeL (ORCPT <rfc822;linux-kselftest@vger.kernel.org>); Fri, 17 Dec 2021 06:34:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639740851; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GcIVNk9FqOqzjRi5JqPcDJ4wYU12HMN6l6PoOYX4XeU=; b=SloD0UewO5ZzG3HL4trjWPnYfYz09MR+lAclOXDuhQmeQ9JJVt2eaZjxyP3uUjFojzaeug sm1V9XK6JLq+6Vz25daVoFkVZppARDympxcZSoy5is09VQxrv01yZzPeQeyjy4VPLJXx+K 5cLWf5lYJsaaF7knmu4N5vLgS9jWDow= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-540-oSDSQLLFP4mmgy0batSDew-1; Fri, 17 Dec 2021 06:34:07 -0500 X-MC-Unique: oSDSQLLFP4mmgy0batSDew-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4FC52101AFAC; Fri, 17 Dec 2021 11:34:05 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id 307788ACF7; Fri, 17 Dec 2021 11:33:59 +0000 (UTC) From: David Hildenbrand <david@redhat.com> To: linux-kernel@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins <hughd@google.com>, Linus Torvalds <torvalds@linux-foundation.org>, David Rientjes <rientjes@google.com>, Shakeel Butt <shakeelb@google.com>, John Hubbard <jhubbard@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>, Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@linux.ibm.com>, Yang Shi <shy828301@gmail.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>, Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>, Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>, Andrea Arcangeli <aarcange@redhat.com>, Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>, Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>, Jan Kara <jack@suse.cz>, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org, David Hildenbrand <david@redhat.com> Subject: [PATCH v1 07/11] mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (!hugetlb) Date: Fri, 17 Dec 2021 12:30:45 +0100 Message-Id: <20211217113049.23850-8-david@redhat.com> In-Reply-To: <20211217113049.23850-1-david@redhat.com> References: <20211217113049.23850-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: <linux-kselftest.vger.kernel.org> X-Mailing-List: linux-kselftest@vger.kernel.org
Series	mm: COW fixes part 1: fix the COW security issue for THP and hugetlb \| expand [v1,00/11] mm: COW fixes part 1: fix the COW security issue for THP and hugetlb [v1,01/11] seqlock: provide lockdep-free raw_seqcount_t variant [v1,02/11] mm: thp: consolidate mapcount logic on THP split [v1,03/11] mm: simplify hugetlb and file-THP handling in __page_mapcount() [v1,04/11] mm: thp: simlify total_mapcount() [v1,05/11] mm: thp: allow for reading the THP mapcount atomically via a raw_seqlock_t [v1,06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) [v1,07/11] mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (!hugetlb) [v1,08/11] mm: hugetlb: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE [v1,09/11] mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (hugetlb) [v1,10/11] mm: thp: introduce and use page_trans_huge_anon_shared() [v1,11/11] selftests/vm: add tests for the known COW security issues

[v1,07/11] mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required (!hugetlb)

Commit Message

Patch