[0/5] 4.14 backports of fixes for "CoW after fork() issue"

Message ID	20210401181741.168763-1-surenb@google.com (mailing list archive)
Headers	show Return-Path: <SRS0=d9cH=I6=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A7D060724 Date: Thu, 1 Apr 2021 11:17:36 -0700 Message-Id: <20210401181741.168763-1-surenb@google.com> Mime-Version: 1.0 Subject: [PATCH 0/5] 4.14 backports of fixes for "CoW after fork() issue" From: Suren Baghdasaryan <surenb@google.com> To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan <surenb@google.com> Content-Type: text/plain; charset="UTF-8" Received-SPF: none (flex--surenb.bounces.google.com>: No applicable sender policy available) receiver=imf23; identity=mailfrom; envelope-from="<3Sw5mYAYKCAo241oxlqyyqvo.mywvsx47-wwu5kmu.y1q@flex--surenb.bounces.google.com>"; helo=mail-qk1-f201.google.com; client-ip=209.85.222.201 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	4.14 backports of fixes for "CoW after fork() issue" \| expand [0/5] 4.14 backports of fixes for "CoW after fork() issue" [1/5] mm: reuse only-pte-mapped KSM page in do_wp_page() [2/5] mm: do_wp_page() simplification [3/5] mm: fix misplaced unlock_page in do_wp_page() [4/5] userfaultfd: wp: add helper for writeprotect check [5/5] mm/userfaultfd: fix memory corruption due to writeprotect

Suren Baghdasaryan April 1, 2021, 6:17 p.m. UTC

We received a report that the copy-on-write issue repored by Jann Horn in
https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
coded in vmsplice.c). I confirmed this and also that the issue was not
reproducible with 5.10 kernel. I tracked the fix to the following patch
introduced in 5.9 which changes the do_wp_page() logic:

09854ba94c6a 'mm: do_wp_page() simplification'

I backported this patch (#2 in the series) along with 2 prerequisite patches
(#1 and #4) that keep the backports clean and two followup fixes to the main
patch (#3 and #5). I had to skip the following fix:

feb889fb40fa 'mm: don't put pinned pages into the swap cache'

because it uses page_maybe_dma_pinned() which does not exists in earlier
kernels. Because pin_user_pages() does not exist there as well, I *think*
we can safely skip this fix on older kernels, but I would appreciate if
someone could confirm that claim.

The patchset cleanly applies over: stable linux-4.14.y, tag: v4.14.228

Note: 4.14 and 4.19 backports are very similar, so while I backported
only to these two versions I think backports for other versions can be
done easily.

Kirill Tkhai (1):
  mm: reuse only-pte-mapped KSM page in do_wp_page()

Linus Torvalds (2):
  mm: do_wp_page() simplification
  mm: fix misplaced unlock_page in do_wp_page()

Nadav Amit (1):
  mm/userfaultfd: fix memory corruption due to writeprotect

Shaohua Li (1):
  userfaultfd: wp: add helper for writeprotect check

 include/linux/ksm.h           |  7 ++++
 include/linux/userfaultfd_k.h | 10 ++++++
 mm/ksm.c                      | 30 ++++++++++++++++--
 mm/memory.c                   | 60 ++++++++++++++++-------------------
 4 files changed, 73 insertions(+), 34 deletions(-)

Linus Torvalds April 1, 2021, 6:59 p.m. UTC | #1

On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> We received a report that the copy-on-write issue repored by Jann Horn in
> https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> coded in vmsplice.c).

Gaah.

> I confirmed this and also that the issue was not
> reproducible with 5.10 kernel. I tracked the fix to the following patch
> introduced in 5.9 which changes the do_wp_page() logic:
>
> 09854ba94c6a 'mm: do_wp_page() simplification'

The problem here is that there's a _lot_ more patches than the few you
found that fixed various other cases (THP etc).

> I backported this patch (#2 in the series) along with 2 prerequisite patches
> (#1 and #4) that keep the backports clean and two followup fixes to the main
> patch (#3 and #5). I had to skip the following fix:
>
> feb889fb40fa 'mm: don't put pinned pages into the swap cache'
>
> because it uses page_maybe_dma_pinned() which does not exists in earlier
> kernels. Because pin_user_pages() does not exist there as well, I *think*
> we can safely skip this fix on older kernels, but I would appreciate if
> someone could confirm that claim.

Hmm. I think this means that swap activity can now break the
connection to a GUP page (the whole pre-pinning model), but it
probably isn't a new problem for 4.9/4.19.

I suspect the test there should be something like

        /* Single mapper, more references than us and the map? */
        if (page_mapcount(page) == 1 && page_count(page) > 2)
                goto keep_locked;

in the pre-pinning days.

But I really think that there are a number of other commits you're
missing too, because we had a whole series for THP fixes for the same
exact issue.

Added Peter Xu to the cc, because he probably tracked those issues
better than I did.

So NAK on this for now, I think this limited patch-set likely
introduces more problems than it fixes.

        Linus

Suren Baghdasaryan April 1, 2021, 7:43 p.m. UTC | #2

On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > We received a report that the copy-on-write issue repored by Jann Horn in
> > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > coded in vmsplice.c).
>
> Gaah.
>
> > I confirmed this and also that the issue was not
> > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > introduced in 5.9 which changes the do_wp_page() logic:
> >
> > 09854ba94c6a 'mm: do_wp_page() simplification'
>
> The problem here is that there's a _lot_ more patches than the few you
> found that fixed various other cases (THP etc).
>
> > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > patch (#3 and #5). I had to skip the following fix:
> >
> > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> >
> > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > we can safely skip this fix on older kernels, but I would appreciate if
> > someone could confirm that claim.
>
> Hmm. I think this means that swap activity can now break the
> connection to a GUP page (the whole pre-pinning model), but it
> probably isn't a new problem for 4.9/4.19.
>
> I suspect the test there should be something like
>
>         /* Single mapper, more references than us and the map? */
>         if (page_mapcount(page) == 1 && page_count(page) > 2)
>                 goto keep_locked;
>
> in the pre-pinning days.
>
> But I really think that there are a number of other commits you're
> missing too, because we had a whole series for THP fixes for the same
> exact issue.
>
> Added Peter Xu to the cc, because he probably tracked those issues
> better than I did.
>
> So NAK on this for now, I think this limited patch-set likely
> introduces more problems than it fixes.

Thanks for confirming my worries. I'll be happy to add additional
backports if Peter can point me to them.
Thanks,
Suren.

>
>         Linus

Peter Xu April 1, 2021, 11:47 p.m. UTC | #3

Hi, Suren,

On Thu, Apr 01, 2021 at 12:43:51PM -0700, Suren Baghdasaryan wrote:
> On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > We received a report that the copy-on-write issue repored by Jann Horn in
> > > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > > coded in vmsplice.c).
> >
> > Gaah.
> >
> > > I confirmed this and also that the issue was not
> > > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > > introduced in 5.9 which changes the do_wp_page() logic:
> > >
> > > 09854ba94c6a 'mm: do_wp_page() simplification'
> >
> > The problem here is that there's a _lot_ more patches than the few you
> > found that fixed various other cases (THP etc).
> >
> > > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > > patch (#3 and #5). I had to skip the following fix:
> > >
> > > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> > >
> > > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > > we can safely skip this fix on older kernels, but I would appreciate if
> > > someone could confirm that claim.
> >
> > Hmm. I think this means that swap activity can now break the
> > connection to a GUP page (the whole pre-pinning model), but it
> > probably isn't a new problem for 4.9/4.19.
> >
> > I suspect the test there should be something like
> >
> >         /* Single mapper, more references than us and the map? */
> >         if (page_mapcount(page) == 1 && page_count(page) > 2)
> >                 goto keep_locked;
> >
> > in the pre-pinning days.
> >
> > But I really think that there are a number of other commits you're
> > missing too, because we had a whole series for THP fixes for the same
> > exact issue.
> >
> > Added Peter Xu to the cc, because he probably tracked those issues
> > better than I did.
> >
> > So NAK on this for now, I think this limited patch-set likely
> > introduces more problems than it fixes.
> 
> Thanks for confirming my worries. I'll be happy to add additional
> backports if Peter can point me to them.

If for a full-alignment with current upstream, I can at least think of below
series:

Early cow for general pages:
https://lore.kernel.org/lkml/20200925222600.6832-1-peterx@redhat.com/

A race fix for copy_page and gup-fast:
https://lore.kernel.org/linux-mm/0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com/

Early cow for hugetlbfs (which is very recently):
https://lore.kernel.org/lkml/20210217233547.93892-1-peterx@redhat.com/

But I believe they'll bring a number of dependencies too like the page pinned
work; so seems not easy.

Btw, AFAICT you don't need patch 4/5 in this series for 4.14/4.19, since
those're only for uffd-wp and it doesn't exist until 5.7.

Thanks,

Suren Baghdasaryan April 2, 2021, 12:12 a.m. UTC | #4

On Thu, Apr 1, 2021 at 4:47 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,
>
> On Thu, Apr 01, 2021 at 12:43:51PM -0700, Suren Baghdasaryan wrote:
> > On Thu, Apr 1, 2021 at 11:59 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > We received a report that the copy-on-write issue repored by Jann Horn in
> > > > https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
> > > > reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
> > > > coded in vmsplice.c).
> > >
> > > Gaah.
> > >
> > > > I confirmed this and also that the issue was not
> > > > reproducible with 5.10 kernel. I tracked the fix to the following patch
> > > > introduced in 5.9 which changes the do_wp_page() logic:
> > > >
> > > > 09854ba94c6a 'mm: do_wp_page() simplification'
> > >
> > > The problem here is that there's a _lot_ more patches than the few you
> > > found that fixed various other cases (THP etc).
> > >
> > > > I backported this patch (#2 in the series) along with 2 prerequisite patches
> > > > (#1 and #4) that keep the backports clean and two followup fixes to the main
> > > > patch (#3 and #5). I had to skip the following fix:
> > > >
> > > > feb889fb40fa 'mm: don't put pinned pages into the swap cache'
> > > >
> > > > because it uses page_maybe_dma_pinned() which does not exists in earlier
> > > > kernels. Because pin_user_pages() does not exist there as well, I *think*
> > > > we can safely skip this fix on older kernels, but I would appreciate if
> > > > someone could confirm that claim.
> > >
> > > Hmm. I think this means that swap activity can now break the
> > > connection to a GUP page (the whole pre-pinning model), but it
> > > probably isn't a new problem for 4.9/4.19.
> > >
> > > I suspect the test there should be something like
> > >
> > >         /* Single mapper, more references than us and the map? */
> > >         if (page_mapcount(page) == 1 && page_count(page) > 2)
> > >                 goto keep_locked;
> > >
> > > in the pre-pinning days.
> > >
> > > But I really think that there are a number of other commits you're
> > > missing too, because we had a whole series for THP fixes for the same
> > > exact issue.
> > >
> > > Added Peter Xu to the cc, because he probably tracked those issues
> > > better than I did.
> > >
> > > So NAK on this for now, I think this limited patch-set likely
> > > introduces more problems than it fixes.
> >
> > Thanks for confirming my worries. I'll be happy to add additional
> > backports if Peter can point me to them.
>
> If for a full-alignment with current upstream, I can at least think of below
> series:
>
> Early cow for general pages:
> https://lore.kernel.org/lkml/20200925222600.6832-1-peterx@redhat.com/
>
> A race fix for copy_page and gup-fast:
> https://lore.kernel.org/linux-mm/0-v4-908497cf359a+4782-gup_fork_jgg@nvidia.com/
>
> Early cow for hugetlbfs (which is very recently):
> https://lore.kernel.org/lkml/20210217233547.93892-1-peterx@redhat.com/
>
> But I believe they'll bring a number of dependencies too like the page pinned
> work; so seems not easy.

Thanks Peter. Let me try backporting these and I'll see if it's doable.

>
> Btw, AFAICT you don't need patch 4/5 in this series for 4.14/4.19, since
> those're only for uffd-wp and it doesn't exist until 5.7.

Got it. Will drop it from the next series.
Thanks,
Suren.

>
> Thanks,
>
> --
> Peter Xu
>

Vlastimil Babka April 7, 2021, 1:21 p.m. UTC | #5

On 4/1/21 8:59 PM, Linus Torvalds wrote:
> On Thu, Apr 1, 2021 at 11:17 AM Suren Baghdasaryan <surenb@google.com> wrote:

Thanks Suren for bringing this up!

>> We received a report that the copy-on-write issue repored by Jann Horn in
>> https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
>> reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
>> coded in vmsplice.c).

Note that even upstream AFAIK still has the issue unfixed when Jann's reproducer
is converted to use THPs, as Andrea has shown in
https://lore.kernel.org/linux-mm/X%2FjgLGPgPb+Xms1t@redhat.com/

> Gaah.
> 
>> I confirmed this and also that the issue was not
>> reproducible with 5.10 kernel. I tracked the fix to the following patch
>> introduced in 5.9 which changes the do_wp_page() logic:
>>
>> 09854ba94c6a 'mm: do_wp_page() simplification'
> 
> The problem here is that there's a _lot_ more patches than the few you
> found that fixed various other cases (THP etc).
> 
>> I backported this patch (#2 in the series) along with 2 prerequisite patches
>> (#1 and #4) that keep the backports clean and two followup fixes to the main
>> patch (#3 and #5). I had to skip the following fix:
>>
>> feb889fb40fa 'mm: don't put pinned pages into the swap cache'
>>
>> because it uses page_maybe_dma_pinned() which does not exists in earlier
>> kernels. Because pin_user_pages() does not exist there as well, I *think*
>> we can safely skip this fix on older kernels, but I would appreciate if
>> someone could confirm that claim.
> 
> Hmm. I think this means that swap activity can now break the
> connection to a GUP page (the whole pre-pinning model), but it
> probably isn't a new problem for 4.9/4.19.
> 
> I suspect the test there should be something like
> 
>         /* Single mapper, more references than us and the map? */
>         if (page_mapcount(page) == 1 && page_count(page) > 2)
>                 goto keep_locked;
> 
> in the pre-pinning days.
> 
> But I really think that there are a number of other commits you're
> missing too, because we had a whole series for THP fixes for the same
> exact issue.
> 
> Added Peter Xu to the cc, because he probably tracked those issues
> better than I did.

Let me shamelessly plug these links for illustrating what kind of minefield we
would be going into backporting this. Also for references what not to miss, and
what may still become broken afterwards:

https://lwn.net/Articles/849638/
https://lwn.net/Articles/849876/

> So NAK on this for now, I think this limited patch-set likely
> introduces more problems than it fixes.

I personally think there are only two options safe enough for stable backports
(so that not more harm is caused than actually prevented).

1) Ignore the issue (outside of Android at least). The security model of zygote
is unusual. Where else a parent of fork() doesn't trust the child, which is the
same binary?

BTW, I think the CVE description is very misleading:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-29374

"does not properly consider the semantics of read operations and therefore can
grant unintended write access" - the bug was never about an unintended write
access, but about an info leak from parent to child, no?

2) For backports go with the original approach of 17839856fd58 ("gup: document
and work around "COW can break either way" issue"), thus break COW during the
GUP. But only for vmplice() so that nothing else gets broken. I think 5.4 stable
(another LTS) actually backported only 17839856fd58 out of everything else, so
it should have even the THP case covered, but its userfaultfd() is now probably
broken...

>         Linus
>

Peter Xu April 7, 2021, 2:30 p.m. UTC | #6

On Wed, Apr 07, 2021 at 03:21:55PM +0200, Vlastimil Babka wrote:
> 2) For backports go with the original approach of 17839856fd58 ("gup: document
> and work around "COW can break either way" issue"), thus break COW during the
> GUP. But only for vmplice() so that nothing else gets broken. I think 5.4 stable
> (another LTS) actually backported only 17839856fd58 out of everything else, so
> it should have even the THP case covered, but its userfaultfd() is now probably
> broken...

Since you mentioned this approach - AFAIU userfaultfd was only broken because
with that approach the kernel pretends some read accesses as writes, while
userfaultfd needs that accurate resolution.  Adding something like
FOLL_BREAK_COW [1] upon 17839856fd58 should keep both the vmsplice issue fixed
but also uffd working since that'll keep the read/write operation separate.

Meanwhile, I know Andrea was actively working on a complete solution [2] that's
a few steps further.  E.g., FOLL_BREAK_COW is done with FOLL_UNSHARE [3], speed
up in COW path [4] with similar idea of what we do right now with latest
upstream in 09854ba94c6aad7, allow write-protect with pinned pages (which is
right now forbidden), and something more.  However that's definitely a huge
branch, even discussing upstream (or maybe stopped discussing for quite some
days already?).

Neither of above are within upstream, so I don't really know whether these
information could be anything useful, just raise it up.  If Android could drop
userfaultfd, then I think solution 2) above is indeed the most efficient.  Note
that I think only uffd-wp was affected by 17839856fd58 but not the "missing
mode", so if Android is only using missing mode it still looks fine to only
have 17839856fd58.  It's just that I remembered there's another report besides
uffd-wp on 17839856fd58, but I can't remember the details of the other report.

Thanks,

[1] https://lkml.org/lkml/2020/8/10/439
[2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=mapcount_deshare
[3] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare&id=7c3a31caa34ac6ac4a4ec0559b1307b5edfc0821
[4] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=mapcount_deshare&id=599aa62474f51a470408b28fd4365320a5357aca

Linus Torvalds April 7, 2021, 4:07 p.m. UTC | #7

On Wed, Apr 7, 2021 at 6:22 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> 1) Ignore the issue (outside of Android at least). The security model of zygote
> is unusual. Where else a parent of fork() doesn't trust the child, which is the
> same binary?

Agreed. I think this is basically an android-only issue (with
_possibly_ some impact on crazy "pin-and-fork" loads), and doesn't
necessarily merit a backport at all.

If Android people insist on using very old kernels, knowing that they
do things that are questionable with those old kernels, at some point
it's just _their_ problem.

                 Linus

Suren Baghdasaryan April 7, 2021, 4:33 p.m. UTC | #8

On Wed, Apr 7, 2021 at 9:07 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Apr 7, 2021 at 6:22 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > 1) Ignore the issue (outside of Android at least). The security model of zygote
> > is unusual. Where else a parent of fork() doesn't trust the child, which is the
> > same binary?
>
> Agreed. I think this is basically an android-only issue (with
> _possibly_ some impact on crazy "pin-and-fork" loads), and doesn't
> necessarily merit a backport at all.
>
> If Android people insist on using very old kernels, knowing that they
> do things that are questionable with those old kernels, at some point
> it's just _their_ problem.

We don't really insist on using old kernels but rather we are stuck
with them for some time.
Trying my hand at backporting the patchsets Peter mentioned proved
this to be far from easy with many dependencies. Let me look into
Vlastimil's suggestion to backport only 17839856fd58 and it sounds
like 5.4 already followed that path. Thanks for all the information!
Suren.

>
>                  Linus
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

Linus Torvalds April 7, 2021, 5:04 p.m. UTC | #9

On Wed, Apr 7, 2021 at 9:33 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Trying my hand at backporting the patchsets Peter mentioned proved
> this to be far from easy with many dependencies. Let me look into
> Vlastimil's suggestion to backport only 17839856fd58 and it sounds
> like 5.4 already followed that path.

Well, in many ways 17839856fd58 was the "simple and obvious" fix, and
I do think it's easily backportable.

But it *did* cause problems too. Those problems may not be issues on
those old kernels, though.

In particular, commit 17839856fd58 caused uffd-wp to stop working
right, and it caused some issues with debugging (I forget the exact
details, but I think it was strace accessing PROT_NONE or write-only
pages or something like that, and COW failed).

But yes, in many ways that commit is a much simpler and more
straightforward one (which is why I tried it once - we ended up with
the much more subtle and far-reaching fixes after the UFFD issues
crept up).

The issues that 17839856fd58 caused may be entire non-events in old
kernels. In fact, the uffd writeprotect API was added fairly recently
(see commit 63b2d4174c4a that made it into v5.7), so the uffd-wp issue
that was triggered probably cannot happen in the old kernels.

The strace issue might not be relevant either, but I forget what the
details were. Mikilas should know.

See

  https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/

for Mikulas report. I never looked into it in detail, because by then
the uffd-wp issue had already come up, so it was juat another nail in
the coffin for that simpler approach.

Mikulas, do you remember?

            Linus

Mikulas Patocka April 7, 2021, 6:47 p.m. UTC | #10

On Wed, 7 Apr 2021, Linus Torvalds wrote:

> On Wed, Apr 7, 2021 at 9:33 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > Trying my hand at backporting the patchsets Peter mentioned proved
> > this to be far from easy with many dependencies. Let me look into
> > Vlastimil's suggestion to backport only 17839856fd58 and it sounds
> > like 5.4 already followed that path.
> 
> Well, in many ways 17839856fd58 was the "simple and obvious" fix, and
> I do think it's easily backportable.
> 
> But it *did* cause problems too. Those problems may not be issues on
> those old kernels, though.
> 
> In particular, commit 17839856fd58 caused uffd-wp to stop working
> right, and it caused some issues with debugging (I forget the exact
> details, but I think it was strace accessing PROT_NONE or write-only
> pages or something like that, and COW failed).
> 
> But yes, in many ways that commit is a much simpler and more
> straightforward one (which is why I tried it once - we ended up with
> the much more subtle and far-reaching fixes after the UFFD issues
> crept up).
> 
> The issues that 17839856fd58 caused may be entire non-events in old
> kernels. In fact, the uffd writeprotect API was added fairly recently
> (see commit 63b2d4174c4a that made it into v5.7), so the uffd-wp issue
> that was triggered probably cannot happen in the old kernels.
> 
> The strace issue might not be relevant either, but I forget what the
> details were. Mikilas should know.
> 
> See
> 
>   https://lore.kernel.org/lkml/alpine.LRH.2.02.2009031328040.6929@file01.intranet.prod.int.rdu2.redhat.com/
> 
> for Mikulas report. I never looked into it in detail, because by then
> the uffd-wp issue had already come up, so it was juat another nail in
> the coffin for that simpler approach.
> 
> Mikulas, do you remember?
> 
>             Linus

Hi

I think that we never found a root cause for this bug. I was testing if 
the whole system can run from persistent memory and found out that strace 
didn't work. I bisected it, reported it and when I received Peter Xu's 
patches (which fixed it), I stopped bothering about it.

So, we fixed it, but we don't know why.

Peter Xu's patchset that fixed it is here: 
https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/

Mikulas

Linus Torvalds April 7, 2021, 7:22 p.m. UTC | #11

On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> So, we fixed it, but we don't know why.
>
> Peter Xu's patchset that fixed it is here:
> https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/

Yeah, that's the part that ends up being really painful to backport
(with all the subsequent fixes too), so the 4.14 people would prefer
to avoid it.

But I think that if it's a "requires dax pmem and ptrace on top", it
may simply be a non-issue for those users. Although who knows - maybe
that ends up being a real issue on Android..

            Linus

Suren Baghdasaryan April 7, 2021, 9:53 p.m. UTC | #12

On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > So, we fixed it, but we don't know why.
> >
> > Peter Xu's patchset that fixed it is here:
> > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
>
> Yeah, that's the part that ends up being really painful to backport
> (with all the subsequent fixes too), so the 4.14 people would prefer
> to avoid it.
>
> But I think that if it's a "requires dax pmem and ptrace on top", it
> may simply be a non-issue for those users. Although who knows - maybe
> that ends up being a real issue on Android..

A lot to digest, so I need to do some reading now. Thanks everyone!

>
>             Linus

Suren Baghdasaryan April 21, 2021, 8:01 p.m. UTC | #13

On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > So, we fixed it, but we don't know why.
> > >
> > > Peter Xu's patchset that fixed it is here:
> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
> >
> > Yeah, that's the part that ends up being really painful to backport
> > (with all the subsequent fixes too), so the 4.14 people would prefer
> > to avoid it.
> >
> > But I think that if it's a "requires dax pmem and ptrace on top", it
> > may simply be a non-issue for those users. Although who knows - maybe
> > that ends up being a real issue on Android..
>
> A lot to digest, so I need to do some reading now. Thanks everyone!

After a delay due to vacation I prepared backports of 17839856fd58
("gup: document and work around "COW can break either way" issue") for
4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
is still unresolved, so with or without this patch it's still there
(Android is not affected by this since we do not use THPs with older
kernels).
Andrea pointed out that there are other issues and to properly fix
them his COR approach is needed. However it has not been accepted yet,
so I can't really backport it. I'll be happy to do that though if it
is accepted in the future.

Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
distinguish real writes vs enforced COW read requests, however I also
see that you had a later version of this patch here:
https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
backport? Or is it not needed in the absence of uffd-wp support in the
earlier kernels?
Thanks,
Suren.

>
> >
> >             Linus

Peter Xu April 21, 2021, 9:05 p.m. UTC | #14

Hi, Suren,

On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> distinguish real writes vs enforced COW read requests, however I also
> see that you had a later version of this patch here:
> https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> backport? Or is it not needed in the absence of uffd-wp support in the
> earlier kernels?

Sorry I have no ability to evaluate the rest... but according to Linus's
previous reply, my understanding is that it is not needed, not to mention it's
not upstreamed too.

Thanks,

Suren Baghdasaryan April 21, 2021, 9:17 p.m. UTC | #15

On Wed, Apr 21, 2021 at 2:05 PM Peter Xu <peterx@redhat.com> wrote:
>
> Hi, Suren,
>
> On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > distinguish real writes vs enforced COW read requests, however I also
> > see that you had a later version of this patch here:
> > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > backport? Or is it not needed in the absence of uffd-wp support in the
> > earlier kernels?
>
> Sorry I have no ability to evaluate the rest... but according to Linus's
> previous reply, my understanding is that it is not needed, not to mention it's
> not upstreamed too.

Thanks! Then I'll send the backports for 17839856fd58 alone and if
more backports are needed I'll post followup patches.
Cheers,
Suren.

>
> Thanks,
>
> --
> Peter Xu
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

Vlastimil Babka April 21, 2021, 10:59 p.m. UTC | #16

On 4/21/21 10:01 PM, Suren Baghdasaryan wrote:
> On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>> > >
>> > > So, we fixed it, but we don't know why.
>> > >
>> > > Peter Xu's patchset that fixed it is here:
>> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
>> >
>> > Yeah, that's the part that ends up being really painful to backport
>> > (with all the subsequent fixes too), so the 4.14 people would prefer
>> > to avoid it.
>> >
>> > But I think that if it's a "requires dax pmem and ptrace on top", it
>> > may simply be a non-issue for those users. Although who knows - maybe
>> > that ends up being a real issue on Android..
>>
>> A lot to digest, so I need to do some reading now. Thanks everyone!
> 
> After a delay due to vacation I prepared backports of 17839856fd58
> ("gup: document and work around "COW can break either way" issue") for
> 4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
> later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
> is still unresolved, so with or without this patch it's still there
> (Android is not affected by this since we do not use THPs with older
> kernels).

Which THP issue do you mean here? The race that was part of the same Project
zero report and was solved by a different patch adding some locking? Or the
vmsplice info leak but applied to THP's? Because if it's the latter then I
believe 17839856fd58 did solve that too. It was the later switch of approach to
rely just on page_count() that left THP side unfixed.

> Andrea pointed out that there are other issues and to properly fix
> them his COR approach is needed. However it has not been accepted yet,
> so I can't really backport it. I'll be happy to do that though if it
> is accepted in the future.
> 
> Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> distinguish real writes vs enforced COW read requests, however I also
> see that you had a later version of this patch here:
> https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> backport? Or is it not needed in the absence of uffd-wp support in the
> earlier kernels?
> Thanks,
> Suren.
> 
>>
>> >
>> >             Linus
>

Suren Baghdasaryan April 21, 2021, 11:01 p.m. UTC | #17

On Wed, Apr 21, 2021 at 2:17 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Apr 21, 2021 at 2:05 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hi, Suren,
> >
> > On Wed, Apr 21, 2021 at 01:01:34PM -0700, Suren Baghdasaryan wrote:
> > > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > > distinguish real writes vs enforced COW read requests, however I also
> > > see that you had a later version of this patch here:
> > > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > > backport? Or is it not needed in the absence of uffd-wp support in the
> > > earlier kernels?
> >
> > Sorry I have no ability to evaluate the rest... but according to Linus's
> > previous reply, my understanding is that it is not needed, not to mention it's
> > not upstreamed too.
>
> Thanks! Then I'll send the backports for 17839856fd58 alone and if
> more backports are needed I'll post followup patches.

Posted 4.19 backport: https://lore.kernel.org/patchwork/patch/1416747
and 4.14 backport: https://lore.kernel.org/patchwork/patch/1416748.
They are identical but Greg asked me to submit separate patches due to
an additional minor merge conflict in 4.14.

> Cheers,
> Suren.
>
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

Suren Baghdasaryan April 21, 2021, 11:05 p.m. UTC | #18

On Wed, Apr 21, 2021 at 3:59 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/21/21 10:01 PM, Suren Baghdasaryan wrote:
> > On Wed, Apr 7, 2021 at 2:53 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> On Wed, Apr 7, 2021 at 12:23 PM Linus Torvalds
> >> <torvalds@linux-foundation.org> wrote:
> >> >
> >> > On Wed, Apr 7, 2021 at 11:47 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> > >
> >> > > So, we fixed it, but we don't know why.
> >> > >
> >> > > Peter Xu's patchset that fixed it is here:
> >> > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/
> >> >
> >> > Yeah, that's the part that ends up being really painful to backport
> >> > (with all the subsequent fixes too), so the 4.14 people would prefer
> >> > to avoid it.
> >> >
> >> > But I think that if it's a "requires dax pmem and ptrace on top", it
> >> > may simply be a non-issue for those users. Although who knows - maybe
> >> > that ends up being a real issue on Android..
> >>
> >> A lot to digest, so I need to do some reading now. Thanks everyone!
> >
> > After a delay due to vacation I prepared backports of 17839856fd58
> > ("gup: document and work around "COW can break either way" issue") for
> > 4.14 and 4.19 kernels. As Linus pointed out, uffd-wp was introduced
> > later in 5.7, so is not an issue for 4.x kernels. The issue with THPs
> > is still unresolved, so with or without this patch it's still there
> > (Android is not affected by this since we do not use THPs with older
> > kernels).
>
> Which THP issue do you mean here? The race that was part of the same Project
> zero report and was solved by a different patch adding some locking? Or the
> vmsplice info leak but applied to THP's? Because if it's the latter then I
> believe 17839856fd58 did solve that too. It was the later switch of approach to
> rely just on page_count() that left THP side unfixed.

I meant the "vmsplice info leak applied to THP's" but now I realize
that 17839856fd58 does not use elevated reference count, so indeed
that should not be a problem. Thanks for the note!

>
> > Andrea pointed out that there are other issues and to properly fix
> > them his COR approach is needed. However it has not been accepted yet,
> > so I can't really backport it. I'll be happy to do that though if it
> > is accepted in the future.
> >
> > Peter, you mentioned https://lkml.org/lkml/2020/8/10/439 patch to
> > distinguish real writes vs enforced COW read requests, however I also
> > see that you had a later version of this patch here:
> > https://lore.kernel.org/patchwork/patch/1286506/. Which one should I
> > backport? Or is it not needed in the absence of uffd-wp support in the
> > earlier kernels?
> > Thanks,
> > Suren.
> >
> >>
> >> >
> >> >             Linus
> >
>

[0/5] 4.14 backports of fixes for "CoW after fork() issue"

Message

Comments