diff mbox series

[RFC] mm: Fetch the dirty bit before we reset the pte

Message ID 20201008092627.399131-1-aneesh.kumar@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm: Fetch the dirty bit before we reset the pte | expand

Commit Message

Aneesh Kumar K.V Oct. 8, 2020, 9:26 a.m. UTC
In copy_present_page, after we mark the pte non-writable, we should
check for previous dirty bit updates and make sure we don't lose the dirty
bit on reset.

Also, avoid marking the pte write-protected again if copy_present_page
already marked it write-protected.

Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Linus Torvalds Oct. 8, 2020, 5:02 p.m. UTC | #1
On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> In copy_present_page, after we mark the pte non-writable, we should
> check for previous dirty bit updates and make sure we don't lose the dirty
> bit on reset.

No, we'll just remove that entirely.

Do you have a test-case that shows a problem? I have a patch that I
was going to delay until 5.10 because I didn't think it mattered in
practice..

The second part of this patch would be to add a sequence count
protection to fast-GUP pinning, so that GUP and fork() couldn't race,
but I haven't written that part.

Here's the first patch anyway. If you actually have a test-case where
this matters, I guess I need to apply it now..

                   Linus
Linus Torvalds Oct. 8, 2020, 5:06 p.m. UTC | #2
[ Just adding Leon to the participants ]

This patch (not attached again, Leon has seen it before) has been
tested for the last couple of weeks for the rdma case, so I have no
problems applying it now, just to keep everybody in the loop.

             Linus

On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
> >
> > In copy_present_page, after we mark the pte non-writable, we should
> > check for previous dirty bit updates and make sure we don't lose the dirty
> > bit on reset.
>
> No, we'll just remove that entirely.
>
> Do you have a test-case that shows a problem? I have a patch that I
> was going to delay until 5.10 because I didn't think it mattered in
> practice..
>
> The second part of this patch would be to add a sequence count
> protection to fast-GUP pinning, so that GUP and fork() couldn't race,
> but I haven't written that part.
>
> Here's the first patch anyway. If you actually have a test-case where
> this matters, I guess I need to apply it now..
>
>                    Linus
Linus Torvalds Oct. 8, 2020, 5:30 p.m. UTC | #3
On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Here's the first patch anyway. If you actually have a test-case where
> this matters, I guess I need to apply it now..

Actually, I removed the "__page_mapcount()" part of that patch, to
keep it minimal and _only_ do remove the wrprotect trick.

We can do the __page_mapcount() optimization and the mm sequence count
for 5.10 (although so far nobody has actually written the seqcount
patch - I think it would be a trivial few-liner, but I guess it won't
make 5.10 at this point).

So here's what I ended up with.

                      Linus
Aneesh Kumar K.V Oct. 8, 2020, 5:32 p.m. UTC | #4
On 10/8/20 10:32 PM, Linus Torvalds wrote:
> On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>>
>> In copy_present_page, after we mark the pte non-writable, we should
>> check for previous dirty bit updates and make sure we don't lose the dirty
>> bit on reset.
> 
> No, we'll just remove that entirely.
> 
> Do you have a test-case that shows a problem? I have a patch that I
> was going to delay until 5.10 because I didn't think it mattered in
> practice..
> 

Unfortunately, I don't have a test case. That was observed by code 
inspection while I was fixing syzkaller report.

> The second part of this patch would be to add a sequence count
> protection to fast-GUP pinning, so that GUP and fork() couldn't race,
> but I haven't written that part.
> 
> Here's the first patch anyway. If you actually have a test-case where
> this matters, I guess I need to apply it now..
> 
>                     Linus
> 


-aneesh
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index bfe202ef6244..f57b1f04d50a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -848,6 +848,9 @@  copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	if (likely(!page_maybe_dma_pinned(page)))
 		return 1;
 
+	if (pte_dirty(*src_pte))
+		pte = pte_mkdirty(pte);
+
 	/*
 	 * Uhhuh. It looks like the page might be a pinned page,
 	 * and we actually need to copy it. Now we can set the
@@ -904,6 +907,11 @@  copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		if (retval <= 0)
 			return retval;
 
+		/*
+		 * Fetch the src pte value again, copy_present_page
+		 * could modify it.
+		 */
+		pte = *src_pte;
 		get_page(page);
 		page_dup_rmap(page, false);
 		rss[mm_counter(page)]++;