Message ID | 20201008092627.399131-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] mm: Fetch the dirty bit before we reset the pte | expand |
On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > In copy_present_page, after we mark the pte non-writable, we should > check for previous dirty bit updates and make sure we don't lose the dirty > bit on reset. No, we'll just remove that entirely. Do you have a test-case that shows a problem? I have a patch that I was going to delay until 5.10 because I didn't think it mattered in practice.. The second part of this patch would be to add a sequence count protection to fast-GUP pinning, so that GUP and fork() couldn't race, but I haven't written that part. Here's the first patch anyway. If you actually have a test-case where this matters, I guess I need to apply it now.. Linus
[ Just adding Leon to the participants ] This patch (not attached again, Leon has seen it before) has been tested for the last couple of weeks for the rdma case, so I have no problems applying it now, just to keep everybody in the loop. Linus On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: > > > > In copy_present_page, after we mark the pte non-writable, we should > > check for previous dirty bit updates and make sure we don't lose the dirty > > bit on reset. > > No, we'll just remove that entirely. > > Do you have a test-case that shows a problem? I have a patch that I > was going to delay until 5.10 because I didn't think it mattered in > practice.. > > The second part of this patch would be to add a sequence count > protection to fast-GUP pinning, so that GUP and fork() couldn't race, > but I haven't written that part. > > Here's the first patch anyway. If you actually have a test-case where > this matters, I guess I need to apply it now.. > > Linus
On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Here's the first patch anyway. If you actually have a test-case where > this matters, I guess I need to apply it now.. Actually, I removed the "__page_mapcount()" part of that patch, to keep it minimal and _only_ do remove the wrprotect trick. We can do the __page_mapcount() optimization and the mm sequence count for 5.10 (although so far nobody has actually written the seqcount patch - I think it would be a trivial few-liner, but I guess it won't make 5.10 at this point). So here's what I ended up with. Linus
On 10/8/20 10:32 PM, Linus Torvalds wrote: > On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: >> >> In copy_present_page, after we mark the pte non-writable, we should >> check for previous dirty bit updates and make sure we don't lose the dirty >> bit on reset. > > No, we'll just remove that entirely. > > Do you have a test-case that shows a problem? I have a patch that I > was going to delay until 5.10 because I didn't think it mattered in > practice.. > Unfortunately, I don't have a test case. That was observed by code inspection while I was fixing syzkaller report. > The second part of this patch would be to add a sequence count > protection to fast-GUP pinning, so that GUP and fork() couldn't race, > but I haven't written that part. > > Here's the first patch anyway. If you actually have a test-case where > this matters, I guess I need to apply it now.. > > Linus > -aneesh
diff --git a/mm/memory.c b/mm/memory.c index bfe202ef6244..f57b1f04d50a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -848,6 +848,9 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (likely(!page_maybe_dma_pinned(page))) return 1; + if (pte_dirty(*src_pte)) + pte = pte_mkdirty(pte); + /* * Uhhuh. It looks like the page might be a pinned page, * and we actually need to copy it. Now we can set the @@ -904,6 +907,11 @@ copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (retval <= 0) return retval; + /* + * Fetch the src pte value again, copy_present_page + * could modify it. + */ + pte = *src_pte; get_page(page); page_dup_rmap(page, false); rss[mm_counter(page)]++;
In copy_present_page, after we mark the pte non-writable, we should check for previous dirty bit updates and make sure we don't lose the dirty bit on reset. Also, avoid marking the pte write-protected again if copy_present_page already marked it write-protected. Cc: Peter Xu <peterx@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+)