diff mbox series

[3/3] mm/mmu_notifier: contextual information for event triggering invalidation

Message ID 20181203201817.10759-4-jglisse@redhat.com (mailing list archive)
State Superseded
Headers show
Series mmu notifier contextual informations | expand

Commit Message

Jerome Glisse Dec. 3, 2018, 8:18 p.m. UTC
From: Jérôme Glisse <jglisse@redhat.com>

CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).

Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening.

This patchset adds event information so that users of mmu notifier can
differentiate among broad category:
    - UNMAP: munmap() or mremap()
    - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
    - PROTECTION_VMA: change in access protections for the range
    - PROTECTION_PAGE: change in access protections for page in the range
    - SOFT_DIRTY: soft dirtyness tracking

Being able to identify munmap() and mremap() from other reasons why the
page table is cleared is important to allow user of mmu notifier to
update their own internal tracking structure accordingly (on munmap or
mremap it is not longer needed to track range of virtual address as it
becomes invalid).

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: kvm@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
---
 fs/dax.c                     |  1 +
 fs/proc/task_mmu.c           |  1 +
 include/linux/mmu_notifier.h | 33 +++++++++++++++++++++++++++++++++
 kernel/events/uprobes.c      |  1 +
 mm/huge_memory.c             |  4 ++++
 mm/hugetlb.c                 |  4 ++++
 mm/khugepaged.c              |  1 +
 mm/ksm.c                     |  2 ++
 mm/madvise.c                 |  1 +
 mm/memory.c                  |  5 +++++
 mm/migrate.c                 |  2 ++
 mm/mprotect.c                |  1 +
 mm/mremap.c                  |  1 +
 mm/oom_kill.c                |  1 +
 mm/rmap.c                    |  2 ++
 15 files changed, 60 insertions(+)

Comments

Mike Rapoport Dec. 4, 2018, 8:17 a.m. UTC | #1
On Mon, Dec 03, 2018 at 03:18:17PM -0500, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
> 
> Users of mmu notifier API track changes to the CPU page table and take
> specific action for them. While current API only provide range of virtual
> address affected by the change, not why the changes is happening.
> 
> This patchset adds event information so that users of mmu notifier can
> differentiate among broad category:
>     - UNMAP: munmap() or mremap()
>     - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
>     - PROTECTION_VMA: change in access protections for the range
>     - PROTECTION_PAGE: change in access protections for page in the range
>     - SOFT_DIRTY: soft dirtyness tracking
> 
> Being able to identify munmap() and mremap() from other reasons why the
> page table is cleared is important to allow user of mmu notifier to
> update their own internal tracking structure accordingly (on munmap or
> mremap it is not longer needed to track range of virtual address as it
> becomes invalid).
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox <mawilcox@microsoft.com>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Felix Kuehling <felix.kuehling@amd.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-rdma@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> ---
>  fs/dax.c                     |  1 +
>  fs/proc/task_mmu.c           |  1 +
>  include/linux/mmu_notifier.h | 33 +++++++++++++++++++++++++++++++++
>  kernel/events/uprobes.c      |  1 +
>  mm/huge_memory.c             |  4 ++++
>  mm/hugetlb.c                 |  4 ++++
>  mm/khugepaged.c              |  1 +
>  mm/ksm.c                     |  2 ++
>  mm/madvise.c                 |  1 +
>  mm/memory.c                  |  5 +++++
>  mm/migrate.c                 |  2 ++
>  mm/mprotect.c                |  1 +
>  mm/mremap.c                  |  1 +
>  mm/oom_kill.c                |  1 +
>  mm/rmap.c                    |  2 ++
>  15 files changed, 60 insertions(+)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index e22508ee19ec..83092c5ac5f0 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -761,6 +761,7 @@ static void dax_entry_mkclean(struct address_space *mapping, pgoff_t index,
>  		struct mmu_notifier_range range;
>  		unsigned long address;
> 
> +		range.event = MMU_NOTIFY_PROTECTION_PAGE;
>  		range.mm = vma->vm_mm;
> 
>  		cond_resched();
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 53d625925669..4abb1668eeb3 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1144,6 +1144,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
>  			range.start = 0;
>  			range.end = -1UL;
>  			range.mm = mm;
> +			range.event = MMU_NOTIFY_SOFT_DIRTY;
>  			mmu_notifier_invalidate_range_start(&range);
>  		}
>  		walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index cbeece8e47d4..3077d487be8b 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -25,10 +25,43 @@ struct mmu_notifier_mm {
>  	spinlock_t lock;
>  };
> 
> +/*
> + * What event is triggering the invalidation:

Can you please make it kernel-doc comment?

> + *
> + * MMU_NOTIFY_UNMAP
> + *    either munmap() that unmap the range or a mremap() that move the range
> + *
> + * MMU_NOTIFY_CLEAR
> + *    clear page table entry (many reasons for this like madvise() or replacing
> + *    a page by another one, ...).
> + *
> + * MMU_NOTIFY_PROTECTION_VMA
> + *    update is due to protection change for the range ie using the vma access
> + *    permission (vm_page_prot) to update the whole range is enough no need to
> + *    inspect changes to the CPU page table (mprotect() syscall)
> + *
> + * MMU_NOTIFY_PROTECTION_PAGE
> + *    update is due to change in read/write flag for pages in the range so to
> + *    mirror those changes the user must inspect the CPU page table (from the
> + *    end callback).
> + *
> + *
> + * MMU_NOTIFY_SOFT_DIRTY
> + *    soft dirty accounting (still same page and same access flags)
> + */
> +enum mmu_notifier_event {
> +	MMU_NOTIFY_UNMAP = 0,
> +	MMU_NOTIFY_CLEAR,
> +	MMU_NOTIFY_PROTECTION_VMA,
> +	MMU_NOTIFY_PROTECTION_PAGE,
> +	MMU_NOTIFY_SOFT_DIRTY,
> +};
> +
>  struct mmu_notifier_range {
>  	struct mm_struct *mm;
>  	unsigned long start;
>  	unsigned long end;
> +	enum mmu_notifier_event event;
>  	bool blockable;
>  };
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index aa7996ca361e..b6ef3be1c24e 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -174,6 +174,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  	struct mmu_notifier_range range;
>  	struct mem_cgroup *memcg;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = addr;
>  	range.end = addr + PAGE_SIZE;
>  	range.mm = mm;
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 1a7a059dbf7d..4919be71ffd0 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1182,6 +1182,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
>  		cond_resched();
>  	}
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = haddr;
>  	range.end = range.start + HPAGE_PMD_SIZE;
>  	range.mm = vma->vm_mm;
> @@ -1347,6 +1348,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
>  				    vma, HPAGE_PMD_NR);
>  	__SetPageUptodate(new_page);
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = haddr;
>  	range.end = range.start + HPAGE_PMD_SIZE;
>  	range.mm = vma->vm_mm;
> @@ -2029,6 +2031,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct mmu_notifier_range range;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = address & HPAGE_PUD_MASK;
>  	range.end = range.start + HPAGE_PUD_SIZE;
>  	range.mm = mm;
> @@ -2248,6 +2251,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct mmu_notifier_range range;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = address & HPAGE_PMD_MASK;
>  	range.end = range.start + HPAGE_PMD_SIZE;
>  	range.mm = mm;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4bfbdab44d51..9ffe34173834 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3244,6 +3244,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> 
>  	cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = vma->vm_start;
>  	range.end = vma->vm_end;
>  	range.mm = src;
> @@ -3344,6 +3345,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  	unsigned long sz = huge_page_size(h);
>  	struct mmu_notifier_range range;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = start;
>  	range.end = end;
>  	range.mm = mm;
> @@ -3629,6 +3631,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  	__SetPageUptodate(new_page);
>  	set_page_huge_active(new_page);
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = haddr;
>  	range.end = range.start + huge_page_size(h);
>  	range.mm = mm;
> @@ -4346,6 +4349,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>  	bool shared_pmd = false;
>  	struct mmu_notifier_range range;
> 
> +	range.event = MMU_NOTIFY_PROTECTION_VMA;
>  	range.start = start;
>  	range.end = end;
>  	range.mm = mm;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e9fe0c9a9f56..c5c78ba30b38 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1016,6 +1016,7 @@ static void collapse_huge_page(struct mm_struct *mm,
>  	pte = pte_offset_map(pmd, address);
>  	pte_ptl = pte_lockptr(mm, pmd);
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = address;
>  	range.end = range.start + HPAGE_PMD_SIZE;
>  	range.mm = mm;
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 262694d0cd4c..f8fbb92ca1bd 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1050,6 +1050,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
> 
>  	BUG_ON(PageTransCompound(page));
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = pvmw.address;
>  	range.end = range.start + PAGE_SIZE;
>  	range.mm = mm;
> @@ -1139,6 +1140,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
>  	if (!pmd)
>  		goto out;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = addr;
>  	range.end = addr + PAGE_SIZE;
>  	range.mm = mm;
> diff --git a/mm/madvise.c b/mm/madvise.c
> index f20dd80ca21b..c415985d6a04 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -466,6 +466,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>  	if (!vma_is_anonymous(vma))
>  		return -EINVAL;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = max(vma->vm_start, start_addr);
>  	if (range.start >= vma->vm_end)
>  		return -EINVAL;
> diff --git a/mm/memory.c b/mm/memory.c
> index 36e0b83949fc..4ad63002d770 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1007,6 +1007,7 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	 * is_cow_mapping() returns true.
>  	 */
>  	is_cow = is_cow_mapping(vma->vm_flags);
> +	range.event = MMU_NOTIFY_PROTECTION_PAGE;
>  	range.start = addr;
>  	range.end = end;
>  	range.mm = src_mm;
> @@ -1334,6 +1335,7 @@ void unmap_vmas(struct mmu_gather *tlb,
>  {
>  	struct mmu_notifier_range range;
> 
> +	range.event = MMU_NOTIFY_UNMAP;
>  	range.start = start_addr;
>  	range.end = end_addr;
>  	range.mm = vma->vm_mm;
> @@ -1358,6 +1360,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>  	struct mmu_notifier_range range;
>  	struct mmu_gather tlb;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = start;
>  	range.end = range.start + size;
>  	range.mm = vma->vm_mm;
> @@ -1387,6 +1390,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
>  	struct mmu_notifier_range range;
>  	struct mmu_gather tlb;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = address;
>  	range.end = range.start + size;
>  	range.mm = vma->vm_mm;
> @@ -2260,6 +2264,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  	struct mem_cgroup *memcg;
>  	struct mmu_notifier_range range;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = vmf->address & PAGE_MASK;
>  	range.end = range.start + PAGE_SIZE;
>  	range.mm = mm;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4896dd9d8b28..a2caaabfc5a1 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2306,6 +2306,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
>  	struct mmu_notifier_range range;
>  	struct mm_walk mm_walk;
> 
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.start = migrate->start;
>  	range.end = migrate->end;
>  	range.mm = mm_walk.mm;
> @@ -2726,6 +2727,7 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
>  			if (!notified) {
>  				notified = true;
> 
> +				range.event = MMU_NOTIFY_CLEAR;
>  				range.start = addr;
>  				range.end = migrate->end;
>  				range.mm = mm;
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index f466adf31e12..6d41321b2f3e 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -186,6 +186,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
> 
>  		/* invoke the mmu notifier if the pmd is populated */
>  		if (!range.start) {
> +			range.event = MMU_NOTIFY_PROTECTION_VMA;
>  			range.start = addr;
>  			range.end = end;
>  			range.mm = mm;
> diff --git a/mm/mremap.c b/mm/mremap.c
> index db060acb4a8c..856a5e6bb226 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -203,6 +203,7 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  	old_end = old_addr + len;
>  	flush_cache_range(vma, old_addr, old_end);
> 
> +	range.event = MMU_NOTIFY_UNMAP;
>  	range.start = old_addr;
>  	range.end = old_end;
>  	range.mm = vma->vm_mm;
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index b29ab2624e95..f4bde1c34714 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -519,6 +519,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
>  			struct mmu_notifier_range range;
>  			struct mmu_gather tlb;
> 
> +			range.event = MMU_NOTIFY_CLEAR;
>  			range.start = vma->vm_start;
>  			range.end = vma->vm_end;
>  			range.mm = mm;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 09c5d9e5c766..b1afbbcc236a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -896,6 +896,7 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
>  	 * We have to assume the worse case ie pmd for invalidation. Note that
>  	 * the page can not be free from this function.
>  	 */
> +	range.event = MMU_NOTIFY_PROTECTION_PAGE;
>  	range.mm = vma->vm_mm;
>  	range.start = address;
>  	range.end = min(vma->vm_end, range.start +
> @@ -1372,6 +1373,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  	 * Note that the page can not be free in this function as call of
>  	 * try_to_unmap() must hold a reference on the page.
>  	 */
> +	range.event = MMU_NOTIFY_CLEAR;
>  	range.mm = vma->vm_mm;
>  	range.start = vma->vm_start;
>  	range.end = min(vma->vm_end, range.start +
> -- 
> 2.17.2
>
Jerome Glisse Dec. 4, 2018, 2:48 p.m. UTC | #2
On Tue, Dec 04, 2018 at 10:17:48AM +0200, Mike Rapoport wrote:
> On Mon, Dec 03, 2018 at 03:18:17PM -0500, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>

[...]

> > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> > index cbeece8e47d4..3077d487be8b 100644
> > --- a/include/linux/mmu_notifier.h
> > +++ b/include/linux/mmu_notifier.h
> > @@ -25,10 +25,43 @@ struct mmu_notifier_mm {
> >  	spinlock_t lock;
> >  };
> > 
> > +/*
> > + * What event is triggering the invalidation:
> 
> Can you please make it kernel-doc comment?

Sorry should have done that in the first place, Andrew i will post a v2
with that and fixing my one stupid bug.



> > + *
> > + * MMU_NOTIFY_UNMAP
> > + *    either munmap() that unmap the range or a mremap() that move the range
> > + *
> > + * MMU_NOTIFY_CLEAR
> > + *    clear page table entry (many reasons for this like madvise() or replacing
> > + *    a page by another one, ...).
> > + *
> > + * MMU_NOTIFY_PROTECTION_VMA
> > + *    update is due to protection change for the range ie using the vma access
> > + *    permission (vm_page_prot) to update the whole range is enough no need to
> > + *    inspect changes to the CPU page table (mprotect() syscall)
> > + *
> > + * MMU_NOTIFY_PROTECTION_PAGE
> > + *    update is due to change in read/write flag for pages in the range so to
> > + *    mirror those changes the user must inspect the CPU page table (from the
> > + *    end callback).
> > + *
> > + *
> > + * MMU_NOTIFY_SOFT_DIRTY
> > + *    soft dirty accounting (still same page and same access flags)
> > + */
> > +enum mmu_notifier_event {
> > +	MMU_NOTIFY_UNMAP = 0,
> > +	MMU_NOTIFY_CLEAR,
> > +	MMU_NOTIFY_PROTECTION_VMA,
> > +	MMU_NOTIFY_PROTECTION_PAGE,
> > +	MMU_NOTIFY_SOFT_DIRTY,
> > +};
Andrew Morton Dec. 4, 2018, 11:21 p.m. UTC | #3
On Mon,  3 Dec 2018 15:18:17 -0500 jglisse@redhat.com wrote:

> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
> 
> Users of mmu notifier API track changes to the CPU page table and take
> specific action for them. While current API only provide range of virtual
> address affected by the change, not why the changes is happening.
> 
> This patchset adds event information so that users of mmu notifier can
> differentiate among broad category:
>     - UNMAP: munmap() or mremap()
>     - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
>     - PROTECTION_VMA: change in access protections for the range
>     - PROTECTION_PAGE: change in access protections for page in the range
>     - SOFT_DIRTY: soft dirtyness tracking
> 
> Being able to identify munmap() and mremap() from other reasons why the
> page table is cleared is important to allow user of mmu notifier to
> update their own internal tracking structure accordingly (on munmap or
> mremap it is not longer needed to track range of virtual address as it
> becomes invalid).
> 
> ...
>
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -519,6 +519,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
>  			struct mmu_notifier_range range;
>  			struct mmu_gather tlb;
>  
> +			range.event = MMU_NOTIFY_CLEAR;
>  			range.start = vma->vm_start;
>  			range.end = vma->vm_end;
>  			range.mm = mm;

mmu_notifier_range and MMU_NOTIFY_CLEAR aren't defined if
CONFIG_MMU_NOTIFIER=n.

I'll try a temporary bodge:

+++ a/include/linux/mmu_notifier.h
@@ -10,8 +10,6 @@
 struct mmu_notifier;
 struct mmu_notifier_ops;
 
-#ifdef CONFIG_MMU_NOTIFIER
-
 /*
  * The mmu notifier_mm structure is allocated and installed in
  * mm->mmu_notifier_mm inside the mm_take_all_locks() protected
@@ -32,6 +30,8 @@ struct mmu_notifier_range {
 	bool blockable;
 };
 
+#ifdef CONFIG_MMU_NOTIFIER
+
 struct mmu_notifier_ops {
 	/*
 	 * Called either by mmu_notifier_unregister or when the mm is


But this new code should vanish altogether if CONFIG_MMU_NOTIFIER=n,
please.  Or at least, we shouldn't be unnecessarily initializing .mm
and .event.  Please take a look at debloating this code.
kernel test robot Dec. 6, 2018, 8:53 p.m. UTC | #4
Hi Jérôme,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.20-rc5]
[cannot apply to next-20181206]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/jglisse-redhat-com/mmu-notifier-contextual-informations/20181207-031930
config: i386-randconfig-x007-201848 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/events/uprobes.c: In function '__replace_page':
   kernel/events/uprobes.c:174:28: error: storage size of 'range' isn't known
     struct mmu_notifier_range range;
                               ^~~~~
>> kernel/events/uprobes.c:177:16: error: 'MMU_NOTIFY_CLEAR' undeclared (first use in this function); did you mean 'VM_ARCH_CLEAR'?
     range.event = MMU_NOTIFY_CLEAR;
                   ^~~~~~~~~~~~~~~~
                   VM_ARCH_CLEAR
   kernel/events/uprobes.c:177:16: note: each undeclared identifier is reported only once for each function it appears in
   kernel/events/uprobes.c:174:28: warning: unused variable 'range' [-Wunused-variable]
     struct mmu_notifier_range range;
                               ^~~~~

vim +177 kernel/events/uprobes.c

   152	
   153	/**
   154	 * __replace_page - replace page in vma by new page.
   155	 * based on replace_page in mm/ksm.c
   156	 *
   157	 * @vma:      vma that holds the pte pointing to page
   158	 * @addr:     address the old @page is mapped at
   159	 * @page:     the cowed page we are replacing by kpage
   160	 * @kpage:    the modified page we replace page by
   161	 *
   162	 * Returns 0 on success, -EFAULT on failure.
   163	 */
   164	static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
   165					struct page *old_page, struct page *new_page)
   166	{
   167		struct mm_struct *mm = vma->vm_mm;
   168		struct page_vma_mapped_walk pvmw = {
   169			.page = old_page,
   170			.vma = vma,
   171			.address = addr,
   172		};
   173		int err;
 > 174		struct mmu_notifier_range range;
   175		struct mem_cgroup *memcg;
   176	
 > 177		range.event = MMU_NOTIFY_CLEAR;
   178		range.start = addr;
   179		range.end = addr + PAGE_SIZE;
   180		range.mm = mm;
   181	
   182		VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
   183	
   184		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL, &memcg,
   185				false);
   186		if (err)
   187			return err;
   188	
   189		/* For try_to_free_swap() and munlock_vma_page() below */
   190		lock_page(old_page);
   191	
   192		mmu_notifier_invalidate_range_start(&range);
   193		err = -EAGAIN;
   194		if (!page_vma_mapped_walk(&pvmw)) {
   195			mem_cgroup_cancel_charge(new_page, memcg, false);
   196			goto unlock;
   197		}
   198		VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
   199	
   200		get_page(new_page);
   201		page_add_new_anon_rmap(new_page, vma, addr, false);
   202		mem_cgroup_commit_charge(new_page, memcg, false, false);
   203		lru_cache_add_active_or_unevictable(new_page, vma);
   204	
   205		if (!PageAnon(old_page)) {
   206			dec_mm_counter(mm, mm_counter_file(old_page));
   207			inc_mm_counter(mm, MM_ANONPAGES);
   208		}
   209	
   210		flush_cache_page(vma, addr, pte_pfn(*pvmw.pte));
   211		ptep_clear_flush_notify(vma, addr, pvmw.pte);
   212		set_pte_at_notify(mm, addr, pvmw.pte,
   213				mk_pte(new_page, vma->vm_page_prot));
   214	
   215		page_remove_rmap(old_page, false);
   216		if (!page_mapped(old_page))
   217			try_to_free_swap(old_page);
   218		page_vma_mapped_walk_done(&pvmw);
   219	
   220		if (vma->vm_flags & VM_LOCKED)
   221			munlock_vma_page(old_page);
   222		put_page(old_page);
   223	
   224		err = 0;
   225	 unlock:
   226		mmu_notifier_invalidate_range_end(&range);
   227		unlock_page(old_page);
   228		return err;
   229	}
   230	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
kernel test robot Dec. 6, 2018, 9:19 p.m. UTC | #5
Hi Jérôme,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.20-rc5]
[cannot apply to next-20181206]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/jglisse-redhat-com/mmu-notifier-contextual-informations/20181207-031930
config: x86_64-randconfig-x017-201848 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   fs///proc/task_mmu.c: In function 'clear_refs_write':
   fs///proc/task_mmu.c:1099:29: error: storage size of 'range' isn't known
      struct mmu_notifier_range range;
                                ^~~~~
>> fs///proc/task_mmu.c:1147:18: error: 'MMU_NOTIFY_SOFT_DIRTY' undeclared (first use in this function); did you mean 'CLEAR_REFS_SOFT_DIRTY'?
       range.event = MMU_NOTIFY_SOFT_DIRTY;
                     ^~~~~~~~~~~~~~~~~~~~~
                     CLEAR_REFS_SOFT_DIRTY
   fs///proc/task_mmu.c:1147:18: note: each undeclared identifier is reported only once for each function it appears in
   fs///proc/task_mmu.c:1099:29: warning: unused variable 'range' [-Wunused-variable]
      struct mmu_notifier_range range;
                                ^~~~~

vim +1147 fs///proc/task_mmu.c

  1069	
  1070	static ssize_t clear_refs_write(struct file *file, const char __user *buf,
  1071					size_t count, loff_t *ppos)
  1072	{
  1073		struct task_struct *task;
  1074		char buffer[PROC_NUMBUF];
  1075		struct mm_struct *mm;
  1076		struct vm_area_struct *vma;
  1077		enum clear_refs_types type;
  1078		struct mmu_gather tlb;
  1079		int itype;
  1080		int rv;
  1081	
  1082		memset(buffer, 0, sizeof(buffer));
  1083		if (count > sizeof(buffer) - 1)
  1084			count = sizeof(buffer) - 1;
  1085		if (copy_from_user(buffer, buf, count))
  1086			return -EFAULT;
  1087		rv = kstrtoint(strstrip(buffer), 10, &itype);
  1088		if (rv < 0)
  1089			return rv;
  1090		type = (enum clear_refs_types)itype;
  1091		if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
  1092			return -EINVAL;
  1093	
  1094		task = get_proc_task(file_inode(file));
  1095		if (!task)
  1096			return -ESRCH;
  1097		mm = get_task_mm(task);
  1098		if (mm) {
> 1099			struct mmu_notifier_range range;
  1100			struct clear_refs_private cp = {
  1101				.type = type,
  1102			};
  1103			struct mm_walk clear_refs_walk = {
  1104				.pmd_entry = clear_refs_pte_range,
  1105				.test_walk = clear_refs_test_walk,
  1106				.mm = mm,
  1107				.private = &cp,
  1108			};
  1109	
  1110			if (type == CLEAR_REFS_MM_HIWATER_RSS) {
  1111				if (down_write_killable(&mm->mmap_sem)) {
  1112					count = -EINTR;
  1113					goto out_mm;
  1114				}
  1115	
  1116				/*
  1117				 * Writing 5 to /proc/pid/clear_refs resets the peak
  1118				 * resident set size to this mm's current rss value.
  1119				 */
  1120				reset_mm_hiwater_rss(mm);
  1121				up_write(&mm->mmap_sem);
  1122				goto out_mm;
  1123			}
  1124	
  1125			down_read(&mm->mmap_sem);
  1126			tlb_gather_mmu(&tlb, mm, 0, -1);
  1127			if (type == CLEAR_REFS_SOFT_DIRTY) {
  1128				for (vma = mm->mmap; vma; vma = vma->vm_next) {
  1129					if (!(vma->vm_flags & VM_SOFTDIRTY))
  1130						continue;
  1131					up_read(&mm->mmap_sem);
  1132					if (down_write_killable(&mm->mmap_sem)) {
  1133						count = -EINTR;
  1134						goto out_mm;
  1135					}
  1136					for (vma = mm->mmap; vma; vma = vma->vm_next) {
  1137						vma->vm_flags &= ~VM_SOFTDIRTY;
  1138						vma_set_page_prot(vma);
  1139					}
  1140					downgrade_write(&mm->mmap_sem);
  1141					break;
  1142				}
  1143	
  1144				range.start = 0;
  1145				range.end = -1UL;
  1146				range.mm = mm;
> 1147				range.event = MMU_NOTIFY_SOFT_DIRTY;
  1148				mmu_notifier_invalidate_range_start(&range);
  1149			}
  1150			walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
  1151			if (type == CLEAR_REFS_SOFT_DIRTY)
  1152				mmu_notifier_invalidate_range_end(&range);
  1153			tlb_finish_mmu(&tlb, 0, -1);
  1154			up_read(&mm->mmap_sem);
  1155	out_mm:
  1156			mmput(mm);
  1157		}
  1158		put_task_struct(task);
  1159	
  1160		return count;
  1161	}
  1162	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Jerome Glisse Dec. 6, 2018, 9:51 p.m. UTC | #6
Should be all fixed in v2 i built with and without mmu notifier and
did not had any issue in v2.

On Fri, Dec 07, 2018 at 05:19:21AM +0800, kbuild test robot wrote:
> Hi Jérôme,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.20-rc5]
> [cannot apply to next-20181206]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/jglisse-redhat-com/mmu-notifier-contextual-informations/20181207-031930
> config: x86_64-randconfig-x017-201848 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>    fs///proc/task_mmu.c: In function 'clear_refs_write':
>    fs///proc/task_mmu.c:1099:29: error: storage size of 'range' isn't known
>       struct mmu_notifier_range range;
>                                 ^~~~~
> >> fs///proc/task_mmu.c:1147:18: error: 'MMU_NOTIFY_SOFT_DIRTY' undeclared (first use in this function); did you mean 'CLEAR_REFS_SOFT_DIRTY'?
>        range.event = MMU_NOTIFY_SOFT_DIRTY;
>                      ^~~~~~~~~~~~~~~~~~~~~
>                      CLEAR_REFS_SOFT_DIRTY
>    fs///proc/task_mmu.c:1147:18: note: each undeclared identifier is reported only once for each function it appears in
>    fs///proc/task_mmu.c:1099:29: warning: unused variable 'range' [-Wunused-variable]
>       struct mmu_notifier_range range;
>                                 ^~~~~
> 
> vim +1147 fs///proc/task_mmu.c
> 
>   1069	
>   1070	static ssize_t clear_refs_write(struct file *file, const char __user *buf,
>   1071					size_t count, loff_t *ppos)
>   1072	{
>   1073		struct task_struct *task;
>   1074		char buffer[PROC_NUMBUF];
>   1075		struct mm_struct *mm;
>   1076		struct vm_area_struct *vma;
>   1077		enum clear_refs_types type;
>   1078		struct mmu_gather tlb;
>   1079		int itype;
>   1080		int rv;
>   1081	
>   1082		memset(buffer, 0, sizeof(buffer));
>   1083		if (count > sizeof(buffer) - 1)
>   1084			count = sizeof(buffer) - 1;
>   1085		if (copy_from_user(buffer, buf, count))
>   1086			return -EFAULT;
>   1087		rv = kstrtoint(strstrip(buffer), 10, &itype);
>   1088		if (rv < 0)
>   1089			return rv;
>   1090		type = (enum clear_refs_types)itype;
>   1091		if (type < CLEAR_REFS_ALL || type >= CLEAR_REFS_LAST)
>   1092			return -EINVAL;
>   1093	
>   1094		task = get_proc_task(file_inode(file));
>   1095		if (!task)
>   1096			return -ESRCH;
>   1097		mm = get_task_mm(task);
>   1098		if (mm) {
> > 1099			struct mmu_notifier_range range;
>   1100			struct clear_refs_private cp = {
>   1101				.type = type,
>   1102			};
>   1103			struct mm_walk clear_refs_walk = {
>   1104				.pmd_entry = clear_refs_pte_range,
>   1105				.test_walk = clear_refs_test_walk,
>   1106				.mm = mm,
>   1107				.private = &cp,
>   1108			};
>   1109	
>   1110			if (type == CLEAR_REFS_MM_HIWATER_RSS) {
>   1111				if (down_write_killable(&mm->mmap_sem)) {
>   1112					count = -EINTR;
>   1113					goto out_mm;
>   1114				}
>   1115	
>   1116				/*
>   1117				 * Writing 5 to /proc/pid/clear_refs resets the peak
>   1118				 * resident set size to this mm's current rss value.
>   1119				 */
>   1120				reset_mm_hiwater_rss(mm);
>   1121				up_write(&mm->mmap_sem);
>   1122				goto out_mm;
>   1123			}
>   1124	
>   1125			down_read(&mm->mmap_sem);
>   1126			tlb_gather_mmu(&tlb, mm, 0, -1);
>   1127			if (type == CLEAR_REFS_SOFT_DIRTY) {
>   1128				for (vma = mm->mmap; vma; vma = vma->vm_next) {
>   1129					if (!(vma->vm_flags & VM_SOFTDIRTY))
>   1130						continue;
>   1131					up_read(&mm->mmap_sem);
>   1132					if (down_write_killable(&mm->mmap_sem)) {
>   1133						count = -EINTR;
>   1134						goto out_mm;
>   1135					}
>   1136					for (vma = mm->mmap; vma; vma = vma->vm_next) {
>   1137						vma->vm_flags &= ~VM_SOFTDIRTY;
>   1138						vma_set_page_prot(vma);
>   1139					}
>   1140					downgrade_write(&mm->mmap_sem);
>   1141					break;
>   1142				}
>   1143	
>   1144				range.start = 0;
>   1145				range.end = -1UL;
>   1146				range.mm = mm;
> > 1147				range.event = MMU_NOTIFY_SOFT_DIRTY;
>   1148				mmu_notifier_invalidate_range_start(&range);
>   1149			}
>   1150			walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
>   1151			if (type == CLEAR_REFS_SOFT_DIRTY)
>   1152				mmu_notifier_invalidate_range_end(&range);
>   1153			tlb_finish_mmu(&tlb, 0, -1);
>   1154			up_read(&mm->mmap_sem);
>   1155	out_mm:
>   1156			mmput(mm);
>   1157		}
>   1158		put_task_struct(task);
>   1159	
>   1160		return count;
>   1161	}
>   1162	
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox series

Patch

diff --git a/fs/dax.c b/fs/dax.c
index e22508ee19ec..83092c5ac5f0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -761,6 +761,7 @@  static void dax_entry_mkclean(struct address_space *mapping, pgoff_t index,
 		struct mmu_notifier_range range;
 		unsigned long address;
 
+		range.event = MMU_NOTIFY_PROTECTION_PAGE;
 		range.mm = vma->vm_mm;
 
 		cond_resched();
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 53d625925669..4abb1668eeb3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1144,6 +1144,7 @@  static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 			range.start = 0;
 			range.end = -1UL;
 			range.mm = mm;
+			range.event = MMU_NOTIFY_SOFT_DIRTY;
 			mmu_notifier_invalidate_range_start(&range);
 		}
 		walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index cbeece8e47d4..3077d487be8b 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -25,10 +25,43 @@  struct mmu_notifier_mm {
 	spinlock_t lock;
 };
 
+/*
+ * What event is triggering the invalidation:
+ *
+ * MMU_NOTIFY_UNMAP
+ *    either munmap() that unmap the range or a mremap() that move the range
+ *
+ * MMU_NOTIFY_CLEAR
+ *    clear page table entry (many reasons for this like madvise() or replacing
+ *    a page by another one, ...).
+ *
+ * MMU_NOTIFY_PROTECTION_VMA
+ *    update is due to protection change for the range ie using the vma access
+ *    permission (vm_page_prot) to update the whole range is enough no need to
+ *    inspect changes to the CPU page table (mprotect() syscall)
+ *
+ * MMU_NOTIFY_PROTECTION_PAGE
+ *    update is due to change in read/write flag for pages in the range so to
+ *    mirror those changes the user must inspect the CPU page table (from the
+ *    end callback).
+ *
+ *
+ * MMU_NOTIFY_SOFT_DIRTY
+ *    soft dirty accounting (still same page and same access flags)
+ */
+enum mmu_notifier_event {
+	MMU_NOTIFY_UNMAP = 0,
+	MMU_NOTIFY_CLEAR,
+	MMU_NOTIFY_PROTECTION_VMA,
+	MMU_NOTIFY_PROTECTION_PAGE,
+	MMU_NOTIFY_SOFT_DIRTY,
+};
+
 struct mmu_notifier_range {
 	struct mm_struct *mm;
 	unsigned long start;
 	unsigned long end;
+	enum mmu_notifier_event event;
 	bool blockable;
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index aa7996ca361e..b6ef3be1c24e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -174,6 +174,7 @@  static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	struct mmu_notifier_range range;
 	struct mem_cgroup *memcg;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = addr;
 	range.end = addr + PAGE_SIZE;
 	range.mm = mm;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a7a059dbf7d..4919be71ffd0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1182,6 +1182,7 @@  static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 		cond_resched();
 	}
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = haddr;
 	range.end = range.start + HPAGE_PMD_SIZE;
 	range.mm = vma->vm_mm;
@@ -1347,6 +1348,7 @@  vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 				    vma, HPAGE_PMD_NR);
 	__SetPageUptodate(new_page);
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = haddr;
 	range.end = range.start + HPAGE_PMD_SIZE;
 	range.mm = vma->vm_mm;
@@ -2029,6 +2031,7 @@  void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_notifier_range range;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = address & HPAGE_PUD_MASK;
 	range.end = range.start + HPAGE_PUD_SIZE;
 	range.mm = mm;
@@ -2248,6 +2251,7 @@  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	struct mm_struct *mm = vma->vm_mm;
 	struct mmu_notifier_range range;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = address & HPAGE_PMD_MASK;
 	range.end = range.start + HPAGE_PMD_SIZE;
 	range.mm = mm;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bfbdab44d51..9ffe34173834 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3244,6 +3244,7 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 	cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = vma->vm_start;
 	range.end = vma->vm_end;
 	range.mm = src;
@@ -3344,6 +3345,7 @@  void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	unsigned long sz = huge_page_size(h);
 	struct mmu_notifier_range range;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = start;
 	range.end = end;
 	range.mm = mm;
@@ -3629,6 +3631,7 @@  static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	__SetPageUptodate(new_page);
 	set_page_huge_active(new_page);
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = haddr;
 	range.end = range.start + huge_page_size(h);
 	range.mm = mm;
@@ -4346,6 +4349,7 @@  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	bool shared_pmd = false;
 	struct mmu_notifier_range range;
 
+	range.event = MMU_NOTIFY_PROTECTION_VMA;
 	range.start = start;
 	range.end = end;
 	range.mm = mm;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e9fe0c9a9f56..c5c78ba30b38 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1016,6 +1016,7 @@  static void collapse_huge_page(struct mm_struct *mm,
 	pte = pte_offset_map(pmd, address);
 	pte_ptl = pte_lockptr(mm, pmd);
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = address;
 	range.end = range.start + HPAGE_PMD_SIZE;
 	range.mm = mm;
diff --git a/mm/ksm.c b/mm/ksm.c
index 262694d0cd4c..f8fbb92ca1bd 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1050,6 +1050,7 @@  static int write_protect_page(struct vm_area_struct *vma, struct page *page,
 
 	BUG_ON(PageTransCompound(page));
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = pvmw.address;
 	range.end = range.start + PAGE_SIZE;
 	range.mm = mm;
@@ -1139,6 +1140,7 @@  static int replace_page(struct vm_area_struct *vma, struct page *page,
 	if (!pmd)
 		goto out;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = addr;
 	range.end = addr + PAGE_SIZE;
 	range.mm = mm;
diff --git a/mm/madvise.c b/mm/madvise.c
index f20dd80ca21b..c415985d6a04 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -466,6 +466,7 @@  static int madvise_free_single_vma(struct vm_area_struct *vma,
 	if (!vma_is_anonymous(vma))
 		return -EINVAL;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = max(vma->vm_start, start_addr);
 	if (range.start >= vma->vm_end)
 		return -EINVAL;
diff --git a/mm/memory.c b/mm/memory.c
index 36e0b83949fc..4ad63002d770 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1007,6 +1007,7 @@  int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * is_cow_mapping() returns true.
 	 */
 	is_cow = is_cow_mapping(vma->vm_flags);
+	range.event = MMU_NOTIFY_PROTECTION_PAGE;
 	range.start = addr;
 	range.end = end;
 	range.mm = src_mm;
@@ -1334,6 +1335,7 @@  void unmap_vmas(struct mmu_gather *tlb,
 {
 	struct mmu_notifier_range range;
 
+	range.event = MMU_NOTIFY_UNMAP;
 	range.start = start_addr;
 	range.end = end_addr;
 	range.mm = vma->vm_mm;
@@ -1358,6 +1360,7 @@  void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	struct mmu_notifier_range range;
 	struct mmu_gather tlb;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = start;
 	range.end = range.start + size;
 	range.mm = vma->vm_mm;
@@ -1387,6 +1390,7 @@  static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
 	struct mmu_notifier_range range;
 	struct mmu_gather tlb;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = address;
 	range.end = range.start + size;
 	range.mm = vma->vm_mm;
@@ -2260,6 +2264,7 @@  static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 	struct mem_cgroup *memcg;
 	struct mmu_notifier_range range;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = vmf->address & PAGE_MASK;
 	range.end = range.start + PAGE_SIZE;
 	range.mm = mm;
diff --git a/mm/migrate.c b/mm/migrate.c
index 4896dd9d8b28..a2caaabfc5a1 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2306,6 +2306,7 @@  static void migrate_vma_collect(struct migrate_vma *migrate)
 	struct mmu_notifier_range range;
 	struct mm_walk mm_walk;
 
+	range.event = MMU_NOTIFY_CLEAR;
 	range.start = migrate->start;
 	range.end = migrate->end;
 	range.mm = mm_walk.mm;
@@ -2726,6 +2727,7 @@  static void migrate_vma_pages(struct migrate_vma *migrate)
 			if (!notified) {
 				notified = true;
 
+				range.event = MMU_NOTIFY_CLEAR;
 				range.start = addr;
 				range.end = migrate->end;
 				range.mm = mm;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index f466adf31e12..6d41321b2f3e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -186,6 +186,7 @@  static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
 
 		/* invoke the mmu notifier if the pmd is populated */
 		if (!range.start) {
+			range.event = MMU_NOTIFY_PROTECTION_VMA;
 			range.start = addr;
 			range.end = end;
 			range.mm = mm;
diff --git a/mm/mremap.c b/mm/mremap.c
index db060acb4a8c..856a5e6bb226 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -203,6 +203,7 @@  unsigned long move_page_tables(struct vm_area_struct *vma,
 	old_end = old_addr + len;
 	flush_cache_range(vma, old_addr, old_end);
 
+	range.event = MMU_NOTIFY_UNMAP;
 	range.start = old_addr;
 	range.end = old_end;
 	range.mm = vma->vm_mm;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b29ab2624e95..f4bde1c34714 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -519,6 +519,7 @@  bool __oom_reap_task_mm(struct mm_struct *mm)
 			struct mmu_notifier_range range;
 			struct mmu_gather tlb;
 
+			range.event = MMU_NOTIFY_CLEAR;
 			range.start = vma->vm_start;
 			range.end = vma->vm_end;
 			range.mm = mm;
diff --git a/mm/rmap.c b/mm/rmap.c
index 09c5d9e5c766..b1afbbcc236a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -896,6 +896,7 @@  static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
 	 * We have to assume the worse case ie pmd for invalidation. Note that
 	 * the page can not be free from this function.
 	 */
+	range.event = MMU_NOTIFY_PROTECTION_PAGE;
 	range.mm = vma->vm_mm;
 	range.start = address;
 	range.end = min(vma->vm_end, range.start +
@@ -1372,6 +1373,7 @@  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * Note that the page can not be free in this function as call of
 	 * try_to_unmap() must hold a reference on the page.
 	 */
+	range.event = MMU_NOTIFY_CLEAR;
 	range.mm = vma->vm_mm;
 	range.start = vma->vm_start;
 	range.end = min(vma->vm_end, range.start +