Message ID | 20200821085011.28878-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/4] mm: Export flush_vm_area() to sync the PTEs upon construction | expand |
On Fri, Aug 21, 2020 at 09:50:08AM +0100, Chris Wilson wrote: > The alloc_vm_area() is another method for drivers to > vmap/map_kernel_range that uses apply_to_page_range() rather than the > direct vmalloc walkers. This is missing the page table modification > tracking, and the ability to synchronize the PTE updates afterwards. > Provide flush_vm_area() for the users of alloc_vm_area() that assumes > the worst and ensures that the page directories are correctly flushed > upon construction. > > The impact is most pronounced on x86_32 due to the delayed set_pmd(). > > Reported-by: Pavel Machek <pavel@ucw.cz> > References: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified") > References: 86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()") > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Joerg Roedel <jroedel@suse.de> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Dave Airlie <airlied@redhat.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > Cc: Pavel Machek <pavel@ucw.cz> > Cc: David Vrabel <david.vrabel@citrix.com> > Cc: <stable@vger.kernel.org> # v5.8+ > --- > include/linux/vmalloc.h | 1 + > mm/vmalloc.c | 16 ++++++++++++++++ > 2 files changed, 17 insertions(+) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 0221f852a7e1..a253b27df0ac 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -204,6 +204,7 @@ static inline void set_vm_flush_reset_perms(void *addr) > > /* Allocate/destroy a 'vmalloc' VM area. */ > extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes); > +extern void flush_vm_area(struct vm_struct *area); > extern void free_vm_area(struct vm_struct *area); > > /* for /dev/kmem */ > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index b482d240f9a2..c41934486031 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3078,6 +3078,22 @@ struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes) > } > EXPORT_SYMBOL_GPL(alloc_vm_area); > > +void flush_vm_area(struct vm_struct *area) > +{ > + unsigned long addr = (unsigned long)area->addr; > + > + /* apply_to_page_range() doesn't track the damage, assume the worst */ > + if (ARCH_PAGE_TABLE_SYNC_MASK & (PGTBL_PTE_MODIFIED | > + PGTBL_PMD_MODIFIED | > + PGTBL_PUD_MODIFIED | > + PGTBL_P4D_MODIFIED | > + PGTBL_PGD_MODIFIED)) > + arch_sync_kernel_mappings(addr, addr + area->size); This should happen in __apply_to_page_range() directly and look like this: if (ARCH_PAGE_TABLE_SYNC_MASK && create) arch_sync_kernel_mappings(addr, addr + size); Or even better, track whether something had to be allocated in the __apply_to_page_range() path and check for: if (ARCH_PAGE_TABLE_SYNC_MASK & mask)
Quoting Joerg Roedel (2020-08-21 10:51:29) > On Fri, Aug 21, 2020 at 09:50:08AM +0100, Chris Wilson wrote: > > The alloc_vm_area() is another method for drivers to > > vmap/map_kernel_range that uses apply_to_page_range() rather than the > > direct vmalloc walkers. This is missing the page table modification > > tracking, and the ability to synchronize the PTE updates afterwards. > > Provide flush_vm_area() for the users of alloc_vm_area() that assumes > > the worst and ensures that the page directories are correctly flushed > > upon construction. > > > > The impact is most pronounced on x86_32 due to the delayed set_pmd(). > > > > Reported-by: Pavel Machek <pavel@ucw.cz> > > References: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified") > > References: 86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()") > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Joerg Roedel <jroedel@suse.de> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Dave Airlie <airlied@redhat.com> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > > Cc: Pavel Machek <pavel@ucw.cz> > > Cc: David Vrabel <david.vrabel@citrix.com> > > Cc: <stable@vger.kernel.org> # v5.8+ > > --- > > include/linux/vmalloc.h | 1 + > > mm/vmalloc.c | 16 ++++++++++++++++ > > 2 files changed, 17 insertions(+) > > > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > > index 0221f852a7e1..a253b27df0ac 100644 > > --- a/include/linux/vmalloc.h > > +++ b/include/linux/vmalloc.h > > @@ -204,6 +204,7 @@ static inline void set_vm_flush_reset_perms(void *addr) > > > > /* Allocate/destroy a 'vmalloc' VM area. */ > > extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes); > > +extern void flush_vm_area(struct vm_struct *area); > > extern void free_vm_area(struct vm_struct *area); > > > > /* for /dev/kmem */ > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index b482d240f9a2..c41934486031 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3078,6 +3078,22 @@ struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes) > > } > > EXPORT_SYMBOL_GPL(alloc_vm_area); > > > > +void flush_vm_area(struct vm_struct *area) > > +{ > > + unsigned long addr = (unsigned long)area->addr; > > + > > + /* apply_to_page_range() doesn't track the damage, assume the worst */ > > + if (ARCH_PAGE_TABLE_SYNC_MASK & (PGTBL_PTE_MODIFIED | > > + PGTBL_PMD_MODIFIED | > > + PGTBL_PUD_MODIFIED | > > + PGTBL_P4D_MODIFIED | > > + PGTBL_PGD_MODIFIED)) > > + arch_sync_kernel_mappings(addr, addr + area->size); > > This should happen in __apply_to_page_range() directly and look like > this: Ok. I thought it had to be after assigning the *ptep. If we apply the sync first, do not have to worry about PGTBL_PTE_MODIFIED from the *ptep? -Chris
On Fri, Aug 21, 2020 at 10:54:22AM +0100, Chris Wilson wrote: > Ok. I thought it had to be after assigning the *ptep. If we apply the > sync first, do not have to worry about PGTBL_PTE_MODIFIED from the > *ptep? Hmm, if I understand the code correctly, you are re-implementing some generic ioremap/vmalloc mapping logic in the i915 driver. I don't know the reason, but if it is valid you need to manually call arch_sync_kernel_mappings() from your driver like this to be correct: if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PTE_MODIFIED) arch_sync_kernel_mappings(); In practice this is a no-op, because nobody sets PGTBL_PTE_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK, so the above code would be optimized away. But what you really care about is the tracking in apply_to_page_range(), as that allocates the !pte levels of your page-table, which needs synchronization on x86-32. Btw, what are the reasons you can't use generic vmalloc/ioremap interfaces to map the range? Regards, Joerg
Quoting Joerg Roedel (2020-08-21 11:22:04) > On Fri, Aug 21, 2020 at 10:54:22AM +0100, Chris Wilson wrote: > > Ok. I thought it had to be after assigning the *ptep. If we apply the > > sync first, do not have to worry about PGTBL_PTE_MODIFIED from the > > *ptep? > > Hmm, if I understand the code correctly, you are re-implementing some > generic ioremap/vmalloc mapping logic in the i915 driver. I don't know > the reason, but if it is valid you need to manually call > arch_sync_kernel_mappings() from your driver like this to be correct: > > if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PTE_MODIFIED) > arch_sync_kernel_mappings(); > > In practice this is a no-op, because nobody sets PGTBL_PTE_MODIFIED in > ARCH_PAGE_TABLE_SYNC_MASK, so the above code would be optimized away. > > But what you really care about is the tracking in apply_to_page_range(), > as that allocates the !pte levels of your page-table, which needs > synchronization on x86-32. > > Btw, what are the reasons you can't use generic vmalloc/ioremap > interfaces to map the range? ioremap_prot and ioremap_page_range assume a contiguous IO address. So we needed to allocate the vmalloc area [and would then need to iterate over the discontiguous iomem chunks with ioremap_page_range], and since alloc_vm_area returned the ptep, it looked clearer to then assign those according to whether we wanted ioremapping or a plain page. So we ended up with one call to the core to return us a vm_struct and a pte array that worked for either backing store. -Chris
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 0221f852a7e1..a253b27df0ac 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -204,6 +204,7 @@ static inline void set_vm_flush_reset_perms(void *addr) /* Allocate/destroy a 'vmalloc' VM area. */ extern struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes); +extern void flush_vm_area(struct vm_struct *area); extern void free_vm_area(struct vm_struct *area); /* for /dev/kmem */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index b482d240f9a2..c41934486031 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3078,6 +3078,22 @@ struct vm_struct *alloc_vm_area(size_t size, pte_t **ptes) } EXPORT_SYMBOL_GPL(alloc_vm_area); +void flush_vm_area(struct vm_struct *area) +{ + unsigned long addr = (unsigned long)area->addr; + + /* apply_to_page_range() doesn't track the damage, assume the worst */ + if (ARCH_PAGE_TABLE_SYNC_MASK & (PGTBL_PTE_MODIFIED | + PGTBL_PMD_MODIFIED | + PGTBL_PUD_MODIFIED | + PGTBL_P4D_MODIFIED | + PGTBL_PGD_MODIFIED)) + arch_sync_kernel_mappings(addr, addr + area->size); + + flush_cache_vmap(addr, area->size); +} +EXPORT_SYMBOL_GPL(flush_vm_area); + void free_vm_area(struct vm_struct *area) { struct vm_struct *ret;
The alloc_vm_area() is another method for drivers to vmap/map_kernel_range that uses apply_to_page_range() rather than the direct vmalloc walkers. This is missing the page table modification tracking, and the ability to synchronize the PTE updates afterwards. Provide flush_vm_area() for the users of alloc_vm_area() that assumes the worst and ensures that the page directories are correctly flushed upon construction. The impact is most pronounced on x86_32 due to the delayed set_pmd(). Reported-by: Pavel Machek <pavel@ucw.cz> References: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified") References: 86cf69f1d893 ("x86/mm/32: implement arch_sync_kernel_mappings()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: David Vrabel <david.vrabel@citrix.com> Cc: <stable@vger.kernel.org> # v5.8+ --- include/linux/vmalloc.h | 1 + mm/vmalloc.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+)