Message ID | 20200714070220.3500839-13-ira.weiny@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PKS: Add Protection Keys Supervisor (PKS) support | expand |
On Tue, Jul 14, 2020 at 12:02:17AM -0700, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > Device managed pages may have additional protections. These protections > need to be removed prior to valid use by kernel users. > > Check for special treatment of device managed pages in kmap and take > action if needed. We use kmap as an interface for generic kernel code > because under normal circumstances it would be a bug for general kernel > code to not use kmap prior to accessing kernel memory. Therefore, this > should allow any valid kernel users to seamlessly use these pages > without issues. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > --- > include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index d6e82e3de027..7f809d8d5a94 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -8,6 +8,7 @@ > #include <linux/mm.h> > #include <linux/uaccess.h> > #include <linux/hardirq.h> > +#include <linux/memremap.h> > > #include <asm/cacheflush.h> > > @@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size) > > #include <asm/kmap_types.h> > > +static inline void enable_access(struct page *page) > +{ > + if (!page_is_access_protected(page)) > + return; > + dev_access_enable(); > +} > + > +static inline void disable_access(struct page *page) > +{ > + if (!page_is_access_protected(page)) > + return; > + dev_access_disable(); > +} So, if I followed along correctly, you're proposing to do a WRMSR per k{,un}map{_atomic}(), sounds like excellent performance all-round :-(
On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote: > On Tue, Jul 14, 2020 at 12:02:17AM -0700, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > Device managed pages may have additional protections. These protections > > need to be removed prior to valid use by kernel users. > > > > Check for special treatment of device managed pages in kmap and take > > action if needed. We use kmap as an interface for generic kernel code > > because under normal circumstances it would be a bug for general kernel > > code to not use kmap prior to accessing kernel memory. Therefore, this > > should allow any valid kernel users to seamlessly use these pages > > without issues. > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > > include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++- > > 1 file changed, 31 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > > index d6e82e3de027..7f809d8d5a94 100644 > > --- a/include/linux/highmem.h > > +++ b/include/linux/highmem.h > > @@ -8,6 +8,7 @@ > > #include <linux/mm.h> > > #include <linux/uaccess.h> > > #include <linux/hardirq.h> > > +#include <linux/memremap.h> > > > > #include <asm/cacheflush.h> > > > > @@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size) > > > > #include <asm/kmap_types.h> > > > > +static inline void enable_access(struct page *page) > > +{ > > + if (!page_is_access_protected(page)) > > + return; > > + dev_access_enable(); > > +} > > + > > +static inline void disable_access(struct page *page) > > +{ > > + if (!page_is_access_protected(page)) > > + return; > > + dev_access_disable(); > > +} > > So, if I followed along correctly, you're proposing to do a WRMSR per > k{,un}map{_atomic}(), sounds like excellent performance all-round :-( Only to pages which have this additional protection, ie not DRAM. User mappings of this memory is not affected (would be covered by User PKeys if desired). User mappings to persistent memory are the primary use case and the performant path. Ira
On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote: > On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote: > > So, if I followed along correctly, you're proposing to do a WRMSR per > > k{,un}map{_atomic}(), sounds like excellent performance all-round :-( > > Only to pages which have this additional protection, ie not DRAM. > > User mappings of this memory is not affected (would be covered by User PKeys if > desired). User mappings to persistent memory are the primary use case and the > performant path. Because performance to non-volatile memory doesn't matter? I think Dave has a better answer here ...
On 7/14/20 12:29 PM, Peter Zijlstra wrote: > On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote: >> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote: >>> So, if I followed along correctly, you're proposing to do a WRMSR per >>> k{,un}map{_atomic}(), sounds like excellent performance all-round :-( >> Only to pages which have this additional protection, ie not DRAM. >> >> User mappings of this memory is not affected (would be covered by User PKeys if >> desired). User mappings to persistent memory are the primary use case and the >> performant path. > Because performance to non-volatile memory doesn't matter? I think Dave > has a better answer here ... So, these WRMSRs are less evil than normal. They're architecturally non-serializing instructions, just like the others in the SDM WRMSR documentation: Note that WRMSR to the IA32_TSC_DEADLINE MSR (MSR index 6E0H) and the X2APIC MSRs (MSR indices 802H to 83FH) are not serializing. This section of the SDM needs to be updated for the PKRS. Also note that the PKRS WRMSR is similar in its ordering properties to WRPKRU: WRPKRU will never execute speculatively. Memory accesses affected by PKRU register will not execute (even speculatively) until all prior executions of WRPKRU have completed execution and updated the PKRU register. Which means we don't have to do silliness like LFENCE before WRMSR to get ordering *back*. This is another tidbit that needs to get added to the SDM. It should probably also get captured in the changelog. But, either way, this *will* make accessing PMEM more expensive from the kernel. No escaping that. But, we've also got customers saying they won't deploy PMEM until we mitigate this stray write issue. Those folks are quite willing to pay the increased in-kernel cost for increased protection from stray kernel writes. Intel is also quite motivated because we really like increasing the number of PMEM deployments. :) Ira, can you make sure this all gets pulled into the changelogs somewhere?
On Tue, Jul 14, 2020 at 12:42:11PM -0700, Dave Hansen wrote: > On 7/14/20 12:29 PM, Peter Zijlstra wrote: > > On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote: > >> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote: > >>> So, if I followed along correctly, you're proposing to do a WRMSR per > >>> k{,un}map{_atomic}(), sounds like excellent performance all-round :-( > >> Only to pages which have this additional protection, ie not DRAM. > >> > >> User mappings of this memory is not affected (would be covered by User PKeys if > >> desired). User mappings to persistent memory are the primary use case and the > >> performant path. > > Because performance to non-volatile memory doesn't matter? I think Dave > > has a better answer here ... > > So, these WRMSRs are less evil than normal. They're architecturally > non-serializing instructions, Excellent, that should make these a fair bit faster than regular MSRs. > But, either way, this *will* make accessing PMEM more expensive from the > kernel. No escaping that. There's no free lunch, it's just that regular MSRs are fairly horrible.
On Tue, Jul 14, 2020 at 12:42:11PM -0700, Dave Hansen wrote: > On 7/14/20 12:29 PM, Peter Zijlstra wrote: > > On Tue, Jul 14, 2020 at 12:06:16PM -0700, Ira Weiny wrote: > >> On Tue, Jul 14, 2020 at 10:44:51AM +0200, Peter Zijlstra wrote: > >>> So, if I followed along correctly, you're proposing to do a WRMSR per > >>> k{,un}map{_atomic}(), sounds like excellent performance all-round :-( > >> Only to pages which have this additional protection, ie not DRAM. > >> > >> User mappings of this memory is not affected (would be covered by User PKeys if > >> desired). User mappings to persistent memory are the primary use case and the > >> performant path. > > Because performance to non-volatile memory doesn't matter? I think Dave > > has a better answer here ... > > So, these WRMSRs are less evil than normal. They're architecturally > non-serializing instructions, just like the others in the SDM WRMSR > documentation: > > Note that WRMSR to the IA32_TSC_DEADLINE MSR (MSR index 6E0H) > and the X2APIC MSRs (MSR indices 802H to 83FH) are not > serializing. > > This section of the SDM needs to be updated for the PKRS. Also note > that the PKRS WRMSR is similar in its ordering properties to WRPKRU: > > WRPKRU will never execute speculatively. Memory accesses > affected by PKRU register will not execute (even speculatively) > until all prior executions of WRPKRU have completed execution > and updated the PKRU register. > > Which means we don't have to do silliness like LFENCE before WRMSR to > get ordering *back*. This is another tidbit that needs to get added to > the SDM. It should probably also get captured in the changelog. > > But, either way, this *will* make accessing PMEM more expensive from the > kernel. No escaping that. But, we've also got customers saying they > won't deploy PMEM until we mitigate this stray write issue. Those folks > are quite willing to pay the increased in-kernel cost for increased > protection from stray kernel writes. Intel is also quite motivated > because we really like increasing the number of PMEM deployments. :) > > Ira, can you make sure this all gets pulled into the changelogs somewhere? Yes of course. Thanks for writing that up. Ira
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index d6e82e3de027..7f809d8d5a94 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -8,6 +8,7 @@ #include <linux/mm.h> #include <linux/uaccess.h> #include <linux/hardirq.h> +#include <linux/memremap.h> #include <asm/cacheflush.h> @@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size) #include <asm/kmap_types.h> +static inline void enable_access(struct page *page) +{ + if (!page_is_access_protected(page)) + return; + dev_access_enable(); +} + +static inline void disable_access(struct page *page) +{ + if (!page_is_access_protected(page)) + return; + dev_access_disable(); +} + #ifdef CONFIG_HIGHMEM extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot); extern void kunmap_atomic_high(void *kvaddr); @@ -55,6 +70,11 @@ static inline void *kmap(struct page *page) else addr = kmap_high(page); kmap_flush_tlb((unsigned long)addr); + /* + * Even non-highmem pages may have additional access protections which + * need to be checked and potentially enabled. + */ + enable_access(page); return addr; } @@ -63,6 +83,11 @@ void kunmap_high(struct page *page); static inline void kunmap(struct page *page) { might_sleep(); + /* + * Even non-highmem pages may have additional access protections which + * need to be checked and potentially disabled. + */ + disable_access(page); if (!PageHighMem(page)) return; kunmap_high(page); @@ -85,6 +110,7 @@ static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { preempt_disable(); pagefault_disable(); + enable_access(page); if (!PageHighMem(page)) return page_address(page); return kmap_atomic_high_prot(page, prot); @@ -137,6 +163,7 @@ static inline unsigned long totalhigh_pages(void) { return 0UL; } static inline void *kmap(struct page *page) { might_sleep(); + enable_access(page); return page_address(page); } @@ -146,6 +173,7 @@ static inline void kunmap_high(struct page *page) static inline void kunmap(struct page *page) { + disable_access(page); #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(page_address(page)); #endif @@ -155,6 +183,7 @@ static inline void *kmap_atomic(struct page *page) { preempt_disable(); pagefault_disable(); + enable_access(page); return page_address(page); } #define kmap_atomic_prot(page, prot) kmap_atomic(page) @@ -216,7 +245,8 @@ static inline void kmap_atomic_idx_pop(void) #define kunmap_atomic(addr) \ do { \ BUILD_BUG_ON(__same_type((addr), struct page *)); \ - kunmap_atomic_high(addr); \ + disable_access(kmap_to_page(addr)); \ + kunmap_atomic_high(addr); \ pagefault_enable(); \ preempt_enable(); \ } while (0)