@@ -4,25 +4,30 @@
Memory Protection Keys
======================
-Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
-which is found on Intel's Skylake (and later) "Scalable Processor"
-Server CPUs. It will be available in future non-server Intel parts
-and future AMD processors.
+Memory Protection Keys provide a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables
+when an application changes protection domains.
-For anyone wishing to test or use this feature, it is available in
-Amazon's EC2 C5 instances and is known to work there using an Ubuntu
-17.04 image.
+PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable
+Processor" Server CPUs and later. And it will be available in future
+non-server Intel parts and future AMD processors.
-Memory Protection Keys provides a mechanism for enforcing page-based
-protections, but without requiring modification of the page tables
-when an application changes protection domains. It works by
-dedicating 4 previously ignored bits in each page table entry to a
-"protection key", giving 16 possible keys.
+Protection Keys for Supervisor pages (PKS) is available in the SDM since May
+2020.
+
+pkeys work by dedicating 4 previously Reserved bits in each page table entry to
+a "protection key", giving 16 possible keys. User and Supervisor pages are
+treated separately.
+
+Protections for each page are controlled with per-CPU registers for each type
+of page User and Supervisor. Each of these 32-bit register stores two separate
+bits (Access Disable and Write Disable) for each key.
-There is also a new user-accessible register (PKRU) with two separate
-bits (Access Disable and Write Disable) for each key. Being a CPU
-register, PKRU is inherently thread-local, potentially giving each
-thread a different set of protections from every other thread.
+For Userspace the register is user-accessible (rdpkru/wrpkru). For
+Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel.
+
+Being a CPU register, pkeys are inherently thread-local, potentially giving
+each thread an independent set of protections from every other thread.
There are two new instructions (RDPKRU/WRPKRU) for reading and writing
to the new register. The feature is only available in 64-bit mode,
@@ -30,8 +35,11 @@ even though there is theoretically space in the PAE PTEs. These
permissions are enforced on data access only and have no effect on
instruction fetches.
-Syscalls
-========
+For kernel space rdmsr/wrmsr are used to access the kernel MSRs.
+
+
+Syscalls for user space keys
+============================
There are 3 system calls which directly interact with pkeys::
@@ -98,3 +106,80 @@ with a read()::
The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
+
+
+Kernel API for PKS support
+==========================
+
+Similar to user space pkeys, supervisor pkeys allow additional protections to
+be defined for a supervisor mappings. Unlike user space pkeys, violations of
+these protections result in a a kernel oops.
+
+Supervisor Memory Protection Keys (PKS) is a feature which is found on Intel's
+Sapphire Rapids (and later) "Scalable Processor" Server CPUs. It will also be
+available in future non-server Intel parts.
+
+Also qemu has some support as well: https://www.qemu.org/2021/04/30/qemu-6-0-0/
+
+Kernel users intending to use PKS support should depend on
+ARCH_HAS_SUPERVISOR_PKEYS, and add their config to ARCH_ENABLE_SUPERVISOR_PKEYS
+to turn on this support within the core.
+
+Users reserve a key value by adding an entry to the enum pks_pkey_consumers and
+defining the initial protections in the consumer_defaults[] array.
+
+For example to configure a key for 'MY_FEATURE' with a default of Write
+Disabled.
+
+::
+
+ enum pks_pkey_consumers
+ {
+ PKS_KEY_DEFAULT,
+ PKS_KEY_MY_FEATURE,
+ PKS_KEY_NR_CONSUMERS
+ }
+
+ ...
+ consumer_defaults[PKS_KEY_DEFAULT] = 0;
+ consumer_defaults[PKS_KEY_MY_FEATURE] = PKR_DISABLE_WRITE;
+ ...
+
+The following interface is used to manipulate the 'protection domain' defined
+by a pkey within the kernel. Setting a pkey value in a supervisor PTE adds
+this additional protection to the page.
+
+::
+
+ #define PAGE_KERNEL_PKEY(pkey)
+ #define _PAGE_KEY(pkey)
+ bool pks_enabled(void);
+ void pks_mk_noaccess(int pkey);
+ void pks_mk_readonly(int pkey);
+ void pks_mk_readwrite(int pkey);
+
+pks_enabled() allows users to know if PKS is configured and available on the
+current running system.
+
+Kernel users must set the pkey in the page table entries for the mappings they
+want to protect. This can be done with PAGE_KERNEL_PKEY() or _PAGE_KEY().
+
+The pks_mk*() family of calls allow indinvidual threads to change the
+protections for the domain identified by the pkey parameter. 3 states are
+available: pks_mk_noaccess(), pks_mk_readonly(), and pks_mk_readwrite() which
+set the access to none, read, and read/write respectively.
+
+The interface sets (Access Disabled (AD=1)) for all keys not in use.
+
+It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing
+but still maintains ordering properties similar to WRPKRU. Thus it is safe to
+immediately use a mapping when the pks_mk*() functions return.
+
+Older versions of the SDM on PKRS may be wrong with regard to this
+serialization. The text should be the same as that of WRPKRU. From the WRPKRU
+text:
+
+ WRPKRU will never execute transiently. Memory accesses
+ affected by PKRU register will not execute (even transiently)
+ until all prior executions of WRPKRU have completed execution
+ and updated the PKRU register.
@@ -71,6 +71,12 @@
_PAGE_PKEY_BIT2 | \
_PAGE_PKEY_BIT3)
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0)
+#else
+#define _PAGE_PKEY(pkey) (_AT(pteval_t, 0))
+#endif
+
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE)
#define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED)
#else
@@ -226,6 +232,12 @@ enum page_cache_mode {
#define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO)
#define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE)
+#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
+#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pkey))
+#else
+#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL
+#endif
+
#endif /* __ASSEMBLY__ */
/* xwr */
@@ -3,6 +3,9 @@
* Intel Memory Protection Keys management
* Copyright (c) 2015, Intel Corporation.
*/
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/pkeys: " fmt
+
#include <linux/debugfs.h> /* debugfs_create_u32() */
#include <linux/mm_types.h> /* mm_struct, vma, etc... */
#include <linux/pkeys.h> /* PKEY_* */
@@ -10,6 +13,7 @@
#include <asm/cpufeature.h> /* boot_cpu_has, ... */
#include <asm/mmu_context.h> /* vma_pkey() */
+#include <asm/pks.h>
int __execute_only_pkey(struct mm_struct *mm)
{
@@ -301,4 +305,66 @@ void pks_init_task(struct task_struct *task)
task->thread.saved_pkrs = pkrs_init_value;
}
+bool pks_enabled(void)
+{
+ return cpu_feature_enabled(X86_FEATURE_PKS);
+}
+
+/*
+ * Do not call this directly, see pks_mk*() below.
+ *
+ * @pkey: Key for the domain to change
+ * @protection: protection bits to be used
+ *
+ * Protection utilizes the same protection bits specified for User pkeys
+ * PKEY_DISABLE_ACCESS
+ * PKEY_DISABLE_WRITE
+ *
+ */
+static inline void pks_update_protection(int pkey, unsigned long protection)
+{
+ current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs,
+ pkey, protection);
+ pkrs_write_current();
+}
+
+/**
+ * pks_mk_noaccess() - Disable all access to the domain
+ * @pkey the pkey for which the access should change.
+ *
+ * Disable all access to the domain specified by pkey. This is not a global
+ * update and only affects the current running thread.
+ */
+void pks_mk_noaccess(int pkey)
+{
+ pks_update_protection(pkey, PKEY_DISABLE_ACCESS);
+}
+EXPORT_SYMBOL_GPL(pks_mk_noaccess);
+
+/**
+ * pks_mk_readonly() - Make the domain Read only
+ * @pkey the pkey for which the access should change.
+ *
+ * Allow read access to the domain specified by pkey. This is not a global
+ * update and only affects the current running thread.
+ */
+void pks_mk_readonly(int pkey)
+{
+ pks_update_protection(pkey, PKEY_DISABLE_WRITE);
+}
+EXPORT_SYMBOL_GPL(pks_mk_readonly);
+
+/**
+ * pks_mk_readwrite() - Make the domain Read/Write
+ * @pkey the pkey for which the access should change.
+ *
+ * Allow all access, read and write, to the domain specified by pkey. This is
+ * not a global update and only affects the current running thread.
+ */
+void pks_mk_readwrite(int pkey)
+{
+ pks_update_protection(pkey, 0);
+}
+EXPORT_SYMBOL_GPL(pks_mk_readwrite);
+
#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
@@ -1526,6 +1526,10 @@ static inline bool arch_has_pfn_modify_check(void)
# define PAGE_KERNEL_EXEC PAGE_KERNEL
#endif
+#ifndef PAGE_KERNEL_PKEY
+#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL
+#endif
+
/*
* Page Table Modification bits for pgtbl_mod_mask.
*
@@ -56,11 +56,25 @@ extern u32 pkrs_init_value;
void pkrs_save_irq(struct pt_regs *regs);
void pkrs_restore_irq(struct pt_regs *regs);
+bool pks_enabled(void);
+void pks_mk_noaccess(int pkey);
+void pks_mk_readonly(int pkey);
+void pks_mk_readwrite(int pkey);
+
#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
static inline void pkrs_save_irq(struct pt_regs *regs) { }
static inline void pkrs_restore_irq(struct pt_regs *regs) { }
+static inline bool pks_enabled(void)
+{
+ return false;
+}
+
+static inline void pks_mk_noaccess(int pkey) {}
+static inline void pks_mk_readonly(int pkey) {}
+static inline void pks_mk_readwrite(int pkey) {}
+
#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
#endif /* _LINUX_PKEYS_H */