Message ID | 20191104200141.5385-1-jarkko.sakkinen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for,v24,1/3] x86/sgx: Use GFP_KERNEL for allocations | expand |
On Mon, Nov 04, 2019 at 10:01:39PM +0200, Jarkko Sakkinen wrote: > The reasoning is the same as in > > http://git.infradead.org/users/jjs/linux-tpmdd.git/commit/abd55954f91a3aacc1d260d2411cf776ec4d5fd2 > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > --- > arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > index 5b28a9c0cb68..d53aee5a64c1 100644 > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) > if (copy_from_user(&ecreate, arg, sizeof(ecreate))) > return -EFAULT; > > - secs_page = alloc_page(GFP_HIGHUSER); > + secs_page = alloc_page(GFP_KERNEL); > if (!secs_page) > return -ENOMEM; > > @@ -674,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) > if (copy_from_user(&einit, arg, sizeof(einit))) > return -EFAULT; > > - initp_page = alloc_page(GFP_HIGHUSER); > + initp_page = alloc_page(GFP_KERNEL); Would it make sense to use GFP_KERNEL_ACCOUNT? The accounting would be weird for the case where userspace is using a builder process, but even in that case it's not flat out wrong to account per-enclave memory allocations. > if (!initp_page) > return -ENOMEM; > > -- > 2.20.1 >
On Mon, Nov 04, 2019 at 12:46:02PM -0800, Sean Christopherson wrote: > On Mon, Nov 04, 2019 at 10:01:39PM +0200, Jarkko Sakkinen wrote: > > The reasoning is the same as in > > > > http://git.infradead.org/users/jjs/linux-tpmdd.git/commit/abd55954f91a3aacc1d260d2411cf776ec4d5fd2 > > > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > --- > > arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > > index 5b28a9c0cb68..d53aee5a64c1 100644 > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > > @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) > > if (copy_from_user(&ecreate, arg, sizeof(ecreate))) > > return -EFAULT; > > > > - secs_page = alloc_page(GFP_HIGHUSER); > > + secs_page = alloc_page(GFP_KERNEL); > > if (!secs_page) > > return -ENOMEM; > > > > @@ -674,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) > > if (copy_from_user(&einit, arg, sizeof(einit))) > > return -EFAULT; > > > > - initp_page = alloc_page(GFP_HIGHUSER); > > + initp_page = alloc_page(GFP_KERNEL); > > Would it make sense to use GFP_KERNEL_ACCOUNT? The accounting would be > weird for the case where userspace is using a builder process, but even in > that case it's not flat out wrong to account per-enclave memory allocations. I did not find a single call site that would use that for allocating memory for function-internal data. /Jarkko
On Tue, Nov 05, 2019 at 12:26:58AM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 04, 2019 at 12:46:02PM -0800, Sean Christopherson wrote: > > On Mon, Nov 04, 2019 at 10:01:39PM +0200, Jarkko Sakkinen wrote: > > > The reasoning is the same as in > > > > > > http://git.infradead.org/users/jjs/linux-tpmdd.git/commit/abd55954f91a3aacc1d260d2411cf776ec4d5fd2 > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > > --- > > > arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > > > index 5b28a9c0cb68..d53aee5a64c1 100644 > > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > > > @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) > > > if (copy_from_user(&ecreate, arg, sizeof(ecreate))) > > > return -EFAULT; > > > > > > - secs_page = alloc_page(GFP_HIGHUSER); > > > + secs_page = alloc_page(GFP_KERNEL); > > > if (!secs_page) > > > return -ENOMEM; > > > > > > @@ -674,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) > > > if (copy_from_user(&einit, arg, sizeof(einit))) > > > return -EFAULT; > > > > > > - initp_page = alloc_page(GFP_HIGHUSER); > > > + initp_page = alloc_page(GFP_KERNEL); > > > > Would it make sense to use GFP_KERNEL_ACCOUNT? The accounting would be > > weird for the case where userspace is using a builder process, but even in > > that case it's not flat out wrong to account per-enclave memory allocations. > > I did not find a single call site that would use that for allocating > memory for function-internal data. Actually, the fact that the allocations are transient is an even better argument for accounting the memory, as the weirdness I was referring to doesn't exist for the builder concept. But looking more closely, Documentation/core-api/memory-allocation.rst states: * Untrusted allocations triggered from userspace should be a subject of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` allocations that should be accounted. That means all uses of GFP_KERNEL except in sgx_alloc_epc_section() should be converted to GFP_KERNEL_ACCOUNTED. As is, depending on fd limits[*], a single process can easily burn through multiple GBs of memory simply by opening /dev/sgx/enclave in a loop. [*] AFAICT, systemd is upping the max number of open files to 1M on my systems. I don't _think_ I changed a setting anywhere?
On Mon, Nov 04, 2019 at 06:17:20PM -0800, Sean Christopherson wrote: > On Tue, Nov 05, 2019 at 12:26:58AM +0200, Jarkko Sakkinen wrote: > > On Mon, Nov 04, 2019 at 12:46:02PM -0800, Sean Christopherson wrote: > > > On Mon, Nov 04, 2019 at 10:01:39PM +0200, Jarkko Sakkinen wrote: > > > > The reasoning is the same as in > > > > > > > > http://git.infradead.org/users/jjs/linux-tpmdd.git/commit/abd55954f91a3aacc1d260d2411cf776ec4d5fd2 > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > > > --- > > > > arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > > > > index 5b28a9c0cb68..d53aee5a64c1 100644 > > > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > > > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > > > > @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) > > > > if (copy_from_user(&ecreate, arg, sizeof(ecreate))) > > > > return -EFAULT; > > > > > > > > - secs_page = alloc_page(GFP_HIGHUSER); > > > > + secs_page = alloc_page(GFP_KERNEL); > > > > if (!secs_page) > > > > return -ENOMEM; > > > > > > > > @@ -674,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) > > > > if (copy_from_user(&einit, arg, sizeof(einit))) > > > > return -EFAULT; > > > > > > > > - initp_page = alloc_page(GFP_HIGHUSER); > > > > + initp_page = alloc_page(GFP_KERNEL); > > > > > > Would it make sense to use GFP_KERNEL_ACCOUNT? The accounting would be > > > weird for the case where userspace is using a builder process, but even in > > > that case it's not flat out wrong to account per-enclave memory allocations. > > > > I did not find a single call site that would use that for allocating > > memory for function-internal data. > > Actually, the fact that the allocations are transient is an even better > argument for accounting the memory, as the weirdness I was referring to > doesn't exist for the builder concept. > > But looking more closely, Documentation/core-api/memory-allocation.rst > states: > > * Untrusted allocations triggered from userspace should be a subject > of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There > is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` > allocations that should be accounted. > > That means all uses of GFP_KERNEL except in sgx_alloc_epc_section() should > be converted to GFP_KERNEL_ACCOUNTED. As is, depending on fd limits[*], a > single process can easily burn through multiple GBs of memory simply by > opening /dev/sgx/enclave in a loop. What does the documentation mean by untrusted allocaton? __GFP_ACCOUNT kernel and GFP_KERNEL_ACCOUNT are both quite alien flags to me as is kmemcg. Things that I know that exist but have never had to deal with them. Looking at the kernel source code they rarely get used. Many drivers have process bound data structures but none of the drivers use these flags. I'm wondering why. Why sgx_alloc_epc_section() is a use case given that it is something that allocates memory for the global EPC database? > [*] AFAICT, systemd is upping the max number of open files to 1M on my > systems. I don't _think_ I changed a setting anywhere? /Jarkko
On Wed, Nov 06, 2019 at 11:54:38PM +0200, Jarkko Sakkinen wrote: > On Mon, Nov 04, 2019 at 06:17:20PM -0800, Sean Christopherson wrote: > > On Tue, Nov 05, 2019 at 12:26:58AM +0200, Jarkko Sakkinen wrote: > > > On Mon, Nov 04, 2019 at 12:46:02PM -0800, Sean Christopherson wrote: > > > > On Mon, Nov 04, 2019 at 10:01:39PM +0200, Jarkko Sakkinen wrote: > > > > > The reasoning is the same as in > > > > > > > > > > http://git.infradead.org/users/jjs/linux-tpmdd.git/commit/abd55954f91a3aacc1d260d2411cf776ec4d5fd2 > > > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > > > > --- > > > > > arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > > > > > index 5b28a9c0cb68..d53aee5a64c1 100644 > > > > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > > > > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > > > > > @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) > > > > > if (copy_from_user(&ecreate, arg, sizeof(ecreate))) > > > > > return -EFAULT; > > > > > > > > > > - secs_page = alloc_page(GFP_HIGHUSER); > > > > > + secs_page = alloc_page(GFP_KERNEL); > > > > > if (!secs_page) > > > > > return -ENOMEM; > > > > > > > > > > @@ -674,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) > > > > > if (copy_from_user(&einit, arg, sizeof(einit))) > > > > > return -EFAULT; > > > > > > > > > > - initp_page = alloc_page(GFP_HIGHUSER); > > > > > + initp_page = alloc_page(GFP_KERNEL); > > > > > > > > Would it make sense to use GFP_KERNEL_ACCOUNT? The accounting would be > > > > weird for the case where userspace is using a builder process, but even in > > > > that case it's not flat out wrong to account per-enclave memory allocations. > > > > > > I did not find a single call site that would use that for allocating > > > memory for function-internal data. > > > > Actually, the fact that the allocations are transient is an even better > > argument for accounting the memory, as the weirdness I was referring to > > doesn't exist for the builder concept. > > > > But looking more closely, Documentation/core-api/memory-allocation.rst > > states: > > > > * Untrusted allocations triggered from userspace should be a subject > > of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There > > is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` > > allocations that should be accounted. > > > > That means all uses of GFP_KERNEL except in sgx_alloc_epc_section() should > > be converted to GFP_KERNEL_ACCOUNTED. As is, depending on fd limits[*], a > > single process can easily burn through multiple GBs of memory simply by > > opening /dev/sgx/enclave in a loop. > > What does the documentation mean by untrusted allocaton? > > __GFP_ACCOUNT kernel and GFP_KERNEL_ACCOUNT are both quite alien flags > to me as is kmemcg. Things that I know that exist but have never had to > deal with them. > > Looking at the kernel source code they rarely get used. Many drivers > have process bound data structures but none of the drivers use these > flags. I'm wondering why. > > Why sgx_alloc_epc_section() is a use case given that it is something > that allocates memory for the global EPC database? > > > [*] AFAICT, systemd is upping the max number of open files to 1M on my > > systems. I don't _think_ I changed a setting anywhere? Anyway, the tree is now updated: diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 5b82670bb79a..d53aee5a64c1 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) if (copy_from_user(&ecreate, arg, sizeof(ecreate))) return -EFAULT; - secs_page = alloc_page(GFP_HIGHUSER); + secs_page = alloc_page(GFP_KERNEL); if (!secs_page) return -ENOMEM; @@ -427,12 +427,20 @@ static int sgx_encl_add_page(struct sgx_encl *encl, if (addp->flags & SGX_PAGE_MEASURE) { ret = __sgx_encl_extend(encl, epc_page); - if (ret) + + /* + * Destroy the enclave if EEXTEND fails, EADD can't be undone. + * Note, destroy() also frees the resources for the added page. + */ + if (ret) { sgx_encl_destroy(encl); - else - sgx_mark_page_reclaimable(encl_page->epc_page); + goto out_unlock; + } } + sgx_mark_page_reclaimable(encl_page->epc_page); + +out_unlock: mutex_unlock(&encl->lock); up_read(¤t->mm->mmap_sem); return ret; @@ -666,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) if (copy_from_user(&einit, arg, sizeof(einit))) return -EFAULT; - initp_page = alloc_page(GFP_HIGHUSER); + initp_page = alloc_page(GFP_KERNEL); if (!initp_page) return -ENOMEM; Hope that all the updates will be fairly localized :-) /Jarkko
On Wed, Nov 06, 2019 at 11:59:42PM +0200, Jarkko Sakkinen wrote: > On Wed, Nov 06, 2019 at 11:54:38PM +0200, Jarkko Sakkinen wrote: > > On Mon, Nov 04, 2019 at 06:17:20PM -0800, Sean Christopherson wrote: > > > On Tue, Nov 05, 2019 at 12:26:58AM +0200, Jarkko Sakkinen wrote: > > > > On Mon, Nov 04, 2019 at 12:46:02PM -0800, Sean Christopherson wrote: > > > > > On Mon, Nov 04, 2019 at 10:01:39PM +0200, Jarkko Sakkinen wrote: > > > > > > The reasoning is the same as in > > > > > > > > > > > > http://git.infradead.org/users/jjs/linux-tpmdd.git/commit/abd55954f91a3aacc1d260d2411cf776ec4d5fd2 > > > > > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > > > > > > --- > > > > > > arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- > > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > > > > > > index 5b28a9c0cb68..d53aee5a64c1 100644 > > > > > > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > > > > > > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > > > > > > @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) > > > > > > if (copy_from_user(&ecreate, arg, sizeof(ecreate))) > > > > > > return -EFAULT; > > > > > > > > > > > > - secs_page = alloc_page(GFP_HIGHUSER); > > > > > > + secs_page = alloc_page(GFP_KERNEL); > > > > > > if (!secs_page) > > > > > > return -ENOMEM; > > > > > > > > > > > > @@ -674,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) > > > > > > if (copy_from_user(&einit, arg, sizeof(einit))) > > > > > > return -EFAULT; > > > > > > > > > > > > - initp_page = alloc_page(GFP_HIGHUSER); > > > > > > + initp_page = alloc_page(GFP_KERNEL); > > > > > > > > > > Would it make sense to use GFP_KERNEL_ACCOUNT? The accounting would be > > > > > weird for the case where userspace is using a builder process, but even in > > > > > that case it's not flat out wrong to account per-enclave memory allocations. > > > > > > > > I did not find a single call site that would use that for allocating > > > > memory for function-internal data. > > > > > > Actually, the fact that the allocations are transient is an even better > > > argument for accounting the memory, as the weirdness I was referring to > > > doesn't exist for the builder concept. > > > > > > But looking more closely, Documentation/core-api/memory-allocation.rst > > > states: > > > > > > * Untrusted allocations triggered from userspace should be a subject > > > of kmem accounting and must have ``__GFP_ACCOUNT`` bit set. There > > > is the handy ``GFP_KERNEL_ACCOUNT`` shortcut for ``GFP_KERNEL`` > > > allocations that should be accounted. > > > > > > That means all uses of GFP_KERNEL except in sgx_alloc_epc_section() should > > > be converted to GFP_KERNEL_ACCOUNTED. As is, depending on fd limits[*], a > > > single process can easily burn through multiple GBs of memory simply by > > > opening /dev/sgx/enclave in a loop. > > > > What does the documentation mean by untrusted allocaton? > > > > __GFP_ACCOUNT kernel and GFP_KERNEL_ACCOUNT are both quite alien flags > > to me as is kmemcg. Things that I know that exist but have never had to > > deal with them. > > > > Looking at the kernel source code they rarely get used. Many drivers > > have process bound data structures but none of the drivers use these > > flags. I'm wondering why. > > > > Why sgx_alloc_epc_section() is a use case given that it is something > > that allocates memory for the global EPC database? > > > > > [*] AFAICT, systemd is upping the max number of open files to 1M on my > > > systems. I don't _think_ I changed a setting anywhere? > > Anyway, the tree is now updated: > > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c > index 5b82670bb79a..d53aee5a64c1 100644 > --- a/arch/x86/kernel/cpu/sgx/ioctl.c > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c > @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) > if (copy_from_user(&ecreate, arg, sizeof(ecreate))) > return -EFAULT; > > - secs_page = alloc_page(GFP_HIGHUSER); > + secs_page = alloc_page(GFP_KERNEL); > if (!secs_page) > return -ENOMEM; > > @@ -427,12 +427,20 @@ static int sgx_encl_add_page(struct sgx_encl *encl, > > if (addp->flags & SGX_PAGE_MEASURE) { > ret = __sgx_encl_extend(encl, epc_page); > - if (ret) > + > + /* > + * Destroy the enclave if EEXTEND fails, EADD can't be undone. > + * Note, destroy() also frees the resources for the added page. > + */ > + if (ret) { > sgx_encl_destroy(encl); > - else > - sgx_mark_page_reclaimable(encl_page->epc_page); > + goto out_unlock; > + } > } > > + sgx_mark_page_reclaimable(encl_page->epc_page); > + > +out_unlock: > mutex_unlock(&encl->lock); > up_read(¤t->mm->mmap_sem); > return ret; > @@ -666,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) > if (copy_from_user(&einit, arg, sizeof(einit))) > return -EFAULT; > > - initp_page = alloc_page(GFP_HIGHUSER); > + initp_page = alloc_page(GFP_KERNEL); > if (!initp_page) > return -ENOMEM; > > Hope that all the updates will be fairly localized :-) Also removed some patches on top that were pushed by accident (patches under review). /Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 5b28a9c0cb68..d53aee5a64c1 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -259,7 +259,7 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg) if (copy_from_user(&ecreate, arg, sizeof(ecreate))) return -EFAULT; - secs_page = alloc_page(GFP_HIGHUSER); + secs_page = alloc_page(GFP_KERNEL); if (!secs_page) return -ENOMEM; @@ -674,7 +674,7 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) if (copy_from_user(&einit, arg, sizeof(einit))) return -EFAULT; - initp_page = alloc_page(GFP_HIGHUSER); + initp_page = alloc_page(GFP_KERNEL); if (!initp_page) return -ENOMEM;
The reasoning is the same as in http://git.infradead.org/users/jjs/linux-tpmdd.git/commit/abd55954f91a3aacc1d260d2411cf776ec4d5fd2 Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> --- arch/x86/kernel/cpu/sgx/ioctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)