Message ID | 20190617222438.2080-2-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | security: x86/sgx: SGX vs. LSM, round 3 | expand |
On 6/17/19 3:24 PM, Sean Christopherson wrote: > +static int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) > +{ > + struct sgx_encl_mm *encl_mm; > + > + encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL); > + if (!encl_mm) > + return -ENOMEM; > + > + encl_mm->encl = encl; > + encl_mm->mm = mm; > + kref_init(&encl_mm->refcount); > + > + spin_lock(&encl->mm_lock); > + list_add(&encl_mm->list, &encl->mm_list); > + spin_unlock(&encl->mm_lock); > + > + return 0; > +} This probably need commenting and/or changelogging about the mm refcounting or lack thereof.
On Mon, Jun 17, 2019 at 3:24 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > The enclave mm tracking is currently broken: > > - Adding current->mm during ECREATE is wrong as there is no guarantee > that the current process has mmap()'d the enclave, i.e. there may > never be an associated sgx_vma_close() to drop the encl_mm. > > - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called > only when splitting or duplicating a vma. If userspace performs a > single mmap() on the enclave then SGX will fail to track the mm. > This bug is partially hidden by tracking current->mm at ECREATE. > > Rework the tracking to get/add the mm at mmap(). A side effect of the > bug fix is that sgx_vma_{open,close}() should never encounter a vma with > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm > cannot be found in either condition. > It would be nifty if you could also kill .vm_close, since then VMAs could be merged properly. Would this be straightforward? --Andy
On Mon, Jun 17, 2019 at 04:42:59PM -0700, Andy Lutomirski wrote: > On Mon, Jun 17, 2019 at 3:24 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > The enclave mm tracking is currently broken: > > > > - Adding current->mm during ECREATE is wrong as there is no guarantee > > that the current process has mmap()'d the enclave, i.e. there may > > never be an associated sgx_vma_close() to drop the encl_mm. > > > > - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called > > only when splitting or duplicating a vma. If userspace performs a > > single mmap() on the enclave then SGX will fail to track the mm. > > This bug is partially hidden by tracking current->mm at ECREATE. > > > > Rework the tracking to get/add the mm at mmap(). A side effect of the > > bug fix is that sgx_vma_{open,close}() should never encounter a vma with > > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm > > cannot be found in either condition. > > > > It would be nifty if you could also kill .vm_close, since then VMAs > could be merged properly. Would this be straightforward? Hmm, we probably could move the mm tracking to f_op->{open,release}. The downside to that approach is that EPC reclaim would unnecessarily walk the vmas for processes that have opened the enclave but not mapped any EPC pages. In the grand scheme, that's a minor issue and probably worth the tradeoff of vma merging. On the plus side, in addition to zapping ->close, I think it would allow for a simpler vma walking scheme. Maybe.
On Tue, Jun 18, 2019 at 07:11:47AM -0700, Sean Christopherson wrote: > On Mon, Jun 17, 2019 at 04:42:59PM -0700, Andy Lutomirski wrote: > > On Mon, Jun 17, 2019 at 3:24 PM Sean Christopherson > > <sean.j.christopherson@intel.com> wrote: > > > > > > The enclave mm tracking is currently broken: > > > > > > - Adding current->mm during ECREATE is wrong as there is no guarantee > > > that the current process has mmap()'d the enclave, i.e. there may > > > never be an associated sgx_vma_close() to drop the encl_mm. > > > > > > - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called > > > only when splitting or duplicating a vma. If userspace performs a > > > single mmap() on the enclave then SGX will fail to track the mm. > > > This bug is partially hidden by tracking current->mm at ECREATE. > > > > > > Rework the tracking to get/add the mm at mmap(). A side effect of the > > > bug fix is that sgx_vma_{open,close}() should never encounter a vma with > > > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm > > > cannot be found in either condition. > > > > > > > It would be nifty if you could also kill .vm_close, since then VMAs > > could be merged properly. Would this be straightforward? > > Hmm, we probably could move the mm tracking to f_op->{open,release}. The Scratch that thought, pre-coffee brain was thinking there was a magical file op that was called whenever a new reference to the file was acquired. The only idea I can think of to avoid .vm_close() would be to not refcount the mm at all, and instead register a mmu notifier at .mmap() and use mmu_notifier.release() to remove the mm. The downside is that the mm tracking would be sticky, i.e. once a process mmap()'d an enclave it would forever be tracked by the enclave. Practically speaking, in the worst case this would add a small amount of overhead to reclaiming from an enclave that was temporarily mapped by multiple processes. So it might be a viable option? > downside to that approach is that EPC reclaim would unnecessarily walk the > vmas for processes that have opened the enclave but not mapped any EPC > pages. In the grand scheme, that's a minor issue and probably worth the > tradeoff of vma merging. > > On the plus side, in addition to zapping ->close, I think it would allow > for a simpler vma walking scheme. Maybe.
On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote: > The enclave mm tracking is currently broken: > > - Adding current->mm during ECREATE is wrong as there is no guarantee > that the current process has mmap()'d the enclave, i.e. there may > never be an associated sgx_vma_close() to drop the encl_mm. > > - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called > only when splitting or duplicating a vma. If userspace performs a > single mmap() on the enclave then SGX will fail to track the mm. > This bug is partially hidden by tracking current->mm at ECREATE. > > Rework the tracking to get/add the mm at mmap(). A side effect of the > bug fix is that sgx_vma_{open,close}() should never encounter a vma with > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm > cannot be found in either condition. > > Change the WARN() on a non-empty mm_list to a WARN_ONCE(). The warning > will fire over and over (and over) if the mm tracking is broken, which > hampers debug/triage far more than it helps. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ---------- > arch/x86/kernel/cpu/sgx/driver/main.c | 26 ++++++++++++++++++ > arch/x86/kernel/cpu/sgx/encl.c | 38 +++++--------------------- > arch/x86/kernel/cpu/sgx/encl.h | 4 +-- > 4 files changed, 35 insertions(+), 47 deletions(-) BTW, sgx_encl_mm is a bit confusing name (which I made) to begin with. It clues as it was a some kind of 1:1 association with the process, which it isn't. Could you update the patch and rename it as sgx_encl_mapping? After that I'm happy to merge. /Jarkko
On Wed, 2019-06-19 at 15:56 +0300, Jarkko Sakkinen wrote: > On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote: > > The enclave mm tracking is currently broken: > > > > - Adding current->mm during ECREATE is wrong as there is no guarantee > > that the current process has mmap()'d the enclave, i.e. there may > > never be an associated sgx_vma_close() to drop the encl_mm. > > > > - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called > > only when splitting or duplicating a vma. If userspace performs a > > single mmap() on the enclave then SGX will fail to track the mm. > > This bug is partially hidden by tracking current->mm at ECREATE. > > > > Rework the tracking to get/add the mm at mmap(). A side effect of the > > bug fix is that sgx_vma_{open,close}() should never encounter a vma with > > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm > > cannot be found in either condition. > > > > Change the WARN() on a non-empty mm_list to a WARN_ONCE(). The warning > > will fire over and over (and over) if the mm tracking is broken, which > > hampers debug/triage far more than it helps. > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > --- > > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ---------- > > arch/x86/kernel/cpu/sgx/driver/main.c | 26 ++++++++++++++++++ > > arch/x86/kernel/cpu/sgx/encl.c | 38 +++++--------------------- > > arch/x86/kernel/cpu/sgx/encl.h | 4 +-- > > 4 files changed, 35 insertions(+), 47 deletions(-) > > BTW, sgx_encl_mm is a bit confusing name (which I made) to begin with. > It clues as it was a some kind of 1:1 association with the process, > which it isn't. > > Could you update the patch and rename it as sgx_encl_mapping? After that > I'm happy to merge. And send it as a separate patch. Can merge it today/tomorrow after I've finished my reaper change. /Jarkko
On Wed, Jun 19, 2019 at 04:00:14PM +0300, Jarkko Sakkinen wrote: > On Wed, 2019-06-19 at 15:56 +0300, Jarkko Sakkinen wrote: > > On Mon, 2019-06-17 at 15:24 -0700, Sean Christopherson wrote: > > > The enclave mm tracking is currently broken: > > > > > > - Adding current->mm during ECREATE is wrong as there is no guarantee > > > that the current process has mmap()'d the enclave, i.e. there may > > > never be an associated sgx_vma_close() to drop the encl_mm. > > > > > > - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called > > > only when splitting or duplicating a vma. If userspace performs a > > > single mmap() on the enclave then SGX will fail to track the mm. > > > This bug is partially hidden by tracking current->mm at ECREATE. > > > > > > Rework the tracking to get/add the mm at mmap(). A side effect of the > > > bug fix is that sgx_vma_{open,close}() should never encounter a vma with > > > an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm > > > cannot be found in either condition. > > > > > > Change the WARN() on a non-empty mm_list to a WARN_ONCE(). The warning > > > will fire over and over (and over) if the mm tracking is broken, which > > > hampers debug/triage far more than it helps. > > > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > > --- > > > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ---------- > > > arch/x86/kernel/cpu/sgx/driver/main.c | 26 ++++++++++++++++++ > > > arch/x86/kernel/cpu/sgx/encl.c | 38 +++++--------------------- > > > arch/x86/kernel/cpu/sgx/encl.h | 4 +-- > > > 4 files changed, 35 insertions(+), 47 deletions(-) > > > > BTW, sgx_encl_mm is a bit confusing name (which I made) to begin with. > > It clues as it was a some kind of 1:1 association with the process, > > which it isn't. > > > > Could you update the patch and rename it as sgx_encl_mapping? After that > > I'm happy to merge. > > And send it as a separate patch. Can merge it today/tomorrow after I've > finished my reaper change. Still working/testing reaper stuff but this one is now applied. Lets not do any renames for the moment because they only slow down. That was a bad suggestion from my part. The reason I squashed this change is that it is a pure bug fix. If there is a bug, lets try to just fix that with absolutely minimal intrusion. Only after that consider a "better way". /Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index d17c60dca114..3552d642b26f 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -276,7 +276,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) { unsigned long encl_size = secs->size + PAGE_SIZE; struct sgx_epc_page *secs_epc; - struct sgx_encl_mm *encl_mm; unsigned long ssaframesize; struct sgx_pageinfo pginfo; struct sgx_secinfo secinfo; @@ -311,12 +310,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) INIT_WORK(&encl->work, sgx_add_page_worker); - encl_mm = sgx_encl_mm_add(encl, current->mm); - if (IS_ERR(encl_mm)) { - ret = PTR_ERR(encl_mm); - goto err_out; - } - secs_epc = sgx_alloc_page(&encl->secs, true); if (IS_ERR(secs_epc)) { ret = PTR_ERR(secs_epc); @@ -369,13 +362,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->backing = NULL; } - if (!list_empty(&encl->mm_list)) { - encl_mm = list_first_entry(&encl->mm_list, struct sgx_encl_mm, - list); - list_del(&encl_mm->list); - kfree(encl_mm); - } - mutex_unlock(&encl->lock); return ret; } diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index 87735ce8b5ba..03eca4bccee1 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -60,9 +60,35 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, } #endif +static int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) +{ + struct sgx_encl_mm *encl_mm; + + encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL); + if (!encl_mm) + return -ENOMEM; + + encl_mm->encl = encl; + encl_mm->mm = mm; + kref_init(&encl_mm->refcount); + + spin_lock(&encl->mm_lock); + list_add(&encl_mm->list, &encl->mm_list); + spin_unlock(&encl->mm_lock); + + return 0; +} + static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; + int ret; + + if (!sgx_encl_get_mm(encl, vma->vm_mm)) { + ret = sgx_encl_mm_add(encl, vma->vm_mm); + if (ret) + return ret; + } vma->vm_ops = &sgx_vm_ops; vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 6b190eccd02e..6e9f3a41a40d 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -132,26 +132,6 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, return entry; } -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, - struct mm_struct *mm) -{ - struct sgx_encl_mm *encl_mm; - - encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL); - if (!encl_mm) - return ERR_PTR(-ENOMEM); - - encl_mm->encl = encl; - encl_mm->mm = mm; - kref_init(&encl_mm->refcount); - - spin_lock(&encl->mm_lock); - list_add(&encl_mm->list, &encl->mm_list); - spin_unlock(&encl->mm_lock); - - return encl_mm; -} - void sgx_encl_mm_release(struct kref *ref) { struct sgx_encl_mm *encl_mm = @@ -164,8 +144,8 @@ void sgx_encl_mm_release(struct kref *ref) kfree(encl_mm); } -static struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl, - struct mm_struct *mm) +struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl, + struct mm_struct *mm) { struct sgx_encl_mm *encl_mm = NULL; struct sgx_encl_mm *prev_mm = NULL; @@ -190,10 +170,10 @@ static struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl, return NULL; } + static void sgx_vma_open(struct vm_area_struct *vma) { struct sgx_encl *encl = vma->vm_private_data; - struct sgx_encl_mm *encl_mm; if (!encl) return; @@ -201,12 +181,8 @@ static void sgx_vma_open(struct vm_area_struct *vma) if (encl->flags & SGX_ENCL_DEAD) goto error; - encl_mm = sgx_encl_get_mm(encl, vma->vm_mm); - if (!encl_mm) { - encl_mm = sgx_encl_mm_add(encl, vma->vm_mm); - if (IS_ERR(encl_mm)) - goto error; - } + if (WARN_ON_ONCE(!sgx_encl_get_mm(encl, vma->vm_mm))) + goto error; kref_get(&encl->refcount); return; @@ -224,7 +200,7 @@ static void sgx_vma_close(struct vm_area_struct *vma) return; encl_mm = sgx_encl_get_mm(encl, vma->vm_mm); - if (encl_mm) { + if (!WARN_ON_ONCE(!encl_mm)) { kref_put(&encl_mm->refcount, sgx_encl_mm_release); /* Release kref for the VMA. */ @@ -468,7 +444,7 @@ void sgx_encl_release(struct kref *ref) if (encl->backing) fput(encl->backing); - WARN(!list_empty(&encl->mm_list), "sgx: mm_list non-empty"); + WARN_ONCE(!list_empty(&encl->mm_list), "sgx: mm_list non-empty"); kfree(encl); } diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index c557f0374d74..7b339334d875 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -120,9 +120,9 @@ pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page); struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index); struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, struct sgx_encl_mm *encl_mm, int *iter); -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, - struct mm_struct *mm); void sgx_encl_mm_release(struct kref *ref); +struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl, + struct mm_struct *mm); int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page); struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl,
The enclave mm tracking is currently broken: - Adding current->mm during ECREATE is wrong as there is no guarantee that the current process has mmap()'d the enclave, i.e. there may never be an associated sgx_vma_close() to drop the encl_mm. - Adding mm's at sgx_vma_open() is wrong as vm_ops->open is called only when splitting or duplicating a vma. If userspace performs a single mmap() on the enclave then SGX will fail to track the mm. This bug is partially hidden by tracking current->mm at ECREATE. Rework the tracking to get/add the mm at mmap(). A side effect of the bug fix is that sgx_vma_{open,close}() should never encounter a vma with an associated enclave and no associated encl_mm, i.e. WARN if an encl_mm cannot be found in either condition. Change the WARN() on a non-empty mm_list to a WARN_ONCE(). The warning will fire over and over (and over) if the mm tracking is broken, which hampers debug/triage far more than it helps. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 ---------- arch/x86/kernel/cpu/sgx/driver/main.c | 26 ++++++++++++++++++ arch/x86/kernel/cpu/sgx/encl.c | 38 +++++--------------------- arch/x86/kernel/cpu/sgx/encl.h | 4 +-- 4 files changed, 35 insertions(+), 47 deletions(-)