Message ID | 20201005141119.5395-1-jarkko.sakkinen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] x86/sgx: Fix sgx_encl_may_map locking | expand |
On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote: > Fix the issue further discussed in: No, this is still utter crap. Just use the version I sent. > 1. https://lore.kernel.org/linux-sgx/op.0rwbv916wjvjmi@mqcpg7oapc828.gar.corp.intel.com/ > 2. https://lore.kernel.org/linux-sgx/20201003195440.GD20115@casper.infradead.org/ > > Reported-by: Haitao Huang <haitao.huang@linux.intel.com> > Suggested-by: Matthew Wilcox <willy@infradead.org> > Cc: Sean Christopherson <sean.j.christopherson@intel.com> > Cc: Jethro Beekman <jethro@fortanix.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> > --- > v3: > * Added the missing unlock pointed out by Matthew. > * Tested with the correct patch (last time had v1 applied) > * I don't know what happened to v2 changelog, checked from patchwork > and it wasn't there. Hope this is not scraped. > arch/x86/kernel/cpu/sgx/encl.c | 29 +++++++++++++++++++++++++---- > 1 file changed, 25 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > index 4c6407cd857a..e91e521b03a8 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -307,6 +307,8 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, > unsigned long idx_start = PFN_DOWN(start); > unsigned long idx_end = PFN_DOWN(end - 1); > struct sgx_encl_page *page; > + unsigned long count = 0; > + int ret = 0; > > XA_STATE(xas, &encl->page_array, idx_start); > > @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, > if (current->personality & READ_IMPLIES_EXEC) > return -EACCES; > > - xas_for_each(&xas, page, idx_end) > - if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) > - return -EACCES; > + /* > + * No need to hold encl->lock: > + * 1. None of the page->* get written. > + * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This > + * is before calling xa_insert(). After that it is never modified. > + */ > + xas_lock(&xas); > + xas_for_each(&xas, page, idx_end) { > + if (++count % XA_CHECK_SCHED) > + continue; > > - return 0; > + if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) { > + ret = -EACCES; > + break; > + } > + > + xas_pause(&xas); > + xas_unlock(&xas); > + cond_resched(); > + xas_lock(&xas); > + } > + xas_unlock(&xas); > + > + return ret; > } > > static int sgx_vma_mprotect(struct vm_area_struct *vma, > -- > 2.25.1 >
On 10/5/20 7:11 AM, Jarkko Sakkinen wrote: > + unsigned long count = 0; ... > + xas_lock(&xas); > + xas_for_each(&xas, page, idx_end) { > + if (++count % XA_CHECK_SCHED) > + continue; Let's slow down and think through the loop, please. First time through the loop, count=0, XA_CHECK_SCHED=4096, it will do a ++count, and now count=1. (count % XA_CHECK_SCHED) == 1. It will continue. It skips the page->vm_max_prot_bits checks. Next time through the loop, count=1, XA_CHECK_SCHED=4096, it will do a ++count, and now count=2. (count % XA_CHECK_SCHED) == 2. It will continue. It skips the page->vm_max_prot_bits checks. ... It will do this until it hits count=4095 where it will actually fall into the rest of the loop, doing the page->vm_max_prot_bits checks. So, in the end the loop only does what it's supposed to be doing 1/4096 times. Not great. Don't we have tests that will notice breakage like this? The XA_CHECK_SCHED needs to be stuck near the *end* of the loop, just before the lock dropping and resched stuff.
On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote: > @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, > if (current->personality & READ_IMPLIES_EXEC) > return -EACCES; > > - xas_for_each(&xas, page, idx_end) > - if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) > - return -EACCES; > + /* > + * No need to hold encl->lock: > + * 1. None of the page->* get written. > + * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This > + * is before calling xa_insert(). After that it is never modified. > + */ You forgot to cover racing with insertion, e.g. below is the snippet from my original patch[*], which did the lookup without protection from encl->lock.` + /* + * No need to take encl->lock, vm_prot_bits is set prior to + * insertion and never changes, and racing with adding pages is + * a userspace bug. + */ + rcu_read_lock(); + page = radix_tree_lookup(&encl->page_tree, idx); + rcu_read_unlock(); [*]https://patchwork.kernel.org/patch/11005431/
On Mon, Oct 05, 2020 at 03:12:54PM +0100, Matthew Wilcox wrote: > On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote: > > Fix the issue further discussed in: > > No, this is still utter crap. Just use the version I sent. OK, just a quick recap to fully understand it. Here's your response from the original thread: https://lore.kernel.org/linux-sgx/20201005013053.GJ20115@casper.infradead.org/ And here's your snippet from v2: https://lore.kernel.org/linux-sgx/20201005111139.GK20115@casper.infradead.org/ This is what confused me. Compared to (v3) patch and your version, the differences I spot are: A. 'count' is checked first. B. Both this and the snippet from the original thread (i.e. page-writeback.c) use the same loop construct. Right, and snippet from page-writeback.c is not compatible with this because looking at documentation xas_find() continued with xas_next_entry() sequence will jump through the existing entries and skip the holes? On the other hand, xas_next() does not. Of the A part I'm not sure how the order there semantically matter. /Jarkko
On Mon, Oct 05, 2020 at 08:25:39PM +0300, Jarkko Sakkinen wrote: > On Mon, Oct 05, 2020 at 03:12:54PM +0100, Matthew Wilcox wrote: > > On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote: > > > Fix the issue further discussed in: > > > > No, this is still utter crap. Just use the version I sent. > > OK, just a quick recap to fully understand it. > > Here's your response from the original thread: > > https://lore.kernel.org/linux-sgx/20201005013053.GJ20115@casper.infradead.org/ > > And here's your snippet from v2: > > https://lore.kernel.org/linux-sgx/20201005111139.GK20115@casper.infradead.org/ > > This is what confused me. > > Compared to (v3) patch and your version, the differences I spot are: > > A. 'count' is checked first. > B. Both this and the snippet from the original thread (i.e. > page-writeback.c) use the same loop construct. > > Right, and snippet from page-writeback.c is not compatible with this > because looking at documentation xas_find() continued with > xas_next_entry() sequence will jump through the existing entries and > skip the holes? On the other hand, xas_next() does not. > > Of the A part I'm not sure how the order there semantically matter. Right, how blind I was with B. Read Dave's response. /Jarkko
On Mon, Oct 05, 2020 at 07:28:38AM -0700, Dave Hansen wrote: > On 10/5/20 7:11 AM, Jarkko Sakkinen wrote: > > + unsigned long count = 0; > ... > > + xas_lock(&xas); > > + xas_for_each(&xas, page, idx_end) { > > + if (++count % XA_CHECK_SCHED) > > + continue; > > Let's slow down and think through the loop, please. > > First time through the loop, count=0, XA_CHECK_SCHED=4096, it will do a > ++count, and now count=1. (count % XA_CHECK_SCHED) == 1. It will > continue. It skips the page->vm_max_prot_bits checks. > > Next time through the loop, count=1, XA_CHECK_SCHED=4096, it will do a > ++count, and now count=2. (count % XA_CHECK_SCHED) == 2. It will > continue. It skips the page->vm_max_prot_bits checks. > > ... > > It will do this until it hits count=4095 where it will actually fall > into the rest of the loop, doing the page->vm_max_prot_bits checks. Uh oh, how stupid of me. > So, in the end the loop only does what it's supposed to be doing 1/4096 > times. Not great. Don't we have tests that will notice breakage like this? It would not be hard to have two counts and WARN_ON in the end to compare for mismatch. > The XA_CHECK_SCHED needs to be stuck near the *end* of the loop, just > before the lock dropping and resched stuff. Referring to Matthew's suggestion: https://lore.kernel.org/linux-sgx/20201005111139.GK20115@casper.infradead.org/ I think that the order is just right. It takes the page, does the processing, and in the tail does check before rescheduling. The only thing I'd do for clarity would be to change the tail in that like this: /* Move to the next iteration. */ count++; /* Surpassed the modulus, reschedule. */ if (!(count % XA_CHECK_SCHED)) { xas_pause(&xas); xas_unlock(&xas); cond_resched(); xas_lock(&xas); } } xas_unlock(&xas); I think this is just a bit more readable way to express the same thing. Matthew, can you agree with this small twist? /Jarkko
On Mon, Oct 05, 2020 at 08:55:19AM -0700, Sean Christopherson wrote: > On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote: > > @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, > > if (current->personality & READ_IMPLIES_EXEC) > > return -EACCES; > > > > - xas_for_each(&xas, page, idx_end) > > - if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) > > - return -EACCES; > > + /* > > + * No need to hold encl->lock: > > + * 1. None of the page->* get written. > > + * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This > > + * is before calling xa_insert(). After that it is never modified. > > + */ > > You forgot to cover racing with insertion, e.g. below is the snippet from my > original patch[*], which did the lookup without protection from encl->lock.` > > + /* > + * No need to take encl->lock, vm_prot_bits is set prior to > + * insertion and never changes, and racing with adding pages is > + * a userspace bug. > + */ > + rcu_read_lock(); > + page = radix_tree_lookup(&encl->page_tree, idx); > + rcu_read_unlock(); > > > [*]https://patchwork.kernel.org/patch/11005431/ I'm not sure why that was merged as it was but it probably was not because of that snippet. It had encl->lock before, so it was by all practical means covered then. I would have replaced encl->lock with that if I ever had received a patch with just that, i.e. that particular snippet gone to noise. That's why it was not broken before v36. /Jarkko
On Mon, Oct 05, 2020 at 08:42:07PM +0300, Jarkko Sakkinen wrote: > On Mon, Oct 05, 2020 at 08:55:19AM -0700, Sean Christopherson wrote: > > On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote: > > > @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, > > > if (current->personality & READ_IMPLIES_EXEC) > > > return -EACCES; > > > > > > - xas_for_each(&xas, page, idx_end) > > > - if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) > > > - return -EACCES; > > > + /* > > > + * No need to hold encl->lock: > > > + * 1. None of the page->* get written. > > > + * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This > > > + * is before calling xa_insert(). After that it is never modified. > > > + */ > > > > You forgot to cover racing with insertion, e.g. below is the snippet from my > > original patch[*], which did the lookup without protection from encl->lock.` > > > > + /* > > + * No need to take encl->lock, vm_prot_bits is set prior to > > + * insertion and never changes, and racing with adding pages is > > + * a userspace bug. > > + */ > > + rcu_read_lock(); > > + page = radix_tree_lookup(&encl->page_tree, idx); > > + rcu_read_unlock(); > > > > > > [*]https://patchwork.kernel.org/patch/11005431/ > > I'm not sure why that was merged as it was but it probably was not > because of that snippet. It had encl->lock before, so it was by all > practical means covered then. I would have replaced encl->lock with that > if I ever had received a patch with just that, i.e. that particular > snippet gone to noise. That's why it was not broken before v36. Or even if that patch was squashed (can't recall that far), it was not intentionally not replaced. I have no idea why encl->lock was not replaced but the code was not broken. /Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 4c6407cd857a..e91e521b03a8 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -307,6 +307,8 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, unsigned long idx_start = PFN_DOWN(start); unsigned long idx_end = PFN_DOWN(end - 1); struct sgx_encl_page *page; + unsigned long count = 0; + int ret = 0; XA_STATE(xas, &encl->page_array, idx_start); @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, if (current->personality & READ_IMPLIES_EXEC) return -EACCES; - xas_for_each(&xas, page, idx_end) - if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) - return -EACCES; + /* + * No need to hold encl->lock: + * 1. None of the page->* get written. + * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This + * is before calling xa_insert(). After that it is never modified. + */ + xas_lock(&xas); + xas_for_each(&xas, page, idx_end) { + if (++count % XA_CHECK_SCHED) + continue; - return 0; + if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) { + ret = -EACCES; + break; + } + + xas_pause(&xas); + xas_unlock(&xas); + cond_resched(); + xas_lock(&xas); + } + xas_unlock(&xas); + + return ret; } static int sgx_vma_mprotect(struct vm_area_struct *vma,
Fix the issue further discussed in: 1. https://lore.kernel.org/linux-sgx/op.0rwbv916wjvjmi@mqcpg7oapc828.gar.corp.intel.com/ 2. https://lore.kernel.org/linux-sgx/20201003195440.GD20115@casper.infradead.org/ Reported-by: Haitao Huang <haitao.huang@linux.intel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Jethro Beekman <jethro@fortanix.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> --- v3: * Added the missing unlock pointed out by Matthew. * Tested with the correct patch (last time had v1 applied) * I don't know what happened to v2 changelog, checked from patchwork and it wasn't there. Hope this is not scraped. arch/x86/kernel/cpu/sgx/encl.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-)