[v3] x86/sgx: Fix sgx_encl_may_map locking

Message ID	20201005141119.5395-1-jarkko.sakkinen@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=E1/z=DM=vger.kernel.org=linux-sgx-owner@kernel.org> From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> To: linux-sgx@vger.kernel.org Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>, Haitao Huang <haitao.huang@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, Sean Christopherson <sean.j.christopherson@intel.com>, Jethro Beekman <jethro@fortanix.com>, Dave Hansen <dave.hansen@linux.intel.com> Subject: [PATCH v3] x86/sgx: Fix sgx_encl_may_map locking Date: Mon, 5 Oct 2020 17:11:19 +0300 Message-Id: <20201005141119.5395-1-jarkko.sakkinen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v3] x86/sgx: Fix sgx_encl_may_map locking \| expand [v3] x86/sgx: Fix sgx_encl_may_map locking

Jarkko Sakkinen Oct. 5, 2020, 2:11 p.m. UTC

Fix the issue further discussed in:

1. https://lore.kernel.org/linux-sgx/op.0rwbv916wjvjmi@mqcpg7oapc828.gar.corp.intel.com/
2. https://lore.kernel.org/linux-sgx/20201003195440.GD20115@casper.infradead.org/

Reported-by: Haitao Huang <haitao.huang@linux.intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Jethro Beekman <jethro@fortanix.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
v3:
* Added the missing unlock pointed out by Matthew.
* Tested with the correct patch (last time had v1 applied)
* I don't know what happened to v2 changelog, checked from patchwork
  and it wasn't there. Hope this is not scraped.
 arch/x86/kernel/cpu/sgx/encl.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

Matthew Wilcox (Oracle) Oct. 5, 2020, 2:12 p.m. UTC | #1

On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote:
> Fix the issue further discussed in:

No, this is still utter crap.  Just use the version I sent.

> 1. https://lore.kernel.org/linux-sgx/op.0rwbv916wjvjmi@mqcpg7oapc828.gar.corp.intel.com/
> 2. https://lore.kernel.org/linux-sgx/20201003195440.GD20115@casper.infradead.org/
> 
> Reported-by: Haitao Huang <haitao.huang@linux.intel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Jethro Beekman <jethro@fortanix.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> ---
> v3:
> * Added the missing unlock pointed out by Matthew.
> * Tested with the correct patch (last time had v1 applied)
> * I don't know what happened to v2 changelog, checked from patchwork
>   and it wasn't there. Hope this is not scraped.
>  arch/x86/kernel/cpu/sgx/encl.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 4c6407cd857a..e91e521b03a8 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -307,6 +307,8 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>  	unsigned long idx_start = PFN_DOWN(start);
>  	unsigned long idx_end = PFN_DOWN(end - 1);
>  	struct sgx_encl_page *page;
> +	unsigned long count = 0;
> +	int ret = 0;
>  
>  	XA_STATE(xas, &encl->page_array, idx_start);
>  
> @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>  	if (current->personality & READ_IMPLIES_EXEC)
>  		return -EACCES;
>  
> -	xas_for_each(&xas, page, idx_end)
> -		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> -			return -EACCES;
> +	/*
> +	 * No need to hold encl->lock:
> +	 * 1. None of the page->* get written.
> +	 * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This
> +	 *    is before calling xa_insert(). After that it is never modified.
> +	 */
> +	xas_lock(&xas);
> +	xas_for_each(&xas, page, idx_end) {
> +		if (++count % XA_CHECK_SCHED)
> +			continue;
>  
> -	return 0;
> +		if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) {
> +			ret = -EACCES;
> +			break;
> +		}
> +
> +		xas_pause(&xas);
> +		xas_unlock(&xas);
> +		cond_resched();
> +		xas_lock(&xas);
> +	}
> +	xas_unlock(&xas);
> +
> +	return ret;
>  }
>  
>  static int sgx_vma_mprotect(struct vm_area_struct *vma,
> -- 
> 2.25.1
>

Dave Hansen Oct. 5, 2020, 2:28 p.m. UTC | #2

On 10/5/20 7:11 AM, Jarkko Sakkinen wrote:
> +	unsigned long count = 0;
...
> +	xas_lock(&xas);
> +	xas_for_each(&xas, page, idx_end) {
> +		if (++count % XA_CHECK_SCHED)
> +			continue;

Let's slow down and think through the loop, please.

First time through the loop, count=0, XA_CHECK_SCHED=4096, it will do a
++count, and now count=1.  (count % XA_CHECK_SCHED) == 1.  It will
continue.  It skips the page->vm_max_prot_bits checks.

Next time through the loop, count=1, XA_CHECK_SCHED=4096, it will do a
++count, and now count=2.  (count % XA_CHECK_SCHED) == 2.  It will
continue.  It skips the page->vm_max_prot_bits checks.

...

It will do this until it hits count=4095 where it will actually fall
into the rest of the loop, doing the page->vm_max_prot_bits checks.

So, in the end the loop only does what it's supposed to be doing 1/4096
times.  Not great.  Don't we have tests that will notice breakage like this?

The XA_CHECK_SCHED needs to be stuck near the *end* of the loop, just
before the lock dropping and resched stuff.

Sean Christopherson Oct. 5, 2020, 3:55 p.m. UTC | #3

On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote:
> @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
>  	if (current->personality & READ_IMPLIES_EXEC)
>  		return -EACCES;
>  
> -	xas_for_each(&xas, page, idx_end)
> -		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> -			return -EACCES;
> +	/*
> +	 * No need to hold encl->lock:
> +	 * 1. None of the page->* get written.
> +	 * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This
> +	 *    is before calling xa_insert(). After that it is never modified.
> +	 */

You forgot to cover racing with insertion, e.g. below is the snippet from my
original patch[*], which did the lookup without protection from encl->lock.`

+		/*
+		 * No need to take encl->lock, vm_prot_bits is set prior to
+		 * insertion and never changes, and racing with adding pages is
+		 * a userspace bug.
+		 */
+		rcu_read_lock();
+		page = radix_tree_lookup(&encl->page_tree, idx);
+		rcu_read_unlock();


[*]https://patchwork.kernel.org/patch/11005431/

Jarkko Sakkinen Oct. 5, 2020, 5:25 p.m. UTC | #4

On Mon, Oct 05, 2020 at 03:12:54PM +0100, Matthew Wilcox wrote:
> On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote:
> > Fix the issue further discussed in:
> 
> No, this is still utter crap.  Just use the version I sent.

OK, just a quick recap to fully understand it.

Here's your response from the original thread:

https://lore.kernel.org/linux-sgx/20201005013053.GJ20115@casper.infradead.org/

And here's your snippet from v2:

https://lore.kernel.org/linux-sgx/20201005111139.GK20115@casper.infradead.org/

This is what confused me.

Compared to (v3) patch and your version, the differences I spot are:

A. 'count' is checked first.
B. Both this and the snippet from the original thread (i.e.
   page-writeback.c) use the same loop construct.

Right, and snippet from page-writeback.c is not compatible with this
because looking at documentation xas_find() continued with
xas_next_entry() sequence will jump through the existing entries and
skip the holes? On the other hand, xas_next() does not.

Of the A part I'm not sure how the order there semantically matter.

/Jarkko

Jarkko Sakkinen Oct. 5, 2020, 5:28 p.m. UTC | #5

On Mon, Oct 05, 2020 at 08:25:39PM +0300, Jarkko Sakkinen wrote:
> On Mon, Oct 05, 2020 at 03:12:54PM +0100, Matthew Wilcox wrote:
> > On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote:
> > > Fix the issue further discussed in:
> > 
> > No, this is still utter crap.  Just use the version I sent.
> 
> OK, just a quick recap to fully understand it.
> 
> Here's your response from the original thread:
> 
> https://lore.kernel.org/linux-sgx/20201005013053.GJ20115@casper.infradead.org/
> 
> And here's your snippet from v2:
> 
> https://lore.kernel.org/linux-sgx/20201005111139.GK20115@casper.infradead.org/
> 
> This is what confused me.
> 
> Compared to (v3) patch and your version, the differences I spot are:
> 
> A. 'count' is checked first.
> B. Both this and the snippet from the original thread (i.e.
>    page-writeback.c) use the same loop construct.
> 
> Right, and snippet from page-writeback.c is not compatible with this
> because looking at documentation xas_find() continued with
> xas_next_entry() sequence will jump through the existing entries and
> skip the holes? On the other hand, xas_next() does not.
> 
> Of the A part I'm not sure how the order there semantically matter.

Right, how blind I was with B. Read Dave's response.

/Jarkko

Jarkko Sakkinen Oct. 5, 2020, 5:36 p.m. UTC | #6

On Mon, Oct 05, 2020 at 07:28:38AM -0700, Dave Hansen wrote:
> On 10/5/20 7:11 AM, Jarkko Sakkinen wrote:
> > +	unsigned long count = 0;
> ...
> > +	xas_lock(&xas);
> > +	xas_for_each(&xas, page, idx_end) {
> > +		if (++count % XA_CHECK_SCHED)
> > +			continue;
> 
> Let's slow down and think through the loop, please.
> 
> First time through the loop, count=0, XA_CHECK_SCHED=4096, it will do a
> ++count, and now count=1.  (count % XA_CHECK_SCHED) == 1.  It will
> continue.  It skips the page->vm_max_prot_bits checks.
> 
> Next time through the loop, count=1, XA_CHECK_SCHED=4096, it will do a
> ++count, and now count=2.  (count % XA_CHECK_SCHED) == 2.  It will
> continue.  It skips the page->vm_max_prot_bits checks.
> 
> ...
> 
> It will do this until it hits count=4095 where it will actually fall
> into the rest of the loop, doing the page->vm_max_prot_bits checks.

Uh oh, how stupid of me.

> So, in the end the loop only does what it's supposed to be doing 1/4096
> times.  Not great.  Don't we have tests that will notice breakage like this?

It would not be hard to have two counts and WARN_ON in the end to
compare for mismatch.

> The XA_CHECK_SCHED needs to be stuck near the *end* of the loop, just
> before the lock dropping and resched stuff.

Referring to Matthew's suggestion:

https://lore.kernel.org/linux-sgx/20201005111139.GK20115@casper.infradead.org/

I think that the order is just right.

It takes the page, does the processing, and in the tail does check
before rescheduling.

The only thing I'd do for clarity would be to change the tail in
that like this:


		/* Move to the next iteration. */
		count++;

		/* Surpassed the modulus, reschedule. */
		if (!(count % XA_CHECK_SCHED)) {
			xas_pause(&xas);
			xas_unlock(&xas);
			cond_resched();
			xas_lock(&xas);
		}
	}

	xas_unlock(&xas);

I think this is just a bit more readable way to express the same
thing. Matthew, can you agree with this small twist?

/Jarkko

Jarkko Sakkinen Oct. 5, 2020, 5:41 p.m. UTC | #7

On Mon, Oct 05, 2020 at 08:55:19AM -0700, Sean Christopherson wrote:
> On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote:
> > @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
> >  	if (current->personality & READ_IMPLIES_EXEC)
> >  		return -EACCES;
> >  
> > -	xas_for_each(&xas, page, idx_end)
> > -		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> > -			return -EACCES;
> > +	/*
> > +	 * No need to hold encl->lock:
> > +	 * 1. None of the page->* get written.
> > +	 * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This
> > +	 *    is before calling xa_insert(). After that it is never modified.
> > +	 */
> 
> You forgot to cover racing with insertion, e.g. below is the snippet from my
> original patch[*], which did the lookup without protection from encl->lock.`
> 
> +		/*
> +		 * No need to take encl->lock, vm_prot_bits is set prior to
> +		 * insertion and never changes, and racing with adding pages is
> +		 * a userspace bug.
> +		 */
> +		rcu_read_lock();
> +		page = radix_tree_lookup(&encl->page_tree, idx);
> +		rcu_read_unlock();
> 
> 
> [*]https://patchwork.kernel.org/patch/11005431/

I'm not sure why that was merged as it was but it probably was not
because of that snippet. It had encl->lock before, so it was by all
practical means covered then. I would have replaced encl->lock with that
if I ever had received a patch with just that, i.e. that particular
snippet gone to noise. That's why it was not broken before v36.

/Jarkko

Jarkko Sakkinen Oct. 5, 2020, 5:43 p.m. UTC | #8

On Mon, Oct 05, 2020 at 08:42:07PM +0300, Jarkko Sakkinen wrote:
> On Mon, Oct 05, 2020 at 08:55:19AM -0700, Sean Christopherson wrote:
> > On Mon, Oct 05, 2020 at 05:11:19PM +0300, Jarkko Sakkinen wrote:
> > > @@ -317,11 +319,30 @@ int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
> > >  	if (current->personality & READ_IMPLIES_EXEC)
> > >  		return -EACCES;
> > >  
> > > -	xas_for_each(&xas, page, idx_end)
> > > -		if (!page || (~page->vm_max_prot_bits & vm_prot_bits))
> > > -			return -EACCES;
> > > +	/*
> > > +	 * No need to hold encl->lock:
> > > +	 * 1. None of the page->* get written.
> > > +	 * 2. page->vm_max_prot_bits is set in sgx_encl_page_alloc(). This
> > > +	 *    is before calling xa_insert(). After that it is never modified.
> > > +	 */
> > 
> > You forgot to cover racing with insertion, e.g. below is the snippet from my
> > original patch[*], which did the lookup without protection from encl->lock.`
> > 
> > +		/*
> > +		 * No need to take encl->lock, vm_prot_bits is set prior to
> > +		 * insertion and never changes, and racing with adding pages is
> > +		 * a userspace bug.
> > +		 */
> > +		rcu_read_lock();
> > +		page = radix_tree_lookup(&encl->page_tree, idx);
> > +		rcu_read_unlock();
> > 
> > 
> > [*]https://patchwork.kernel.org/patch/11005431/
> 
> I'm not sure why that was merged as it was but it probably was not
> because of that snippet. It had encl->lock before, so it was by all
> practical means covered then. I would have replaced encl->lock with that
> if I ever had received a patch with just that, i.e. that particular
> snippet gone to noise. That's why it was not broken before v36.

Or even if that patch was squashed (can't recall that far), it was
not intentionally not replaced. I have no idea why encl->lock was
not replaced but the code was not broken.

/Jarkko

[v3] x86/sgx: Fix sgx_encl_may_map locking

Commit Message

Comments

Patch