Message ID | 20210205182840.2260-2-jarkko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] MAINTAINERS: Add Dave Hansen as reviewer for INTEL SGX | expand |
On 2/5/21 10:28 AM, Jarkko Sakkinen wrote: > This has been shown in tests: > > [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100 > > There are two functions that drain encl->mm_list: > > - sgx_release() (i.e. VFS release) removes the remaining mm_list entries. > - sgx_mmu_notifier_release() removes mm_list entry for the registered > process, if it still exists. Jarkko, I like your approach. This actually has the potential to be a lot more understandable than the fix we settled on before. But I think the explanation needs some tweaking, and I think I can take it a step further to make it even more straightforward. The issue here isn't *really* mm_list, it's this: encl_mm->encl = encl; That literally establishes a encl_mm to encl reference and needs a reference count. That reference remains until 'encl_mm' is freed. I don't think mm_list needs to even be taken into account. The most straightforward way to fix this is to take a refcount at "encl_mm->encl = encl" and release it at kfree(encl_mm). That makes a *lot* of logical sense to me, and it's also trivial to audit. Totally untested patch attached (adapted directly from yours).
On Fri, Feb 05, 2021 at 11:36:57AM -0800, Dave Hansen wrote: > On 2/5/21 10:28 AM, Jarkko Sakkinen wrote: > > This has been shown in tests: > > > > [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100 > > > > There are two functions that drain encl->mm_list: > > > > - sgx_release() (i.e. VFS release) removes the remaining mm_list entries. > > - sgx_mmu_notifier_release() removes mm_list entry for the registered > > process, if it still exists. > > Jarkko, I like your approach. This actually has the potential to be a > lot more understandable than the fix we settled on before. Yeah, it's more like by-the-book use of refcount, each processs gets a reference. This way things should be always serialized correctly. > But I think the explanation needs some tweaking, and I think I can take > it a step further to make it even more straightforward. The issue here > isn't *really* mm_list, it's this: > > encl_mm->encl = encl; Agreed. This was also in center of thinking when I did this new patch. > That literally establishes a encl_mm to encl reference and needs a > reference count. That reference remains until 'encl_mm' is freed. I > don't think mm_list needs to even be taken into account. > > The most straightforward way to fix this is to take a refcount at > "encl_mm->encl = encl" and release it at kfree(encl_mm). That makes a > *lot* of logical sense to me, and it's also trivial to audit. > > Totally untested patch attached (adapted directly from yours). I tested this version, and it also seems to work. Boris, can you pick this refined version from Dave's attachment or do you prefer that I do a re-send? /Jarkko
On Sun, Feb 07, 2021 at 11:29:49PM +0200, Jarkko Sakkinen wrote: > On Fri, Feb 05, 2021 at 11:36:57AM -0800, Dave Hansen wrote: > > On 2/5/21 10:28 AM, Jarkko Sakkinen wrote: > > > This has been shown in tests: > > > > > > [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100 > > > > > > There are two functions that drain encl->mm_list: > > > > > > - sgx_release() (i.e. VFS release) removes the remaining mm_list entries. > > > - sgx_mmu_notifier_release() removes mm_list entry for the registered > > > process, if it still exists. > > > > Jarkko, I like your approach. This actually has the potential to be a > > lot more understandable than the fix we settled on before. > > Yeah, it's more like by-the-book use of refcount, each processs gets > a reference. This way things should be always serialized correctly. > > > But I think the explanation needs some tweaking, and I think I can take > > it a step further to make it even more straightforward. The issue here > > isn't *really* mm_list, it's this: > > > > encl_mm->encl = encl; > > Agreed. > > This was also in center of thinking when I did this new patch. > > > That literally establishes a encl_mm to encl reference and needs a > > reference count. That reference remains until 'encl_mm' is freed. I > > don't think mm_list needs to even be taken into account. > > > > The most straightforward way to fix this is to take a refcount at > > "encl_mm->encl = encl" and release it at kfree(encl_mm). That makes a > > *lot* of logical sense to me, and it's also trivial to audit. > > > > Totally untested patch attached (adapted directly from yours). > > I tested this version, and it also seems to work. Boris, can you > pick this refined version from Dave's attachment or do you prefer > that I do a re-send? Nevermind. I'll send a proper patch (just noticed that the attachment did have short summary). /Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index f2eac41bb4ff..8d8fcc91c0d6 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -72,6 +72,12 @@ static int sgx_release(struct inode *inode, struct file *file) synchronize_srcu(&encl->srcu); mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); kfree(encl_mm); + + /* + * Release the mm_list reference, as sgx_mmu_notifier_release() + * will only do this only, when it grabs encl_mm. + */ + kref_put(&encl->refcount, sgx_encl_release); } kref_put(&encl->refcount, sgx_encl_release); diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index ee50a5010277..c1d9c86c0265 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -474,6 +474,7 @@ static void sgx_mmu_notifier_release(struct mmu_notifier *mn, if (tmp == encl_mm) { synchronize_srcu(&encl_mm->encl->srcu); mmu_notifier_put(mn); + kref_put(&encl_mm->encl->refcount, sgx_encl_release); } } @@ -545,6 +546,13 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) } spin_lock(&encl->mm_lock); + + /* + * Take a reference to guarantee that the enclave is not destroyed, + * while sgx_mmu_notifier_release() is active. + */ + kref_get(&encl->refcount); + list_add_rcu(&encl_mm->list, &encl->mm_list); /* Pairs with smp_rmb() in sgx_reclaimer_block(). */ smp_wmb();
This has been shown in tests: [ +0.000008] WARNING: CPU: 3 PID: 7620 at kernel/rcu/srcutree.c:374 cleanup_srcu_struct+0xed/0x100 There are two functions that drain encl->mm_list: - sgx_release() (i.e. VFS release) removes the remaining mm_list entries. - sgx_mmu_notifier_release() removes mm_list entry for the registered process, if it still exists. If encl->refcount is taken only for VFS, this can lead to sgx_encl_release() being executed before sgx_mmu_notifier_release() completes, which is exactly what happens in the above klog entry. Each process also needs its own enclave reference. In order to fix the race condition, increase encl->refcount when an entry to encl->mm_list added for a process. Release this reference when the mm_list entry is cleaned up, either in sgx_mmu_notifier_release() or sgx_release(). Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Cc: Dave Hansen <dave.hansen@linux.intel.com Reported-by: Haitao Huang <haitao.huang@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> --- v7: - No changes from v6. Resend of https://patchwork.kernel.org/project/intel-sgx/patch/20210204143845.39697-1-jarkko@kernel.org/ v6: - Maintain refcount for each encl->mm_list entry. v5: - To make sure that the instance does not get deleted use kref_get() kref_put(). This also removes the need for additional synchronize_srcu(). v4: - Rewrite the commit message. - Just change the call order. *_expedited() is out of scope for this bug fix. v3: Fine-tuned tags, and added missing change log for v2. v2: Switch to synchronize_srcu_expedited(). arch/x86/kernel/cpu/sgx/driver.c | 6 ++++++ arch/x86/kernel/cpu/sgx/encl.c | 8 ++++++++ 2 files changed, 14 insertions(+)