[for_v21,1/2] x86/sgx: Use SRCU to protect mm_list during reclaim

Reclaiming enclaves faces a bit of a conundrum when it comes to lock
ordering.  The reclaim flows need to take mmap_sem for read, e.g. to age
and zap PTEs on arbitrary mm_structs.  But reclaim must first walk the
enclave's list of mm_structs, which could be modified asynchronously to
reclaim.  Because modifying the list of mm_structs is done in reaction
to vma changes, i.e. with mmap_sem held exclusively, taking enclave's
mm_lock to protect the list walk in reclaim would lead to deadlocks due
to conflicting lock ordering.  To avoid this, SGX currently uses a
custom walker that drops mm_lock and restarts the walk as needed.

Use SRCU to protect reclaim instead of using a custom walker to avoid
the aforementioned lock issues.  Using SRCU improves readability in the
reclaimer by eliminating the need to juggle mm_lock during reclaim since
it can take mmap_sem() underneath srcu_read_lock().  And since relcaim
doesn't drop its SRCU read lock, there is no need to grab a reference to
encl_mm.

Not taking a reference to encl_mm is not just an optimization, it's also
functionally necessary and a major motivation to moving to SRCu. Putting
the reference can invoke sgx_encl_mm_release(), which calls
synchronize_srcu() and will deadlock if done while holding the SRCU read
lock.  Not taking a reference paves the way for additional refcounting
improvements that would be extremely difficult to implement when using
the custom walker due to cyclical dependencies on the refcount.

Speaking of sgx_encl_mm_release(), the whole purpose of using SRCU is
that sgx_encl_mm_release() is blocked (if called on another cpu) by
synchronize_srcu(), which in turn prevents mmdrop() from freeing the
mm_struct while reclaim is in the SRCU critical section.  Ultimately,
reclaim just needs to ensure mm_struct isn't freed so that it can call
mmget_not_zero() to prevent the page tables from being dropped while it
accesses PTEs, i.e. it doesn't matter if the encl_mm is dying, reclaim
just needs to make sure it's not fully dead.

To avoid calling synchronize_rcu() while holding rcu_read_lock(), use
mmput_async() in the reclaimer, e.g. __mmput() closes all VMAs, thus
triggering sgx_encl_mm_release() and synchronize_srcu().  Alternatively
sgx_encl_mm_release() could always call synchronize_rcu() in a worker
thread (see below), but doing __mmput() in a worker thread is desirable
from an SGX performance perspective, i.e. doesn't stall the reclaimer
CPU to release the mm.

And finally, the last deadlock scenario is if sgx_encl_mm_release() is
invoked on an in-use mm_struct, e.g. via munmap().

CPU0                     CPU1
munmap()
down_write(&mmap_sem)
                         srcu_read_lock()

synchronize_srcu()
                         down_read(&mmap_sem) <- deadlock

Avoid deadlock in this scenario by synchronizing SRCU via a worker
thread.  SRCU ensures only the liveliness of the mm_struct itself,
which is guaranteed by an mmgrab() prior to scheduling the work.
The reclaimer is responsible for checking mm_users and the VMAs to
ensure it doesn't touch stale PTEs, i.e. delaying synchronization does
not affect the reclaimer's responsiblities.  The delay does add one new
wrinkle in that sgx_encl_mm_add() and sgx_vma_open() can see a dying
encl_mm.  Previously this was prevented by virtue of sgx_vma_close()
being mutually exclusive (the caller must hold down_write(&mmap_sem)).
Handle such a case by using kref_get_unless_zero().

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 arch/x86/Kconfig                      |   1 +
 arch/x86/kernel/cpu/sgx/driver/main.c |  34 ++----
 arch/x86/kernel/cpu/sgx/encl.c        | 165 ++++++++++++++------------
 arch/x86/kernel/cpu/sgx/encl.h        |   9 +-
 arch/x86/kernel/cpu/sgx/reclaim.c     |  71 ++++-------
 5 files changed, 124 insertions(+), 156 deletions(-)

Message ID	20190711161625.15786-2-sean.j.christopherson@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-sgx-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1233914DB for <patchwork-linux-sgx@patchwork.kernel.org>; Thu, 11 Jul 2019 16:16:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 018E7288BC for <patchwork-linux-sgx@patchwork.kernel.org>; Thu, 11 Jul 2019 16:16:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E9D242890B; Thu, 11 Jul 2019 16:16:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9F50288C9 for <patchwork-linux-sgx@patchwork.kernel.org>; Thu, 11 Jul 2019 16:16:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727612AbfGKQQ2 (ORCPT <rfc822;patchwork-linux-sgx@patchwork.kernel.org>); Thu, 11 Jul 2019 12:16:28 -0400 Received: from mga03.intel.com ([134.134.136.65]:37842 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728479AbfGKQQ2 (ORCPT <rfc822;linux-sgx@vger.kernel.org>); Thu, 11 Jul 2019 12:16:28 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jul 2019 09:16:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,479,1557212400"; d="scan'208";a="177213536" Received: from sjchrist-coffee.jf.intel.com ([10.54.74.165]) by orsmga002.jf.intel.com with ESMTP; 11 Jul 2019 09:16:26 -0700 From: Sean Christopherson <sean.j.christopherson@intel.com> To: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: linux-sgx@vger.kernel.org Subject: [PATCH for_v21 1/2] x86/sgx: Use SRCU to protect mm_list during reclaim Date: Thu, 11 Jul 2019 09:16:24 -0700 Message-Id: <20190711161625.15786-2-sean.j.christopherson@intel.com> X-Mailer: git-send-email 2.22.0 In-Reply-To: <20190711161625.15786-1-sean.j.christopherson@intel.com> References: <20190711161625.15786-1-sean.j.christopherson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk List-ID: <linux-sgx.vger.kernel.org> X-Mailing-List: linux-sgx@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	x86/sgx: Use SRCU and mmu_notifier \| expand [for_v21,0/2] x86/sgx: Use SRCU and mmu_notifier [for_v21,1/2] x86/sgx: Use SRCU to protect mm_list during reclaim [for_v21,2/2] x86/sgx: Use mmu_notifier.release() instead of per-vma refcounting

[for_v21,1/2] x86/sgx: Use SRCU to protect mm_list during reclaim

Commit Message

Comments

Patch