Message ID | 20210616004458.2192889-1-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/sgx: Suppress WARN on inability to sanitize EPC if ksgxd is stopped | expand |
Le Tue, Jun 15, 2021 at 05:44:58PM -0700, Sean Christopherson a écrit : > Don't WARN on having unsanitized EPC pages if ksgxd is stopped early, > e.g. if sgx_init() realizes there will be no downstream consumers of EPC. > If ksgxd is stopped early, EPC pages may be left on the dirty list, but > that's ok because ksgxd is only stopped if SGX initialization failed or > if the kernel is going down. In either case, the EPC won't be used. > > This bug was exposed by the addition of KVM support, but has existed and > was hittable since the original sanitization code was added. Prior to > adding KVM support, if Launch Control was not fully enabled, e.g. when > running on older hardware, sgx_init() bailed immediately before spawning > ksgxd because X86_FEATURE_SGX was cleared if X86_FEATURE_SGX_LC was > unsupported. > > With KVM support, sgx_drv_init() handles the X86_FEATURE_SGX_LC check > manually, so now there's any easy-to-hit case where sgx_init() will spawn > ksgxd and _then_ fail to initialize, which results in sgx_init() stopping > ksgxd before it finishes sanitizing the EPC. > > Prior to KVM support, the bug was much harder to hit because it basically > required char device registration to fail. > > Reported-by: Du Cheng <ducheng2@gmail.com> > Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > > Lightly tested due to lack of hardware. I hacked the flow to verify that > stopping early will leave work pending, and that rechecking should_stop() > suppress the resulting WARN. > > arch/x86/kernel/cpu/sgx/main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index ad904747419e..fbad2b9625a5 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -425,7 +425,7 @@ static int ksgxd(void *p) > __sgx_sanitize_pages(&sgx_dirty_page_list); > > /* sanity check: */ > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > + WARN_ON(!list_empty(&sgx_dirty_page_list) && !kthread_should_stop()); > > while (!kthread_should_stop()) { > if (try_to_freeze()) > -- > 2.32.0.272.g935e593368-goog > I applied this patch on 5.13-rc6, and it no longer causes to trigger WARN_ON() on my NUC: ``` [ 0.669411] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 0.669412] software IO TLB: mapped [mem 0x0000000017cb9000-0x000000001bcb9000] (64MB) [ 0.672788] platform rtc_cmos: registered platform RTC device (no PNP device found) [ 0.672805] sgx: EPC section 0x30200000-0x35f7ffff [ 0.674239] Initialise system trusted keyrings [ 0.674254] Key type blacklist registered ``` Regards, Du Cheng
On Tue, Jun 15, 2021 at 05:44:58PM -0700, Sean Christopherson wrote: > Don't WARN on having unsanitized EPC pages if ksgxd is stopped early, > e.g. if sgx_init() realizes there will be no downstream consumers of EPC. > If ksgxd is stopped early, EPC pages may be left on the dirty list, but > that's ok because ksgxd is only stopped if SGX initialization failed or > if the kernel is going down. In either case, the EPC won't be used. > > This bug was exposed by the addition of KVM support, but has existed and > was hittable since the original sanitization code was added. Prior to > adding KVM support, if Launch Control was not fully enabled, e.g. when > running on older hardware, sgx_init() bailed immediately before spawning > ksgxd because X86_FEATURE_SGX was cleared if X86_FEATURE_SGX_LC was > unsupported. > > With KVM support, sgx_drv_init() handles the X86_FEATURE_SGX_LC check > manually, so now there's any easy-to-hit case where sgx_init() will spawn > ksgxd and _then_ fail to initialize, which results in sgx_init() stopping > ksgxd before it finishes sanitizing the EPC. > > Prior to KVM support, the bug was much harder to hit because it basically > required char device registration to fail. > > Reported-by: Du Cheng <ducheng2@gmail.com> > Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > > Lightly tested due to lack of hardware. I hacked the flow to verify that > stopping early will leave work pending, and that rechecking should_stop() > suppress the resulting WARN. > > arch/x86/kernel/cpu/sgx/main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index ad904747419e..fbad2b9625a5 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -425,7 +425,7 @@ static int ksgxd(void *p) > __sgx_sanitize_pages(&sgx_dirty_page_list); > > /* sanity check: */ > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > + WARN_ON(!list_empty(&sgx_dirty_page_list) && !kthread_should_stop()); > > while (!kthread_should_stop()) { > if (try_to_freeze()) > -- Hmm, this looks weird. Why aren't we starting ksgxd only after *everything* has initialized successfully? I.e., after both kvm and native drivers' init functions have succeeded? Then you won't have to do this kthread_should_stop() thing after the fact. Btw, you have the same thing in the while loop's termination condition two lines down which, if I have to look at it later, would make me scratch head as to what TH is going on here. Thx.
On 6/17/21 9:45 AM, Borislav Petkov wrote: > Hmm, this looks weird. Why aren't we starting ksgxd only after > *everything* has initialized successfully? I.e., after both kvm and > native drivers' init functions have succeeded? ksgxd has two roles. I think that's why it looks weird. The obvious role is its use as the kswapd equivalent for SGX. But, it's also used to speed up SGX initialization. It "sanitizes" the EPC asynchronously because it can take quite a while. That's why it gets launched off early. If it gets interrupted, that's when this warning can trigger. I think you're suggesting that we just defer starting ksgxd until we *know* it won't be interrupted, basically moving sgx_page_reclaimer_init() down below sgx_drv_init() and sgx_vepc_init(). While I can see why it's best to get it going as early as possible, I don't see much going on in those init functions that would justify needing to fork off ksgx earlier. Am I missing anything?
On Tue, Jun 15, 2021 at 05:44:58PM -0700, Sean Christopherson wrote: > Don't WARN on having unsanitized EPC pages if ksgxd is stopped early, > e.g. if sgx_init() realizes there will be no downstream consumers of EPC. > If ksgxd is stopped early, EPC pages may be left on the dirty list, but > that's ok because ksgxd is only stopped if SGX initialization failed or > if the kernel is going down. In either case, the EPC won't be used. > > This bug was exposed by the addition of KVM support, but has existed and > was hittable since the original sanitization code was added. Prior to > adding KVM support, if Launch Control was not fully enabled, e.g. when > running on older hardware, sgx_init() bailed immediately before spawning > ksgxd because X86_FEATURE_SGX was cleared if X86_FEATURE_SGX_LC was > unsupported. > > With KVM support, sgx_drv_init() handles the X86_FEATURE_SGX_LC check > manually, so now there's any easy-to-hit case where sgx_init() will spawn > ksgxd and _then_ fail to initialize, which results in sgx_init() stopping > ksgxd before it finishes sanitizing the EPC. > > Prior to KVM support, the bug was much harder to hit because it basically > required char device registration to fail. > > Reported-by: Du Cheng <ducheng2@gmail.com> > Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- Thank you. Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> > Lightly tested due to lack of hardware. I hacked the flow to verify that > stopping early will leave work pending, and that rechecking should_stop() > suppress the resulting WARN. > > arch/x86/kernel/cpu/sgx/main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index ad904747419e..fbad2b9625a5 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -425,7 +425,7 @@ static int ksgxd(void *p) > __sgx_sanitize_pages(&sgx_dirty_page_list); > > /* sanity check: */ > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > + WARN_ON(!list_empty(&sgx_dirty_page_list) && !kthread_should_stop()); > > while (!kthread_should_stop()) { > if (try_to_freeze()) > -- > 2.32.0.272.g935e593368-goog > > /Jarkko
On Wed, Jun 16, 2021 at 02:03:09PM +0800, Du Cheng wrote: > Le Tue, Jun 15, 2021 at 05:44:58PM -0700, Sean Christopherson a écrit : > > Don't WARN on having unsanitized EPC pages if ksgxd is stopped early, > > e.g. if sgx_init() realizes there will be no downstream consumers of EPC. > > If ksgxd is stopped early, EPC pages may be left on the dirty list, but > > that's ok because ksgxd is only stopped if SGX initialization failed or > > if the kernel is going down. In either case, the EPC won't be used. > > > > This bug was exposed by the addition of KVM support, but has existed and > > was hittable since the original sanitization code was added. Prior to > > adding KVM support, if Launch Control was not fully enabled, e.g. when > > running on older hardware, sgx_init() bailed immediately before spawning > > ksgxd because X86_FEATURE_SGX was cleared if X86_FEATURE_SGX_LC was > > unsupported. > > > > With KVM support, sgx_drv_init() handles the X86_FEATURE_SGX_LC check > > manually, so now there's any easy-to-hit case where sgx_init() will spawn > > ksgxd and _then_ fail to initialize, which results in sgx_init() stopping > > ksgxd before it finishes sanitizing the EPC. > > > > Prior to KVM support, the bug was much harder to hit because it basically > > required char device registration to fail. > > > > Reported-by: Du Cheng <ducheng2@gmail.com> > > Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > > > Lightly tested due to lack of hardware. I hacked the flow to verify that > > stopping early will leave work pending, and that rechecking should_stop() > > suppress the resulting WARN. > > > > arch/x86/kernel/cpu/sgx/main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > > index ad904747419e..fbad2b9625a5 100644 > > --- a/arch/x86/kernel/cpu/sgx/main.c > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > @@ -425,7 +425,7 @@ static int ksgxd(void *p) > > __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > /* sanity check: */ > > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > > + WARN_ON(!list_empty(&sgx_dirty_page_list) && !kthread_should_stop()); > > > > while (!kthread_should_stop()) { > > if (try_to_freeze()) > > -- > > 2.32.0.272.g935e593368-goog > > > > I applied this patch on 5.13-rc6, and it no longer causes to trigger WARN_ON() > on my NUC: > > ``` > > [ 0.669411] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > [ 0.669412] software IO TLB: mapped [mem 0x0000000017cb9000-0x000000001bcb9000] (64MB) > [ 0.672788] platform rtc_cmos: registered platform RTC device (no PNP device found) > [ 0.672805] sgx: EPC section 0x30200000-0x35f7ffff > [ 0.674239] Initialise system trusted keyrings > [ 0.674254] Key type blacklist registered > > ``` > > Regards, > Du Cheng Can you thus give a tested-by for this? /Jarkko
Le Wed, Jun 23, 2021 at 04:32:19PM +0300, Jarkko Sakkinen a écrit : > On Wed, Jun 16, 2021 at 02:03:09PM +0800, Du Cheng wrote: > > Le Tue, Jun 15, 2021 at 05:44:58PM -0700, Sean Christopherson a écrit : > > > Don't WARN on having unsanitized EPC pages if ksgxd is stopped early, > > > e.g. if sgx_init() realizes there will be no downstream consumers of EPC. > > > If ksgxd is stopped early, EPC pages may be left on the dirty list, but > > > that's ok because ksgxd is only stopped if SGX initialization failed or > > > if the kernel is going down. In either case, the EPC won't be used. > > > > > > This bug was exposed by the addition of KVM support, but has existed and > > > was hittable since the original sanitization code was added. Prior to > > > adding KVM support, if Launch Control was not fully enabled, e.g. when > > > running on older hardware, sgx_init() bailed immediately before spawning > > > ksgxd because X86_FEATURE_SGX was cleared if X86_FEATURE_SGX_LC was > > > unsupported. > > > > > > With KVM support, sgx_drv_init() handles the X86_FEATURE_SGX_LC check > > > manually, so now there's any easy-to-hit case where sgx_init() will spawn > > > ksgxd and _then_ fail to initialize, which results in sgx_init() stopping > > > ksgxd before it finishes sanitizing the EPC. > > > > > > Prior to KVM support, the bug was much harder to hit because it basically > > > required char device registration to fail. > > > > > > Reported-by: Du Cheng <ducheng2@gmail.com> > > > Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > --- > > > > > > Lightly tested due to lack of hardware. I hacked the flow to verify that > > > stopping early will leave work pending, and that rechecking should_stop() > > > suppress the resulting WARN. > > > > > > arch/x86/kernel/cpu/sgx/main.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > > > index ad904747419e..fbad2b9625a5 100644 > > > --- a/arch/x86/kernel/cpu/sgx/main.c > > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > > @@ -425,7 +425,7 @@ static int ksgxd(void *p) > > > __sgx_sanitize_pages(&sgx_dirty_page_list); > > > > > > /* sanity check: */ > > > - WARN_ON(!list_empty(&sgx_dirty_page_list)); > > > + WARN_ON(!list_empty(&sgx_dirty_page_list) && !kthread_should_stop()); > > > > > > while (!kthread_should_stop()) { > > > if (try_to_freeze()) > > > -- > > > 2.32.0.272.g935e593368-goog > > > > > > > I applied this patch on 5.13-rc6, and it no longer causes to trigger WARN_ON() > > on my NUC: > > > > ``` > > > > [ 0.669411] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > > [ 0.669412] software IO TLB: mapped [mem 0x0000000017cb9000-0x000000001bcb9000] (64MB) > > [ 0.672788] platform rtc_cmos: registered platform RTC device (no PNP device found) > > [ 0.672805] sgx: EPC section 0x30200000-0x35f7ffff > > [ 0.674239] Initialise system trusted keyrings > > [ 0.674254] Key type blacklist registered > > > > ``` > > > > Regards, > > Du Cheng > > Can you thus give a tested-by for this? > > /Jarkko Certainly. Tested-by: Du Cheng <ducheng2@gmail.com>
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index ad904747419e..fbad2b9625a5 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -425,7 +425,7 @@ static int ksgxd(void *p) __sgx_sanitize_pages(&sgx_dirty_page_list); /* sanity check: */ - WARN_ON(!list_empty(&sgx_dirty_page_list)); + WARN_ON(!list_empty(&sgx_dirty_page_list) && !kthread_should_stop()); while (!kthread_should_stop()) { if (try_to_freeze())
Don't WARN on having unsanitized EPC pages if ksgxd is stopped early, e.g. if sgx_init() realizes there will be no downstream consumers of EPC. If ksgxd is stopped early, EPC pages may be left on the dirty list, but that's ok because ksgxd is only stopped if SGX initialization failed or if the kernel is going down. In either case, the EPC won't be used. This bug was exposed by the addition of KVM support, but has existed and was hittable since the original sanitization code was added. Prior to adding KVM support, if Launch Control was not fully enabled, e.g. when running on older hardware, sgx_init() bailed immediately before spawning ksgxd because X86_FEATURE_SGX was cleared if X86_FEATURE_SGX_LC was unsupported. With KVM support, sgx_drv_init() handles the X86_FEATURE_SGX_LC check manually, so now there's any easy-to-hit case where sgx_init() will spawn ksgxd and _then_ fail to initialize, which results in sgx_init() stopping ksgxd before it finishes sanitizing the EPC. Prior to KVM support, the bug was much harder to hit because it basically required char device registration to fail. Reported-by: Du Cheng <ducheng2@gmail.com> Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections") Signed-off-by: Sean Christopherson <seanjc@google.com> --- Lightly tested due to lack of hardware. I hacked the flow to verify that stopping early will leave work pending, and that rechecking should_stop() suppress the resulting WARN. arch/x86/kernel/cpu/sgx/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)