Message ID | 20210311020142.125722-1-kai.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On Thu, Mar 11, 2021, Kai Huang wrote: > From: Jarkko Sakkinen <jarkko@kernel.org> > > EREMOVE takes a page and removes any association between that page and > an enclave. It must be run on a page before it can be added into > another enclave. Currently, EREMOVE is run as part of pages being freed > into the SGX page allocator. It is not expected to fail. > > KVM does not track how guest pages are used, which means that SGX > virtualization use of EREMOVE might fail. > > Break out the EREMOVE call from the SGX page allocator. This will allow > the SGX virtualization code to use the allocator directly. (SGX/KVM > will also introduce a more permissive EREMOVE helper). > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > more specific that it is used to free EPC page assigned to one enclave. > Print an error message when EREMOVE fails to explicitly call out EPC > page is leaked, and requires machine reboot to get leaked pages back. > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > Co-developed-by: Kai Huang <kai.huang@intel.com> > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > Signed-off-by: Kai Huang <kai.huang@intel.com> > --- > v2->v3: > > - Fixed bug during copy/paste which results in SECS page and va pages are not > correctly freed in sgx_encl_release() (sorry for the mistake). > - Added Jarkko's Acked-by. That Acked-by should either be dropped or moved above Co-developed-by to make checkpatch happy. Reviewed-by: Sean Christopherson <seanjc@google.com>
On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > On Thu, Mar 11, 2021, Kai Huang wrote: > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > EREMOVE takes a page and removes any association between that page and > > an enclave. It must be run on a page before it can be added into > > another enclave. Currently, EREMOVE is run as part of pages being freed > > into the SGX page allocator. It is not expected to fail. > > > > KVM does not track how guest pages are used, which means that SGX > > virtualization use of EREMOVE might fail. > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > will also introduce a more permissive EREMOVE helper). > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > more specific that it is used to free EPC page assigned to one enclave. > > Print an error message when EREMOVE fails to explicitly call out EPC > > page is leaked, and requires machine reboot to get leaked pages back. > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > --- > > v2->v3: > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > correctly freed in sgx_encl_release() (sorry for the mistake). > > - Added Jarkko's Acked-by. > > That Acked-by should either be dropped or moved above Co-developed-by to make > checkpatch happy. > > Reviewed-by: Sean Christopherson <seanjc@google.com> Oops, my bad. Yup, ack should be removed. /Jarkko
On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote: > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > > On Thu, Mar 11, 2021, Kai Huang wrote: > > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > EREMOVE takes a page and removes any association between that page and > > > an enclave. It must be run on a page before it can be added into > > > another enclave. Currently, EREMOVE is run as part of pages being freed > > > into the SGX page allocator. It is not expected to fail. > > > > > > KVM does not track how guest pages are used, which means that SGX > > > virtualization use of EREMOVE might fail. > > > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > > will also introduce a more permissive EREMOVE helper). > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > > more specific that it is used to free EPC page assigned to one enclave. > > > Print an error message when EREMOVE fails to explicitly call out EPC > > > page is leaked, and requires machine reboot to get leaked pages back. > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > --- > > > v2->v3: > > > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > > correctly freed in sgx_encl_release() (sorry for the mistake). > > > - Added Jarkko's Acked-by. > > > > That Acked-by should either be dropped or moved above Co-developed-by to make > > checkpatch happy. > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > Oops, my bad. Yup, ack should be removed. > > /Jarkko Hi Jarkko, Your reply of your concern of this patch to the cover-letter https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/ reminds me to do more sanity check of whether removing EREMOVE in sgx_free_epc_page() will impact other code path or not, and I think sgx_encl_release() is not the only place should be changed: - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is called, the VA page can be already valid -- there are other failures can trigger sgx_encl_shrink(). - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:" label, since the EPC page can be already valid when error happened, i.e. when EEXTEND fails. Other places should be OK per my check, but I'd prefer to just replacing all sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA fails, in which case EREMOVE is not required for sure. Your idea, please? Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me, because virtualization has a counterpart in sgx/virt.c too.
On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote: > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote: > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > > > On Thu, Mar 11, 2021, Kai Huang wrote: > > > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > EREMOVE takes a page and removes any association between that page and > > > > an enclave. It must be run on a page before it can be added into > > > > another enclave. Currently, EREMOVE is run as part of pages being freed > > > > into the SGX page allocator. It is not expected to fail. > > > > > > > > KVM does not track how guest pages are used, which means that SGX > > > > virtualization use of EREMOVE might fail. > > > > > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > > > will also introduce a more permissive EREMOVE helper). > > > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > > > more specific that it is used to free EPC page assigned to one enclave. > > > > Print an error message when EREMOVE fails to explicitly call out EPC > > > > page is leaked, and requires machine reboot to get leaked pages back. > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > > --- > > > > v2->v3: > > > > > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > > > correctly freed in sgx_encl_release() (sorry for the mistake). > > > > - Added Jarkko's Acked-by. > > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make > > > checkpatch happy. > > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > Oops, my bad. Yup, ack should be removed. > > > > /Jarkko > > Hi Jarkko, > > Your reply of your concern of this patch to the cover-letter > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/ > > reminds me to do more sanity check of whether removing EREMOVE in > sgx_free_epc_page() will impact other code path or not, and I think > sgx_encl_release() is not the only place should be changed: > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is > called, the VA page can be already valid -- there are other failures can > trigger sgx_encl_shrink(). You right about this, good catch. Shrink needs to always do EREMOVE as grow has done EPA, which changes EPC page state. > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:" > label, since the EPC page can be already valid when error happened, i.e. when > EEXTEND fails. Yes, correct, good work! > Other places should be OK per my check, but I'd prefer to just replacing all > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA > fails, in which case EREMOVE is not required for sure. I would not unless they require it. > Your idea, please? > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me, > because virtualization has a counterpart in sgx/virt.c too. It does make sense to use sgx_free_epc_page() everywhere where it's the right thing to call and here's why. If there is some unrelated regression that causes EPC page not get uninitialized when it actually should, doing extra EREMOVE could mask those bugs. I.e. it can postpone a failure, which can make a bug harder to backtrace. Jarkko
On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote: > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote: > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote: > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > > > > On Thu, Mar 11, 2021, Kai Huang wrote: > > > > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > > > EREMOVE takes a page and removes any association between that page and > > > > > an enclave. It must be run on a page before it can be added into > > > > > another enclave. Currently, EREMOVE is run as part of pages being freed > > > > > into the SGX page allocator. It is not expected to fail. > > > > > > > > > > KVM does not track how guest pages are used, which means that SGX > > > > > virtualization use of EREMOVE might fail. > > > > > > > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > > > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > > > > will also introduce a more permissive EREMOVE helper). > > > > > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > > > > more specific that it is used to free EPC page assigned to one enclave. > > > > > Print an error message when EREMOVE fails to explicitly call out EPC > > > > > page is leaked, and requires machine reboot to get leaked pages back. > > > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > > > --- > > > > > v2->v3: > > > > > > > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > > > > correctly freed in sgx_encl_release() (sorry for the mistake). > > > > > - Added Jarkko's Acked-by. > > > > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make > > > > checkpatch happy. > > > > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > > > Oops, my bad. Yup, ack should be removed. > > > > > > /Jarkko > > > > Hi Jarkko, > > > > Your reply of your concern of this patch to the cover-letter > > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/ > > > > reminds me to do more sanity check of whether removing EREMOVE in > > sgx_free_epc_page() will impact other code path or not, and I think > > sgx_encl_release() is not the only place should be changed: > > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is > > called, the VA page can be already valid -- there are other failures can > > trigger sgx_encl_shrink(). > > You right about this, good catch. > > Shrink needs to always do EREMOVE as grow has done EPA, which changes > EPC page state. > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:" > > label, since the EPC page can be already valid when error happened, i.e. when > > EEXTEND fails. > > Yes, correct, good work! > > > Other places should be OK per my check, but I'd prefer to just replacing all > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA > > fails, in which case EREMOVE is not required for sure. > > I would not unless they require it. > > > Your idea, please? > > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me, > > because virtualization has a counterpart in sgx/virt.c too. > > It does make sense to use sgx_free_epc_page() everywhere where it's > the right thing to call and here's why. > > If there is some unrelated regression that causes EPC page not get > uninitialized when it actually should, doing extra EREMOVE could mask > those bugs. I.e. it can postpone a failure, which can make a bug harder > to backtrace. > I.e. even though it is true that for correctly working code extra EREMOVE is nil functionality, it could change semantics for buggy code. /Jarkko
On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote: > On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote: > > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote: > > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote: > > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > > > > > On Thu, Mar 11, 2021, Kai Huang wrote: > > > > > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > > > > > EREMOVE takes a page and removes any association between that page and > > > > > > an enclave. It must be run on a page before it can be added into > > > > > > another enclave. Currently, EREMOVE is run as part of pages being freed > > > > > > into the SGX page allocator. It is not expected to fail. > > > > > > > > > > > > KVM does not track how guest pages are used, which means that SGX > > > > > > virtualization use of EREMOVE might fail. > > > > > > > > > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > > > > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > > > > > will also introduce a more permissive EREMOVE helper). > > > > > > > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > > > > > more specific that it is used to free EPC page assigned to one enclave. > > > > > > Print an error message when EREMOVE fails to explicitly call out EPC > > > > > > page is leaked, and requires machine reboot to get leaked pages back. > > > > > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > > > > --- > > > > > > v2->v3: > > > > > > > > > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > > > > > correctly freed in sgx_encl_release() (sorry for the mistake). > > > > > > - Added Jarkko's Acked-by. > > > > > > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make > > > > > checkpatch happy. > > > > > > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > > > > > Oops, my bad. Yup, ack should be removed. > > > > > > > > /Jarkko > > > > > > Hi Jarkko, > > > > > > Your reply of your concern of this patch to the cover-letter > > > > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/ > > > > > > reminds me to do more sanity check of whether removing EREMOVE in > > > sgx_free_epc_page() will impact other code path or not, and I think > > > sgx_encl_release() is not the only place should be changed: > > > > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is > > > called, the VA page can be already valid -- there are other failures can > > > trigger sgx_encl_shrink(). > > > > You right about this, good catch. > > > > Shrink needs to always do EREMOVE as grow has done EPA, which changes > > EPC page state. > > > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:" > > > label, since the EPC page can be already valid when error happened, i.e. when > > > EEXTEND fails. > > > > Yes, correct, good work! > > > > > Other places should be OK per my check, but I'd prefer to just replacing all > > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with > > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA > > > fails, in which case EREMOVE is not required for sure. > > > > I would not unless they require it. > > > > > Your idea, please? > > > > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me, > > > because virtualization has a counterpart in sgx/virt.c too. > > > > It does make sense to use sgx_free_epc_page() everywhere where it's > > the right thing to call and here's why. > > > > If there is some unrelated regression that causes EPC page not get > > uninitialized when it actually should, doing extra EREMOVE could mask > > those bugs. I.e. it can postpone a failure, which can make a bug harder > > to backtrace. > > > > I.e. even though it is true that for correctly working code extra EREMOVE > is nil functionality, it could change semantics for buggy code. Thanks for feedback. Sorry I am not sure if I understand you. So if we don't want to bring functionality change, we need to replace sgx_free_epc_page() in all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's better not to bring any functional change, so I intend to replace all (I now consider even leaving sgx_alloc_va_page() out is not good idea in *this* patch). Or do you just want to replace sgx_free_epc_page() with sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I pointed above? In this way there will be functional change in this patch, and we need to explicitly explain why leaving others out is OK in commit message. To me I prefer the former.
On Tue, Mar 16, 2021 at 09:29:34AM +1300, Kai Huang wrote: > On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote: > > On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote: > > > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote: > > > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote: > > > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > > > > > > On Thu, Mar 11, 2021, Kai Huang wrote: > > > > > > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > > > > > > > EREMOVE takes a page and removes any association between that page and > > > > > > > an enclave. It must be run on a page before it can be added into > > > > > > > another enclave. Currently, EREMOVE is run as part of pages being freed > > > > > > > into the SGX page allocator. It is not expected to fail. > > > > > > > > > > > > > > KVM does not track how guest pages are used, which means that SGX > > > > > > > virtualization use of EREMOVE might fail. > > > > > > > > > > > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > > > > > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > > > > > > will also introduce a more permissive EREMOVE helper). > > > > > > > > > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > > > > > > more specific that it is used to free EPC page assigned to one enclave. > > > > > > > Print an error message when EREMOVE fails to explicitly call out EPC > > > > > > > page is leaked, and requires machine reboot to get leaked pages back. > > > > > > > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > > > > > --- > > > > > > > v2->v3: > > > > > > > > > > > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > > > > > > correctly freed in sgx_encl_release() (sorry for the mistake). > > > > > > > - Added Jarkko's Acked-by. > > > > > > > > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make > > > > > > checkpatch happy. > > > > > > > > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > > > > > > > Oops, my bad. Yup, ack should be removed. > > > > > > > > > > /Jarkko > > > > > > > > Hi Jarkko, > > > > > > > > Your reply of your concern of this patch to the cover-letter > > > > > > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/ > > > > > > > > reminds me to do more sanity check of whether removing EREMOVE in > > > > sgx_free_epc_page() will impact other code path or not, and I think > > > > sgx_encl_release() is not the only place should be changed: > > > > > > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is > > > > called, the VA page can be already valid -- there are other failures can > > > > trigger sgx_encl_shrink(). > > > > > > You right about this, good catch. > > > > > > Shrink needs to always do EREMOVE as grow has done EPA, which changes > > > EPC page state. > > > > > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:" > > > > label, since the EPC page can be already valid when error happened, i.e. when > > > > EEXTEND fails. > > > > > > Yes, correct, good work! > > > > > > > Other places should be OK per my check, but I'd prefer to just replacing all > > > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with > > > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA > > > > fails, in which case EREMOVE is not required for sure. > > > > > > I would not unless they require it. > > > > > > > Your idea, please? > > > > > > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me, > > > > because virtualization has a counterpart in sgx/virt.c too. > > > > > > It does make sense to use sgx_free_epc_page() everywhere where it's > > > the right thing to call and here's why. > > > > > > If there is some unrelated regression that causes EPC page not get > > > uninitialized when it actually should, doing extra EREMOVE could mask > > > those bugs. I.e. it can postpone a failure, which can make a bug harder > > > to backtrace. > > > > > > > I.e. even though it is true that for correctly working code extra EREMOVE > > is nil functionality, it could change semantics for buggy code. > > Thanks for feedback. Sorry I am not sure if I understand you. So if we don't > want to bring functionality change, we need to replace sgx_free_epc_page() in > all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's > better not to bring any functional change, so I intend to replace all (I now > consider even leaving sgx_alloc_va_page() out is not good idea in *this* > patch). > > Or do you just want to replace sgx_free_epc_page() with > sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I > pointed above? In this way there will be functional change in this patch, and > we need to explicitly explain why leaving others out is OK in commit message. > > To me I prefer the former. The original purpose of this patch was exactly to remove EREMOVE sgx_free_epc_page() and call it explicitly where it is required. That's why I introduced sgx_reset_epc_page(). So the latter was actually the goal of this patch at least when I did it. Now this is something completely different. So, I don't consider myself author of this patch in any possible way, because this is not what I intended. To move forward, for the next patch set version, you should change the author field as yourself, and remove all my tags, and I will review it. So you can work out this with former approach if you wish. I.e. my ack/nak/etc. apply to this patch because it's not my code. /Jarkko
On Tue, Mar 16, 2021 at 09:29:34AM +1300, Kai Huang wrote: > On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote: > > On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote: > > > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote: > > > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote: > > > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > > > > > > On Thu, Mar 11, 2021, Kai Huang wrote: > > > > > > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > > > > > > > EREMOVE takes a page and removes any association between that page and > > > > > > > an enclave. It must be run on a page before it can be added into > > > > > > > another enclave. Currently, EREMOVE is run as part of pages being freed > > > > > > > into the SGX page allocator. It is not expected to fail. > > > > > > > > > > > > > > KVM does not track how guest pages are used, which means that SGX > > > > > > > virtualization use of EREMOVE might fail. > > > > > > > > > > > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > > > > > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > > > > > > will also introduce a more permissive EREMOVE helper). > > > > > > > > > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > > > > > > more specific that it is used to free EPC page assigned to one enclave. > > > > > > > Print an error message when EREMOVE fails to explicitly call out EPC > > > > > > > page is leaked, and requires machine reboot to get leaked pages back. > > > > > > > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > > > > > --- > > > > > > > v2->v3: > > > > > > > > > > > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > > > > > > correctly freed in sgx_encl_release() (sorry for the mistake). > > > > > > > - Added Jarkko's Acked-by. > > > > > > > > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make > > > > > > checkpatch happy. > > > > > > > > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > > > > > > > Oops, my bad. Yup, ack should be removed. > > > > > > > > > > /Jarkko > > > > > > > > Hi Jarkko, > > > > > > > > Your reply of your concern of this patch to the cover-letter > > > > > > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/ > > > > > > > > reminds me to do more sanity check of whether removing EREMOVE in > > > > sgx_free_epc_page() will impact other code path or not, and I think > > > > sgx_encl_release() is not the only place should be changed: > > > > > > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is > > > > called, the VA page can be already valid -- there are other failures can > > > > trigger sgx_encl_shrink(). > > > > > > You right about this, good catch. > > > > > > Shrink needs to always do EREMOVE as grow has done EPA, which changes > > > EPC page state. > > > > > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:" > > > > label, since the EPC page can be already valid when error happened, i.e. when > > > > EEXTEND fails. > > > > > > Yes, correct, good work! > > > > > > > Other places should be OK per my check, but I'd prefer to just replacing all > > > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with > > > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA > > > > fails, in which case EREMOVE is not required for sure. > > > > > > I would not unless they require it. > > > > > > > Your idea, please? > > > > > > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me, > > > > because virtualization has a counterpart in sgx/virt.c too. > > > > > > It does make sense to use sgx_free_epc_page() everywhere where it's > > > the right thing to call and here's why. > > > > > > If there is some unrelated regression that causes EPC page not get > > > uninitialized when it actually should, doing extra EREMOVE could mask > > > those bugs. I.e. it can postpone a failure, which can make a bug harder > > > to backtrace. > > > > > > > I.e. even though it is true that for correctly working code extra EREMOVE > > is nil functionality, it could change semantics for buggy code. > > Thanks for feedback. Sorry I am not sure if I understand you. So if we don't > want to bring functionality change, we need to replace sgx_free_epc_page() in > all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's > better not to bring any functional change, so I intend to replace all (I now > consider even leaving sgx_alloc_va_page() out is not good idea in *this* > patch). > > Or do you just want to replace sgx_free_epc_page() with > sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I > pointed above? In this way there will be functional change in this patch, and > we need to explicitly explain why leaving others out is OK in commit message. > > To me I prefer the former. But yes, I'm cool with your preference and I do get your argument, I just need to review it, and do not consider it as my patch :-) /Jarkko
On Tue, 16 Mar 2021 00:59:31 +0200 Jarkko Sakkinen wrote: > On Tue, Mar 16, 2021 at 09:29:34AM +1300, Kai Huang wrote: > > On Mon, 15 Mar 2021 15:19:32 +0200 Jarkko Sakkinen wrote: > > > On Mon, Mar 15, 2021 at 03:18:16PM +0200, Jarkko Sakkinen wrote: > > > > On Mon, Mar 15, 2021 at 08:12:36PM +1300, Kai Huang wrote: > > > > > On Sat, 13 Mar 2021 12:45:53 +0200 Jarkko Sakkinen wrote: > > > > > > On Fri, Mar 12, 2021 at 01:21:54PM -0800, Sean Christopherson wrote: > > > > > > > On Thu, Mar 11, 2021, Kai Huang wrote: > > > > > > > > From: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > > > > > > > > > EREMOVE takes a page and removes any association between that page and > > > > > > > > an enclave. It must be run on a page before it can be added into > > > > > > > > another enclave. Currently, EREMOVE is run as part of pages being freed > > > > > > > > into the SGX page allocator. It is not expected to fail. > > > > > > > > > > > > > > > > KVM does not track how guest pages are used, which means that SGX > > > > > > > > virtualization use of EREMOVE might fail. > > > > > > > > > > > > > > > > Break out the EREMOVE call from the SGX page allocator. This will allow > > > > > > > > the SGX virtualization code to use the allocator directly. (SGX/KVM > > > > > > > > will also introduce a more permissive EREMOVE helper). > > > > > > > > > > > > > > > > Implement original sgx_free_epc_page() as sgx_encl_free_epc_page() to be > > > > > > > > more specific that it is used to free EPC page assigned to one enclave. > > > > > > > > Print an error message when EREMOVE fails to explicitly call out EPC > > > > > > > > page is leaked, and requires machine reboot to get leaked pages back. > > > > > > > > > > > > > > > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > Co-developed-by: Kai Huang <kai.huang@intel.com> > > > > > > > > Acked-by: Jarkko Sakkinen <jarkko@kernel.org> > > > > > > > > Signed-off-by: Kai Huang <kai.huang@intel.com> > > > > > > > > --- > > > > > > > > v2->v3: > > > > > > > > > > > > > > > > - Fixed bug during copy/paste which results in SECS page and va pages are not > > > > > > > > correctly freed in sgx_encl_release() (sorry for the mistake). > > > > > > > > - Added Jarkko's Acked-by. > > > > > > > > > > > > > > That Acked-by should either be dropped or moved above Co-developed-by to make > > > > > > > checkpatch happy. > > > > > > > > > > > > > > Reviewed-by: Sean Christopherson <seanjc@google.com> > > > > > > > > > > > > Oops, my bad. Yup, ack should be removed. > > > > > > > > > > > > /Jarkko > > > > > > > > > > Hi Jarkko, > > > > > > > > > > Your reply of your concern of this patch to the cover-letter > > > > > > > > > > https://lore.kernel.org/lkml/YEkJXu262YDa8ZaK@kernel.org/ > > > > > > > > > > reminds me to do more sanity check of whether removing EREMOVE in > > > > > sgx_free_epc_page() will impact other code path or not, and I think > > > > > sgx_encl_release() is not the only place should be changed: > > > > > > > > > > - sgx_encl_shrink() needs to call sgx_encl_free_epc_page(), since when this is > > > > > called, the VA page can be already valid -- there are other failures can > > > > > trigger sgx_encl_shrink(). > > > > > > > > You right about this, good catch. > > > > > > > > Shrink needs to always do EREMOVE as grow has done EPA, which changes > > > > EPC page state. > > > > > > > > > - sgx_encl_add_page() should call sgx_encl_free_epc_page() in "err_out_free:" > > > > > label, since the EPC page can be already valid when error happened, i.e. when > > > > > EEXTEND fails. > > > > > > > > Yes, correct, good work! > > > > > > > > > Other places should be OK per my check, but I'd prefer to just replacing all > > > > > sgx_free_epc_page() call sites in driver with sgx_encl_free_epc_page(), with > > > > > one exception: sgx_alloc_va_page(), which calls sgx_free_epc_page() when EPA > > > > > fails, in which case EREMOVE is not required for sure. > > > > > > > > I would not unless they require it. > > > > > > > > > Your idea, please? > > > > > > > > > > Btw, introducing a driver wrapper of sgx_free_epc_page() does make sense to me, > > > > > because virtualization has a counterpart in sgx/virt.c too. > > > > > > > > It does make sense to use sgx_free_epc_page() everywhere where it's > > > > the right thing to call and here's why. > > > > > > > > If there is some unrelated regression that causes EPC page not get > > > > uninitialized when it actually should, doing extra EREMOVE could mask > > > > those bugs. I.e. it can postpone a failure, which can make a bug harder > > > > to backtrace. > > > > > > > > > > I.e. even though it is true that for correctly working code extra EREMOVE > > > is nil functionality, it could change semantics for buggy code. > > > > Thanks for feedback. Sorry I am not sure if I understand you. So if we don't > > want to bring functionality change, we need to replace sgx_free_epc_page() in > > all call sites with sgx_encl_free_epc_page(). To me for this patch only, it's > > better not to bring any functional change, so I intend to replace all (I now > > consider even leaving sgx_alloc_va_page() out is not good idea in *this* > > patch). > > > > Or do you just want to replace sgx_free_epc_page() with > > sgx_encl_free_epc_page() in sgx_encl_shrink() and sgx_encl_add_page(), as I > > pointed above? In this way there will be functional change in this patch, and > > we need to explicitly explain why leaving others out is OK in commit message. > > > > To me I prefer the former. > > The original purpose of this patch was exactly to remove EREMOVE > sgx_free_epc_page() and call it explicitly where it is required. That's > why I introduced sgx_reset_epc_page(). So the latter was actually the goal > of this patch at least when I did it. Now this is something completely > different. > > So, I don't consider myself author of this patch in any possible way, > because this is not what I intended. > > To move forward, for the next patch set version, you should change the > author field as yourself, and remove all my tags, and I will review it. > So you can work out this with former approach if you wish. > > I.e. my ack/nak/etc. apply to this patch because it's not my code. OK. Thanks.
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 7449ef33f081..c0b80c0853d8 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -381,6 +381,27 @@ const struct vm_operations_struct sgx_vm_ops = { .access = sgx_vma_access, }; +static void sgx_encl_free_epc_page(struct sgx_epc_page *epc_page) +{ + int ret; + + WARN_ON_ONCE(epc_page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + + /* + * Give a message to remind EPC page is leaked when EREMOVE fails, + * and requires machine reboot to get leaked pages back. This can + * be improved in future by adding stats of leaked pages, etc. + */ +#define EREMOVE_ERROR_MESSAGE \ + "EREMOVE returned %d (0x%x). EPC page leaked. Reboot required to retrieve leaked pages." + ret = __eremove(sgx_get_epc_virt_addr(epc_page)); + if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret)) + return; +#undef EREMOVE_ERROR_MESSAGE + + sgx_free_epc_page(epc_page); +} + /** * sgx_encl_release - Destroy an enclave instance * @kref: address of a kref inside &sgx_encl @@ -404,7 +425,7 @@ void sgx_encl_release(struct kref *ref) if (sgx_unmark_page_reclaimable(entry->epc_page)) continue; - sgx_free_epc_page(entry->epc_page); + sgx_encl_free_epc_page(entry->epc_page); encl->secs_child_cnt--; entry->epc_page = NULL; } @@ -415,7 +436,7 @@ void sgx_encl_release(struct kref *ref) xa_destroy(&encl->page_array); if (!encl->secs_child_cnt && encl->secs.epc_page) { - sgx_free_epc_page(encl->secs.epc_page); + sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; } @@ -423,7 +444,7 @@ void sgx_encl_release(struct kref *ref) va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); list_del(&va_page->list); - sgx_free_epc_page(va_page->epc_page); + sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 8df81a3ed945..44fe91a5bfb3 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -598,18 +598,14 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) * sgx_free_epc_page() - Free an EPC page * @page: an EPC page * - * Call EREMOVE for an EPC page and insert it back to the list of free pages. + * Put the EPC page back to the list of free pages. It's the caller's + * responsibility to make sure that the page is in uninitialized state. In other + * words, do EREMOVE, EWB or whatever operation is necessary before calling + * this function. */ void sgx_free_epc_page(struct sgx_epc_page *page) { struct sgx_epc_section *section = &sgx_epc_sections[page->section]; - int ret; - - WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); - - ret = __eremove(sgx_get_epc_virt_addr(page)); - if (WARN_ONCE(ret, "EREMOVE returned %d (0x%x)", ret, ret)) - return; spin_lock(§ion->lock); list_add_tail(&page->list, §ion->page_list);