Message ID | 20240829023800.1671210-1-aaron.lu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/sgx: Fix deadloop in __sgx_alloc_epc_page() | expand |
On Thu, 2024-08-29 at 10:38 +0800, Aaron Lu wrote: > When current node doesn't have a EPC section configured by firmware and > all other EPC sections memory are used up, CPU can stuck inside the > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > Note how nid_of_current will never equal to nid in that while loop because > nid_of_current is not set in sgx_numa_mask. > > Also worth mentioning is that it's perfectly fine for firmware to not > seup an EPC section on a node. Setting an EPC section on each node can seup -> set up. > be good for performance but that's not a requirement functionality wise. > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > Reported-by: Zhimin Luo <zhimin.luo@intel.com> > Tested-by: Zhimin Luo <zhimin.luo@intel.com> > Signed-off-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com>
Actually run spell check this time ... On Thu, 2024-08-29 at 10:38 +0800, Aaron Lu wrote: > When current node doesn't have a EPC section configured by firmware and "current node" -> "the current node" "a EPC section" -> "an EPC section" > all other EPC sections memory are used up, CPU can stuck inside the "EPC sections memory" -> "EPC sections" "can stuck" -> "can get stuck" > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > Note how nid_of_current will never equal to nid in that while loop because > nid_of_current is not set in sgx_numa_mask. > > Also worth mentioning is that it's perfectly fine for firmware to not > seup an EPC section on a node. Setting an EPC section on each node can > be good for performance but that's not a requirement functionality wise. > [...]
On Thu, Aug 29, 2024 at 03:56:39PM +0800, Huang, Kai wrote: > Actually run spell check this time ... > > On Thu, 2024-08-29 at 10:38 +0800, Aaron Lu wrote: > > When current node doesn't have a EPC section configured by firmware and > > "current node" -> "the current node" > > "a EPC section" -> "an EPC section" > > > all other EPC sections memory are used up, CPU can stuck inside the > > "EPC sections memory" -> "EPC sections" > > "can stuck" -> "can get stuck" > > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > Note how nid_of_current will never equal to nid in that while loop because > > nid_of_current is not set in sgx_numa_mask. > > > > Also worth mentioning is that it's perfectly fine for firmware to not > > seup an EPC section on a node. Setting an EPC section on each node can > > be good for performance but that's not a requirement functionality wise. > > > > [...] Thank you Kai for your detailed review, will reword the changelog according to your suggestions when sending the next version.
Generally, I think it's a bad idea to refer to function names in subjects. This, for instance would be much more informative: x86/sgx: Fix deadlock in SGX NUMA node search On 8/28/24 19:38, Aaron Lu wrote: > When current node doesn't have a EPC section configured by firmware and > all other EPC sections memory are used up, CPU can stuck inside the > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > Note how nid_of_current will never equal to nid in that while loop because > nid_of_current is not set in sgx_numa_mask. > > Also worth mentioning is that it's perfectly fine for firmware to not > seup an EPC section on a node. Setting an EPC section on each node can > be good for performance but that's not a requirement functionality wise. The changelog is a little rough, but I think Kai gave some good suggestions. The other thing you can do is dump the text in chatgpt (or whatever) and have it fix your grammar. It actually does a pretty decent job. Also, you didn't say _how_ you fixed this. That needs to be in here. Something along the lines of: Rework the loop to start and end on *a* node that has SGX memory. This avoids the deadlock looking for the current SGX- lacking node to show up in the loop when it never will. The code looks fine, so feel free to add: Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Also, I do think we should probably add some kind of sanity warning to the SGX code in another patch. If a node on an SGX system has CPUs and memory, it's very likely it will also have some EPC. It can be something soft like a pr_info(), but I think it would be nice to have.
On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote: > When current node doesn't have a EPC section configured by firmware and > all other EPC sections memory are used up, CPU can stuck inside the > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > Note how nid_of_current will never equal to nid in that while loop because ~~~~ Oh *that* while loop ;-) Please be more specific. > nid_of_current is not set in sgx_numa_mask. > > Also worth mentioning is that it's perfectly fine for firmware to not > seup an EPC section on a node. Setting an EPC section on each node can > be good for performance but that's not a requirement functionality wise. This lacks any description of what is done to __sgx_alloc_epc_page(). > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > Reported-by: Zhimin Luo <zhimin.luo@intel.com> > Tested-by: Zhimin Luo <zhimin.luo@intel.com> > Signed-off-by: Aaron Lu <aaron.lu@intel.com> BR, Jarkko
On Thu, Aug 29, 2024 at 08:17:53AM -0700, Dave Hansen wrote: > Generally, I think it's a bad idea to refer to function names in > subjects. This, for instance would be much more informative: > > x86/sgx: Fix deadlock in SGX NUMA node search Indeed, will use this as subject, thanks. > On 8/28/24 19:38, Aaron Lu wrote: > > When current node doesn't have a EPC section configured by firmware and > > all other EPC sections memory are used up, CPU can stuck inside the > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > Note how nid_of_current will never equal to nid in that while loop because > > nid_of_current is not set in sgx_numa_mask. > > > > Also worth mentioning is that it's perfectly fine for firmware to not > > seup an EPC section on a node. Setting an EPC section on each node can > > be good for performance but that's not a requirement functionality wise. > > The changelog is a little rough, but I think Kai gave some good > suggestions. The other thing you can do is dump the text in chatgpt (or > whatever) and have it fix your grammar. It actually does a pretty > decent job. Thanks for the suggestion. > > Also, you didn't say _how_ you fixed this. That needs to be in here. > Something along the lines of: > > Rework the loop to start and end on *a* node that has SGX > memory. This avoids the deadlock looking for the current SGX- > lacking node to show up in the loop when it never will. Will add this to the changelog, thanks for the write-up. > > The code looks fine, so feel free to add: > > Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Thanks. > > Also, I do think we should probably add some kind of sanity warning to > the SGX code in another patch. If a node on an SGX system has CPUs and > memory, it's very likely it will also have some EPC. It can be > something soft like a pr_info(), but I think it would be nice to have. I think there are systems with valid reason to not setup an EPC section per node, e.g. a 8 sockets system with SNC=2, there would be a total of 16 nodes and it's not possible to have one EPC section per node because the upper limit of EPC sections is 8. I'm not sure a warning is appropriate here, what do you think?
On Thu, Aug 29, 2024 at 07:44:13PM +0300, Jarkko Sakkinen wrote: > On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote: > > When current node doesn't have a EPC section configured by firmware and > > all other EPC sections memory are used up, CPU can stuck inside the > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > Note how nid_of_current will never equal to nid in that while loop because > ~~~~ > > Oh *that* while loop ;-) Please be more specific. What about: Note how nid_of_current will never be equal to nid in the while loop that searches an available EPC page from remote nodes because nid_of_current is not set in sgx_numa_mask. > > nid_of_current is not set in sgx_numa_mask. > > > > Also worth mentioning is that it's perfectly fine for firmware to not > > seup an EPC section on a node. Setting an EPC section on each node can > > be good for performance but that's not a requirement functionality wise. > > This lacks any description of what is done to __sgx_alloc_epc_page(). Will add what Dave suggested on how the problem is fixed to the changelog. > > > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > > Reported-by: Zhimin Luo <zhimin.luo@intel.com> > > Tested-by: Zhimin Luo <zhimin.luo@intel.com> > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> Thanks, Aaron
On 8/29/24 23:02, Aaron Lu wrote: >> Also, I do think we should probably add some kind of sanity warning to >> the SGX code in another patch. If a node on an SGX system has CPUs and >> memory, it's very likely it will also have some EPC. It can be >> something soft like a pr_info(), but I think it would be nice to have. > I think there are systems with valid reason to not setup an EPC section > per node, e.g. a 8 sockets system with SNC=2, there would be a total of > 16 nodes and it's not possible to have one EPC section per node because > the upper limit of EPC sections is 8. I'm not sure a warning is > appropriate here, what do you think? While possible, those systems are pretty rare. I don't think a softly-worded pr_info() will scare anyone too much.
On Fri, Aug 30, 2024 at 07:03:33AM -0700, Dave Hansen wrote: > On 8/29/24 23:02, Aaron Lu wrote: > >> Also, I do think we should probably add some kind of sanity warning to > >> the SGX code in another patch. If a node on an SGX system has CPUs and > >> memory, it's very likely it will also have some EPC. It can be > >> something soft like a pr_info(), but I think it would be nice to have. > > I think there are systems with valid reason to not setup an EPC section > > per node, e.g. a 8 sockets system with SNC=2, there would be a total of > > 16 nodes and it's not possible to have one EPC section per node because > > the upper limit of EPC sections is 8. I'm not sure a warning is > > appropriate here, what do you think? > > While possible, those systems are pretty rare. I don't think a > softly-worded pr_info() will scare anyone too much. Understood. Maybe something like below? From e49a78f27956b3d62c5ef962320e63dc3eeb897c Mon Sep 17 00:00:00 2001 From: Aaron Lu <aaron.lu@intel.com> Date: Mon, 2 Sep 2024 11:46:07 +0800 Subject: [PATCH] x86/sgx: Log information when a node lacks an EPC section For optimized performance, firmware typically distributes EPC sections evenly across different NUMA nodes. However, there are scenarios where a node may have both CPUs and memory but no EPC section configured. For example, in an 8-socket system with a Sub-Numa-Cluster=2 setup, there are a total of 16 nodes. Given that the maximum number of supported EPC sections is 8, it is simply not feasible to assign one EPC section to each node. This configuration is not incorrect - SGX will still operate correctly; it is just not optimized from a NUMA standpoint. For this reason, log a message when a node with both CPUs and memory lacks an EPC section. This will provide users with a hint as to why they might be experiencing less-than-ideal performance when running SGX enclaves. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> --- arch/x86/kernel/cpu/sgx/main.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 694fcf7a5e3a..3a79105455f1 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -848,6 +848,13 @@ static bool __init sgx_page_cache_init(void) return false; } + for_each_online_node(nid) { + if (!node_isset(nid, sgx_numa_mask) && + node_state(nid, N_MEMORY) && node_state(nid, N_CPU)) + pr_info("node%d has both CPUs and memory but doesn't have an EPC section\n", + nid); + } + return true; }
On Fri Aug 30, 2024 at 9:14 AM EEST, Aaron Lu wrote: > On Thu, Aug 29, 2024 at 07:44:13PM +0300, Jarkko Sakkinen wrote: > > On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote: > > > When current node doesn't have a EPC section configured by firmware and > > > all other EPC sections memory are used up, CPU can stuck inside the > > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > > Note how nid_of_current will never equal to nid in that while loop because > > ~~~~ > > > > Oh *that* while loop ;-) Please be more specific. > > What about: > Note how nid_of_current will never be equal to nid in the while loop that > searches an available EPC page from remote nodes because nid_of_current is > not set in sgx_numa_mask. That would work I think! > > > > nid_of_current is not set in sgx_numa_mask. > > > > > > Also worth mentioning is that it's perfectly fine for firmware to not > > > seup an EPC section on a node. Setting an EPC section on each node can > > > be good for performance but that's not a requirement functionality wise. > > > > This lacks any description of what is done to __sgx_alloc_epc_page(). > > Will add what Dave suggested on how the problem is fixed to the changelog. Great. I think the code change is correct reflecting these additions. I'll look the next version as a whole but with high probability I can ack that as long as the commit message has these updates. > > > > > > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > > > Reported-by: Zhimin Luo <zhimin.luo@intel.com> > > > Tested-by: Zhimin Luo <zhimin.luo@intel.com> > > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> > > Thanks, > Aaron BR, Jarkko
On Tue, Sep 03, 2024 at 07:05:40PM +0300, Jarkko Sakkinen wrote: > On Fri Aug 30, 2024 at 9:14 AM EEST, Aaron Lu wrote: > > On Thu, Aug 29, 2024 at 07:44:13PM +0300, Jarkko Sakkinen wrote: > > > On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote: > > > > When current node doesn't have a EPC section configured by firmware and > > > > all other EPC sections memory are used up, CPU can stuck inside the > > > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > > > Note how nid_of_current will never equal to nid in that while loop because > > > ~~~~ > > > > > > Oh *that* while loop ;-) Please be more specific. > > > > What about: > > Note how nid_of_current will never be equal to nid in the while loop that > > searches an available EPC page from remote nodes because nid_of_current is > > not set in sgx_numa_mask. > > That would work I think! While rewriting the changelog, I find it more natural to explain this "while loop" when I first mentioned it, i.e. When the current node doesn't have an EPC section configured by firmware and all other EPC sections are used up, CPU can get stuck inside the while loop that looks for an available EPC page from remote nodes indefinitely, leading to a soft lockup. Note how nid_of_current will never be equal to nid in that while loop because nid_of_current is not set in sgx_numa_mask. I hope this looks fine to you. > > > > > > nid_of_current is not set in sgx_numa_mask. > > > > > > > > Also worth mentioning is that it's perfectly fine for firmware to not > > > > seup an EPC section on a node. Setting an EPC section on each node can > > > > be good for performance but that's not a requirement functionality wise. > > > > > > This lacks any description of what is done to __sgx_alloc_epc_page(). > > > > Will add what Dave suggested on how the problem is fixed to the changelog. > > Great. I think the code change is correct reflecting these additions. > I'll look the next version as a whole but with high probability I can > ack that as long as the commit message has these updates. Thanks. > > > > > > > > > > Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") > > > > Reported-by: Zhimin Luo <zhimin.luo@intel.com> > > > > Tested-by: Zhimin Luo <zhimin.luo@intel.com> > > > > Signed-off-by: Aaron Lu <aaron.lu@intel.com> > > > > Thanks, > > Aaron > > BR, Jarkko
On Wed Sep 4, 2024 at 4:39 AM EEST, Aaron Lu wrote: > On Tue, Sep 03, 2024 at 07:05:40PM +0300, Jarkko Sakkinen wrote: > > On Fri Aug 30, 2024 at 9:14 AM EEST, Aaron Lu wrote: > > > On Thu, Aug 29, 2024 at 07:44:13PM +0300, Jarkko Sakkinen wrote: > > > > On Thu Aug 29, 2024 at 5:38 AM EEST, Aaron Lu wrote: > > > > > When current node doesn't have a EPC section configured by firmware and > > > > > all other EPC sections memory are used up, CPU can stuck inside the > > > > > while loop in __sgx_alloc_epc_page() forever and soft lockup will happen. > > > > > Note how nid_of_current will never equal to nid in that while loop because > > > > ~~~~ > > > > > > > > Oh *that* while loop ;-) Please be more specific. > > > > > > What about: > > > Note how nid_of_current will never be equal to nid in the while loop that > > > searches an available EPC page from remote nodes because nid_of_current is > > > not set in sgx_numa_mask. > > > > That would work I think! > > While rewriting the changelog, I find it more natural to explain this > "while loop" when I first mentioned it, i.e. > > When the current node doesn't have an EPC section configured by firmware > and all other EPC sections are used up, CPU can get stuck inside the > while loop that looks for an available EPC page from remote nodes > indefinitely, leading to a soft lockup. Note how nid_of_current will > never be equal to nid in that while loop because nid_of_current is not > set in sgx_numa_mask. > > I hope this looks fine to you. Yes, it is. I just want the storyline to the commit message as a reminder why we did this, that's all. It helps a lot later on when having to look into commit history. BR, Jarkko
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 1a000acd933a..694fcf7a5e3a 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -475,24 +475,25 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) { struct sgx_epc_page *page; int nid_of_current = numa_node_id(); - int nid = nid_of_current; + int nid_start, nid; - if (node_isset(nid_of_current, sgx_numa_mask)) { - page = __sgx_alloc_epc_page_from_node(nid_of_current); - if (page) - return page; - } - - /* Fall back to the non-local NUMA nodes: */ - while (true) { - nid = next_node_in(nid, sgx_numa_mask); - if (nid == nid_of_current) - break; + /* + * Try local node first. If it doesn't have an EPC section, + * fall back to the non-local NUMA nodes. + */ + if (node_isset(nid_of_current, sgx_numa_mask)) + nid_start = nid_of_current; + else + nid_start = next_node_in(nid_of_current, sgx_numa_mask); + nid = nid_start; + do { page = __sgx_alloc_epc_page_from_node(nid); if (page) return page; - } + + nid = next_node_in(nid, sgx_numa_mask); + } while (nid != nid_start); return ERR_PTR(-ENOMEM); }