Message ID | BYAPR12MB32059CFEB36A5FC7D859B69FD5F72@BYAPR12MB3205.namprd12.prod.outlook.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: optimize memblock_add_range() for improved performance | expand |
Hi Stephen, On Wed, Feb 05, 2025 at 05:55:50AM +0000, Stephen Eta Zhou wrote: > Hi Mike Rapoport、Andrew Morton > I have recently been researching the mm subsystem of the Linux kernel, > and I came across the memblock_add_range function, which piqued my > interest. I found the implementation approach quite interesting, so I > analyzed it and identified some areas for optimization. Starting with > this part of the code: > > if (type->cnt * 2 + 1 <= type->max) > insert = true; > The idea here is good, but it has a certain flaw. The condition is rather > restrictive, and it cannot be executed initially. Moreover, it is only > valid when the remaining space is (2/1) + 1. If there is enough memory, > but it does not satisfy (2/1) + 1, the insertion operation still needs to > be performed twice. The code in memblock_add_range() is very fragile, and many attempts to remove the second pass that looked correct at the first glance failed for some corner case. Unfortunately, it's impossible to capture all possible memory configurations and reservations in the memblock test suite, so even it it passes, there is a chance the kernel will fail to boot on an actual HW. > - Before the patch: > - Average: 1.22% > - Max: 1.63%, Min: 0.93% > > - After the patch: > - Average: 0.69% > - Max: 0.94%, Min: 0.50% > These numbers do not represent what's actually interesting: the boot time speedup.
Hi Mike, Thank you for your feedback and insights. I fully understand your concerns regarding the fragility of the code in `memblock_add_range()` and the challenges in handling edge cases. I also acknowledge your point that while the CPU cycle reductions are measurable, they may not fully capture the most important factor — the boot time speedup. Regarding the performance metrics, I want to clarify that the main goal of my optimization was to improve the boot time during the early stages of kernel initialization. While reducing the CPU cycles for `memblock_add_range()` is a positive outcome, the real benefit is in reducing kernel initialization time, particularly before the buddy system takes over. I understand that the CPU cycle reduction alone doesn't address the critical question of boot time speedup, and I will work on measuring this more directly. To address the concern about real hardware and corner cases, I plan to conduct further testing on actual hardware with different memory configurations to ensure the robustness of the patch. This will help ensure the stability and performance benefits are consistent across various setups. I also plan to increase the testing coverage for edge cases and include more robust fallback mechanisms to address the potential fragility mentioned. I want to make sure the changes handle all possible scenarios gracefully. In addition, I will focus on measuring boot time more directly using tools like `bootchart` and share the results, comparing the boot times before and after the optimization to demonstrate the actual impact on startup performance. Given the concerns raised, I would greatly appreciate your guidance on whether you think it's worthwhile for me to continue with this approach. Should I proceed with further refinements and testing, or would you recommend a different direction for optimization? Your input will be invaluable in ensuring this patch meets both performance and stability goals. Thank you again for your careful review, and I look forward to your thoughts. Best regards, Stephen Get Outlook for iOS<https://aka.ms/o0ukef>
Hi Mike, Thank you for your feedback and insights. I fully understand your concerns regarding the fragility of the code in `memblock_add_range()` and the challenges in handling edge cases. I also acknowledge your point that while the CPU cycle reductions are measurable, they may not fully capture the most important factor — the boot time speedup. Regarding the performance metrics, I want to clarify that the main goal of my optimization was to improve the boot time during the early stages of kernel initialization. While reducing the CPU cycles for `memblock_add_range()` is a positive outcome, the real benefit is in reducing kernel initialization time, particularly before the buddy system takes over. I understand that the CPU cycle reduction alone doesn't address the critical question of boot time speedup, and I will work on measuring this more directly. To address the concern about real hardware and corner cases, I plan to conduct further testing on actual hardware with different memory configurations to ensure the robustness of the patch. This will help ensure the stability and performance benefits are consistent across various setups. I also plan to increase the testing coverage for edge cases and include more robust fallback mechanisms to address the potential fragility mentioned. I want to make sure the changes handle all possible scenarios gracefully. In addition, I will focus on measuring boot time more directly using tools like `bootchart` and share the results, comparing the boot times before and after the optimization to demonstrate the actual impact on startup performance. Given the concerns raised, I would greatly appreciate your guidance on whether you think it's worthwhile for me to continue with this approach. Should I proceed with further refinements and testing, or would you recommend a different direction for optimization? Your input will be invaluable in ensuring this patch meets both performance and stability goals. Thank you again for your careful review, and I look forward to your thoughts. Best regards, Stephen
Hi Mike, Thank you for your feedback and insights. I fully understand your concerns regarding the fragility of the code in `memblock_add_range()` and the challenges in handling edge cases. I also acknowledge your point that while the CPU cycle reductions are measurable, they may not fully capture the most important factor — the boot time speedup. Regarding the performance metrics, I want to clarify that the main goal of my optimization was to improve the boot time during the early stages of kernel initialization. While reducing the CPU cycles for `memblock_add_range()` is a positive outcome, the real benefit is in reducing kernel initialization time, particularly before the buddy system takes over. I understand that the CPU cycle reduction alone doesn't address the critical question of boot time speedup, and I will work on measuring this more directly. To address the concern about real hardware and corner cases, I plan to conduct further testing on actual hardware with different memory configurations to ensure the robustness of the patch. This will help ensure the stability and performance benefits are consistent across various setups. I also plan to increase the testing coverage for edge cases and include more robust fallback mechanisms to address the potential fragility mentioned. I want to make sure the changes handle all possible scenarios gracefully. In addition, I will focus on measuring boot time more directly using tools like `bootchart` and share the results, comparing the boot times before and after the optimization to demonstrate the actual impact on startup performance. Given the concerns raised, I would greatly appreciate your guidance on whether you think it's worthwhile for me to continue with this approach. Should I proceed with further refinements and testing, or would you recommend a different direction for optimization? Your input will be invaluable in ensuring this patch meets both performance and stability goals. Thank you again for your careful review, and I look forward to your thoughts. Please note that I am resending this email because my previous one included HTML formatting and did not get successfully delivered to linux-kernel-owner@vger.kernel.org. I apologize for any inconvenience caused by the repeated email. Best regards, Stephen
Hi Mike, Thank you for your feedback and insights. I fully understand your concerns regarding the fragility of the code in `memblock_add_range()` and the challenges in handling edge cases. I also acknowledge your point that while the CPU cycle reductions are measurable, they may not fully capture the most important factor — the boot time speedup. Regarding the performance metrics, I want to clarify that the main goal of my optimization was to improve the boot time during the early stages of kernel initialization. While reducing the CPU cycles for `memblock_add_range()` is a positive outcome, the real benefit is in reducing kernel initialization time, particularly before the buddy system takes over. I understand that the CPU cycle reduction alone doesn't address the critical question of boot time speedup, and I will work on measuring this more directly. To address the concern about real hardware and corner cases, I plan to conduct further testing on actual hardware with different memory configurations to ensure the robustness of the patch. This will help ensure the stability and performance benefits are consistent across various setups. I also plan to increase the testing coverage for edge cases and include more robust fallback mechanisms to address the potential fragility mentioned. I want to make sure the changes handle all possible scenarios gracefully. In addition, I will focus on measuring boot time more directly using tools like `bootchart` and share the results, comparing the boot times before and after the optimization to demonstrate the actual impact on startup performance. Given the concerns raised, I would greatly appreciate your guidance on whether you think it's worthwhile for me to continue with this approach. Should I proceed with further refinements and testing, or would you recommend a different direction for optimization? Your input will be invaluable in ensuring this patch meets both performance and stability goals. Thank you again for your careful review, and I look forward to your thoughts. Best regards, Stephen -----邮件原件----- 发件人: Mike Rapoport <rppt@kernel.org> 发送时间: 2025年2月7日 22:59 收件人: Stephen Eta Zhou <stephen.eta.zhou@outlook.com> 抄送: akpm@linux-foundation.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] mm: optimize memblock_add_range() for improved performance Hi Stephen, On Wed, Feb 05, 2025 at 05:55:50AM +0000, Stephen Eta Zhou wrote: > Hi Mike Rapoport、Andrew Morton > I have recently been researching the mm subsystem of the Linux kernel, > and I came across the memblock_add_range function, which piqued my > interest. I found the implementation approach quite interesting, so I > analyzed it and identified some areas for optimization. Starting with > this part of the code: > > if (type->cnt * 2 + 1 <= type->max) > insert = true; > The idea here is good, but it has a certain flaw. The condition is > rather restrictive, and it cannot be executed initially. Moreover, it > is only valid when the remaining space is (2/1) + 1. If there is > enough memory, but it does not satisfy (2/1) + 1, the insertion > operation still needs to be performed twice. The code in memblock_add_range() is very fragile, and many attempts to remove the second pass that looked correct at the first glance failed for some corner case. Unfortunately, it's impossible to capture all possible memory configurations and reservations in the memblock test suite, so even it it passes, there is a chance the kernel will fail to boot on an actual HW. > - Before the patch: > - Average: 1.22% > - Max: 1.63%, Min: 0.93% > > - After the patch: > - Average: 0.69% > - Max: 0.94%, Min: 0.50% > These numbers do not represent what's actually interesting: the boot time speedup. -- Sincerely yours, Mike.
Apologies for the multiple submissions Hi Mike I apologize for the multiple submissions of my previous emails. Unfortunately, due to formatting issues, the message was repeated unintentionally. I sincerely apologize for any inconvenience caused by this. Please consider this email as the main one. If you have already seen the earlier submissions, kindly disregard them. Thank you for your understanding, and I appreciate your patience. Best regards, Stephen -----邮件原件----- 发件人: Stephen Eta Zhou <stephen.eta.zhou@outlook.com> 发送时间: 2025年2月8日 0:18 收件人: 'Mike Rapoport' <rppt@kernel.org> 抄送: akpm@linux-foundation.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org 主题: 回复: [PATCH] mm: optimize memblock_add_range() for improved performance Hi Mike, Thank you for your feedback and insights. I fully understand your concerns regarding the fragility of the code in `memblock_add_range()` and the challenges in handling edge cases. I also acknowledge your point that while the CPU cycle reductions are measurable, they may not fully capture the most important factor — the boot time speedup. Regarding the performance metrics, I want to clarify that the main goal of my optimization was to improve the boot time during the early stages of kernel initialization. While reducing the CPU cycles for `memblock_add_range()` is a positive outcome, the real benefit is in reducing kernel initialization time, particularly before the buddy system takes over. I understand that the CPU cycle reduction alone doesn't address the critical question of boot time speedup, and I will work on measuring this more directly. To address the concern about real hardware and corner cases, I plan to conduct further testing on actual hardware with different memory configurations to ensure the robustness of the patch. This will help ensure the stability and performance benefits are consistent across various setups. I also plan to increase the testing coverage for edge cases and include more robust fallback mechanisms to address the potential fragility mentioned. I want to make sure the changes handle all possible scenarios gracefully. In addition, I will focus on measuring boot time more directly using tools like `bootchart` and share the results, comparing the boot times before and after the optimization to demonstrate the actual impact on startup performance. Given the concerns raised, I would greatly appreciate your guidance on whether you think it's worthwhile for me to continue with this approach. Should I proceed with further refinements and testing, or would you recommend a different direction for optimization? Your input will be invaluable in ensuring this patch meets both performance and stability goals. Thank you again for your careful review, and I look forward to your thoughts. Best regards, Stephen -----邮件原件----- 发件人: Mike Rapoport <rppt@kernel.org> 发送时间: 2025年2月7日 22:59 收件人: Stephen Eta Zhou <stephen.eta.zhou@outlook.com> 抄送: akpm@linux-foundation.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org 主题: Re: [PATCH] mm: optimize memblock_add_range() for improved performance Hi Stephen, On Wed, Feb 05, 2025 at 05:55:50AM +0000, Stephen Eta Zhou wrote: > Hi Mike Rapoport、Andrew Morton > I have recently been researching the mm subsystem of the Linux kernel, > and I came across the memblock_add_range function, which piqued my > interest. I found the implementation approach quite interesting, so I > analyzed it and identified some areas for optimization. Starting with > this part of the code: > > if (type->cnt * 2 + 1 <= type->max) > insert = true; > The idea here is good, but it has a certain flaw. The condition is > rather restrictive, and it cannot be executed initially. Moreover, it > is only valid when the remaining space is (2/1) + 1. If there is > enough memory, but it does not satisfy (2/1) + 1, the insertion > operation still needs to be performed twice. The code in memblock_add_range() is very fragile, and many attempts to remove the second pass that looked correct at the first glance failed for some corner case. Unfortunately, it's impossible to capture all possible memory configurations and reservations in the memblock test suite, so even it it passes, there is a chance the kernel will fail to boot on an actual HW. > - Before the patch: > - Average: 1.22% > - Max: 1.63%, Min: 0.93% > > - After the patch: > - Average: 0.69% > - Max: 0.94%, Min: 0.50% > These numbers do not represent what's actually interesting: the boot time speedup. -- Sincerely yours, Mike.
Hi Stephen, Please don't top post on Linux kernel mailing list. On Fri, Feb 07, 2025 at 04:03:31PM +0000, Stephen Eta Zhou wrote: > Hi Mike, > > Given the concerns raised, I would greatly appreciate your guidance on > whether you think it's worthwhile for me to continue with this approach. > Should I proceed with further refinements and testing, or would you > recommend a different direction for optimization? Your input will be > invaluable in ensuring this patch meets both performance and stability > goals. I doubt any optimization for memblock_add_range() will really improve boot time and even produce a measurable speed up. With all the fragility and potential of breakage, I don't think it's worthwhile to attempt optimizations there. > Best regards, > Stephen > ________________________________________ > 发件人: Mike Rapoport <rppt@kernel.org> > 发送时间: 2025年2月7日 22:58 > 收件人: Stephen Eta Zhou <stephen.eta.zhou@outlook.com> > 抄送: akpm@linux-foundation.org <akpm@linux-foundation.org>; linux-mm@kvack.org <linux-mm@kvack.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org> > 主题: Re: [PATCH] mm: optimize memblock_add_range() for improved performance > > Hi Stephen, > > On Wed, Feb 05, 2025 at 05:55:50AM +0000, Stephen Eta Zhou wrote: > > Hi Mike Rapoport、Andrew Morton > > > I have recently been researching the mm subsystem of the Linux kernel, > > and I came across the memblock_add_range function, which piqued my > > interest. I found the implementation approach quite interesting, so I > > analyzed it and identified some areas for optimization. Starting with > > this part of the code: > > > > if (type->cnt * 2 + 1 <= type->max) > > insert = true; > > The idea here is good, but it has a certain flaw. The condition is rather > > restrictive, and it cannot be executed initially. Moreover, it is only > > valid when the remaining space is (2/1) + 1. If there is enough memory, > > but it does not satisfy (2/1) + 1, the insertion operation still needs to > > be performed twice. > > The code in memblock_add_range() is very fragile, and many attempts to > remove the second pass that looked correct at the first glance failed for > some corner case. > > Unfortunately, it's impossible to capture all possible memory > configurations and reservations in the memblock test suite, so even it it > passes, there is a chance the kernel will fail to boot on an actual HW. > > > - Before the patch: > > - Average: 1.22% > > - Max: 1.63%, Min: 0.93% > > > > - After the patch: > > - Average: 0.69% > > - Max: 0.94%, Min: 0.50% > > > > These numbers do not represent what's actually interesting: the boot time > speedup. > > -- > Sincerely yours, > Mike.
Hi Mike > I doubt any optimization for memblock_add_range() will really improve boot > time and even produce a measurable speed up. > > With all the fragility and potential of breakage, I don't think it's > worthwhile to attempt optimizations there. Thank you for your insights. I appreciate your guidance on this matter, and after considering your advice, I have decided to stop pursuing further optimizations for the memblock_add_range() function. I'll follow your recommendation and focus on other areas that might offer more measurable improvements. Best regards, Stephen
Hi Mike > I doubt any optimization for memblock_add_range() will really improve boot > time and even produce a measurable speed up. > > With all the fragility and potential of breakage, I don't think it's > worthwhile to attempt optimizations there. I appreciate your previous feedback and, as you suggested, I initially agreed that optimizing memblock_add_range() might not have a significant impact on boot time. However, after conducting some benchmarks, I found that the optimized version runs approximately 5 milliseconds faster compared to the current implementation. I’m unsure whether a 5ms difference is meaningful in the early boot stage, which is why I wanted to reach out and discuss it with you. If this level of improvement is considered beneficial, I am willing to extend my testing to various physical devices to ensure that the optimization works correctly across different architectures. Also, I’d like to share that this is my first time modifying the Linux kernel, and I feel a great sense of accomplishment from this experience. I really appreciate the opportunity to learn from this process and from the insights shared by experienced developers like you. This could be a long-term effort, and I’d love to hear your thoughts on whether it’s worth pursuing further. Looking forward to your insights! Best regards, Stephen
diff --git a/mm/memblock.c b/mm/memblock.c index 95af35fd1389..75c76b39a364 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -585,16 +585,16 @@ static int __init_memblock memblock_add_range(struct memblock_type *type, phys_addr_t base, phys_addr_t size, int nid, enum memblock_flags flags) { - bool insert = false; phys_addr_t obase = base; phys_addr_t end = base + memblock_cap_size(base, &size); - int idx, nr_new, start_rgn = -1, end_rgn; + phys_addr_t rbase, rend; + int idx, nr_new, start_rgn, end_rgn; struct memblock_region *rgn; if (!size) return 0; - /* special case for empty array */ + /* Special case for empty array */ if (type->regions[0].size == 0) { WARN_ON(type->cnt != 0 || type->total_size); type->regions[0].base = base; @@ -606,80 +606,114 @@ static int __init_memblock memblock_add_range(struct memblock_type *type, return 0; } + /* Delayed assignment, which is not necessary when the array is empty. */ + start_rgn = -1; /* - * The worst case is when new range overlaps all existing regions, - * then we'll need type->cnt + 1 empty regions in @type. So if - * type->cnt * 2 + 1 is less than or equal to type->max, we know - * that there is enough empty regions in @type, and we can insert - * regions directly. + * Originally, `end_rgn` didn't need to be assigned a value, + * but due to the use of nested conditional expressions, + * the compiler reports a warning that `end_rgn` is uninitialized. + * Therefore, it has been given an initial value here + * to eliminate the warning. */ - if (type->cnt * 2 + 1 <= type->max) - insert = true; + end_rgn = -1; repeat: /* - * The following is executed twice. Once with %false @insert and - * then with %true. The first counts the number of regions needed - * to accommodate the new area. The second actually inserts them. + * It is assumed that insertion is always possible under normal circumstances. + * If memory is insufficient during insertion, the operation will record the need, + * allocate memory, and then re-execute the insertion for the remaining portion. */ base = obase; nr_new = 0; for_each_memblock_type(idx, type, rgn) { - phys_addr_t rbase = rgn->base; - phys_addr_t rend = rbase + rgn->size; + rbase = rgn->base; + rend = rbase + rgn->size; if (rbase >= end) break; if (rend <= base) continue; + /* - * @rgn overlaps. If it separates the lower part of new - * area, insert that portion. + * @rgn overlaps. If it separates the lower part of new area, insert that portion. */ if (rbase > base) { #ifdef CONFIG_NUMA WARN_ON(nid != memblock_get_region_node(rgn)); #endif WARN_ON(flags != rgn->flags); - nr_new++; - if (insert) { + /* + * If memory is insufficient, the space required will be recorded. + * If memory is sufficient, the insertion will proceed. + */ + if (type->cnt >= type->max) { + /* + * Record obase as the address where the + * overlapping part has not been resolved, + * so that when repeat restarts, + * redundant operations of resolving the + * overlapping addresses are avoided. + */ + if (nr_new == 0) + obase = base; + nr_new++; + } else { if (start_rgn == -1) start_rgn = idx; end_rgn = idx + 1; - memblock_insert_region(type, idx++, base, - rbase - base, nid, - flags); + memblock_insert_region(type, idx++, base, rbase - base, nid, flags); } } - /* area below @rend is dealt with, forget about it */ + /* Area below @rend is dealt with, forget about it */ base = min(rend, end); } - /* insert the remaining portion */ + /* Insert the remaining portion */ if (base < end) { - nr_new++; - if (insert) { + /* + * Similarly, after handling the overlapping part, + * it is still possible that memory is + * insufficient. In that case, the space will be recorded once again. + */ + if (type->cnt >= type->max) { + /* + * The address of obase needs to be recorded here as well. The purpose is to + * handle the situation where, + * after resolving the overlap, there is still a remaining space to + * insert but memory is insufficient (i.e., + * no memory shortage occurred while resolving the overlap). + * This means that space for + * N (overlapping parts) + 1 (non-overlapping part) is required. + * If obase is not recorded, after memory expansion, + * base might revert to the original address to be + * inserted (which could be overlapping). + * This could lead to for_each_memblock_type attempting + * to resolve the overlap again, causing multiple unnecessary iterations, + * even if it's just a simple check. + */ + if (nr_new == 0) + obase = base; + nr_new++; + } else { if (start_rgn == -1) start_rgn = idx; end_rgn = idx + 1; - memblock_insert_region(type, idx, base, end - base, - nid, flags); + memblock_insert_region(type, idx, base, end - base, nid, flags); } } - if (!nr_new) - return 0; - /* - * If this was the first round, resize array and repeat for actual - * insertions; otherwise, merge and return. + * Finally, check if memory insufficiency occurred during insertion. + * If so, the memory will be expanded to an appropriate size, + * and the remaining portion will be inserted again. + * If not, it means memory is sufficient, and the regions will be merged directly. */ - if (!insert) { - while (type->cnt + nr_new > type->max) + if (nr_new > 0) { + while (type->cnt + nr_new > type->max) { if (memblock_double_array(type, obase, size) < 0) return -ENOMEM; - insert = true; + } goto repeat; } else { memblock_merge_regions(type, start_rgn, end_rgn);