Message ID | 20211222130820.1754-3-thunder.leizhen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | support reserving crashkernel above 4G on arm64 kdump | expand |
On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote: > From: Chen Zhou <chenzhou10@huawei.com> > > We will make the functions reserve_crashkernel() as generic, the > xen_pv_domain() check in reserve_crashkernel() is relevant only to > x86, Why is that so? Is Xen-PV x86-only? > the same as insert_resource() in reserve_crashkernel[_low](). Why? Looking at 0212f9159694 ("x86: Add Crash kernel low reservation") it *surprisingly* explains why that resources thing is being added: We need to add another range in /proc/iomem like "Crash kernel low", so kexec-tools could find that info and append to kdump kernel command line. Then, 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") renamed it because, as it states, kexec-tools was taught to handle multiple resources of the same name. So why does kexec-tools on arm *not* need those iomem resources? How does it parse the ranges there? Questions over questions... So last time I told you to sit down and take your time with this cleanup. From reading this here, it doesn't look like it. Rather, it looks like hastily done in a hurry and hurrying stuff doesn't help you one bit - it actually makes it worse. Your commit messages need to explain *why* a change is being done and why is that ok. This one doesn't. > @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p) > * Reserve memory for crash kernel after SRAT is parsed so that it > * won't consume hotpluggable memory. > */ > - reserve_crashkernel(); > +#ifdef CONFIG_KEXEC_CORE > + if (xen_pv_domain()) > + pr_info("Ignoring crashkernel for a Xen PV domain\n"); This is wrong - the check is currently being done inside reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline correctly - and not before. Your change would print on Xen PV, regardless of whether it has received crashkernel= on the cmdline or not. This is exactly why I say that making those functions generic and shared might not be such a good idea, after all, because then you'd have to sprinkle around arch-specific stuff. One of the ways how to address this particular case here would be: 1. Add a x86-specific wrapper around parse_crashkernel() which does all the parsing. When that wrapper finishes, you should have parsed everything that has crashkernel= on the cmdline. 2. At the end of that wrapper, you do arch-specific checks and setup like the xen_pv_domain() one. 3. Now, you do reserve_crashkernel(), if those checks pass. The question is, whether the flow on arm64 can do the same. Probably but it needs careful auditing.
On 2021/12/24 1:26, Borislav Petkov wrote: > On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote: >> From: Chen Zhou <chenzhou10@huawei.com> >> >> We will make the functions reserve_crashkernel() as generic, the >> xen_pv_domain() check in reserve_crashkernel() is relevant only to >> x86, > > Why is that so? Is Xen-PV x86-only? > >> the same as insert_resource() in reserve_crashkernel[_low](). > > Why? > > Looking at > > 0212f9159694 ("x86: Add Crash kernel low reservation") > > it *surprisingly* explains why that resources thing is being added: > > We need to add another range in /proc/iomem like "Crash kernel low", > so kexec-tools could find that info and append to kdump kernel > command line. > > Then, > > 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") > > renamed it because, as it states, kexec-tools was taught to handle > multiple resources of the same name. > > So why does kexec-tools on arm *not* need those iomem resources? How > does it parse the ranges there? Questions over questions... https://lkml.org/lkml/2019/4/4/1758 Chen Zhou has explained before, see below. I'll analyze why x86 and arm64 need to process iomem resources at different times. < This very reminds what x86 does. Any chance some of the code can be reused < rather than duplicated? As i said in the comment, i transport reserve_crashkernel_low() from x86_64. There are minor differences. In arm64, we don't need to do insert_resource(), we do request_resource() in request_standard_resources() later. > > So last time I told you to sit down and take your time with this cleanup. >>From reading this here, it doesn't look like it. Rather, it looks like > hastily done in a hurry and hurrying stuff doesn't help you one bit - it > actually makes it worse. > > Your commit messages need to explain *why* a change is being done and > why is that ok. This one doesn't. OK, I'll do this in follow-up patches. > >> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p) >> * Reserve memory for crash kernel after SRAT is parsed so that it >> * won't consume hotpluggable memory. >> */ >> - reserve_crashkernel(); >> +#ifdef CONFIG_KEXEC_CORE >> + if (xen_pv_domain()) >> + pr_info("Ignoring crashkernel for a Xen PV domain\n"); > > This is wrong - the check is currently being done inside > reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline > correctly - and not before. > > Your change would print on Xen PV, regardless of whether it has received > crashkernel= on the cmdline or not. Yes, you're right. There are changes in code logic, but the print doesn't seem to cause any misunderstanding. > > This is exactly why I say that making those functions generic and shared > might not be such a good idea, after all, because then you'd have to > sprinkle around arch-specific stuff. Yes, I'm thinking about that too. Perhaps they are not suitable for full code sharing, but it looks like there's some code that can be shared. For example, the function parse_crashkernel_in_order() that I extracted based on your suggestion, it could also be parse_crashkernel_high_low(). Or the function reserve_crashkernel_low(). There are two ways to reserve memory above 4G: 1. Use crashkernel=X,high, with or without crashkernel=X,low 2. Use crashkernel=X,[offset], but try low memory first. If failed, then try high memory, and retry at least 256M low memory. I plan to only implement 2 in the next version so that there can be fewer changes. Then implement 1 after 2 is applied. > > One of the ways how to address this particular case here would be: > > 1. Add a x86-specific wrapper around parse_crashkernel() which does > all the parsing. When that wrapper finishes, you should have parsed > everything that has crashkernel= on the cmdline. > > 2. At the end of that wrapper, you do arch-specific checks and setup > like the xen_pv_domain() one. > > 3. Now, you do reserve_crashkernel(), if those checks pass. > > The question is, whether the flow on arm64 can do the same. Probably but > it needs careful auditing. >
On 2021/12/24 14:36, Leizhen (ThunderTown) wrote: > > > On 2021/12/24 1:26, Borislav Petkov wrote: >> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote: >>> From: Chen Zhou <chenzhou10@huawei.com> >>> >>> We will make the functions reserve_crashkernel() as generic, the >>> xen_pv_domain() check in reserve_crashkernel() is relevant only to >>> x86, >> >> Why is that so? Is Xen-PV x86-only? >> >>> the same as insert_resource() in reserve_crashkernel[_low](). >> >> Why? >> >> Looking at >> >> 0212f9159694 ("x86: Add Crash kernel low reservation") >> >> it *surprisingly* explains why that resources thing is being added: >> >> We need to add another range in /proc/iomem like "Crash kernel low", >> so kexec-tools could find that info and append to kdump kernel >> command line. >> >> Then, >> >> 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") >> >> renamed it because, as it states, kexec-tools was taught to handle >> multiple resources of the same name. >> >> So why does kexec-tools on arm *not* need those iomem resources? How >> does it parse the ranges there? Questions over questions... It's a good question worth figuring out. I'm going to dig into this. I admire your rigorous style and sharp vision. > > https://lkml.org/lkml/2019/4/4/1758 > > Chen Zhou has explained before, see below. I'll analyze why x86 and arm64 need > to process iomem resources at different times. > > < This very reminds what x86 does. Any chance some of the code can be reused > < rather than duplicated? > As i said in the comment, i transport reserve_crashkernel_low() from x86_64. There are minor > differences. In arm64, we don't need to do insert_resource(), we do request_resource() > in request_standard_resources() later. > >> >> So last time I told you to sit down and take your time with this cleanup. >> >From reading this here, it doesn't look like it. Rather, it looks like >> hastily done in a hurry and hurrying stuff doesn't help you one bit - it >> actually makes it worse. >> >> Your commit messages need to explain *why* a change is being done and >> why is that ok. This one doesn't. > > OK, I'll do this in follow-up patches. > >> >>> @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p) >>> * Reserve memory for crash kernel after SRAT is parsed so that it >>> * won't consume hotpluggable memory. >>> */ >>> - reserve_crashkernel(); >>> +#ifdef CONFIG_KEXEC_CORE >>> + if (xen_pv_domain()) >>> + pr_info("Ignoring crashkernel for a Xen PV domain\n"); Right, these two lines of code do not need to be moved. xen_pv_domain() is a friendly macro function. >> >> This is wrong - the check is currently being done inside >> reserve_crashkernel(), *after* it has parsed a crashkernel= cmdline >> correctly - and not before. >> >> Your change would print on Xen PV, regardless of whether it has received >> crashkernel= on the cmdline or not. > > Yes, you're right. There are changes in code logic, but the print doesn't > seem to cause any misunderstanding. > >> >> This is exactly why I say that making those functions generic and shared >> might not be such a good idea, after all, because then you'd have to >> sprinkle around arch-specific stuff. > > Yes, I'm thinking about that too. Perhaps they are not suitable for full > code sharing, but it looks like there's some code that can be shared. > For example, the function parse_crashkernel_in_order() that I extracted > based on your suggestion, it could also be parse_crashkernel_high_low(). > Or the function reserve_crashkernel_low(). > > There are two ways to reserve memory above 4G: > 1. Use crashkernel=X,high, with or without crashkernel=X,low > 2. Use crashkernel=X,[offset], but try low memory first. If failed, then > try high memory, and retry at least 256M low memory. > > I plan to only implement 2 in the next version so that there can be fewer > changes. Then implement 1 after 2 is applied. I tried it yesterday and it didn't work. I still have to deal with the problem of adjusting insert_resource(). How about I isolate some cleanup patches first? Strive for them to be merged into v5.17. This way, we can focus on the core changes in the next version. And I can also save some repetitive rebase workload. > >> >> One of the ways how to address this particular case here would be: >> >> 1. Add a x86-specific wrapper around parse_crashkernel() which does >> all the parsing. When that wrapper finishes, you should have parsed >> everything that has crashkernel= on the cmdline. >> >> 2. At the end of that wrapper, you do arch-specific checks and setup >> like the xen_pv_domain() one. >> >> 3. Now, you do reserve_crashkernel(), if those checks pass. >> >> The question is, whether the flow on arm64 can do the same. Probably but >> it needs careful auditing. >>
On 2021/12/25 9:53, Leizhen (ThunderTown) wrote: >>> This is exactly why I say that making those functions generic and shared >>> might not be such a good idea, after all, because then you'd have to >>> sprinkle around arch-specific stuff. Hi Borislav and all: Merry Christmas! I have a new idea now. It helps us get around all the arguments and minimizes changes to the x86 (also to arm64). Previously, Chen Zhou and I tried to share the entire function reserve_crashkernel(), which led to the following series of problems: 1. reserve_crashkernel() is also defined on other architectures, so we should add build option ARCH_WANT_RESERVE_CRASH_KERNEL to avoid conflicts. 2. Move xen_pv_domain() check out of reserve_crashkernel(). 3. Move insert_resource() out of reserve_crashkernel() Others: 4. start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base, crash_base + crash_size); Change SZ_1M to CRASH_ALIGN, or keep it no change. The current conclusion is no change. But I think adding a new macro CRASH_FIXED_ALIGN is also a way. 2M alignment allows page tables to use block mappings for most architectures. 5. if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) Change (1ULL << 32) to CRASH_ADDR_LOW_MAX, or keep it no change. I reanalyzed it, and this doesn't need to be changed. So for 1-3,why not add a new function reserve_crashkernel_mem() and rename reserve_crashkernel_low() to reserve_crashkernel_mem_low(). On x86: static void __init reserve_crashkernel(void) { //Parse all "crashkernel=" configurations in priority order until //a valid combination is found. Or return upon failure. if (xen_pv_domain()) { pr_info("Ignoring crashkernel for a Xen PV domain\n"); return; } //Call reserve_crashkernel_mem() to reserve crashkernel memory, it will //call reserve_crashkernel_mem_low() if needed. if (crashk_low_res.end) insert_resource(&iomem_resource, &crashk_low_res); insert_resource(&iomem_resource, &crashk_res); } On arm64: static void __init reserve_crashkernel(void) { //Parse all "crashkernel=" configurations in priority order until //a valid combination is found. Or return upon failure. //Call reserve_crashkernel_mem() to reserve crashkernel memory, it will //call reserve_crashkernel_mem_low() if needed. } 1. reserve_crashkernel() is still static, so that there is no need to add ARCH_WANT_RESERVE_CRASH_KERNEL. 2. The xen_pv_domain() check have not been affected in any way. Hi Borislav: As you mentioned, this check may also be needed on arm64. But it may be better not to add it until the problem is actually triggered on arm64. 3. insert_resource() is not moved outside reserve_crashkernel() on x86. Hi Borislav: Currently, I haven't figured out why request_resource() can't be replaced with insert_resource() on arm64. But I have a hunch that the kexec tool may be involved. The cost of modification on arm64 is definitely higher than that on x86. Other architectures that want to use reserve_crashkernel_mem() may also face the same problem. So it's probably better that function reserve_crashkernel_mem() doesn't invoke insert_resource(). I guess you have a long Christmas holiday. So I'm going to send the next version without waiting for your response. >> Yes, I'm thinking about that too. Perhaps they are not suitable for full >> code sharing, but it looks like there's some code that can be shared. >> For example, the function parse_crashkernel_in_order() that I extracted >> based on your suggestion, it could also be parse_crashkernel_high_low(). >> Or the function reserve_crashkernel_low(). >> >> There are two ways to reserve memory above 4G: >> 1. Use crashkernel=X,high, with or without crashkernel=X,low >> 2. Use crashkernel=X,[offset], but try low memory first. If failed, then >> try high memory, and retry at least 256M low memory. >> >> I plan to only implement 2 in the next version so that there can be fewer >> changes. Then implement 1 after 2 is applied. > I tried it yesterday and it didn't work. I still have to deal with the > problem of adjusting insert_resource(). > > How about I isolate some cleanup patches first? Strive for them to be > merged into v5.17. This way, we can focus on the core changes in the > next version. And I can also save some repetitive rebase workload. >
On 2021/12/25 9:53, Leizhen (ThunderTown) wrote: > > > On 2021/12/24 14:36, Leizhen (ThunderTown) wrote: >> >> >> On 2021/12/24 1:26, Borislav Petkov wrote: >>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote: >>>> From: Chen Zhou <chenzhou10@huawei.com> >>>> >>>> We will make the functions reserve_crashkernel() as generic, the >>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to >>>> x86, >>> >>> Why is that so? Is Xen-PV x86-only? >>> >>>> the same as insert_resource() in reserve_crashkernel[_low](). >>> >>> Why? >>> >>> Looking at >>> >>> 0212f9159694 ("x86: Add Crash kernel low reservation") >>> >>> it *surprisingly* explains why that resources thing is being added: >>> >>> We need to add another range in /proc/iomem like "Crash kernel low", >>> so kexec-tools could find that info and append to kdump kernel >>> command line. >>> >>> Then, >>> >>> 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") >>> >>> renamed it because, as it states, kexec-tools was taught to handle >>> multiple resources of the same name. >>> >>> So why does kexec-tools on arm *not* need those iomem resources? How >>> does it parse the ranges there? Questions over questions... Hi Borislav: The reason why insert_resource() cannot be used in reserve_crashkernel[_low]() on arm64 is clear. The parent resource node of crashk[_low]_res is added by request_resource() in request_standard_resources(), so that it will be conflicted. All request_resource() in request_standard_resources() should be changed to insert_resource(), to make insert_resource() can be used in reserve_crashkernel[_low](). I found commit e25e6e7593ca ("kdump, x86: Process multiple Crash kernel in /proc/iomem") in kexec-tools. I'm trying to port it to arm64, or make it generic. Thanks. > > It's a good question worth figuring out. I'm going to dig into this. > I admire your rigorous style and sharp vision. >
On 2022/1/7 16:13, Leizhen (ThunderTown) wrote: > > > On 2021/12/25 9:53, Leizhen (ThunderTown) wrote: >> >> >> On 2021/12/24 14:36, Leizhen (ThunderTown) wrote: >>> >>> >>> On 2021/12/24 1:26, Borislav Petkov wrote: >>>> On Wed, Dec 22, 2021 at 09:08:05PM +0800, Zhen Lei wrote: >>>>> From: Chen Zhou <chenzhou10@huawei.com> >>>>> >>>>> We will make the functions reserve_crashkernel() as generic, the >>>>> xen_pv_domain() check in reserve_crashkernel() is relevant only to >>>>> x86, >>>> >>>> Why is that so? Is Xen-PV x86-only? >>>> >>>>> the same as insert_resource() in reserve_crashkernel[_low](). >>>> >>>> Why? >>>> >>>> Looking at >>>> >>>> 0212f9159694 ("x86: Add Crash kernel low reservation") >>>> >>>> it *surprisingly* explains why that resources thing is being added: >>>> >>>> We need to add another range in /proc/iomem like "Crash kernel low", >>>> so kexec-tools could find that info and append to kdump kernel >>>> command line. >>>> >>>> Then, >>>> >>>> 157752d84f5d ("kexec: use Crash kernel for Crash kernel low") >>>> >>>> renamed it because, as it states, kexec-tools was taught to handle >>>> multiple resources of the same name. >>>> >>>> So why does kexec-tools on arm *not* need those iomem resources? How >>>> does it parse the ranges there? Questions over questions... > > Hi Borislav: > The reason why insert_resource() cannot be used in reserve_crashkernel[_low]() > on arm64 is clear. The parent resource node of crashk[_low]_res is added by > request_resource() in request_standard_resources(), so that it will be conflicted. > All request_resource() in request_standard_resources() should be changed to > insert_resource(), to make insert_resource() can be used in reserve_crashkernel[_low](). > > I found commit e25e6e7593ca ("kdump, x86: Process multiple Crash kernel in /proc/iomem") > in kexec-tools. I'm trying to port it to arm64, or make it generic. Chen Zhou's done it before. But the "Crash kernel (low)" can really be eliminated. Chen Zhou just used it to distinguish whether the crashkernel memory range is crashkernel load range or not. We can use get_crash_kernel_load_range() to get and check the load range. > > Thanks. > >> >> It's a good question worth figuring out. I'm going to dig into this. >> I admire your rigorous style and sharp vision. >> > >
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index ae8f63661363e25..acf2f2eedfe3415 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -434,7 +434,6 @@ static int __init reserve_crashkernel_low(void) crashk_low_res.start = low_base; crashk_low_res.end = low_base + low_size - 1; - insert_resource(&iomem_resource, &crashk_low_res); #endif return 0; } @@ -458,11 +457,6 @@ static void __init reserve_crashkernel(void) high = true; } - if (xen_pv_domain()) { - pr_info("Ignoring crashkernel for a Xen PV domain\n"); - return; - } - /* 0 means: find the address automatically */ if (!crash_base) { /* @@ -508,11 +502,6 @@ static void __init reserve_crashkernel(void) crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; - insert_resource(&iomem_resource, &crashk_res); -} -#else -static void __init reserve_crashkernel(void) -{ } #endif @@ -1120,7 +1109,17 @@ void __init setup_arch(char **cmdline_p) * Reserve memory for crash kernel after SRAT is parsed so that it * won't consume hotpluggable memory. */ - reserve_crashkernel(); +#ifdef CONFIG_KEXEC_CORE + if (xen_pv_domain()) + pr_info("Ignoring crashkernel for a Xen PV domain\n"); + else { + reserve_crashkernel(); + if (crashk_res.end > crashk_res.start) + insert_resource(&iomem_resource, &crashk_res); + if (crashk_low_res.end > crashk_low_res.start) + insert_resource(&iomem_resource, &crashk_low_res); + } +#endif memblock_find_dma_reserve();