Message ID | 20211210065533.2023-4-thunder.leizhen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | support reserving crashkernel above 4G on arm64 kdump | expand |
On 12/10/21 12:55 AM, Zhen Lei wrote: > From: Chen Zhou <chenzhou10@huawei.com> > > To make the functions reserve_crashkernel() as generic, > replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. > > Signed-off-by: Chen Zhou <chenzhou10@huawei.com> > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> > Tested-by: John Donnelly <John.p.donnelly@oracle.com> > Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com> > Acked-by: Baoquan He <bhe@redhat.com> Acked-by: John Donnelly <john.p.donnelly@oracle.com> > --- > arch/x86/kernel/setup.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 6424ee4f23da2cf..bb2a0973b98059e 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void) > if (!crash_base) { > /* > * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, > - * crashkernel=x,high reserves memory over 4G, also allocates > - * 256M extra low memory for DMA buffers and swiotlb. > + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX, > + * also allocates 256M extra low memory for DMA buffers > + * and swiotlb. > * But the extra memory is not required for all machines. > * So try low memory first and fall back to high memory > * unless "crashkernel=size[KMG],high" is specified. > @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) > } > } > > - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { > + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { > memblock_phys_free(crash_base, crash_size); > return; > }
On 12/10/21 at 02:55pm, Zhen Lei wrote: > From: Chen Zhou <chenzhou10@huawei.com> > > To make the functions reserve_crashkernel() as generic, > replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX. > > Signed-off-by: Chen Zhou <chenzhou10@huawei.com> > Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> If you made change to this patch, please remove the old Acked-by. If you didn't contribute change, Signed-off-by should be taken off. Compared this with the version I acked, only see memblock_free() -> memblock_phys_free() update which should be done from the rebase. So ack this one again, and please also consider adding Co-developed-by. > Tested-by: John Donnelly <John.p.donnelly@oracle.com> > Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com> > Acked-by: Baoquan He <bhe@redhat.com> > --- > arch/x86/kernel/setup.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 6424ee4f23da2cf..bb2a0973b98059e 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void) > if (!crash_base) { > /* > * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, > - * crashkernel=x,high reserves memory over 4G, also allocates > - * 256M extra low memory for DMA buffers and swiotlb. > + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX, > + * also allocates 256M extra low memory for DMA buffers > + * and swiotlb. > * But the extra memory is not required for all machines. > * So try low memory first and fall back to high memory > * unless "crashkernel=size[KMG],high" is specified. > @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) > } > } > > - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { > + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { > memblock_phys_free(crash_base, crash_size); > return; > } > -- > 2.25.1 >
On Tue, Dec 14, 2021 at 04:54:40PM +0800, Baoquan He wrote:
> If you didn't contribute change, Signed-off-by should be taken off.
The second SOB is correct here. I'll let you figure it out what it
means.
Hint: Documentation/process/submitting-patches.rst
On 12/14/21 at 10:38am, Borislav Petkov wrote: > On Tue, Dec 14, 2021 at 04:54:40PM +0800, Baoquan He wrote: > > If you didn't contribute change, Signed-off-by should be taken off. > > The second SOB is correct here. I'll let you figure it out what it > means. > > Hint: Documentation/process/submitting-patches.rst Ah, OK, I see the new paragraph from you added in below commit. commit 9bf19b78a203b6ed20ed7b5d7222f5ef7a49aed4 Author: Borislav Petkov <bp@alien8.de> Date: Thu Dec 17 19:37:56 2020 +0100 Documentation/submitting-patches: Document the SoB chain Hi Lei, I take back the wrong comment on Signed-off-by tag since you post the patch, please ignore them.
On Tue, Dec 14, 2021 at 05:56:57PM +0800, Baoquan He wrote:
> Ah, OK, I see the new paragraph from you added in below commit.
That is supposed to make it absolutely clear and explicit. There are
other hints as to what a subsequent SOB means in that document already.
Just the next section says:
"The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery
path."
It wouldn't hurt if people would look at that doc from time to time - we
end up referring to it on a daily basis.
On 12/14/21 at 03:24pm, Borislav Petkov wrote: > On Tue, Dec 14, 2021 at 05:56:57PM +0800, Baoquan He wrote: > > Ah, OK, I see the new paragraph from you added in below commit. > > That is supposed to make it absolutely clear and explicit. There are > other hints as to what a subsequent SOB means in that document already. > > Just the next section says: > > "The Signed-off-by: tag indicates that the signer was involved in the > development of the patch, or that he/she was in the patch's delivery > path." > > It wouldn't hurt if people would look at that doc from time to time - we > end up referring to it on a daily basis. Thanks, I need read this completely, and often check if anything new is added.
On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote: > @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) > } > } > > - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { > + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { > memblock_phys_free(crash_base, crash_size); > return; > } That's not a equivalent transformation on X86_32.
On 12/15/21 at 02:28pm, Borislav Petkov wrote: > On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote: > > @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) > > } > > } > > > > - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { > > + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { > > memblock_phys_free(crash_base, crash_size); > > return; > > } > > That's not a equivalent transformation on X86_32. reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent transformation for x86_32 doesn't matter, I think.
On 2021/12/16 9:10, Baoquan He wrote: > On 12/15/21 at 02:28pm, Borislav Petkov wrote: >> On Fri, Dec 10, 2021 at 02:55:26PM +0800, Zhen Lei wrote: >>> @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) >>> } >>> } >>> >>> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { >>> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { >>> memblock_phys_free(crash_base, crash_size); >>> return; >>> } >> >> That's not a equivalent transformation on X86_32. The original value (1ULL << 32) is inaccurate, and it enlarged the CRASH_ADDR_LOW upper limit. This is because when the memory is allocated from the low end, the address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch. If the memory is allocated from the high end, 'crash_base' is greater than or equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX. I think I should update the description, thanks. if (!high) crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX); if (!crash_base) crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_HIGH_MAX); > > reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent > transformation for x86_32 doesn't matter, I think. > > . >
On Thu, Dec 16, 2021 at 09:10:40AM +0800, Baoquan He wrote: > reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent > transformation for x86_32 doesn't matter, I think. That is, of course, very obvious... not! Why is that function parsing crashkernel=XM, and crashkernel=X,high, then it attempts some low memory reservation too? Why isn't crashkernel=Y,low parsed there too? I guess this alludes to why: crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region above 4G, that cause second kernel crash on system that require some amount of low memory, e.g. swiotlb requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. So, before this is going anywhere, I'd like to see this function documented properly. I see Documentation/admin-guide/kdump/kdump.rst explains the crashkernel= options too so you can refer to it in the comments so that when someone looks at that code, someone can follow why it is doing what it is doing. Then, as a future work, all that parsing of crashkernel= cmdline options should be concentrated at the beginning and once it is clear what the user requests, the reservations should be done. As it is, reserve_crashkernel() is pretty unwieldy and hard to read. Thx.
On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote: > The original value (1ULL << 32) is inaccurate I keep asking *why*? > and it enlarged the CRASH_ADDR_LOW upper limit. $ git grep -E "CRASH_ADDR_LOW\W" $ I have no clue what you mean here. > This is because when the memory is allocated from the low end, the > address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch. > If > the memory is allocated from the high end, 'crash_base' is greater than or > equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX. > > I think I should update the description, thanks. I think you should explain why is (1ULL << 32) wrong. It came from: eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed") which simply frees the high memory portion when the low reservation fails. And the test for that is, is crash base > 4G. So that makes perfect sense to me. So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of the _low() variant always returning 0 on 32-bit.
On 2021/12/16 19:07, Borislav Petkov wrote: > On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote: >> The original value (1ULL << 32) is inaccurate > > I keep asking *why*? > >> and it enlarged the CRASH_ADDR_LOW upper limit. > > $ git grep -E "CRASH_ADDR_LOW\W" > $ > > I have no clue what you mean here. #ifdef CONFIG_X86_32 # define CRASH_ADDR_LOW_MAX SZ_512M # define CRASH_ADDR_HIGH_MAX SZ_512M #endif if (!high) (1) crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_LOW_MAX); if (!crash_base) (2) crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, CRASH_ALIGN, CRASH_ADDR_HIGH_MAX); - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) +(3) if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) If the memory of 'crash_base' is successfully allocated at (1), because the last parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW upper limit. If the memory of 'crash_base' is successfully allocated at (2), you see, CRASH_ADDR_HIGH_MAX = CRASH_ADDR_LOW_MAX = SZ_512M, the same as (1). In fact, "crashkernel=high," may not be recommended on X86_32. Is it possible that (CRASH_ADDR_HIGH_MAX >= 4G) and (CRASH_ADDR_LOW_MAX < 4G)? In this case, the memory allocated at (2) maybe over 4G. But why shouldn't CRASH_ADDR_LOW_MAX be equal to 4G at this point? > >> This is because when the memory is allocated from the low end, the >> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch. > >> If >> the memory is allocated from the high end, 'crash_base' is greater than or >> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX. >> >> I think I should update the description, thanks. > > I think you should explain why is (1ULL << 32) wrong. > > It came from: > > eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed") > > which simply frees the high memory portion when the low reservation > fails. And the test for that is, is crash base > 4G. So that makes > perfect sense to me. > > So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of > the _low() variant always returning 0 on 32-bit. >
On 2021/12/16 20:08, Leizhen (ThunderTown) wrote: > > > On 2021/12/16 19:07, Borislav Petkov wrote: >> On Thu, Dec 16, 2021 at 10:46:12AM +0800, Leizhen (ThunderTown) wrote: >>> The original value (1ULL << 32) is inaccurate >> >> I keep asking *why*? >> >>> and it enlarged the CRASH_ADDR_LOW upper limit. >> >> $ git grep -E "CRASH_ADDR_LOW\W" >> $ >> >> I have no clue what you mean here. > > #ifdef CONFIG_X86_32 > # define CRASH_ADDR_LOW_MAX SZ_512M > # define CRASH_ADDR_HIGH_MAX SZ_512M > #endif > > if (!high) > (1) crash_base = memblock_phys_alloc_range(crash_size, > CRASH_ALIGN, CRASH_ALIGN, > CRASH_ADDR_LOW_MAX); > if (!crash_base) > (2) crash_base = memblock_phys_alloc_range(crash_size, > CRASH_ALIGN, CRASH_ALIGN, > CRASH_ADDR_HIGH_MAX); > > - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) > +(3) if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) > > If the memory of 'crash_base' is successfully allocated at (1), because the last > parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that > "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be > invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW > upper limit. > > If the memory of 'crash_base' is successfully allocated at (2), you see, > CRASH_ADDR_HIGH_MAX = CRASH_ADDR_LOW_MAX = SZ_512M, the same as (1). In fact, > "crashkernel=high," may not be recommended on X86_32. > > Is it possible that (CRASH_ADDR_HIGH_MAX >= 4G) and (CRASH_ADDR_LOW_MAX < 4G)? > In this case, the memory allocated at (2) maybe over 4G. But why shouldn't > CRASH_ADDR_LOW_MAX be equal to 4G at this point? We divide two memory areas: low memory area and high memory area. The doc told us: at least 256MB memory should be reserved at low memory area. So that if "crash_base >= CRASH_ADDR_LOW_MAX" is true at (3), that means we have not reserved any memory at low memory area, so we should call reserve_crashkernel_low(). The low memory area is not equivalent to <=4G, I think. So replace (1ULL << 32) with CRASH_ADDR_LOW_MAX is logically correct. > > >> >>> This is because when the memory is allocated from the low end, the >>> address cannot exceed CRASH_ADDR_LOW_MAX, see "if (!high)" branch. >> >>> If >>> the memory is allocated from the high end, 'crash_base' is greater than or >>> equal to (1ULL << 32), and naturally, it is greater than CRASH_ADDR_LOW_MAX. >>> >>> I think I should update the description, thanks. >> >> I think you should explain why is (1ULL << 32) wrong. >> >> It came from: >> >> eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed") >> >> which simply frees the high memory portion when the low reservation >> fails. And the test for that is, is crash base > 4G. So that makes >> perfect sense to me. >> >> So your change is a NOP on 64-bit and it is a NOP on 32-bit by virtue of >> the _low() variant always returning 0 on 32-bit. >>
On 12/16/21 at 11:55am, Borislav Petkov wrote: > On Thu, Dec 16, 2021 at 09:10:40AM +0800, Baoquan He wrote: > > reserve_crashkernel_low() always return 0 on x86_32, so the not equivalent > > transformation for x86_32 doesn't matter, I think. > > That is, of course, very obvious... not! > > Why is that function parsing crashkernel=XM, and crashkernel=X,high, > then it attempts some low memory reservation too? Why isn't > crashkernel=Y,low parsed there too? > > I guess this alludes to why: > > crashkernel=size[KMG],low > [KNL, X86-64] range under 4G. When crashkernel=X,high > is passed, kernel could allocate physical memory region > above 4G, that cause second kernel crash on system > that require some amount of low memory, e.g. swiotlb > requires at least 64M+32K low memory, also enough extra > low memory is needed to make sure DMA buffers for 32-bit > devices won't run out. > > So, before this is going anywhere, I'd like to see this function > documented properly. I see Documentation/admin-guide/kdump/kdump.rst > explains the crashkernel= options too so you can refer to it in the > comments so that when someone looks at that code, someone can follow why > it is doing what it is doing. > > Then, as a future work, all that parsing of crashkernel= cmdline options > should be concentrated at the beginning and once it is clear what the > user requests, the reservations should be done. Totally agree we should refactor code to make reserve_crashkernel() clearer on logic and readibility. In this patchset, we can rewrite the kernel-doc of reserve_crashkernel() to add more words to explain. As for the code refactoring, it can be done in another patchset. > > As it is, reserve_crashkernel() is pretty unwieldy and hard to read.
On Thu, Dec 16, 2021 at 08:08:30PM +0800, Leizhen (ThunderTown) wrote: > If the memory of 'crash_base' is successfully allocated at (1), because the last > parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that > "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be > invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW > upper limit. No, this is actually wrong - that check *must* be 4G. See: eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed") It is even documented: crashkernel=size[KMG],low [KNL, X86-64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region above 4G, that cause second kernel crash on system that require some amount of low memory, e.g. swiotlb requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. so you need to do a low allocation for DMA *when* the reserved memory is above 4G. *NOT* above 512M. But that works due to the obscure situation, as Baoquan stated, that reserve_crashkernel_low() returns 0 on 32-bit. So all this is telling us is that that function needs serious cleanup.
On Thu, Dec 16, 2021 at 10:11:15PM +0800, Baoquan He wrote:
> As for the code refactoring, it can be done in another patchset.
Well, I said "future work" before having seen where this patchset is
going. So no, in that case, the usual kernel development process is:
cleanups/fixes first - new features later.
You can see for yourself that piling more crap on what is an already
unreadable mess is only going to make it worse. So, in order to avoid
the maintenance nightmare, we clean up, streamline, document and then
add the new feature and all is shiny.
Thx.
On 2021/12/16 22:48, Borislav Petkov wrote: > On Thu, Dec 16, 2021 at 08:08:30PM +0800, Leizhen (ThunderTown) wrote: >> If the memory of 'crash_base' is successfully allocated at (1), because the last >> parameter CRASH_ADDR_LOW_MAX is the upper bound, so we can sure that >> "crash_base < CRASH_ADDR_LOW_MAX". So that, reserve_crashkernel_low() will not be >> invoked at (3). That's why I said (1ULL << 32) is inaccurate and enlarge the CRASH_ADDR_LOW >> upper limit. > > No, this is actually wrong - that check *must* be 4G. See: > > eb6db83d1059 ("x86/setup: Do not reserve crashkernel high memory if low reservation failed") > > It is even documented: > > crashkernel=size[KMG],low > [KNL, X86-64] range under 4G. When crashkernel=X,high [KNL, X86-64], This doc is for X86-64, not for X86-32 > is passed, kernel could allocate physical memory region > above 4G, that cause second kernel crash on system > that require some amount of low memory, e.g. swiotlb > requires at least 64M+32K low memory, also enough extra > low memory is needed to make sure DMA buffers for 32-bit > devices won't run out. vi arch/x86/kernel/setup.c +398 /* * Keep the crash kernel below this limit. * * Earlier 32-bits kernels would limit the kernel to the low 512 MB range * due to mapping restrictions. If there is no such restriction, we can make CRASH_ADDR_LOW_MAX equal to (1ULL << 32) minus 1 on X86_32. > > so you need to do a low allocation for DMA *when* the reserved memory is > above 4G. *NOT* above 512M. But that works due to the obscure situation, > as Baoquan stated, that reserve_crashkernel_low() returns 0 on 32-bit. > > So all this is telling us is that that function needs serious cleanup. >
On Fri, Dec 17, 2021 at 10:51:04AM +0800, Leizhen (ThunderTown) wrote: > [KNL, X86-64], This doc is for X86-64, not for X86-32 reserve_crashkernel() runs on both. > If there is no such restriction, we can make CRASH_ADDR_LOW_MAX equal > to (1ULL << 32) minus 1 on X86_32. Again, the 4G limit check is relevant only for 64-bit kernels - not 32-bit ones.
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6424ee4f23da2cf..bb2a0973b98059e 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void) if (!crash_base) { /* * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, - * crashkernel=x,high reserves memory over 4G, also allocates - * 256M extra low memory for DMA buffers and swiotlb. + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX, + * also allocates 256M extra low memory for DMA buffers + * and swiotlb. * But the extra memory is not required for all machines. * So try low memory first and fall back to high memory * unless "crashkernel=size[KMG],high" is specified. @@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void) } } - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) { + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) { memblock_phys_free(crash_base, crash_size); return; }