Message ID | 20220414115720.1887-6-thunder.leizhen@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | support reserving crashkernel above 4G on arm64 kdump | expand |
On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: > /* > * reserve_crashkernel() - reserves memory for crash kernel > * > * This function reserves memory area given in "crashkernel=" kernel command > * line parameter. The memory reserved is used by dump capture kernel when > * primary kernel is crashing. > + * > + * NOTE: Reservation of crashkernel,low is special since its existence > + * is not independent, need rely on the existence of crashkernel,high. > + * Here, four cases of crashkernel low memory reservation are summarized: > + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low > + * memory takes Y; > + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, > + * take the default crashkernel low memory size; > + * 3) crashkernel=X is specified, while fallback to get a memory region > + * in high memory, take the default crashkernel low memory size; > + * 4) crashkernel='invalid value',low is specified, failed the whole > + * crashkernel reservation and bail out. Following the x86 behaviour made sense when we were tried to get that code generic. Now that we moved the logic under arch/arm64, we can diverge a bit. I lost track of the original (v1/v2) proposal but I wonder whether we still need the fallback to high for crashkernel=Y. Maybe simpler, no fallbacks: crashkernel=Y - keep the current behaviour, ignore high,low crashkernel=Y,high - allocate above ZONE_DMA crashkernel=Y,low - allocate within ZONE_DMA From your proposal, the difference is that the Y,high option won't have any default ZONE_DMA fallback, one would have to explicitly pass the Y,low option if needed. Just a thought, maybe it makes the code simpler. But I'm open to discussion if there are good arguments for the proposed (x86-like) behaviour. One argument could be for crashkernel=Y to fall back to high if distros don't want to bother with high/low settings. Another thing I may have asked in the past, what happens if we run a new kernel with these patches with old kexec user tools. I suspect the crashkernel=Y with the fallback to high will confuse the tools. BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the crashkernel above 4G. Let's get the crashkernel reservations sorted first, it's been around for too long. Thanks.
On 2022/4/27 2:02, Catalin Marinas wrote: > On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: >> /* >> * reserve_crashkernel() - reserves memory for crash kernel >> * >> * This function reserves memory area given in "crashkernel=" kernel command >> * line parameter. The memory reserved is used by dump capture kernel when >> * primary kernel is crashing. >> + * >> + * NOTE: Reservation of crashkernel,low is special since its existence >> + * is not independent, need rely on the existence of crashkernel,high. >> + * Here, four cases of crashkernel low memory reservation are summarized: >> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low >> + * memory takes Y; >> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, >> + * take the default crashkernel low memory size; >> + * 3) crashkernel=X is specified, while fallback to get a memory region >> + * in high memory, take the default crashkernel low memory size; >> + * 4) crashkernel='invalid value',low is specified, failed the whole >> + * crashkernel reservation and bail out. > > Following the x86 behaviour made sense when we were tried to get that > code generic. Now that we moved the logic under arch/arm64, we can > diverge a bit. I lost track of the original (v1/v2) proposal but I > wonder whether we still need the fallback to high for crashkernel=Y. I don't think anyone has raised this demand yet! If it weren't for the fact that crashkernel=X appeared earlier, it would probably have been enough for a combination of crashkernel=X,high and crashkernel=Y,low. In fact, I also tend not to support "fallback to high for crashkernel=Y". I took over this from Chen Zhou. In the absence of any objection, I had to inherit. Now that you've brought it up, I'm happy to delete it. Supporting this feature complicates the code logic a lot. The point is, it's not fully backwards compatible yet. For example, someone may want crashkernel=3G to report failure, but the the new support make it work. > Maybe simpler, no fallbacks: > > crashkernel=Y - keep the current behaviour, ignore high,low > crashkernel=Y,high - allocate above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > >>From your proposal, the difference is that the Y,high option won't > have any default ZONE_DMA fallback, one would have to explicitly pass > the Y,low option if needed. I agree with you. Now we don't need code generic, so there is no need to carry the historical burden of other ARCHs. arm64 does not need to delve into that empirical value(the default size of crash low memory). > > Just a thought, maybe it makes the code simpler. But I'm open to > discussion if there are good arguments for the proposed (x86-like) > behaviour. One argument could be for crashkernel=Y to fall back to high > if distros don't want to bother with high/low settings. I think distros should take precedence over "crashkernel=Y,high". After all, ZONE_DMA memory is more valuable than high memory. > > Another thing I may have asked in the past, what happens if we run a new > kernel with these patches with old kexec user tools. I suspect the > crashkernel=Y with the fallback to high will confuse the tools. If crashkernel=Y can reserve the memory in Zone_DMA successfully, the old kexec works well. But if crashkernel=Y fallback to high memory, the second kernel will boot failed, because the old kexec can only use dtb to pass the high memory range to the second kernel. In comparison, if no fallback, we can see crash memory reservation failure in the first kernel, so we have a chance to adjust Y. Currently, the new kexec tool will pick the last memory range(sorted by address in ascending order) to store Image,dtb,initrd. > > BTW, please separate the NO_BLOCK_MAPPINGS optimisations from the > crashkernel above 4G. Let's get the crashkernel reservations sorted > first, it's been around for too long. OK, thank you. That's a good suggestion. > > Thanks. >
On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote: > On 2022/4/27 2:02, Catalin Marinas wrote: > > On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: > >> /* > >> * reserve_crashkernel() - reserves memory for crash kernel > >> * > >> * This function reserves memory area given in "crashkernel=" kernel command > >> * line parameter. The memory reserved is used by dump capture kernel when > >> * primary kernel is crashing. > >> + * > >> + * NOTE: Reservation of crashkernel,low is special since its existence > >> + * is not independent, need rely on the existence of crashkernel,high. > >> + * Here, four cases of crashkernel low memory reservation are summarized: > >> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low > >> + * memory takes Y; > >> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, > >> + * take the default crashkernel low memory size; > >> + * 3) crashkernel=X is specified, while fallback to get a memory region > >> + * in high memory, take the default crashkernel low memory size; > >> + * 4) crashkernel='invalid value',low is specified, failed the whole > >> + * crashkernel reservation and bail out. > > > > Following the x86 behaviour made sense when we were tried to get that > > code generic. Now that we moved the logic under arch/arm64, we can > > diverge a bit. I lost track of the original (v1/v2) proposal but I > > wonder whether we still need the fallback to high for crashkernel=Y. > > I don't think anyone has raised this demand yet! If it weren't for the > fact that crashkernel=X appeared earlier, it would probably have been > enough for a combination of crashkernel=X,high and crashkernel=Y,low. > > In fact, I also tend not to support "fallback to high for crashkernel=Y". > I took over this from Chen Zhou. In the absence of any objection, I had > to inherit. Now that you've brought it up, I'm happy to delete it. > Supporting this feature complicates the code logic a lot. The point is, > it's not fully backwards compatible yet. For example, someone may want > crashkernel=3G to report failure, but the the new support make it work. BTW, prior to v20, this patch had this line: crashk_low_res.name = "Crash kernel (low)"; I can't find it anymore. Do the kexec tools need to distinguish between low and high or they can cope with multiple "Crash kernel" entries? > > Maybe simpler, no fallbacks: > > > > crashkernel=Y - keep the current behaviour, ignore high,low > > crashkernel=Y,high - allocate above ZONE_DMA > > crashkernel=Y,low - allocate within ZONE_DMA > > > > From your proposal, the difference is that the Y,high option won't > > have any default ZONE_DMA fallback, one would have to explicitly pass > > the Y,low option if needed. > > I agree with you. Now we don't need code generic, so there is no need to > carry the historical burden of other ARCHs. arm64 does not need to delve > into that empirical value(the default size of crash low memory). > > > Just a thought, maybe it makes the code simpler. But I'm open to > > discussion if there are good arguments for the proposed (x86-like) > > behaviour. One argument could be for crashkernel=Y to fall back to high > > if distros don't want to bother with high/low settings. > > I think distros should take precedence over "crashkernel=Y,high". After all, > ZONE_DMA memory is more valuable than high memory. My point is whether an admin configuring the kernel command line needs to know the layout of ZONE_DMA etc. to figure out how much to pass in high and low. The fallbacks in this case have some value but they also complicate the code logic. The 4GB limit does not always make sense either for some platforms (RPi4 has a ZONE_DMA limit of 1GB). I think one could always pass a default command line like: crashkernel=1G,high crashkernel=128M,low without much knowledge of the SoC memory layout. Another option is to only introduce crashkernel=Y,low and, when that is passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a 'high' option at all: crashkernel=1G - all within ZONE_DMA crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA 1G above ZONE_DMA If ZONE_DMA is not present or it extends to the whole RAM, we can ignore the 'low' option.
On 2022/4/27 20:32, Catalin Marinas wrote: > On Wed, Apr 27, 2022 at 02:54:52PM +0800, Leizhen (ThunderTown) wrote: >> On 2022/4/27 2:02, Catalin Marinas wrote: >>> On Thu, Apr 14, 2022 at 07:57:16PM +0800, Zhen Lei wrote: >>>> /* >>>> * reserve_crashkernel() - reserves memory for crash kernel >>>> * >>>> * This function reserves memory area given in "crashkernel=" kernel command >>>> * line parameter. The memory reserved is used by dump capture kernel when >>>> * primary kernel is crashing. >>>> + * >>>> + * NOTE: Reservation of crashkernel,low is special since its existence >>>> + * is not independent, need rely on the existence of crashkernel,high. >>>> + * Here, four cases of crashkernel low memory reservation are summarized: >>>> + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low >>>> + * memory takes Y; >>>> + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, >>>> + * take the default crashkernel low memory size; >>>> + * 3) crashkernel=X is specified, while fallback to get a memory region >>>> + * in high memory, take the default crashkernel low memory size; >>>> + * 4) crashkernel='invalid value',low is specified, failed the whole >>>> + * crashkernel reservation and bail out. >>> >>> Following the x86 behaviour made sense when we were tried to get that >>> code generic. Now that we moved the logic under arch/arm64, we can >>> diverge a bit. I lost track of the original (v1/v2) proposal but I >>> wonder whether we still need the fallback to high for crashkernel=Y. >> >> I don't think anyone has raised this demand yet! If it weren't for the >> fact that crashkernel=X appeared earlier, it would probably have been >> enough for a combination of crashkernel=X,high and crashkernel=Y,low. >> >> In fact, I also tend not to support "fallback to high for crashkernel=Y". >> I took over this from Chen Zhou. In the absence of any objection, I had >> to inherit. Now that you've brought it up, I'm happy to delete it. >> Supporting this feature complicates the code logic a lot. The point is, >> it's not fully backwards compatible yet. For example, someone may want >> crashkernel=3G to report failure, but the the new support make it work. > > BTW, prior to v20, this patch had this line: > > crashk_low_res.name = "Crash kernel (low)"; > > I can't find it anymore. Do the kexec tools need to distinguish between > low and high or they can cope with multiple "Crash kernel" entries? Yes, I've updated the kexec tool patch based on Borislav Petkov's comments to keep it the same as x86. And this patch has been merged into kexec v2.0.24. b5a34a2 arm64: support more than one crash kernel regions The kexec tool will first sorted all "Crash kernel" memory range in ascending order, then choose the last one (the memory range with the highest address) to store the second kernel's Image,dtb,initrd. > >>> Maybe simpler, no fallbacks: >>> >>> crashkernel=Y - keep the current behaviour, ignore high,low >>> crashkernel=Y,high - allocate above ZONE_DMA >>> crashkernel=Y,low - allocate within ZONE_DMA >>> >>> From your proposal, the difference is that the Y,high option won't >>> have any default ZONE_DMA fallback, one would have to explicitly pass >>> the Y,low option if needed. >> >> I agree with you. Now we don't need code generic, so there is no need to >> carry the historical burden of other ARCHs. arm64 does not need to delve >> into that empirical value(the default size of crash low memory). >> >>> Just a thought, maybe it makes the code simpler. But I'm open to >>> discussion if there are good arguments for the proposed (x86-like) >>> behaviour. One argument could be for crashkernel=Y to fall back to high >>> if distros don't want to bother with high/low settings. >> >> I think distros should take precedence over "crashkernel=Y,high". After all, >> ZONE_DMA memory is more valuable than high memory. > > My point is whether an admin configuring the kernel command line needs > to know the layout of ZONE_DMA etc. to figure out how much to pass in No need. > high and low. The fallbacks in this case have some value but they also > complicate the code logic. The 4GB limit does not always make sense > either for some platforms (RPi4 has a ZONE_DMA limit of 1GB). > > I think one could always pass a default command line like: > > crashkernel=1G,high crashkernel=128M,low > > without much knowledge of the SoC memory layout. Yes, that's what the end result is. The user specify crashkernel=128M,low and the implementation ensure the 128M low memory is allocated from DMA zone. We use arm64_dma_phys_limit as the upper limit for crash low memory. +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); > > Another option is to only introduce crashkernel=Y,low and, when that is > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > 'high' option at all: > > crashkernel=1G - all within ZONE_DMA > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > 1G above ZONE_DMA > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > the 'low' option. I think although the code is hard to make generic, the interface is better to be relatively uniform. A user might have to maintain both x86 and arm64, and so on. It's not a good thing that the difference is too big. A well had already been dug by the forefathers, and it seemed unnecessary to dig another well beside it if there was no particular advantage. >
On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: > On 2022/4/27 20:32, Catalin Marinas wrote: > > I think one could always pass a default command line like: > > > > crashkernel=1G,high crashkernel=128M,low > > > > without much knowledge of the SoC memory layout. > > Yes, that's what the end result is. The user specify crashkernel=128M,low > and the implementation ensure the 128M low memory is allocated from DMA zone. > We use arm64_dma_phys_limit as the upper limit for crash low memory. > > +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > crash_base, crash_max); > > > Another option is to only introduce crashkernel=Y,low and, when that is > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > > 'high' option at all: > > > > crashkernel=1G - all within ZONE_DMA > > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > > 1G above ZONE_DMA > > > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > > the 'low' option. > > I think although the code is hard to make generic, the interface is better to > be relatively uniform. A user might have to maintain both x86 and arm64, and > so on. It's not a good thing that the difference is too big. There will be some difference as the 4G limit doesn't always hold for arm64 (though it's true in most cases). Anyway, we can probably simplify things a bit while following the documented behaviour: crashkernel=Y - current behaviour within ZONE_DMA crashkernel=Y,high - allocate from above ZONE_DMA crashkernel=Y,low - allocate within ZONE_DMA There is no fallback from crashkernel=Y. The question is whether we still want a default low allocation if crashkernel=Y,low is missing but 'high' is present. If we add this, I think we'd be consistent with kernel-parameters.txt for the 'low' description. A default 'low' is probably not that bad but I'm tempted to always mandate both 'high' and 'low'.
On 2022/4/28 0:04, Catalin Marinas wrote: > On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >> On 2022/4/27 20:32, Catalin Marinas wrote: >>> I think one could always pass a default command line like: >>> >>> crashkernel=1G,high crashkernel=128M,low >>> >>> without much knowledge of the SoC memory layout. >> >> Yes, that's what the end result is. The user specify crashkernel=128M,low >> and the implementation ensure the 128M low memory is allocated from DMA zone. >> We use arm64_dma_phys_limit as the upper limit for crash low memory. >> >> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >> crash_base, crash_max); >> >>> Another option is to only introduce crashkernel=Y,low and, when that is >>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>> 'high' option at all: >>> >>> crashkernel=1G - all within ZONE_DMA >>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>> 1G above ZONE_DMA >>> >>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>> the 'low' option. >> >> I think although the code is hard to make generic, the interface is better to >> be relatively uniform. A user might have to maintain both x86 and arm64, and >> so on. It's not a good thing that the difference is too big. > > There will be some difference as the 4G limit doesn't always hold for > arm64 (though it's true in most cases). Anyway, we can probably simplify > things a bit while following the documented behaviour: > > crashkernel=Y - current behaviour within ZONE_DMA > crashkernel=Y,high - allocate from above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > > There is no fallback from crashkernel=Y. Yes, I followed your guidelines yesterday to modify the code. Now the code flow is much clearer. > > The question is whether we still want a default low allocation if > crashkernel=Y,low is missing but 'high' is present. If we add this, I > think we'd be consistent with kernel-parameters.txt for the 'low' > description. A default 'low' is probably not that bad but I'm tempted to > always mandate both 'high' and 'low'. Yes, I agree with you. Because the situation is complicated, the default value is hard to be accurate. It's better to let the user configure it according to the actual situation, they're also programmers. Whether mandate both 'high' and 'low', or allow only 'high' like x86(but the default value becomes zero). I prefer the latter. The size of 'low' maybe zero, for example, SMMU is enabled on the second kernel. If only high memory is required, only that high memory needs to be configured, seems more reasonable. >
Hi Catalin, Zhen Lei, On 04/27/22 at 05:04pm, Catalin Marinas wrote: > On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: > > On 2022/4/27 20:32, Catalin Marinas wrote: > > > I think one could always pass a default command line like: > > > > > > crashkernel=1G,high crashkernel=128M,low > > > > > > without much knowledge of the SoC memory layout. > > > > Yes, that's what the end result is. The user specify crashkernel=128M,low > > and the implementation ensure the 128M low memory is allocated from DMA zone. > > We use arm64_dma_phys_limit as the upper limit for crash low memory. > > > > +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > > + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > > + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > > crash_base, crash_max); > > > > > Another option is to only introduce crashkernel=Y,low and, when that is > > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > > > 'high' option at all: > > > > > > crashkernel=1G - all within ZONE_DMA > > > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > > > 1G above ZONE_DMA > > > > > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > > > the 'low' option. > > > > I think although the code is hard to make generic, the interface is better to > > be relatively uniform. A user might have to maintain both x86 and arm64, and > > so on. It's not a good thing that the difference is too big. > > There will be some difference as the 4G limit doesn't always hold for > arm64 (though it's true in most cases). Anyway, we can probably simplify > things a bit while following the documented behaviour: > > crashkernel=Y - current behaviour within ZONE_DMA > crashkernel=Y,high - allocate from above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > > There is no fallback from crashkernel=Y. > > The question is whether we still want a default low allocation if > crashkernel=Y,low is missing but 'high' is present. If we add this, I > think we'd be consistent with kernel-parameters.txt for the 'low' > description. A default 'low' is probably not that bad but I'm tempted to > always mandate both 'high' and 'low'. Sorry to interrupt. Seems the ,high ,low and fallback are main concerns about this version. And I have the same concerns about them which comes from below points: 1) we may need to take best effort to keep ,high, ,low behaviour consistent on all ARCHes. Otherwise user/admin may be confused when they deploy/configure kdump on different machines of different ARCHes in the same LAB. I think we should try to avoid the confusion. 2) Fallback behaviour is important to our distros. The reason is we will provide default value with crashkernel=xxxM along kernel of distros. In this case, we hope the reservation will succeed by all means. The ,high and ,low is an option if customer likes to take with expertise. After going through arm64 memory init code, I got below summary about arm64_dma_phys_limit which is the first zone's upper limit. I think we can make use of it to facilitate to simplify code. ================================================================================ DMA DMA32 NORMAL 1)Raspberry Pi4 0~1G 3G~4G (above 4G) 2)Normal machine 0~4G 0 (above 4G) 3)Special machine (above 4G)~MAX 4)No DMA|DMA32 (above 4G)~MAX ------------------------------------------- arm64_dma_phys_limit 1)Raspberry Pi4 1G 2)Normal machine 4G 3)Special machine MAX 4)No DMA|DMA32 MAX Note: 3)Special machine means the machine's starting physical address is above 4G. WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has IOMMU hardware supporting. =================================================================================== I made a draft patch based on this patchset, please feel free to check and see if it's OK, or anything missing or wrongly understood. I removed reserve_crashkernel_high() and only keep reserve_crashkernel() and reserve_crashkernel_low() as the v21 did. Thanks Baoquan
On 04/28/22 at 11:40am, Baoquan He wrote: > Hi Catalin, Zhen Lei, > > On 04/27/22 at 05:04pm, Catalin Marinas wrote: > > On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: > > > On 2022/4/27 20:32, Catalin Marinas wrote: > > > > I think one could always pass a default command line like: > > > > > > > > crashkernel=1G,high crashkernel=128M,low > > > > > > > > without much knowledge of the SoC memory layout. > > > > > > Yes, that's what the end result is. The user specify crashkernel=128M,low > > > and the implementation ensure the 128M low memory is allocated from DMA zone. > > > We use arm64_dma_phys_limit as the upper limit for crash low memory. > > > > > > +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > > > + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > > > + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > > > crash_base, crash_max); > > > > > > > Another option is to only introduce crashkernel=Y,low and, when that is > > > > passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > > > > 'high' option at all: > > > > > > > > crashkernel=1G - all within ZONE_DMA > > > > crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > > > > 1G above ZONE_DMA > > > > > > > > If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > > > > the 'low' option. > > > > > > I think although the code is hard to make generic, the interface is better to > > > be relatively uniform. A user might have to maintain both x86 and arm64, and > > > so on. It's not a good thing that the difference is too big. > > > > There will be some difference as the 4G limit doesn't always hold for > > arm64 (though it's true in most cases). Anyway, we can probably simplify > > things a bit while following the documented behaviour: > > > > crashkernel=Y - current behaviour within ZONE_DMA > > crashkernel=Y,high - allocate from above ZONE_DMA > > crashkernel=Y,low - allocate within ZONE_DMA > > > > There is no fallback from crashkernel=Y. > > > > The question is whether we still want a default low allocation if > > crashkernel=Y,low is missing but 'high' is present. If we add this, I > > think we'd be consistent with kernel-parameters.txt for the 'low' > > description. A default 'low' is probably not that bad but I'm tempted to > > always mandate both 'high' and 'low'. > > Sorry to interrupt. Seems the ,high ,low and fallback are main concerns > about this version. And I have the same concerns about them which comes > from below points: > 1) we may need to take best effort to keep ,high, ,low behaviour > consistent on all ARCHes. Otherwise user/admin may be confused when they > deploy/configure kdump on different machines of different ARCHes in the > same LAB. I think we should try to avoid the confusion. > 2) Fallback behaviour is important to our distros. The reason is we will > provide default value with crashkernel=xxxM along kernel of distros. In > this case, we hope the reservation will succeed by all means. The ,high > and ,low is an option if customer likes to take with expertise. > > After going through arm64 memory init code, I got below summary about > arm64_dma_phys_limit which is the first zone's upper limit. I think we > can make use of it to facilitate to simplify code. > ================================================================================ > DMA DMA32 NORMAL > 1)Raspberry Pi4 0~1G 3G~4G (above 4G) > 2)Normal machine 0~4G 0 (above 4G) > 3)Special machine (above 4G)~MAX > 4)No DMA|DMA32 (above 4G)~MAX > > ------------------------------------------- > arm64_dma_phys_limit > 1)Raspberry Pi4 1G > 2)Normal machine 4G > 3)Special machine MAX > 4)No DMA|DMA32 MAX > > Note: 3)Special machine means the machine's starting physical address is above 4G. > WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has > IOMMU hardware supporting. > =================================================================================== > > I made a draft patch based on this patchset, please feel free to check and > see if it's OK, or anything missing or wrongly understood. I removed > reserve_crashkernel_high() and only keep reserve_crashkernel() and > reserve_crashkernel_low() as the v21 did. Sorry, forgot attaching the draft patch. By the way, we can also have a simple version with basic ,high, ,low support, no fallback. We can add fallback and other optimization later. This can be plan B. diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 4a8200f29b35..aa1fbea47c46 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -84,11 +84,7 @@ EXPORT_SYMBOL(memstart_addr); * Note: Page-granularity mappings are necessary for crash kernel memory * range for shrinking its size via /sys/kernel/kexec_crash_size interface. */ -#if IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32) phys_addr_t __ro_after_init arm64_dma_phys_limit; -#else -phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; -#endif bool crash_low_mem_page_map __initdata; static bool crash_high_mem_reserved __initdata; @@ -132,63 +128,6 @@ static int __init reserve_crashkernel_low(unsigned long long low_size) return 0; } -static void __init reserve_crashkernel_high(void) -{ - unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; - int ret; - - if (!IS_ENABLED(CONFIG_KEXEC_CORE)) - return; - - /* crashkernel=X[@offset] */ - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), - &crash_size, &crash_base); - if (ret || !crash_size) { - ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base); - if (ret || !crash_size) - return; - } else if (!crash_base) { - crash_low_mem_page_map = true; - } - - crash_size = PAGE_ALIGN(crash_size); - - /* - * For the case crashkernel=X, may fall back to reserve memory above - * 4G, make reservations here in advance. It will be released later if - * the region is successfully reserved under 4G. - */ - if (!crash_base) { - crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, - crash_base, CRASH_ADDR_HIGH_MAX); - if (!crash_base) - return; - - crash_high_mem_reserved = true; - } - - /* Mark the memory range that requires page-level mappings */ - crashk_res.start = crash_base; - crashk_res.end = crash_base + crash_size - 1; -} - -static void __init hand_over_reserved_high_mem(void) -{ - crashk_res_high.start = crashk_res.start; - crashk_res_high.end = crashk_res.end; - - crashk_res.start = 0; - crashk_res.end = 0; -} - -static void __init take_reserved_high_mem(unsigned long long *crash_base, - unsigned long long *crash_size) -{ - *crash_base = crashk_res_high.start; - *crash_size = resource_size(&crashk_res_high); -} - static void __init free_reserved_high_mem(void) { memblock_phys_free(crashk_res_high.start, resource_size(&crashk_res_high)); @@ -225,7 +164,8 @@ static void __init reserve_crashkernel(void) if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; - hand_over_reserved_high_mem(); + if (crashk_res.end) + return; /* crashkernel=X[@offset] */ ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), @@ -245,11 +185,6 @@ static void __init reserve_crashkernel(void) high = true; crash_max = CRASH_ADDR_HIGH_MAX; - - if (crash_high_mem_reserved) { - take_reserved_high_mem(&crash_base, &crash_size); - goto reserve_low; - } } fixed_base = !!crash_base; @@ -267,12 +202,8 @@ static void __init reserve_crashkernel(void) * to high memory, the minimum required low memory will be * reserved later. */ - if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) { - if (crash_high_mem_reserved) { - take_reserved_high_mem(&crash_base, &crash_size); - goto reserve_low; - } - + if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX + && crash_max <CRASH_ADDR_HIGH_MAX)) { crash_max = CRASH_ADDR_HIGH_MAX; goto retry; } @@ -289,7 +220,7 @@ static void __init reserve_crashkernel(void) * description of "crashkernel=X,high" option, add below 'high' * condition to make sure the crash low memory will be reserved. */ - if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) { + if (crash_base >= CRASH_ADDR_LOW_MAX ) { reserve_low: /* case #3 of crashkernel,low reservation */ if (!high) @@ -299,14 +230,7 @@ static void __init reserve_crashkernel(void) memblock_phys_free(crash_base, crash_size); return; } - } else if (crash_high_mem_reserved) { - /* - * The crash memory is successfully allocated under 4G, and the - * previously reserved high memory is no longer required. - */ - free_reserved_high_mem(); } - pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n", crash_base, crash_base + crash_size, crash_size >> 20); @@ -520,10 +444,10 @@ void __init arm64_memblock_init(void) early_init_fdt_scan_reserved_mem(); + arm64_dma_phys_limit = memblock_end_of_DRAM(); + if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) reserve_crashkernel(); - else - reserve_crashkernel_high(); high_memory = __va(memblock_end_of_DRAM() - 1) + 1; }
On 2022/4/28 11:52, Baoquan He wrote: > On 04/28/22 at 11:40am, Baoquan He wrote: >> Hi Catalin, Zhen Lei, >> >> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >>>> On 2022/4/27 20:32, Catalin Marinas wrote: >>>>> I think one could always pass a default command line like: >>>>> >>>>> crashkernel=1G,high crashkernel=128M,low >>>>> >>>>> without much knowledge of the SoC memory layout. >>>> >>>> Yes, that's what the end result is. The user specify crashkernel=128M,low >>>> and the implementation ensure the 128M low memory is allocated from DMA zone. >>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. >>>> >>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >>>> crash_base, crash_max); >>>> >>>>> Another option is to only introduce crashkernel=Y,low and, when that is >>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>>>> 'high' option at all: >>>>> >>>>> crashkernel=1G - all within ZONE_DMA >>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>>>> 1G above ZONE_DMA >>>>> >>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>>>> the 'low' option. >>>> >>>> I think although the code is hard to make generic, the interface is better to >>>> be relatively uniform. A user might have to maintain both x86 and arm64, and >>>> so on. It's not a good thing that the difference is too big. >>> >>> There will be some difference as the 4G limit doesn't always hold for >>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>> things a bit while following the documented behaviour: >>> >>> crashkernel=Y - current behaviour within ZONE_DMA >>> crashkernel=Y,high - allocate from above ZONE_DMA >>> crashkernel=Y,low - allocate within ZONE_DMA >>> >>> There is no fallback from crashkernel=Y. >>> >>> The question is whether we still want a default low allocation if >>> crashkernel=Y,low is missing but 'high' is present. If we add this, I >>> think we'd be consistent with kernel-parameters.txt for the 'low' >>> description. A default 'low' is probably not that bad but I'm tempted to >>> always mandate both 'high' and 'low'. >> >> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >> about this version. And I have the same concerns about them which comes >> from below points: >> 1) we may need to take best effort to keep ,high, ,low behaviour >> consistent on all ARCHes. Otherwise user/admin may be confused when they >> deploy/configure kdump on different machines of different ARCHes in the >> same LAB. I think we should try to avoid the confusion. Yes, but for someone who is configuring crashkernel= for the first time, he needs to read doc to understand how to configure it. The doc can show the recommended default value of 'low' size. After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. + * -swiotlb size: user-specified with swiotlb= or default. - low_size = swiotlb_size_or_default() + (8UL<<20); + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); That means all ARCHs can explicit configure crashkernel=256M,low, instead of omitting it. This may be another way to avoid confusion. It's not hard for programmer-turned-user/admin. However, this requires us to forgo backward compatibility with the default size of 'low'. >> 2) Fallback behaviour is important to our distros. The reason is we will >> provide default value with crashkernel=xxxM along kernel of distros. In >> this case, we hope the reservation will succeed by all means. The ,high >> and ,low is an option if customer likes to take with expertise. OK, I got it. >> >> After going through arm64 memory init code, I got below summary about >> arm64_dma_phys_limit which is the first zone's upper limit. I think we >> can make use of it to facilitate to simplify code. >> ================================================================================ >> DMA DMA32 NORMAL >> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >> 2)Normal machine 0~4G 0 (above 4G) >> 3)Special machine (above 4G)~MAX >> 4)No DMA|DMA32 (above 4G)~MAX arm64_memblock_init() reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") paging_init() | map_mem() | unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI bootmem_init(); | | zone_sizes_init() | | of_dma_get_max_cpu_address | ----| //Update arm64_dma_phys_limit | ----| reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above request_standard_resources() >> >> ------------------------------------------- >> arm64_dma_phys_limit >> 1)Raspberry Pi4 1G >> 2)Normal machine 4G >> 3)Special machine MAX >> 4)No DMA|DMA32 MAX >> >> Note: 3)Special machine means the machine's starting physical address is above 4G. >> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has >> IOMMU hardware supporting. >> =================================================================================== >> >> I made a draft patch based on this patchset, please feel free to check and >> see if it's OK, or anything missing or wrongly understood. I removed >> reserve_crashkernel_high() and only keep reserve_crashkernel() and >> reserve_crashkernel_low() as the v21 did. > > Sorry, forgot attaching the draft patch. > > By the way, we can also have a simple version with basic ,high, ,low > support, no fallback. We can add fallback and other optimization later. > This can be plan B. Yes, That's what Catalin suggested also. Hi, Baoquan He: Without optimization, the whole Patch 3-4 and 6-7 can be dropped. Process after abstraction: if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { reserve_crashkernel() //block mapping } else { //page mapping reserve_crashkernel() } ------------ Simplified real-world process --------- arm64_memblock_init() if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) reserve_crashkernel() paging_init() map_mem() if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) //block mapping else //page mapping unflatten_device_tree or ACPI bootmem_init(); zone_sizes_init() of_dma_get_max_cpu_address //Update arm64_dma_phys_limit if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) reserve_crashkernel() > >
On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: > > > On 2022/4/28 11:52, Baoquan He wrote: > > On 04/28/22 at 11:40am, Baoquan He wrote: > >> Hi Catalin, Zhen Lei, > >> > >> On 04/27/22 at 05:04pm, Catalin Marinas wrote: > >>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: > >>>> On 2022/4/27 20:32, Catalin Marinas wrote: > >>>>> I think one could always pass a default command line like: > >>>>> > >>>>> crashkernel=1G,high crashkernel=128M,low > >>>>> > >>>>> without much knowledge of the SoC memory layout. > >>>> > >>>> Yes, that's what the end result is. The user specify crashkernel=128M,low > >>>> and the implementation ensure the 128M low memory is allocated from DMA zone. > >>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. > >>>> > >>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit > >>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > >>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > >>>> crash_base, crash_max); > >>>> > >>>>> Another option is to only introduce crashkernel=Y,low and, when that is > >>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a > >>>>> 'high' option at all: > >>>>> > >>>>> crashkernel=1G - all within ZONE_DMA > >>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA > >>>>> 1G above ZONE_DMA > >>>>> > >>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore > >>>>> the 'low' option. > >>>> > >>>> I think although the code is hard to make generic, the interface is better to > >>>> be relatively uniform. A user might have to maintain both x86 and arm64, and > >>>> so on. It's not a good thing that the difference is too big. > >>> > >>> There will be some difference as the 4G limit doesn't always hold for > >>> arm64 (though it's true in most cases). Anyway, we can probably simplify > >>> things a bit while following the documented behaviour: > >>> > >>> crashkernel=Y - current behaviour within ZONE_DMA > >>> crashkernel=Y,high - allocate from above ZONE_DMA > >>> crashkernel=Y,low - allocate within ZONE_DMA > >>> > >>> There is no fallback from crashkernel=Y. > >>> > >>> The question is whether we still want a default low allocation if > >>> crashkernel=Y,low is missing but 'high' is present. If we add this, I > >>> think we'd be consistent with kernel-parameters.txt for the 'low' > >>> description. A default 'low' is probably not that bad but I'm tempted to > >>> always mandate both 'high' and 'low'. > >> > >> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns > >> about this version. And I have the same concerns about them which comes > >> from below points: > >> 1) we may need to take best effort to keep ,high, ,low behaviour > >> consistent on all ARCHes. Otherwise user/admin may be confused when they > >> deploy/configure kdump on different machines of different ARCHes in the > >> same LAB. I think we should try to avoid the confusion. > > Yes, but for someone who is configuring crashkernel= for the first time, he > needs to read doc to understand how to configure it. The doc can show the > recommended default value of 'low' size. > > After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), > the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() > is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. > > > + * -swiotlb size: user-specified with swiotlb= or default. > - low_size = swiotlb_size_or_default() + (8UL<<20); > + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); > > That means all ARCHs can explicit configure crashkernel=256M,low, instead of > omitting it. This may be another way to avoid confusion. It's not hard for > programmer-turned-user/admin. However, this requires us to forgo backward > compatibility with the default size of 'low'. We can make ,high and ,low simpler at first as they are alternative. If possible, we can also simplify the ,high ,low implementation on x86_64 if it truly brings better archievement on arm64. > > > >> 2) Fallback behaviour is important to our distros. The reason is we will > >> provide default value with crashkernel=xxxM along kernel of distros. In > >> this case, we hope the reservation will succeed by all means. The ,high > >> and ,low is an option if customer likes to take with expertise. > > OK, I got it. > > >> > >> After going through arm64 memory init code, I got below summary about > >> arm64_dma_phys_limit which is the first zone's upper limit. I think we > >> can make use of it to facilitate to simplify code. > >> ================================================================================ > >> DMA DMA32 NORMAL > >> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) > >> 2)Normal machine 0~4G 0 (above 4G) > >> 3)Special machine (above 4G)~MAX > >> 4)No DMA|DMA32 (above 4G)~MAX > > arm64_memblock_init() > reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") We don't need different code for this place of reservation as you are doing in this patchset, since arm64_dma_phys_limit is initialized as below. In fact, in arm64_memblock_init(), we have made memblock ready, we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if memblock_start_of_DRAM() is bigger than 4G, we possibly can call reserve_crashkernel() here too. phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; > paging_init() | > map_mem() | > unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI > bootmem_init(); | | > zone_sizes_init() | | > of_dma_get_max_cpu_address | ----| > //Update arm64_dma_phys_limit | ----| > reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above > request_standard_resources() Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But arm64_dma_phys_limit could be 1G or 4G, that's why your optimization about BLOCKING may not be right since you assume the 4G boundary, while forgetting Raspberry Pi4 on which 1G is the boundary of low memory and high memory. So separating out BLOCKING optimization can let us focus on the crashkernel,high support. > > >> > >> ------------------------------------------- > >> arm64_dma_phys_limit > >> 1)Raspberry Pi4 1G > >> 2)Normal machine 4G > >> 3)Special machine MAX > >> 4)No DMA|DMA32 MAX > >> > >> Note: 3)Special machine means the machine's starting physical address is above 4G. > >> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has > >> IOMMU hardware supporting. > >> =================================================================================== > >> > >> I made a draft patch based on this patchset, please feel free to check and > >> see if it's OK, or anything missing or wrongly understood. I removed > >> reserve_crashkernel_high() and only keep reserve_crashkernel() and > >> reserve_crashkernel_low() as the v21 did. > > > > Sorry, forgot attaching the draft patch. > > > > By the way, we can also have a simple version with basic ,high, ,low > > support, no fallback. We can add fallback and other optimization later. > > This can be plan B. > > Yes, That's what Catalin suggested also. > > Hi, Baoquan He: > Without optimization, the whole Patch 3-4 and 6-7 can be dropped. > > Process after abstraction: > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { > reserve_crashkernel() > //block mapping > } else { > //page mapping > reserve_crashkernel() > } > > ------------ Simplified real-world process --------- Yeah, this looks clearer. I would like to see a version with them. > arm64_memblock_init() Before reserve_crashkernel(), we can update arm64_dma_phys_limit as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or memblock_start_of_DRAM() is bigger than 4G. Then we go with: if (!arm64_dma_phys_limit) reserve_crashkernel(); Just personal opinion, please check if it's appropriate to handle case 3) which has physical starting memory above 4G here. > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) > reserve_crashkernel() > paging_init() > map_mem() > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) > //block mapping > else > //page mapping > unflatten_device_tree or ACPI > bootmem_init(); > zone_sizes_init() > of_dma_get_max_cpu_address > //Update arm64_dma_phys_limit > if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) > reserve_crashkernel() The rest sounds good with optimization code split out.
On 2022/4/29 11:24, Baoquan He wrote: > On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: >> >> >> On 2022/4/28 11:52, Baoquan He wrote: >>> On 04/28/22 at 11:40am, Baoquan He wrote: >>>> Hi Catalin, Zhen Lei, >>>> >>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >>>>>> On 2022/4/27 20:32, Catalin Marinas wrote: >>>>>>> I think one could always pass a default command line like: >>>>>>> >>>>>>> crashkernel=1G,high crashkernel=128M,low >>>>>>> >>>>>>> without much knowledge of the SoC memory layout. >>>>>> >>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low >>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone. >>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. >>>>>> >>>>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >>>>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >>>>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >>>>>> crash_base, crash_max); >>>>>> >>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is >>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>>>>>> 'high' option at all: >>>>>>> >>>>>>> crashkernel=1G - all within ZONE_DMA >>>>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>>>>>> 1G above ZONE_DMA >>>>>>> >>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>>>>>> the 'low' option. >>>>>> >>>>>> I think although the code is hard to make generic, the interface is better to >>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and >>>>>> so on. It's not a good thing that the difference is too big. >>>>> >>>>> There will be some difference as the 4G limit doesn't always hold for >>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>>>> things a bit while following the documented behaviour: >>>>> >>>>> crashkernel=Y - current behaviour within ZONE_DMA >>>>> crashkernel=Y,high - allocate from above ZONE_DMA >>>>> crashkernel=Y,low - allocate within ZONE_DMA >>>>> >>>>> There is no fallback from crashkernel=Y. >>>>> >>>>> The question is whether we still want a default low allocation if >>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I >>>>> think we'd be consistent with kernel-parameters.txt for the 'low' >>>>> description. A default 'low' is probably not that bad but I'm tempted to >>>>> always mandate both 'high' and 'low'. >>>> >>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >>>> about this version. And I have the same concerns about them which comes >>>> from below points: >>>> 1) we may need to take best effort to keep ,high, ,low behaviour >>>> consistent on all ARCHes. Otherwise user/admin may be confused when they >>>> deploy/configure kdump on different machines of different ARCHes in the >>>> same LAB. I think we should try to avoid the confusion. >> >> Yes, but for someone who is configuring crashkernel= for the first time, he >> needs to read doc to understand how to configure it. The doc can show the >> recommended default value of 'low' size. >> >> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), >> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() >> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. >> >> >> + * -swiotlb size: user-specified with swiotlb= or default. >> - low_size = swiotlb_size_or_default() + (8UL<<20); >> + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); >> >> That means all ARCHs can explicit configure crashkernel=256M,low, instead of >> omitting it. This may be another way to avoid confusion. It's not hard for >> programmer-turned-user/admin. However, this requires us to forgo backward >> compatibility with the default size of 'low'. > > We can make ,high and ,low simpler at first as they are alternative. If > possible, we can also simplify the ,high ,low implementation on x86_64 > if it truly brings better archievement on arm64. OK, I plan to remove optimization, fallback and default low size, to follow the suggestion of Catalin first. But there's one minor point of contention. 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. or Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. I prefer 2), how about you? > >> >> >>>> 2) Fallback behaviour is important to our distros. The reason is we will >>>> provide default value with crashkernel=xxxM along kernel of distros. In >>>> this case, we hope the reservation will succeed by all means. The ,high >>>> and ,low is an option if customer likes to take with expertise. >> >> OK, I got it. >> >>>> >>>> After going through arm64 memory init code, I got below summary about >>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we >>>> can make use of it to facilitate to simplify code. >>>> ================================================================================ >>>> DMA DMA32 NORMAL >>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >>>> 2)Normal machine 0~4G 0 (above 4G) >>>> 3)Special machine (above 4G)~MAX >>>> 4)No DMA|DMA32 (above 4G)~MAX >> >> arm64_memblock_init() >> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") > We don't need different code for this place of reservation as you are > doing in this patchset, since arm64_dma_phys_limit is initialized as > below. In fact, in arm64_memblock_init(), we have made memblock ready, > we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if > memblock_start_of_DRAM() is bigger than 4G, we possibly can call > reserve_crashkernel() here too. Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing 32-bit devices to access high memory without SMMU is: Set a fixed value for the upper 32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G). > > phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; > >> paging_init() | >> map_mem() | >> unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI >> bootmem_init(); | | >> zone_sizes_init() | | >> of_dma_get_max_cpu_address | ----| >> //Update arm64_dma_phys_limit | ----| >> reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above >> request_standard_resources() > > Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case > as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But > arm64_dma_phys_limit could be 1G or 4G, that's why your optimization > about BLOCKING may not be right since you assume the 4G boundary, while > forgetting Raspberry Pi4 on which 1G is the boundary of low memory and No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory under 4G and block mapping for memory above 4G. But still try to reserve crash low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized, the memory 1-4G are mapped as page. > high memory. So separating out BLOCKING optimization can let us focus on > the crashkernel,high support. > >> >>>> >>>> ------------------------------------------- >>>> arm64_dma_phys_limit >>>> 1)Raspberry Pi4 1G >>>> 2)Normal machine 4G >>>> 3)Special machine MAX >>>> 4)No DMA|DMA32 MAX >>>> >>>> Note: 3)Special machine means the machine's starting physical address is above 4G. >>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has >>>> IOMMU hardware supporting. >>>> =================================================================================== >>>> >>>> I made a draft patch based on this patchset, please feel free to check and >>>> see if it's OK, or anything missing or wrongly understood. I removed >>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and >>>> reserve_crashkernel_low() as the v21 did. >>> >>> Sorry, forgot attaching the draft patch. >>> >>> By the way, we can also have a simple version with basic ,high, ,low >>> support, no fallback. We can add fallback and other optimization later. >>> This can be plan B. >> >> Yes, That's what Catalin suggested also. >> >> Hi, Baoquan He: >> Without optimization, the whole Patch 3-4 and 6-7 can be dropped. >> >> Process after abstraction: >> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { >> reserve_crashkernel() >> //block mapping >> } else { >> //page mapping >> reserve_crashkernel() >> } >> >> ------------ Simplified real-world process --------- > Yeah, this looks clearer. I would like to see a version with them. > >> arm64_memblock_init() > Before reserve_crashkernel(), we can update arm64_dma_phys_limit > as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or > memblock_start_of_DRAM() is bigger than 4G. > Then we go with: > if (!arm64_dma_phys_limit) > reserve_crashkernel(); > > Just personal opinion, please check if it's appropriate to handle case > 3) which has physical starting memory above 4G here. OK, I'll write it down and consider it in the future optimization. > >> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >> reserve_crashkernel() > >> paging_init() >> map_mem() >> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >> //block mapping >> else >> //page mapping >> unflatten_device_tree or ACPI >> bootmem_init(); >> zone_sizes_init() >> of_dma_get_max_cpu_address >> //Update arm64_dma_phys_limit >> if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) >> reserve_crashkernel() > > The rest sounds good with optimization code split out. > > . >
On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: > > > On 2022/4/29 11:24, Baoquan He wrote: >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: >>> >>> >>> On 2022/4/28 11:52, Baoquan He wrote: >>>> On 04/28/22 at 11:40am, Baoquan He wrote: >>>>> Hi Catalin, Zhen Lei, >>>>> >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>>>>> On Wed, Apr 27, 2022 at 09:49:20PM +0800, Leizhen (ThunderTown) wrote: >>>>>>> On 2022/4/27 20:32, Catalin Marinas wrote: >>>>>>>> I think one could always pass a default command line like: >>>>>>>> >>>>>>>> crashkernel=1G,high crashkernel=128M,low >>>>>>>> >>>>>>>> without much knowledge of the SoC memory layout. >>>>>>> >>>>>>> Yes, that's what the end result is. The user specify crashkernel=128M,low >>>>>>> and the implementation ensure the 128M low memory is allocated from DMA zone. >>>>>>> We use arm64_dma_phys_limit as the upper limit for crash low memory. >>>>>>> >>>>>>> +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit >>>>>>> + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; >>>>>>> + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, >>>>>>> crash_base, crash_max); >>>>>>> >>>>>>>> Another option is to only introduce crashkernel=Y,low and, when that is >>>>>>>> passed, crashkernel=Y can go above arm64_dma_phys_limit. We won't need a >>>>>>>> 'high' option at all: >>>>>>>> >>>>>>>> crashkernel=1G - all within ZONE_DMA >>>>>>>> crashkernel=1G crashkernel=128M,low - 128M in ZONE_DMA >>>>>>>> 1G above ZONE_DMA >>>>>>>> >>>>>>>> If ZONE_DMA is not present or it extends to the whole RAM, we can ignore >>>>>>>> the 'low' option. >>>>>>> >>>>>>> I think although the code is hard to make generic, the interface is better to >>>>>>> be relatively uniform. A user might have to maintain both x86 and arm64, and >>>>>>> so on. It's not a good thing that the difference is too big. >>>>>> >>>>>> There will be some difference as the 4G limit doesn't always hold for >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>>>>> things a bit while following the documented behaviour: >>>>>> >>>>>> crashkernel=Y - current behaviour within ZONE_DMA >>>>>> crashkernel=Y,high - allocate from above ZONE_DMA >>>>>> crashkernel=Y,low - allocate within ZONE_DMA >>>>>> >>>>>> There is no fallback from crashkernel=Y. >>>>>> >>>>>> The question is whether we still want a default low allocation if >>>>>> crashkernel=Y,low is missing but 'high' is present. If we add this, I >>>>>> think we'd be consistent with kernel-parameters.txt for the 'low' >>>>>> description. A default 'low' is probably not that bad but I'm tempted to >>>>>> always mandate both 'high' and 'low'. >>>>> >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >>>>> about this version. And I have the same concerns about them which comes >>>>> from below points: >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they >>>>> deploy/configure kdump on different machines of different ARCHes in the >>>>> same LAB. I think we should try to avoid the confusion. >>> >>> Yes, but for someone who is configuring crashkernel= for the first time, he >>> needs to read doc to understand how to configure it. The doc can show the >>> recommended default value of 'low' size. >>> >>> After commit 94fb93341822 ("x86/crash: Allocate enough low memory when crashkernel=high"), >>> the default 'low' size doesn't make much sense anymore. The default size of swiotlb_size() >>> is 64M, far less than 256M. And if user specify "swiotlb=", he can also adjust crashkernel=Y,low. >>> >>> >>> + * -swiotlb size: user-specified with swiotlb= or default. >>> - low_size = swiotlb_size_or_default() + (8UL<<20); >>> + low_size = max(swiotlb_size_or_default() + (8UL<<20), 256UL<<20); >>> >>> That means all ARCHs can explicit configure crashkernel=256M,low, instead of >>> omitting it. This may be another way to avoid confusion. It's not hard for >>> programmer-turned-user/admin. However, this requires us to forgo backward >>> compatibility with the default size of 'low'. >> >> We can make ,high and ,low simpler at first as they are alternative. If >> possible, we can also simplify the ,high ,low implementation on x86_64 >> if it truly brings better archievement on arm64. > > OK, I plan to remove optimization, fallback and default low size, to follow the > suggestion of Catalin first. But there's one minor point of contention. > > 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. > 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. > or > Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. > > I prefer 2), how about you? > >> >>> >>> >>>>> 2) Fallback behaviour is important to our distros. The reason is we will >>>>> provide default value with crashkernel=xxxM along kernel of distros. In >>>>> this case, we hope the reservation will succeed by all means. The ,high >>>>> and ,low is an option if customer likes to take with expertise. >>> >>> OK, I got it. >>> >>>>> >>>>> After going through arm64 memory init code, I got below summary about >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we >>>>> can make use of it to facilitate to simplify code. >>>>> ================================================================================ >>>>> DMA DMA32 NORMAL >>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >>>>> 2)Normal machine 0~4G 0 (above 4G) >>>>> 3)Special machine (above 4G)~MAX >>>>> 4)No DMA|DMA32 (above 4G)~MAX >>> >>> arm64_memblock_init() >>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") >> We don't need different code for this place of reservation as you are >> doing in this patchset, since arm64_dma_phys_limit is initialized as >> below. In fact, in arm64_memblock_init(), we have made memblock ready, >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call >> reserve_crashkernel() here too. > > Yes. Maybe all the devices in this environment are 64-bit. One way I know of allowing > 32-bit devices to access high memory without SMMU is: Set a fixed value for the upper > 32 bits. In this case, the DMA zone should be [phys_start, phys_start + 4G). I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA offsets in the max_zone_phys() calculation") Currently, the kernel assumes that if RAM starts above 32-bit (or zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and such constrained devices have a hardwired DMA offset. In practice, we haven't noticed any such hardware so let's assume that we can expand ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, ZONE_DMA is expanded to the 4GB limit if no RAM addressable by zone_bits. So your suggestion is feasible. > >> >> phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; >> >>> paging_init() | >>> map_mem() | >>> unflatten_device_tree or ACPI | ---- //Raspberry Pi4 get dma zone base on dtb or ACPI >>> bootmem_init(); | | >>> zone_sizes_init() | | >>> of_dma_get_max_cpu_address | ----| >>> //Update arm64_dma_phys_limit | ----| >>> reserve_crashkernel() <-------------- //Because we need arm64_dma_phys_limit to be updated above >>> request_standard_resources() >> >> Yeah, because arm64_dma_phys_limit is decided late in the 1) and 2) case >> as I summarized, we need defer reserve_crashkernel() to bootmem_init(). But >> arm64_dma_phys_limit could be 1G or 4G, that's why your optimization >> about BLOCKING may not be right since you assume the 4G boundary, while >> forgetting Raspberry Pi4 on which 1G is the boundary of low memory and > > No, no, my optimization for Raspberry Pi4 is fine. I do page mapping for memory > under 4G and block mapping for memory above 4G. But still try to reserve crash > low memory from DMA zone. So when 1G is the boundary, it's just not fully optimized, > the memory 1-4G are mapped as page. > >> high memory. So separating out BLOCKING optimization can let us focus on >> the crashkernel,high support. >> >>> >>>>> >>>>> ------------------------------------------- >>>>> arm64_dma_phys_limit >>>>> 1)Raspberry Pi4 1G >>>>> 2)Normal machine 4G >>>>> 3)Special machine MAX >>>>> 4)No DMA|DMA32 MAX >>>>> >>>>> Note: 3)Special machine means the machine's starting physical address is above 4G. >>>>> WHile 4)No DMA|DMA32 means kernel w/o CONFIG_ZONE_DMA|DMA32, and has >>>>> IOMMU hardware supporting. >>>>> =================================================================================== >>>>> >>>>> I made a draft patch based on this patchset, please feel free to check and >>>>> see if it's OK, or anything missing or wrongly understood. I removed >>>>> reserve_crashkernel_high() and only keep reserve_crashkernel() and >>>>> reserve_crashkernel_low() as the v21 did. >>>> >>>> Sorry, forgot attaching the draft patch. >>>> >>>> By the way, we can also have a simple version with basic ,high, ,low >>>> support, no fallback. We can add fallback and other optimization later. >>>> This can be plan B. >>> >>> Yes, That's what Catalin suggested also. >>> >>> Hi, Baoquan He: >>> Without optimization, the whole Patch 3-4 and 6-7 can be dropped. >>> >>> Process after abstraction: >>> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) { >>> reserve_crashkernel() >>> //block mapping >>> } else { >>> //page mapping >>> reserve_crashkernel() >>> } >>> >>> ------------ Simplified real-world process --------- >> Yeah, this looks clearer. I would like to see a version with them. >> >>> arm64_memblock_init() >> Before reserve_crashkernel(), we can update arm64_dma_phys_limit >> as memblock_end_of_DRAM() if CONFIG_ZONE_DMA|DMA32 is not enabled or >> memblock_start_of_DRAM() is bigger than 4G. >> Then we go with: >> if (!arm64_dma_phys_limit) >> reserve_crashkernel(); >> >> Just personal opinion, please check if it's appropriate to handle case >> 3) which has physical starting memory above 4G here. > > OK, I'll write it down and consider it in the future optimization. > >> >>> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >>> reserve_crashkernel() >> >>> paging_init() >>> map_mem() >>> if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) >>> //block mapping >>> else >>> //page mapping >>> unflatten_device_tree or ACPI >>> bootmem_init(); >>> zone_sizes_init() >>> of_dma_get_max_cpu_address >>> //Update arm64_dma_phys_limit >>> if (IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32)) >>> reserve_crashkernel() >> >> The rest sounds good with optimization code split out. >> >> . >> >
On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote: > On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: > > On 2022/4/29 11:24, Baoquan He wrote: > >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: > >>> On 2022/4/28 11:52, Baoquan He wrote: > >>>> On 04/28/22 at 11:40am, Baoquan He wrote: > >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: > >>>>>> There will be some difference as the 4G limit doesn't always hold for > >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify > >>>>>> things a bit while following the documented behaviour: > >>>>>> > >>>>>> crashkernel=Y - current behaviour within ZONE_DMA > >>>>>> crashkernel=Y,high - allocate from above ZONE_DMA > >>>>>> crashkernel=Y,low - allocate within ZONE_DMA [...] > >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns > >>>>> about this version. And I have the same concerns about them which comes > >>>>> from below points: > >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour > >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they > >>>>> deploy/configure kdump on different machines of different ARCHes in the > >>>>> same LAB. I think we should try to avoid the confusion. I guess by all arches you mean just x86 here. Since the code is not generic, all arches do their own stuff. > > OK, I plan to remove optimization, fallback and default low size, to follow the > > suggestion of Catalin first. But there's one minor point of contention. > > > > 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. > > 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. > > or > > Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. > > > > I prefer 2), how about you? (2) works for me as well. We keep these simple as "expert" options and allow crashkernel= to fall back to 'high' if not sufficient memory in ZONE_DMA. That would be a slight change from the current behaviour but, as Zhen Lei said, with the old tools it's just moving the error around, the crashkernel wouldn't be available in either case. > >>>>> 2) Fallback behaviour is important to our distros. The reason is we will > >>>>> provide default value with crashkernel=xxxM along kernel of distros. In > >>>>> this case, we hope the reservation will succeed by all means. The ,high > >>>>> and ,low is an option if customer likes to take with expertise. OK, that's good feedback. So, to recap, IIUC you are fine with: crashkernel=Y - allocate within ZONE_DMA with fallback above with a default in ZONE_DMA (like x86, 256M or swiotlb size) crashkernel=Y,high - allocate from above ZONE_DMA crashkernel=Y,low - allocate within ZONE_DMA 'crashkernel' overrides the high and low while the latter two can be passed independently. > >>>>> After going through arm64 memory init code, I got below summary about > >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we > >>>>> can make use of it to facilitate to simplify code. > >>>>> ================================================================================ > >>>>> DMA DMA32 NORMAL > >>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) > >>>>> 2)Normal machine 0~4G 0 (above 4G) > >>>>> 3)Special machine (above 4G)~MAX > >>>>> 4)No DMA|DMA32 (above 4G)~MAX > >>> > >>> arm64_memblock_init() > >>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") > >> We don't need different code for this place of reservation as you are > >> doing in this patchset, since arm64_dma_phys_limit is initialized as > >> below. In fact, in arm64_memblock_init(), we have made memblock ready, > >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if > >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call > >> reserve_crashkernel() here too. > > > > Yes. Maybe all the devices in this environment are 64-bit. One way I > > know of allowing 32-bit devices to access high memory without SMMU > > is: Set a fixed value for the upper 32 bits. In this case, the DMA > > zone should be [phys_start, phys_start + 4G). We decided that this case doesn't really exists for arm64 platforms (no need for special ZONE_DMA). > I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA > offsets in the max_zone_phys() calculation") > > Currently, the kernel assumes that if RAM starts above 32-bit (or > zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and > such constrained devices have a hardwired DMA offset. In practice, we > haven't noticed any such hardware so let's assume that we can expand > ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, > ZONE_DMA is expanded to the 4GB limit if no RAM addressable by > zone_bits. I think the above log is slightly confusing. If the DRAM starts above 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes ZONE_DMA to 4G. I don't remember why we did this last part, maybe in case we get incorrect firmware tables, otherwise we could have extended ZONE_DMA to end of DRAM. Zhen Lei, if we agreed on the crashkernel behaviour, could you please post a series that does the above parsing allocation? Ignore the optimisations, we can look at them afterwards. Thanks.
On 2022/5/4 6:00, Catalin Marinas wrote: > On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote: >> On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: >>> On 2022/4/29 11:24, Baoquan He wrote: >>>> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: >>>>> On 2022/4/28 11:52, Baoquan He wrote: >>>>>> On 04/28/22 at 11:40am, Baoquan He wrote: >>>>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: >>>>>>>> There will be some difference as the 4G limit doesn't always hold for >>>>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify >>>>>>>> things a bit while following the documented behaviour: >>>>>>>> >>>>>>>> crashkernel=Y - current behaviour within ZONE_DMA >>>>>>>> crashkernel=Y,high - allocate from above ZONE_DMA >>>>>>>> crashkernel=Y,low - allocate within ZONE_DMA > [...] >>>>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns >>>>>>> about this version. And I have the same concerns about them which comes >>>>>>> from below points: >>>>>>> 1) we may need to take best effort to keep ,high, ,low behaviour >>>>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they >>>>>>> deploy/configure kdump on different machines of different ARCHes in the >>>>>>> same LAB. I think we should try to avoid the confusion. > > I guess by all arches you mean just x86 here. Since the code is not > generic, all arches do their own stuff. > >>> OK, I plan to remove optimization, fallback and default low size, to follow the >>> suggestion of Catalin first. But there's one minor point of contention. >>> >>> 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. >>> 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. >>> or >>> Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. >>> >>> I prefer 2), how about you? > > (2) works for me as well. We keep these simple as "expert" options and Okay, so I'll follow 2) to update the code. > allow crashkernel= to fall back to 'high' if not sufficient memory in > ZONE_DMA. That would be a slight change from the current behaviour but, > as Zhen Lei said, with the old tools it's just moving the error around, > the crashkernel wouldn't be available in either case. > >>>>>>> 2) Fallback behaviour is important to our distros. The reason is we will >>>>>>> provide default value with crashkernel=xxxM along kernel of distros. In >>>>>>> this case, we hope the reservation will succeed by all means. The ,high >>>>>>> and ,low is an option if customer likes to take with expertise. > > OK, that's good feedback. > > So, to recap, IIUC you are fine with: > > crashkernel=Y - allocate within ZONE_DMA with fallback > above with a default in ZONE_DMA (like > x86, 256M or swiotlb size) > crashkernel=Y,high - allocate from above ZONE_DMA > crashkernel=Y,low - allocate within ZONE_DMA > > 'crashkernel' overrides the high and low while the latter two can be > passed independently. > >>>>>>> After going through arm64 memory init code, I got below summary about >>>>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we >>>>>>> can make use of it to facilitate to simplify code. >>>>>>> ================================================================================ >>>>>>> DMA DMA32 NORMAL >>>>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) >>>>>>> 2)Normal machine 0~4G 0 (above 4G) >>>>>>> 3)Special machine (above 4G)~MAX >>>>>>> 4)No DMA|DMA32 (above 4G)~MAX >>>>> >>>>> arm64_memblock_init() >>>>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") >>>> We don't need different code for this place of reservation as you are >>>> doing in this patchset, since arm64_dma_phys_limit is initialized as >>>> below. In fact, in arm64_memblock_init(), we have made memblock ready, >>>> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if >>>> memblock_start_of_DRAM() is bigger than 4G, we possibly can call >>>> reserve_crashkernel() here too. >>> >>> Yes. Maybe all the devices in this environment are 64-bit. One way I >>> know of allowing 32-bit devices to access high memory without SMMU >>> is: Set a fixed value for the upper 32 bits. In this case, the DMA >>> zone should be [phys_start, phys_start + 4G). > > We decided that this case doesn't really exists for arm64 platforms (no > need for special ZONE_DMA). > >> I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA >> offsets in the max_zone_phys() calculation") >> >> Currently, the kernel assumes that if RAM starts above 32-bit (or >> zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and >> such constrained devices have a hardwired DMA offset. In practice, we >> haven't noticed any such hardware so let's assume that we can expand >> ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, >> ZONE_DMA is expanded to the 4GB limit if no RAM addressable by >> zone_bits. > > I think the above log is slightly confusing. If the DRAM starts above > 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but > above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes > ZONE_DMA to 4G. I don't remember why we did this last part, maybe in > case we get incorrect firmware tables, otherwise we could have extended > ZONE_DMA to end of DRAM. > > Zhen Lei, if we agreed on the crashkernel behaviour, could you please > post a series that does the above parsing allocation? Ignore the > optimisations, we can look at them afterwards. OK, I've changed the code before the festival, and I'll test it today. > > Thanks. >
On 05/03/22 at 11:00pm, Catalin Marinas wrote: > On Fri, Apr 29, 2022 at 04:25:37PM +0800, Leizhen (ThunderTown) wrote: > > On 2022/4/29 16:02, Leizhen (ThunderTown) wrote: > > > On 2022/4/29 11:24, Baoquan He wrote: > > >> On 04/28/22 at 05:33pm, Leizhen (ThunderTown) wrote: > > >>> On 2022/4/28 11:52, Baoquan He wrote: > > >>>> On 04/28/22 at 11:40am, Baoquan He wrote: > > >>>>> On 04/27/22 at 05:04pm, Catalin Marinas wrote: > > >>>>>> There will be some difference as the 4G limit doesn't always hold for > > >>>>>> arm64 (though it's true in most cases). Anyway, we can probably simplify > > >>>>>> things a bit while following the documented behaviour: > > >>>>>> > > >>>>>> crashkernel=Y - current behaviour within ZONE_DMA > > >>>>>> crashkernel=Y,high - allocate from above ZONE_DMA > > >>>>>> crashkernel=Y,low - allocate within ZONE_DMA > [...] > > >>>>> Sorry to interrupt. Seems the ,high ,low and fallback are main concerns > > >>>>> about this version. And I have the same concerns about them which comes > > >>>>> from below points: > > >>>>> 1) we may need to take best effort to keep ,high, ,low behaviour > > >>>>> consistent on all ARCHes. Otherwise user/admin may be confused when they > > >>>>> deploy/configure kdump on different machines of different ARCHes in the > > >>>>> same LAB. I think we should try to avoid the confusion. > > I guess by all arches you mean just x86 here. Since the code is not > generic, all arches do their own stuff. Right. Since currently only x86 has crashkernel,high|low support. From the distros and customer's point of view, we would like to see the same feature has the same or similar behaviour. This will ease operation and maintaining. E.g on the cloud platform, the base of it could be any ARCH, x86, arm64. The inconsistent behaviour could cause confusion. Certainly, the underlying implementation may be different. Surely, if arm64 has its own manner because of reasons, we can add document to note that. > > > > OK, I plan to remove optimization, fallback and default low size, to follow the > > > suggestion of Catalin first. But there's one minor point of contention. > > > > > > 1) Both "crashkernel=X,high" and "crashkernel=X,low" must be present. > > > 2) Both "crashkernel=X,high" and "crashkernel=X,low" are present. > > > or > > > Allow "crashkernel=X,high" to be present alone. Unlike x86, the default low size is zero. > > > > > > I prefer 2), how about you? > > (2) works for me as well. We keep these simple as "expert" options and > allow crashkernel= to fall back to 'high' if not sufficient memory in > ZONE_DMA. That would be a slight change from the current behaviour but, > as Zhen Lei said, with the old tools it's just moving the error around, > the crashkernel wouldn't be available in either case. > > > >>>>> 2) Fallback behaviour is important to our distros. The reason is we will > > >>>>> provide default value with crashkernel=xxxM along kernel of distros. In > > >>>>> this case, we hope the reservation will succeed by all means. The ,high > > >>>>> and ,low is an option if customer likes to take with expertise. > > OK, that's good feedback. > > So, to recap, IIUC you are fine with: > > crashkernel=Y - allocate within ZONE_DMA with fallback > above with a default in ZONE_DMA (like > x86, 256M or swiotlb size) Ack to this one. > crashkernel=Y,high - allocate from above ZONE_DMA Not exactly. If there's only ZONE_DMA, crashkernel,high will be reserved in ZONE_DMA, and crashkernel,low will be ignored. Other than this, ack. > crashkernel=Y,low - allocate within ZONE_DMA Ack to this one. > > 'crashkernel' overrides the high and low while the latter two can be > passed independently. crashkernel=,high can be passed independently, then a crashkernel=,low is needed implicitly. If people don't want crashkernel=,low explicitly, crashkernel=0,low need be specified. An independent crashkernel=,low makes no sense. Crashkernel=,low should be paird with crashkernel=,high. My personal opinion according to the existed senmantics on x86. Otherwise, the guidance of crashkernel= |,high|,low reservation will be complicated to write. > > > >>>>> After going through arm64 memory init code, I got below summary about > > >>>>> arm64_dma_phys_limit which is the first zone's upper limit. I think we > > >>>>> can make use of it to facilitate to simplify code. > > >>>>> ================================================================================ > > >>>>> DMA DMA32 NORMAL > > >>>>> 1)Raspberry Pi4 0~1G 3G~4G (above 4G) > > >>>>> 2)Normal machine 0~4G 0 (above 4G) > > >>>>> 3)Special machine (above 4G)~MAX > > >>>>> 4)No DMA|DMA32 (above 4G)~MAX > > >>> > > >>> arm64_memblock_init() > > >>> reserve_crashkernel() --------------- 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") > > >> We don't need different code for this place of reservation as you are > > >> doing in this patchset, since arm64_dma_phys_limit is initialized as > > >> below. In fact, in arm64_memblock_init(), we have made memblock ready, > > >> we can initialize arm64_dma_phys_limit as memblock_end_of_DRAM(). And if > > >> memblock_start_of_DRAM() is bigger than 4G, we possibly can call > > >> reserve_crashkernel() here too. > > > > > > Yes. Maybe all the devices in this environment are 64-bit. One way I > > > know of allowing 32-bit devices to access high memory without SMMU > > > is: Set a fixed value for the upper 32 bits. In this case, the DMA > > > zone should be [phys_start, phys_start + 4G). > > We decided that this case doesn't really exists for arm64 platforms (no > need for special ZONE_DMA). > > > I just read the message of commit 791ab8b2e3 ("arm64: Ignore any DMA > > offsets in the max_zone_phys() calculation") > > > > Currently, the kernel assumes that if RAM starts above 32-bit (or > > zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and > > such constrained devices have a hardwired DMA offset. In practice, we > > haven't noticed any such hardware so let's assume that we can expand > > ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly, > > ZONE_DMA is expanded to the 4GB limit if no RAM addressable by > > zone_bits. > > I think the above log is slightly confusing. If the DRAM starts above > 4G, ZONE_DMA goes to the end of DRAM. If the DRAM starts below 4G but > above the zone_bits for ZONE_DMA as specified in DT/ACPI, it pushes > ZONE_DMA to 4G. I don't remember why we did this last part, maybe in > case we get incorrect firmware tables, otherwise we could have extended > ZONE_DMA to end of DRAM. > > Zhen Lei, if we agreed on the crashkernel behaviour, could you please > post a series that does the above parsing allocation? Ignore the > optimisations, we can look at them afterwards. > > Thanks. > > -- > Catalin >
On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote: > On 05/03/22 at 11:00pm, Catalin Marinas wrote: > > So, to recap, IIUC you are fine with: > > > > crashkernel=Y - allocate within ZONE_DMA with fallback > > above with a default in ZONE_DMA (like > > x86, 256M or swiotlb size) > > Ack to this one. > > > > crashkernel=Y,high - allocate from above ZONE_DMA > > Not exactly. If there's only ZONE_DMA, crashkernel,high will > be reserved in ZONE_DMA, and crashkernel,low will be ignored. > Other than this, ack. Yes, that's fine. > > crashkernel=Y,low - allocate within ZONE_DMA > > Ack to this one. > > > > 'crashkernel' overrides the high and low while the latter two can be > > passed independently. > > crashkernel=,high can be passed independently, then a crashkernel=,low > is needed implicitly. If people don't want crashkernel=,low > explicitly, crashkernel=0,low need be specified. I find this complicating the interface. I don't know the background to the x86 implementation but we diverge already on arm64 since we talk about ZONE_DMA rather than 4G limit (though for most platforms these would be the same). I guess we could restate the difference between crashkernel= and crashkernel=,high as the hint to go for allocation above ZONE_DMA first. > An independent crashkernel=,low makes no sense. Crashkernel=,low > should be paird with crashkernel=,high. You could argue that crashkernel=,low gives the current crashkernel= behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it may have some value on its own. > My personal opinion according to the existed senmantics on x86. > Otherwise, the guidance of crashkernel= |,high|,low reservation > will be complicated to write. It's more that I find the current semantics unnecessarily confusing. But even reading the x86_64 text it's not that clear. For example the default low allocation for crashkernel= and crashkernel=,high is only mentioned in the crashkernel=,low description.
On 05/05/22 at 03:20pm, Catalin Marinas wrote: > On Thu, May 05, 2022 at 11:00:19AM +0800, Baoquan He wrote: > > On 05/03/22 at 11:00pm, Catalin Marinas wrote: > > > So, to recap, IIUC you are fine with: > > > > > > crashkernel=Y - allocate within ZONE_DMA with fallback > > > above with a default in ZONE_DMA (like > > > x86, 256M or swiotlb size) > > > > Ack to this one. > > > > > > > crashkernel=Y,high - allocate from above ZONE_DMA > > > > Not exactly. If there's only ZONE_DMA, crashkernel,high will > > be reserved in ZONE_DMA, and crashkernel,low will be ignored. > > Other than this, ack. > > Yes, that's fine. > > > > crashkernel=Y,low - allocate within ZONE_DMA > > > > Ack to this one. > > > > > > 'crashkernel' overrides the high and low while the latter two can be > > > passed independently. > > > > crashkernel=,high can be passed independently, then a crashkernel=,low > > is needed implicitly. If people don't want crashkernel=,low > > explicitly, crashkernel=0,low need be specified. > > I find this complicating the interface. I don't know the background to > the x86 implementation but we diverge already on arm64 since we talk > about ZONE_DMA rather than 4G limit (though for most platforms these > would be the same). > > I guess we could restate the difference between crashkernel= and > crashkernel=,high as the hint to go for allocation above ZONE_DMA first. Yes, rethinking about this, we can make a straightforward and simpler crashkernel=,high|,low on arm64, namely asking for user to clearly specify them. During maintenance of crashkernel= parameter in our distros, we found crashkernel=xM is used mostly since most of systems can be satisfied with 256M or a little more for kdump. While on some big end servers, 1G or more crashkernel memory is needed. In this case, crashkernel=,high is taken. We don't want to reserve so much low memory during system running while just waiting in case rare crash happened. crashkernel=,high is rarely used, so making it simple and not so flexible is not so bad. We can improve it later with justification. > > > An independent crashkernel=,low makes no sense. Crashkernel=,low > > should be paird with crashkernel=,high. > > You could argue that crashkernel=,low gives the current crashkernel= > behaviour, i.e. either all within ZONE_DMA or fail to allocate. So it > may have some value on its own. Yes, crashkernel=,low has the same behaviour as the current crashkernel= if we decide not to add fallback mechanism to it. The purpose of crahskernel=,low is to assist crashkernel=,high to get kdump kernel boot up with satisfing DMA allocation. While allowing independent crashkernel=,low will add it another mission, limiting crashkernel only reserved in low memory. Up to now, we don't see the need for that. > > > My personal opinion according to the existed senmantics on x86. > > Otherwise, the guidance of crashkernel= |,high|,low reservation > > will be complicated to write. > > It's more that I find the current semantics unnecessarily confusing. But > even reading the x86_64 text it's not that clear. For example the > default low allocation for crashkernel= and crashkernel=,high is only > mentioned in the crashkernel=,low description. Yeah, we can improve those document if insufficiency is found. By the way, with my observation, crashkernel= with fallback meet 99% of our needs. If people really need more than 512M memory or more, then please consider crashkernel=,high. Basically on servers, low memory is limited, while high memory is very big. So I agree with you that we can make it step by step, firstly adding basic crashkernel=,high and ,low support. We can add those complicated cases later.
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index e16b248699d5c3c..19c2d487cb08feb 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -329,8 +329,13 @@ bool crash_is_nosave(unsigned long pfn) /* in reserved memory? */ addr = __pfn_to_phys(pfn); - if ((addr < crashk_res.start) || (crashk_res.end < addr)) - return false; + if ((addr < crashk_res.start) || (crashk_res.end < addr)) { + if (!crashk_low_res.end) + return false; + + if ((addr < crashk_low_res.start) || (crashk_low_res.end < addr)) + return false; + } if (!kexec_crash_image) return true; diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c index 59c648d51848886..889951291cc0f9c 100644 --- a/arch/arm64/kernel/machine_kexec_file.c +++ b/arch/arm64/kernel/machine_kexec_file.c @@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz) /* Exclude crashkernel region */ ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end); + if (ret) + goto out; + + if (crashk_low_res.end) { + ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end); + if (ret) + goto out; + } - if (!ret) - ret = crash_prepare_elf64_headers(cmem, true, addr, sz); + ret = crash_prepare_elf64_headers(cmem, true, addr, sz); +out: kfree(cmem); return ret; } diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index f670bca160992b9..99d5539c13de3b1 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -90,43 +90,138 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit; phys_addr_t __ro_after_init arm64_dma_phys_limit = PHYS_MASK + 1; #endif +/* Current arm64 boot protocol requires 2MB alignment */ +#define CRASH_ALIGN SZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAX memblock.current_limit + +/* + * This is an empirical value in x86_64 and taken here directly. Please + * refer to the code comment in reserve_crashkernel_low() of x86_64 for more + * details. + */ +#define DEFAULT_CRASH_KERNEL_LOW_SIZE \ + max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20) + +static int __init reserve_crashkernel_low(unsigned long long low_size) +{ + unsigned long long low_base; + + /* passed with crashkernel=0,low ? */ + if (!low_size) + return 0; + + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + if (!low_base) { + pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); + return -ENOMEM; + } + + pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n", + low_base, low_base + low_size, low_size >> 20); + + crashk_low_res.start = low_base; + crashk_low_res.end = low_base + low_size - 1; + insert_resource(&iomem_resource, &crashk_low_res); + + return 0; +} + /* * reserve_crashkernel() - reserves memory for crash kernel * * This function reserves memory area given in "crashkernel=" kernel command * line parameter. The memory reserved is used by dump capture kernel when * primary kernel is crashing. + * + * NOTE: Reservation of crashkernel,low is special since its existence + * is not independent, need rely on the existence of crashkernel,high. + * Here, four cases of crashkernel low memory reservation are summarized: + * 1) crashkernel=Y,low is specified explicitly, the size of crashkernel low + * memory takes Y; + * 2) crashkernel=,low is not given, while crashkernel=,high is specified, + * take the default crashkernel low memory size; + * 3) crashkernel=X is specified, while fallback to get a memory region + * in high memory, take the default crashkernel low memory size; + * 4) crashkernel='invalid value',low is specified, failed the whole + * crashkernel reservation and bail out. */ static void __init reserve_crashkernel(void) { unsigned long long crash_base, crash_size; - unsigned long long crash_max = arm64_dma_phys_limit; + unsigned long long crash_low_size; + unsigned long long crash_max = CRASH_ADDR_LOW_MAX; + bool fixed_base, high = false; + char *cmdline = boot_command_line; int ret; if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), + /* crashkernel=X[@offset] */ + ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), &crash_size, &crash_base); - /* no crashkernel= or invalid value specified */ - if (ret || !crash_size) - return; + if (ret || !crash_size) { + ret = parse_crashkernel_high(cmdline, 0, &crash_size, &crash_base); + if (ret || !crash_size) + return; + + ret = parse_crashkernel_low(cmdline, 0, &crash_low_size, &crash_base); + if (ret == -ENOENT) + /* case #2 of crashkernel,low reservation */ + crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; + else if (ret) + /* case #4 of crashkernel,low reservation */ + return; + + high = true; + crash_max = CRASH_ADDR_HIGH_MAX; + } + fixed_base = !!crash_base; crash_size = PAGE_ALIGN(crash_size); - /* User specifies base address explicitly. */ - if (crash_base) + if (fixed_base) crash_max = crash_base + crash_size; - /* Current arm64 boot protocol requires 2MB alignment */ - crash_base = memblock_phys_alloc_range(crash_size, SZ_2M, +retry: + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, crash_base, crash_max); if (!crash_base) { + /* + * Attempt to fully allocate low memory failed, fall back + * to high memory, the minimum required low memory will be + * reserved later. + */ + if (!fixed_base && (crash_max == CRASH_ADDR_LOW_MAX)) { + crash_max = CRASH_ADDR_HIGH_MAX; + goto retry; + } + pr_warn("cannot allocate crashkernel (size:0x%llx)\n", crash_size); return; } + /* + * When both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the + * CRASH_ADDR_LOW_MAX equals the upper limit of physical memory, so + * the 'crash_base' of high memory can not exceed it. To follow the + * description of "crashkernel=X,high" option, add below 'high' + * condition to make sure the crash low memory will be reserved. + */ + if ((crash_base >= CRASH_ADDR_LOW_MAX) || high) { + /* case #3 of crashkernel,low reservation */ + if (!high) + crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; + + if (reserve_crashkernel_low(crash_low_size)) { + memblock_phys_free(crash_base, crash_size); + return; + } + } + pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n", crash_base, crash_base + crash_size, crash_size >> 20); @@ -135,6 +230,9 @@ static void __init reserve_crashkernel(void) * map. Inform kmemleak so that it won't try to access it. */ kmemleak_ignore_phys(crash_base); + if (crashk_low_res.end) + kmemleak_ignore_phys(crashk_low_res.start); + crashk_res.start = crash_base; crashk_res.end = crash_base + crash_size - 1; insert_resource(&iomem_resource, &crashk_res);