Message ID | 37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 | expand |
Hi Christoph, On Wed, Mar 06, 2024 at 05:45:04PM -0800, Christoph Lameter (Ampere) wrote: > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. > Therefore this patch increases the default NR_CPUS from 256 to 512. > > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that > this might have a significant impact on stack usage due to code which places > cpumasks on the stack. To mitigate that concern, we can select > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with > NR_CPUS=256, we only select this when NR_CPUS > 256. > > CPUMASK_OFFSTACK configures the cpumasks in the kernel to be > dynamically allocated. This was used in the X86 architecture in the > past to enable support for larger CPU configurations up to 8k cpus. > > With that is becomes possible to dynamically size the allocation of > the cpu bitmaps depending on the quantity of processors detected on > bootup. Memory used for cpumasks will increase if the kernel is > run on a machine with more cores. > > Further increases may be needed if ARM processor vendors start > supporting more processors. Given the current inflationary trends > in core counts from multiple processor manufacturers this may occur. > > There are minor regressions for hackbench. The kernel data size > for 512 cpus is smaller with offstack than with onstack. > > Benchmark results using hackbench average over 10 runs of > > hackbench -s 512 -l 2000 -g 15 -f 25 -P > > on Altra 80 Core > > Support for 256 CPUs on stack. Baseline > > 7.8564 sec > > Support for 512 CUs on stack. > > 7.8713 sec + 0.18% > > 512 CPUS offstack > > 7.8916 sec + 0.44% > > Kernel size comparison: > > text data filename Difference to onstack256 baseline > 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256 > 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19% > 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14% Thanks for this data; I think that's a strong justification that this isn't likely to cause a big problem for us, and so I thing it makes sense to go with this. I have two minor comments below. > Tested-by: Eric Mackay <eric.mackay@oracle.com> > Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com> > --- > > > Original post: https://www.spinics.net/lists/linux-mm/msg369701.html > V2: https://lkml.org/lkml/2024/2/7/505 > > > V1->V2 > > - Keep quotation marks > - Remove whiltespace damage > - Add tested by > > V2->V3: > - Add test results > - Rework descriptions > > > arch/arm64/Kconfig | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index aa7c1d435139..4e767dede47d 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1427,7 +1427,21 @@ config SCHED_SMT > config NR_CPUS > int "Maximum number of CPUs (2-4096)" > range 2 4096 > - default "256" > + default "512" > + > +# > +# Determines the placement of cpumasks. > +# > +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated. > +# Useful for machines with lots of core because it avoids increasing > +# the size of many of the data structures in the kernel. > +# > +# If this is off then the cpumasks have a static sizes and are > +# embedded within data structures. > +# > + config CPUMASK_OFFSTACK > + def_bool y > + depends on NR_CPUS > 256 As before, can we please delete the comment? That's the general semantic of CPUMASK_OFFSTACK, not why we're selecting it. That aside, this config option is defined in lib/Kconfig, so we should select it rather than redefining it. i.e. this should be: select CPUMASK_OFFSTACK if NR_CPUS > 256 Sorry for not spotting that before. With those changes: Acked-by: Mark Rutland <mark.rutland@arm.com> Catalin, are you happy to fix that up when applying? Mark. > > config HOTPLUG_CPU > bool "Support for hot-pluggable CPUs" > -- > 2.39.2 >
On Wed, 06 Mar 2024 17:45:04 -0800, Christoph Lameter (Ampere) wrote: > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. > Therefore this patch increases the default NR_CPUS from 256 to 512. > > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that > this might have a significant impact on stack usage due to code which places > cpumasks on the stack. To mitigate that concern, we can select > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with > NR_CPUS=256, we only select this when NR_CPUS > 256. > > [...] Applied to arm64 (for-next/misc), thanks! I dropped the config entry and comment, replaced it with a select as per Mark's suggestion. [1/1] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 https://git.kernel.org/arm64/c/0499a78369ad
Dear All, On 07.03.2024 02:45, Christoph Lameter (Ampere) wrote: > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere > Computing) are planning to ship systems with 512 CPUs. So that all > CPUs on > these systems can be used with defconfig, we'd like to bump NR_CPUS to > 512. > Therefore this patch increases the default NR_CPUS from 256 to 512. > > As increasing NR_CPUS will increase the size of cpumasks, there's a > fear that > this might have a significant impact on stack usage due to code which > places > cpumasks on the stack. To mitigate that concern, we can select > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with > NR_CPUS=256, we only select this when NR_CPUS > 256. > > CPUMASK_OFFSTACK configures the cpumasks in the kernel to be > dynamically allocated. This was used in the X86 architecture in the > past to enable support for larger CPU configurations up to 8k cpus. > > With that is becomes possible to dynamically size the allocation of > the cpu bitmaps depending on the quantity of processors detected on > bootup. Memory used for cpumasks will increase if the kernel is > run on a machine with more cores. > > Further increases may be needed if ARM processor vendors start > supporting more processors. Given the current inflationary trends > in core counts from multiple processor manufacturers this may occur. > > There are minor regressions for hackbench. The kernel data size > for 512 cpus is smaller with offstack than with onstack. > > Benchmark results using hackbench average over 10 runs of > > hackbench -s 512 -l 2000 -g 15 -f 25 -P > > on Altra 80 Core > > Support for 256 CPUs on stack. Baseline > > 7.8564 sec > > Support for 512 CUs on stack. > > 7.8713 sec + 0.18% > > 512 CPUS offstack > > 7.8916 sec + 0.44% > > Kernel size comparison: > > text data filename Difference to > onstack256 baseline > 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256 > 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19% > 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14% > > Tested-by: Eric Mackay <eric.mackay@oracle.com> > Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"). Unfortunately it triggers the following warning during boot on most of my ARM64-based test boards. Here is an example from Odroid-N2 board: ------------[ cut here ]------------ WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554 dev_pm_opp_set_config+0x390/0x710 Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+) rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan display_connector snd_soc_meson_axg_tdm_interface snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse hub 1-1:1.0: USB hub found CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G C 6.8.0-rc3+ #14677 Hardware name: Hardkernel ODROID-N2 (DT) Workqueue: events_unbound deferred_probe_work_func pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : dev_pm_opp_set_config+0x390/0x710 lr : dev_pm_opp_set_config+0x5c/0x710 ... Call trace: dev_pm_opp_set_config+0x390/0x710 dt_cpufreq_probe+0x268/0x480 platform_probe+0x68/0xd8 really_probe+0x148/0x2b4 __driver_probe_device+0x78/0x12c driver_probe_device+0xdc/0x164 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0xa8/0x1b0 device_initial_probe+0x14/0x20 bus_probe_device+0xb0/0xb4 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x1ec/0x53c worker_thread+0x298/0x408 kthread+0x124/0x128 ret_from_fork+0x10/0x20 irq event stamp: 317178 hardirqs last enabled at (317177): [<ffff8000801788d4>] ktime_get_coarse_real_ts64+0x10c/0x110 hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c softirqs last enabled at (315802): [<ffff800080010a60>] __do_softirq+0x4a0/0x4e8 softirqs last disabled at (315793): [<ffff8000800169b0>] ____do_softirq+0x10/0x1c ---[ end trace 0000000000000000 ]--- cpu cpu2: error -EBUSY: failed to set regulators cpufreq-dt: probe of cpufreq-dt failed with error -16 It looks that cpufreq-dt and/or opp drivers needs some adjustments related with this change. > --- > > > Original post: https://www.spinics.net/lists/linux-mm/msg369701.html > V2: https://lkml.org/lkml/2024/2/7/505 > > > V1->V2 > > - Keep quotation marks > - Remove whiltespace damage > - Add tested by > > V2->V3: > - Add test results > - Rework descriptions > > > arch/arm64/Kconfig | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index aa7c1d435139..4e767dede47d 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1427,7 +1427,21 @@ config SCHED_SMT > config NR_CPUS > int "Maximum number of CPUs (2-4096)" > range 2 4096 > - default "256" > + default "512" > + > +# > +# Determines the placement of cpumasks. > +# > +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated. > +# Useful for machines with lots of core because it avoids increasing > +# the size of many of the data structures in the kernel. > +# > +# If this is off then the cpumasks have a static sizes and are > +# embedded within data structures. > +# > + config CPUMASK_OFFSTACK > + def_bool y > + depends on NR_CPUS > 256 > > config HOTPLUG_CPU > bool "Support for hot-pluggable CPUs" Best regards
On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote: > This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: > Dynamically allocate cpumasks and increase supported CPUs to 512"). > Unfortunately it triggers the following warning during boot on most of > my ARM64-based test boards. Here is an example from Odroid-N2 board: > > ------------[ cut here ]------------ > WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554 > dev_pm_opp_set_config+0x390/0x710 > Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+) > rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper > crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched > snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid > snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir > snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin > meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout > videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev > reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b > stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm > meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi > drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan > display_connector snd_soc_meson_axg_tdm_interface > snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse > hub 1-1:1.0: USB hub found > CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G C 6.8.0-rc3+ #14677 > Hardware name: Hardkernel ODROID-N2 (DT) > Workqueue: events_unbound deferred_probe_work_func > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : dev_pm_opp_set_config+0x390/0x710 > lr : dev_pm_opp_set_config+0x5c/0x710 > ... > Call trace: > dev_pm_opp_set_config+0x390/0x710 > dt_cpufreq_probe+0x268/0x480 > platform_probe+0x68/0xd8 > really_probe+0x148/0x2b4 > __driver_probe_device+0x78/0x12c > driver_probe_device+0xdc/0x164 > __device_attach_driver+0xb8/0x138 > bus_for_each_drv+0x84/0xe0 > __device_attach+0xa8/0x1b0 > device_initial_probe+0x14/0x20 > bus_probe_device+0xb0/0xb4 > deferred_probe_work_func+0x8c/0xc8 > process_one_work+0x1ec/0x53c > worker_thread+0x298/0x408 > kthread+0x124/0x128 > ret_from_fork+0x10/0x20 > irq event stamp: 317178 > hardirqs last enabled at (317177): [<ffff8000801788d4>] > ktime_get_coarse_real_ts64+0x10c/0x110 > hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c > softirqs last enabled at (315802): [<ffff800080010a60>] > __do_softirq+0x4a0/0x4e8 > softirqs last disabled at (315793): [<ffff8000800169b0>] > ____do_softirq+0x10/0x1c > ---[ end trace 0000000000000000 ]--- > cpu cpu2: error -EBUSY: failed to set regulators > cpufreq-dt: probe of cpufreq-dt failed with error -16 > > It looks that cpufreq-dt and/or opp drivers needs some adjustments > related with this change. That's strange. Is this with defconfig? I wonder whether NR_CPUS being larger caused the issue with this specific code. Otherwise CPUMASK_OFFSTACK may not work that well on arm64. > > > > --- > > > > > > Original post: https://www.spinics.net/lists/linux-mm/msg369701.html > > V2: https://lkml.org/lkml/2024/2/7/505 > > > > > > V1->V2 > > > > - Keep quotation marks > > - Remove whiltespace damage > > - Add tested by > > > > V2->V3: > > - Add test results > > - Rework descriptions > > > > > > arch/arm64/Kconfig | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index aa7c1d435139..4e767dede47d 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -1427,7 +1427,21 @@ config SCHED_SMT > > config NR_CPUS > > int "Maximum number of CPUs (2-4096)" > > range 2 4096 > > - default "256" > > + default "512" > > + > > +# > > +# Determines the placement of cpumasks. > > +# > > +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated. > > +# Useful for machines with lots of core because it avoids increasing > > +# the size of many of the data structures in the kernel. > > +# > > +# If this is off then the cpumasks have a static sizes and are > > +# embedded within data structures. > > +# > > + config CPUMASK_OFFSTACK > > + def_bool y > > + depends on NR_CPUS > 256 > > > > config HOTPLUG_CPU > > bool "Support for hot-pluggable CPUs" > > Best regards > -- > Marek Szyprowski, PhD > Samsung R&D Institute Poland >
On 08.03.2024 15:51, Catalin Marinas wrote: > On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote: >> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: >> Dynamically allocate cpumasks and increase supported CPUs to 512"). >> Unfortunately it triggers the following warning during boot on most of >> my ARM64-based test boards. Here is an example from Odroid-N2 board: >> >> ------------[ cut here ]------------ >> WARNING: CPU: 4 PID: 63 at drivers/opp/core.c:2554 >> dev_pm_opp_set_config+0x390/0x710 >> Modules linked in: dw_hdmi_i2s_audio meson_gxl smsc onboard_usb_hub(+) >> rtc_pcf8563 panfrost snd_soc_meson_axg_sound_card drm_shmem_helper >> crct10dif_ce dwmac_generic snd_soc_meson_card_utils gpu_sched >> snd_soc_meson_g12a_toacodec snd_soc_meson_g12a_tohdmitx rc_odroid >> snd_soc_meson_codec_glue pwm_meson ao_cec_g12a meson_gxbb_wdt meson_ir >> snd_soc_meson_axg_frddr snd_soc_meson_axg_toddr snd_soc_meson_axg_tdmin >> meson_vdec(C) v4l2_mem2mem videobuf2_dma_contig snd_soc_meson_axg_tdmout >> videobuf2_memops axg_audio videobuf2_v4l2 sclk_div videodev >> reset_meson_audio_arb snd_soc_meson_axg_fifo clk_phase dwmac_meson8b >> stmmac_platform videobuf2_common mdio_mux_meson_g12a meson_drm >> meson_dw_hdmi rtc_meson_vrtc stmmac meson_ddr_pmu_g12 mc dw_hdmi >> drm_display_helper pcs_xpcs snd_soc_meson_t9015 meson_canvas gpio_fan >> display_connector snd_soc_meson_axg_tdm_interface >> snd_soc_simple_amplifier snd_soc_meson_axg_tdm_formatter nvmem_meson_efuse >> hub 1-1:1.0: USB hub found >> CPU: 4 PID: 63 Comm: kworker/u12:5 Tainted: G C 6.8.0-rc3+ #14677 >> Hardware name: Hardkernel ODROID-N2 (DT) >> Workqueue: events_unbound deferred_probe_work_func >> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : dev_pm_opp_set_config+0x390/0x710 >> lr : dev_pm_opp_set_config+0x5c/0x710 >> ... >> Call trace: >> dev_pm_opp_set_config+0x390/0x710 >> dt_cpufreq_probe+0x268/0x480 >> platform_probe+0x68/0xd8 >> really_probe+0x148/0x2b4 >> __driver_probe_device+0x78/0x12c >> driver_probe_device+0xdc/0x164 >> __device_attach_driver+0xb8/0x138 >> bus_for_each_drv+0x84/0xe0 >> __device_attach+0xa8/0x1b0 >> device_initial_probe+0x14/0x20 >> bus_probe_device+0xb0/0xb4 >> deferred_probe_work_func+0x8c/0xc8 >> process_one_work+0x1ec/0x53c >> worker_thread+0x298/0x408 >> kthread+0x124/0x128 >> ret_from_fork+0x10/0x20 >> irq event stamp: 317178 >> hardirqs last enabled at (317177): [<ffff8000801788d4>] >> ktime_get_coarse_real_ts64+0x10c/0x110 >> hardirqs last disabled at (317178): [<ffff800081222030>] el1_dbg+0x24/0x8c >> softirqs last enabled at (315802): [<ffff800080010a60>] >> __do_softirq+0x4a0/0x4e8 >> softirqs last disabled at (315793): [<ffff8000800169b0>] >> ____do_softirq+0x10/0x1c >> ---[ end trace 0000000000000000 ]--- >> cpu cpu2: error -EBUSY: failed to set regulators >> cpufreq-dt: probe of cpufreq-dt failed with error -16 >> >> It looks that cpufreq-dt and/or opp drivers needs some adjustments >> related with this change. > That's strange. Is this with defconfig? I wonder whether NR_CPUS being > larger caused the issue with this specific code. Otherwise > CPUMASK_OFFSTACK may not work that well on arm64. I've used defconfig with some debug options enabled and some drivers compiled-in: make ARCH=arm64 defconfig ./scripts/config -e BLK_DEV_RAM --set-val BLK_DEV_RAM_COUNT 4 --set-val BLK_DEV_RAM_SIZE 81920 --set-val CMA_SIZE_MBYTES 96 -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e STAGING -e I2C_GPIO -e PM_DEBUG -e PM_ADVANCED_DEBUG -e USB_GADGET -e USB_ETH -e CONFIG_DEVFREQ_THERMAL -e CONFIG_BRCMFMAC_PCIE -e CONFIG_NFC -d ARCH_SUNXI -d ARCH_ALPINE -d DRM_NOUVEAU -d ARCH_BCM_IPROC -d ARCH_BERLIN -d ARCH_BRCMSTB -d ARCH_LAYERSCAPE -d ARCH_LG1K -d ARCH_HISI -d ARCH_MEDIATEK -d ARCH_MVEBU -d ARCH_SEATTLE -d ARCH_SYNQUACER -d ARCH_RENESAS -d ARCH_STRATIX10 -d ARCH_TEGRA -d ARCH_SPRD -d ARCH_THUNDER -d ARCH_THUNDER2 -d ARCH_UNIPHIER -d ARCH_XGENE -d ARCH_ZX -d ARCH_ZYNQMP -d HIBERNATION -d CLK_SUNXI -d CONFIG_EFI -d CONFIG_TEE -e FW_CFG_SYSFS Best regards
On Fri, 8 Mar 2024, Marek Szyprowski wrote: >>> >>> It looks that cpufreq-dt and/or opp drivers needs some adjustments >>> related with this change. >> That's strange. Is this with defconfig? I wonder whether NR_CPUS being >> larger caused the issue with this specific code. Otherwise >> CPUMASK_OFFSTACK may not work that well on arm64. cpumask handling must use the accessor functions provided in include/linux/cpumask.h for declaring and accessing cpumasks. It is likely related to the driver opencoding one of the accessors. I.e. you must use alloc_cpumask_var() and not allocate yourself on the stack.
On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote: > On Fri, 8 Mar 2024, Marek Szyprowski wrote: > > > > > > > > > It looks that cpufreq-dt and/or opp drivers needs some adjustments > > > > related with this change. > > > That's strange. Is this with defconfig? I wonder whether NR_CPUS being > > > larger caused the issue with this specific code. Otherwise > > > CPUMASK_OFFSTACK may not work that well on arm64. > > cpumask handling must use the accessor functions provided in > include/linux/cpumask.h for declaring and accessing cpumasks. It is likely > related to the driver opencoding one of the accessors. I took a look at both the OPP code and the cpufreq-dt code and it looks like those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use the cpumask accessors, and use the cpumask_var_*() functions to dynamically allocate/free cpumasks). Maybe I've missed something, but superficially those look right. Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512? That'll have CPUMASK_OFFSTACK=n, and: * If that blows up, we know the problem is independent of CPUMASK_OFFSTACK, and has something to do with large cpumasks (either a driver bug, or elsewhere). * If that doesn't blow up, it suggests the problem is related to CPUMASK_OFFSTACK rather than with large cpumasks specifically. Either way, we probably need to revert this patch for now, as this won't have enough time to soak in linux-next in time for v6.9. Mark.
On 11.03.2024 13:12, Mark Rutland wrote: > On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote: >> On Fri, 8 Mar 2024, Marek Szyprowski wrote: >>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments >>>>> related with this change. >>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being >>>> larger caused the issue with this specific code. Otherwise >>>> CPUMASK_OFFSTACK may not work that well on arm64. >> cpumask handling must use the accessor functions provided in >> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely >> related to the driver opencoding one of the accessors. > I took a look at both the OPP code and the cpufreq-dt code and it looks like > those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use > the cpumask accessors, and use the cpumask_var_*() functions to dynamically > allocate/free cpumasks). Maybe I've missed something, but superficially those > look right. > > Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512? Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works fine, so it must be something else broken. > That'll have CPUMASK_OFFSTACK=n, and: > > * If that blows up, we know the problem is independent of CPUMASK_OFFSTACK, and > has something to do with large cpumasks (either a driver bug, or elsewhere). > > * If that doesn't blow up, it suggests the problem is related to > CPUMASK_OFFSTACK rather than with large cpumasks specifically. > > Either way, we probably need to revert this patch for now, as this won't have > enough time to soak in linux-next in time for v6.9. Best regards
On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote: > On 11.03.2024 13:12, Mark Rutland wrote: > > On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote: > >> On Fri, 8 Mar 2024, Marek Szyprowski wrote: > >>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments > >>>>> related with this change. > >>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being > >>>> larger caused the issue with this specific code. Otherwise > >>>> CPUMASK_OFFSTACK may not work that well on arm64. > >> cpumask handling must use the accessor functions provided in > >> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely > >> related to the driver opencoding one of the accessors. > > I took a look at both the OPP code and the cpufreq-dt code and it looks like > > those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use > > the cpumask accessors, and use the cpumask_var_*() functions to dynamically > > allocate/free cpumasks). Maybe I've missed something, but superficially those > > look right. > > > > Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512? > > Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works > fine, so it must be something else broken. Thanks for confirming. Would you mind testing the problematic commit with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything obvious that can be fixed quickly, I'll revert the commit and queue it again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet).
Hi Catalin, On 11.03.2024 16:22, Catalin Marinas wrote: > On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote: >> On 11.03.2024 13:12, Mark Rutland wrote: >>> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote: >>>> On Fri, 8 Mar 2024, Marek Szyprowski wrote: >>>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments >>>>>>> related with this change. >>>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being >>>>>> larger caused the issue with this specific code. Otherwise >>>>>> CPUMASK_OFFSTACK may not work that well on arm64. >>>> cpumask handling must use the accessor functions provided in >>>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely >>>> related to the driver opencoding one of the accessors. >>> I took a look at both the OPP code and the cpufreq-dt code and it looks like >>> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use >>> the cpumask accessors, and use the cpumask_var_*() functions to dynamically >>> allocate/free cpumasks). Maybe I've missed something, but superficially those >>> look right. >>> >>> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512? >> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works >> fine, so it must be something else broken. > Thanks for confirming. Would you mind testing the problematic commit > with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything > obvious that can be fixed quickly, I'll revert the commit and queue it > again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet). I've enabled this option, but unfortunately it didn't reveal anything more besides the warning and error I've posted in my initial report. I will try to analyze this issue further, but I won't manage to do this today. Best regards
On Mon, Mar 11, 2024 at 05:51:04PM +0100, Marek Szyprowski wrote: > On 11.03.2024 16:22, Catalin Marinas wrote: > > On Mon, Mar 11, 2024 at 03:56:37PM +0100, Marek Szyprowski wrote: > >> On 11.03.2024 13:12, Mark Rutland wrote: > >>> On Fri, Mar 08, 2024 at 09:08:59AM -0800, Christoph Lameter (Ampere) wrote: > >>>> On Fri, 8 Mar 2024, Marek Szyprowski wrote: > >>>>>>> It looks that cpufreq-dt and/or opp drivers needs some adjustments > >>>>>>> related with this change. > >>>>>> That's strange. Is this with defconfig? I wonder whether NR_CPUS being > >>>>>> larger caused the issue with this specific code. Otherwise > >>>>>> CPUMASK_OFFSTACK may not work that well on arm64. > >>>> cpumask handling must use the accessor functions provided in > >>>> include/linux/cpumask.h for declaring and accessing cpumasks. It is likely > >>>> related to the driver opencoding one of the accessors. > >>> I took a look at both the OPP code and the cpufreq-dt code and it looks like > >>> those are doign the right thing w.r.t. cpumask manipulation (i.e. they only use > >>> the cpumask accessors, and use the cpumask_var_*() functions to dynamically > >>> allocate/free cpumasks). Maybe I've missed something, but superficially those > >>> look right. > >>> > >>> Marek, can you try reverting this commit and trying defconfig + NR_CPUS=512? > >> Yes, with $subject reverted and CONFIG_NR_CPUS=512 everything works > >> fine, so it must be something else broken. > > Thanks for confirming. Would you mind testing the problematic commit > > with CONFIG_DEBUG_PER_CPU_MAPS enabled? If it doesn't show anything > > obvious that can be fixed quickly, I'll revert the commit and queue it > > again after -rc1 for 6.10 (I haven't sent 6.9 the pull request yet). > > I've enabled this option, but unfortunately it didn't reveal anything > more besides the warning and error I've posted in my initial report. I > will try to analyze this issue further, but I won't manage to do this today. No worries, thanks for giving this a try.
On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote: > On 07.03.2024 02:45, Christoph Lameter (Ampere) wrote: > > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere > > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on > > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. > > Therefore this patch increases the default NR_CPUS from 256 to 512. > > > > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that > > this might have a significant impact on stack usage due to code which places > > cpumasks on the stack. To mitigate that concern, we can select > > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with > > NR_CPUS=256, we only select this when NR_CPUS > 256. > > > > CPUMASK_OFFSTACK configures the cpumasks in the kernel to be > > dynamically allocated. This was used in the X86 architecture in the > > past to enable support for larger CPU configurations up to 8k cpus. [...] > This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: > Dynamically allocate cpumasks and increase supported CPUs to 512"). > Unfortunately it triggers the following warning during boot on most of > my ARM64-based test boards. Here is an example from Odroid-N2 board: I spent a big part of this afternoon going through the code paths but there's nothing obvious that triggered this problem. My suspicion is some memory corruption, algorithmically I can't see anything that could go wrong with CPUMASK_OFFSTACK. Unfortunately I could not reproduce it yet to be able to add some debug info. So I decided to revert this patch. If we get to the bottom of it during the merging window, I can still revive it. Otherwise we'll add it to linux-next post -rc1. Thanks for reporting it and subsequent debugging.
On Mon, 11 Mar 2024, Catalin Marinas wrote: >> This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: >> Dynamically allocate cpumasks and increase supported CPUs to 512"). >> Unfortunately it triggers the following warning during boot on most of >> my ARM64-based test boards. Here is an example from Odroid-N2 board: > > I spent a big part of this afternoon going through the code paths but > there's nothing obvious that triggered this problem. My suspicion is > some memory corruption, algorithmically I can't see anything that could > go wrong with CPUMASK_OFFSTACK. Unfortunately I could not reproduce it > yet to be able to add some debug info. > > So I decided to revert this patch. If we get to the bottom of it during > the merging window, I can still revive it. Otherwise we'll add it to > linux-next post -rc1. I also looked through the opp source and I cannot find even anything that even uses the functionality changed by the OFFSTACK option. This could be an issue in the ARM64 arch code itself where there maybe an assumption elsewhere that a cpumask can always store up to NR_CPU cpus and not only nr_cpu_ids as OFFSTACK does. How can I exercise the opp driver in order to recreate the problem? I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if there is an issue with OFFSTACK in opp then it should fail with kernel default configuration on that platform.
On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote: > This could be an issue in the ARM64 arch code itself where there maybe an > assumption elsewhere that a cpumask can always store up to NR_CPU cpus and > not only nr_cpu_ids as OFFSTACK does. > > How can I exercise the opp driver in order to recreate the problem? > > I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if there > is an issue with OFFSTACK in opp then it should fail with kernel default > configuration on that platform. I checked the ARM64 arch sources use of NR_CPUS and its all fine. Also verified in my testing logs that CONFIG_PM_OPP was set in all tests. No warnings in the kernel log during those tests. How to reproduce this?
On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote: > On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote: > > > This could be an issue in the ARM64 arch code itself where there maybe > > an assumption elsewhere that a cpumask can always store up to NR_CPU > > cpus and not only nr_cpu_ids as OFFSTACK does. > > > > How can I exercise the opp driver in order to recreate the problem? > > > > I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if > > there is an issue with OFFSTACK in opp then it should fail with kernel > > default configuration on that platform. > > I checked the ARM64 arch sources use of NR_CPUS and its all fine. > > Also verified in my testing logs that CONFIG_PM_OPP was set in all tests. > > No warnings in the kernel log during those tests. > > How to reproduce this? I guess you need a platform with a dts that has an "operating-points-v2" property. I don't have any around. Sudeep was trying to trigger this code path earlier, not sure where he got to.
On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote: > On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote: > > On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote: > > > > > This could be an issue in the ARM64 arch code itself where there maybe > > > an assumption elsewhere that a cpumask can always store up to NR_CPU > > > cpus and not only nr_cpu_ids as OFFSTACK does. > > > > > > How can I exercise the opp driver in order to recreate the problem? > > > > > > I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if > > > there is an issue with OFFSTACK in opp then it should fail with kernel > > > default configuration on that platform. > > > > I checked the ARM64 arch sources use of NR_CPUS and its all fine. > > > > Also verified in my testing logs that CONFIG_PM_OPP was set in all tests. > > > > No warnings in the kernel log during those tests. > > > > How to reproduce this? > > I guess you need a platform with a dts that has an "operating-points-v2" > property. I don't have any around. > > Sudeep was trying to trigger this code path earlier, not sure where he > got to. I did try to trigger this on FVP by adding OPPs + some hacks to add dummy clock provider to successfully probe this driver. I couldn't hit the issue reported
On 13.03.2024 15:35, Sudeep Holla wrote: > On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote: >> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote: >>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote: >>> >>>> This could be an issue in the ARM64 arch code itself where there maybe >>>> an assumption elsewhere that a cpumask can always store up to NR_CPU >>>> cpus and not only nr_cpu_ids as OFFSTACK does. >>>> >>>> How can I exercise the opp driver in order to recreate the problem? >>>> >>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if >>>> there is an issue with OFFSTACK in opp then it should fail with kernel >>>> default configuration on that platform. >>> I checked the ARM64 arch sources use of NR_CPUS and its all fine. >>> >>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests. >>> >>> No warnings in the kernel log during those tests. >>> >>> How to reproduce this? >> I guess you need a platform with a dts that has an "operating-points-v2" >> property. I don't have any around. >> >> Sudeep was trying to trigger this code path earlier, not sure where he >> got to. > I did try to trigger this on FVP by adding OPPs + some hacks to add dummy > clock provider to successfully probe this driver. I couldn't hit the issue > reported
On Wed, 13 Mar 2024, Marek Szyprowski wrote: >> I did try to trigger this on FVP by adding OPPs + some hacks to add dummy >> clock provider to successfully probe this driver. I couldn't hit the issue >> reported
On Wed, Mar 13, 2024 at 05:22:33PM +0100, Marek Szyprowski wrote: > On 13.03.2024 15:35, Sudeep Holla wrote: > > On Tue, Mar 12, 2024 at 05:55:49PM +0000, Catalin Marinas wrote: > >> On Tue, Mar 12, 2024 at 10:06:06AM -0700, Christoph Lameter (Ampere) wrote: > >>> On Mon, 11 Mar 2024, Christoph Lameter (Ampere) wrote: > >>> > >>>> This could be an issue in the ARM64 arch code itself where there maybe > >>>> an assumption elsewhere that a cpumask can always store up to NR_CPU > >>>> cpus and not only nr_cpu_ids as OFFSTACK does. > >>>> > >>>> How can I exercise the opp driver in order to recreate the problem? > >>>> > >>>> I assume the opp driver is ARM specific? x86 defaults to OFFSTACK so if > >>>> there is an issue with OFFSTACK in opp then it should fail with kernel > >>>> default configuration on that platform. > >>> I checked the ARM64 arch sources use of NR_CPUS and its all fine. > >>> > >>> Also verified in my testing logs that CONFIG_PM_OPP was set in all tests. > >>> > >>> No warnings in the kernel log during those tests. > >>> > >>> How to reproduce this? > >> I guess you need a platform with a dts that has an "operating-points-v2" > >> property. I don't have any around. > >> > >> Sudeep was trying to trigger this code path earlier, not sure where he > >> got to. > > I did try to trigger this on FVP by adding OPPs + some hacks to add dummy > > clock provider to successfully probe this driver. I couldn't hit the issue > > reported
On 13.03.2024 17:39, Christoph Lameter (Ampere) wrote: > On Wed, 13 Mar 2024, Marek Szyprowski wrote: > >>> I did try to trigger this on FVP by adding OPPs + some hacks to add >>> dummy >>> clock provider to successfully probe this driver. I couldn't hit the >>> issue >>> reported
On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote: > So, I wonder whether what you're seeing is a latent bug which is > being tickled by the presence of the CPU masks being off-stack > changing the kernel timing. > > I would suggest the printk debug approach may help here to see when > the OPPs are begun to be parsed, when they're created etc and their > timing relationship to being used. Given the suspicion, it's possible > that the mere addition of printk() may "fix" the problem, which again > would be another semi-useful data point. It might be an init order problem. Passing "initcall_debug" on the cmdline might help a bit. It would also be useful in dev_pm_opp_set_config(), in the WARN_ON block, to print opp_table->opp_list.next to get an idea whether it looks like a valid pointer or memory corruption.
Dear All, On 14.03.2024 09:39, Catalin Marinas wrote: > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote: >> So, I wonder whether what you're seeing is a latent bug which is >> being tickled by the presence of the CPU masks being off-stack >> changing the kernel timing. >> >> I would suggest the printk debug approach may help here to see when >> the OPPs are begun to be parsed, when they're created etc and their >> timing relationship to being used. Given the suspicion, it's possible >> that the mere addition of printk() may "fix" the problem, which again >> would be another semi-useful data point. > It might be an init order problem. Passing "initcall_debug" on the > cmdline might help a bit. > > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON > block, to print opp_table->opp_list.next to get an idea whether it looks > like a valid pointer or memory corruption. I've finally found some time to do the step-by-step printk-based debugging of this issue and finally found what's broken! Here is the fix: diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index 8bd6e5e8f121..2d83bbc65dd0 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, int cpu) if (!priv) return -ENOMEM; - if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL)) + if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL)) return -ENOMEM; cpumask_set_cpu(cpu, priv->cpus); It is really surprising that this didn't blow up for anyone else so far... This means that the $subject patch is fine. I will send a proper patch fixing this issue in a few minutes. Best regards
On Thu, Mar 14, 2024 at 01:28:40PM +0100, Marek Szyprowski wrote: > Dear All, > > On 14.03.2024 09:39, Catalin Marinas wrote: > > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote: > >> So, I wonder whether what you're seeing is a latent bug which is > >> being tickled by the presence of the CPU masks being off-stack > >> changing the kernel timing. > >> > >> I would suggest the printk debug approach may help here to see when > >> the OPPs are begun to be parsed, when they're created etc and their > >> timing relationship to being used. Given the suspicion, it's possible > >> that the mere addition of printk() may "fix" the problem, which again > >> would be another semi-useful data point. > > It might be an init order problem. Passing "initcall_debug" on the > > cmdline might help a bit. > > > > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON > > block, to print opp_table->opp_list.next to get an idea whether it looks > > like a valid pointer or memory corruption. > > I've finally found some time to do the step-by-step printk-based > debugging of this issue and finally found what's broken! > > Here is the fix: > > diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c > index 8bd6e5e8f121..2d83bbc65dd0 100644 > --- a/drivers/cpufreq/cpufreq-dt.c > +++ b/drivers/cpufreq/cpufreq-dt.c > @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, > int cpu) > if (!priv) > return -ENOMEM; > > - if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL)) > + if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL)) > return -ENOMEM; > > cpumask_set_cpu(cpu, priv->cpus); > > > It is really surprising that this didn't blow up for anyone else so > far... This means that the $subject patch is fine. Wow. I guess we've been lucky with that allocation hitting memory containing zeros. Well done at tracking it down!
On Thu, Mar 14, 2024 at 01:28:40PM +0100, Marek Szyprowski wrote: > On 14.03.2024 09:39, Catalin Marinas wrote: > > On Wed, Mar 13, 2024 at 05:13:33PM +0000, Russell King wrote: > >> So, I wonder whether what you're seeing is a latent bug which is > >> being tickled by the presence of the CPU masks being off-stack > >> changing the kernel timing. > >> > >> I would suggest the printk debug approach may help here to see when > >> the OPPs are begun to be parsed, when they're created etc and their > >> timing relationship to being used. Given the suspicion, it's possible > >> that the mere addition of printk() may "fix" the problem, which again > >> would be another semi-useful data point. > > It might be an init order problem. Passing "initcall_debug" on the > > cmdline might help a bit. > > > > It would also be useful in dev_pm_opp_set_config(), in the WARN_ON > > block, to print opp_table->opp_list.next to get an idea whether it looks > > like a valid pointer or memory corruption. > > I've finally found some time to do the step-by-step printk-based > debugging of this issue and finally found what's broken! > > Here is the fix: > > diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c > index 8bd6e5e8f121..2d83bbc65dd0 100644 > --- a/drivers/cpufreq/cpufreq-dt.c > +++ b/drivers/cpufreq/cpufreq-dt.c > @@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, > int cpu) > if (!priv) > return -ENOMEM; > > - if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL)) > + if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL)) > return -ENOMEM; > > cpumask_set_cpu(cpu, priv->cpus); > > > It is really surprising that this didn't blow up for anyone else so > far... This means that the $subject patch is fine. > > I will send a proper patch fixing this issue in a few minutes. Nice. Many thanks for tracking this down. I'll revert the revert of the CPUMASK_OFFSTACK in the second part of the merging window (I already sent the pull request).
On Thu, 14 Mar 2024, Russell King (Oracle) wrote: >> It is really surprising that this didn't blow up for anyone else so >> far... This means that the $subject patch is fine. > > Wow. I guess we've been lucky with that allocation hitting memory > containing zeros. Well done at tracking it down! It would have blown up with slub_debug because that includes poisoning the contents of all allocations via the slab allocator. Why did that not occur? We should have seen a backtrace with data in registers etc showing poisoning values for an unitialized object. Note that this was indeed triggered by OFFSTACK because (z)alloc_cpumask_var() only generates a kmalloc allocation if that option is set. The config option was never set before my patch was applied on ARM64.
On Thu, Mar 07, 2024 at 07:07:07PM +0000, Catalin Marinas wrote: > On Wed, 06 Mar 2024 17:45:04 -0800, Christoph Lameter (Ampere) wrote: > > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere > > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on > > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. > > Therefore this patch increases the default NR_CPUS from 256 to 512. > > > > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that > > this might have a significant impact on stack usage due to code which places > > cpumasks on the stack. To mitigate that concern, we can select > > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with > > NR_CPUS=256, we only select this when NR_CPUS > 256. > > > > [...] > > Applied to arm64 (for-next/misc), thanks! > > I dropped the config entry and comment, replaced it with a select as per > Mark's suggestion. > > [1/1] ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 > https://git.kernel.org/arm64/c/0499a78369ad I re-instated this patch in arm64 for-next/core as: https://git.kernel.org/arm64/c/3fbd56f0e7c1
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index aa7c1d435139..4e767dede47d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1427,7 +1427,21 @@ config SCHED_SMT config NR_CPUS int "Maximum number of CPUs (2-4096)" range 2 4096 - default "256" + default "512" + +# +# Determines the placement of cpumasks. +# +# With CPUMASK_OFFSTACK the cpumasks are dynamically allocated. +# Useful for machines with lots of core because it avoids increasing +# the size of many of the data structures in the kernel. +# +# If this is off then the cpumasks have a static sizes and are +# embedded within data structures. +# + config CPUMASK_OFFSTACK + def_bool y + depends on NR_CPUS > 256 config HOTPLUG_CPU bool "Support for hot-pluggable CPUs"