Message ID | 155552633539.2015392.2477781120122237934.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | mm: Sub-section memory hotplug support | expand |
On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > The memory hotplug section is an arbitrary / convenient unit for memory > hotplug. 'Section-size' units have bled into the user interface > ('memblock' sysfs) and can not be changed without breaking existing > userspace. The section-size constraint, while mostly benign for typical > memory hotplug, has and continues to wreak havoc with 'device-memory' > use cases, persistent memory (pmem) in particular. Recall that pmem uses > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a > 'struct page' memmap for pmem. However, it does not use the 'bottom > half' of memory hotplug, i.e. never marks pmem pages online and never > exposes the userspace memblock interface for pmem. This leaves an > opening to redress the section-size constraint. v6 and we're not showing any review activity. Who would be suitable people to help out here?
On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > > > The memory hotplug section is an arbitrary / convenient unit for memory > > hotplug. 'Section-size' units have bled into the user interface > > ('memblock' sysfs) and can not be changed without breaking existing > > userspace. The section-size constraint, while mostly benign for typical > > memory hotplug, has and continues to wreak havoc with 'device-memory' > > use cases, persistent memory (pmem) in particular. Recall that pmem uses > > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a > > 'struct page' memmap for pmem. However, it does not use the 'bottom > > half' of memory hotplug, i.e. never marks pmem pages online and never > > exposes the userspace memblock interface for pmem. This leaves an > > opening to redress the section-size constraint. > > v6 and we're not showing any review activity. Who would be suitable > people to help out here? There was quite a bit of review of the cover letter from Michal and David, but you're right the details not so much as of yet. I'd like to call out other people where I can reciprocate with some review of my own. Oscar's altmap work looks like a good candidate for that.
On Wed, Apr 17, 2019 at 3:59 PM Dan Williams <dan.j.williams@intel.com> wrote: > > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > > > > > The memory hotplug section is an arbitrary / convenient unit for memory > > > hotplug. 'Section-size' units have bled into the user interface > > > ('memblock' sysfs) and can not be changed without breaking existing > > > userspace. The section-size constraint, while mostly benign for typical > > > memory hotplug, has and continues to wreak havoc with 'device-memory' > > > use cases, persistent memory (pmem) in particular. Recall that pmem uses > > > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a > > > 'struct page' memmap for pmem. However, it does not use the 'bottom > > > half' of memory hotplug, i.e. never marks pmem pages online and never > > > exposes the userspace memblock interface for pmem. This leaves an > > > opening to redress the section-size constraint. > > > > v6 and we're not showing any review activity. Who would be suitable > > people to help out here? > > There was quite a bit of review of the cover letter from Michal and > David, but you're right the details not so much as of yet. I'd like to > call out other people where I can reciprocate with some review of my > own. Oscar's altmap work looks like a good candidate for that. I'm also hoping Jeff can give a tested-by for the customer scenarios that fall over with the current implementation.
Dan Williams <dan.j.williams@intel.com> writes: >> On Wed, Apr 17, 2019 at 3:59 PM Dan Williams <dan.j.williams@intel.com> wrote: >> >> On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> > >> > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@intel.com> wrote: >> > >> > > The memory hotplug section is an arbitrary / convenient unit for memory >> > > hotplug. 'Section-size' units have bled into the user interface >> > > ('memblock' sysfs) and can not be changed without breaking existing >> > > userspace. The section-size constraint, while mostly benign for typical >> > > memory hotplug, has and continues to wreak havoc with 'device-memory' >> > > use cases, persistent memory (pmem) in particular. Recall that pmem uses >> > > devm_memremap_pages(), and subsequently arch_add_memory(), to allocate a >> > > 'struct page' memmap for pmem. However, it does not use the 'bottom >> > > half' of memory hotplug, i.e. never marks pmem pages online and never >> > > exposes the userspace memblock interface for pmem. This leaves an >> > > opening to redress the section-size constraint. >> > >> > v6 and we're not showing any review activity. Who would be suitable >> > people to help out here? >> >> There was quite a bit of review of the cover letter from Michal and >> David, but you're right the details not so much as of yet. I'd like to >> call out other people where I can reciprocate with some review of my >> own. Oscar's altmap work looks like a good candidate for that. > > I'm also hoping Jeff can give a tested-by for the customer scenarios > that fall over with the current implementation. Sure. I'll also have a look over the patches. -Jeff
On Thu, Apr 18, 2019 at 5:45 AM Jeff Moyer <jmoyer@redhat.com> wrote: [..] > >> > v6 and we're not showing any review activity. Who would be suitable > >> > people to help out here? > >> > >> There was quite a bit of review of the cover letter from Michal and > >> David, but you're right the details not so much as of yet. I'd like to > >> call out other people where I can reciprocate with some review of my > >> own. Oscar's altmap work looks like a good candidate for that. > > > > I'm also hoping Jeff can give a tested-by for the customer scenarios > > that fall over with the current implementation. > > Sure. I'll also have a look over the patches. Andrew, heads up it looks like there is a memory corruption bug in these patches as I've gotten a few reported of "bad page state" at boot. Please drop until I can track down the failure.
On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote: > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation. > org> wrote: > > > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@int > > el.com> wrote: > > > > > The memory hotplug section is an arbitrary / convenient unit for > > > memory > > > hotplug. 'Section-size' units have bled into the user interface > > > ('memblock' sysfs) and can not be changed without breaking > > > existing > > > userspace. The section-size constraint, while mostly benign for > > > typical > > > memory hotplug, has and continues to wreak havoc with 'device- > > > memory' > > > use cases, persistent memory (pmem) in particular. Recall that > > > pmem uses > > > devm_memremap_pages(), and subsequently arch_add_memory(), to > > > allocate a > > > 'struct page' memmap for pmem. However, it does not use the > > > 'bottom > > > half' of memory hotplug, i.e. never marks pmem pages online and > > > never > > > exposes the userspace memblock interface for pmem. This leaves an > > > opening to redress the section-size constraint. > > > > v6 and we're not showing any review activity. Who would be > > suitable > > people to help out here? > > There was quite a bit of review of the cover letter from Michal and > David, but you're right the details not so much as of yet. I'd like > to > call out other people where I can reciprocate with some review of my > own. Oscar's altmap work looks like a good candidate for that. Thanks Dan for ccing me. I will take a look at the patches soon.
I am also taking a look at this work now. I will review and test it in the next couple of days. Pasha On Tue, Apr 23, 2019 at 9:17 AM Oscar Salvador <osalvador@suse.de> wrote: > > On Wed, 2019-04-17 at 15:59 -0700, Dan Williams wrote: > > On Wed, Apr 17, 2019 at 3:04 PM Andrew Morton <akpm@linux-foundation. > > org> wrote: > > > > > > On Wed, 17 Apr 2019 11:38:55 -0700 Dan Williams <dan.j.williams@int > > > el.com> wrote: > > > > > > > The memory hotplug section is an arbitrary / convenient unit for > > > > memory > > > > hotplug. 'Section-size' units have bled into the user interface > > > > ('memblock' sysfs) and can not be changed without breaking > > > > existing > > > > userspace. The section-size constraint, while mostly benign for > > > > typical > > > > memory hotplug, has and continues to wreak havoc with 'device- > > > > memory' > > > > use cases, persistent memory (pmem) in particular. Recall that > > > > pmem uses > > > > devm_memremap_pages(), and subsequently arch_add_memory(), to > > > > allocate a > > > > 'struct page' memmap for pmem. However, it does not use the > > > > 'bottom > > > > half' of memory hotplug, i.e. never marks pmem pages online and > > > > never > > > > exposes the userspace memblock interface for pmem. This leaves an > > > > opening to redress the section-size constraint. > > > > > > v6 and we're not showing any review activity. Who would be > > > suitable > > > people to help out here? > > > > There was quite a bit of review of the cover letter from Michal and > > David, but you're right the details not so much as of yet. I'd like > > to > > call out other people where I can reciprocate with some review of my > > own. Oscar's altmap work looks like a good candidate for that. > > Thanks Dan for ccing me. > I will take a look at the patches soon. > > -- > Oscar Salvador > SUSE L3
Hi Dan, How do you test these patches? Do you have any instructions? I see for example that check_hotplug_memory_range() still enforces memory_block_size_bytes() alignment. Also, after removing check_hotplug_memory_range(), I tried to online 16M aligned DAX memory, and got the following panic: # echo online > /sys/devices/system/memory/memory7/state [ 202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207 memory_block_action+0x110/0x178 [ 202.193391] Modules linked in: [ 202.193698] CPU: 2 PID: 351 Comm: sh Not tainted 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9 [ 202.193909] Hardware name: linux,dummy-virt (DT) [ 202.194122] pstate: 60000005 (nZCv daif -PAN -UAO) [ 202.194243] pc : memory_block_action+0x110/0x178 [ 202.194404] lr : memory_block_action+0x90/0x178 [ 202.194506] sp : ffff000016763ca0 [ 202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80 [ 202.194724] x27: 0000000000000000 x26: 0000000000000000 [ 202.194838] x25: ffff000015546000 x24: 00000000001c0000 [ 202.194949] x23: 0000000000000000 x22: 0000000000040000 [ 202.195058] x21: 00000000001c0000 x20: 0000000000000008 [ 202.195168] x19: 0000000000000007 x18: 0000000000000000 [ 202.195281] x17: 0000000000000000 x16: 0000000000000000 [ 202.195393] x15: 0000000000000000 x14: 0000000000000000 [ 202.195505] x13: 0000000000000000 x12: 0000000000000000 [ 202.195614] x11: 0000000000000000 x10: 0000000000000000 [ 202.195744] x9 : 0000000000000000 x8 : 0000000180000000 [ 202.195858] x7 : 0000000000000018 x6 : ffff000015541930 [ 202.195966] x5 : ffff000015541930 x4 : 0000000000000001 [ 202.196074] x3 : 0000000000000001 x2 : 0000000000000000 [ 202.196185] x1 : 0000000000000070 x0 : 0000000000000000 [ 202.196366] Call trace: [ 202.196455] memory_block_action+0x110/0x178 [ 202.196589] memory_subsys_online+0x3c/0x80 [ 202.196681] device_online+0x6c/0x90 [ 202.196761] state_store+0x84/0x100 [ 202.196841] dev_attr_store+0x18/0x28 [ 202.196927] sysfs_kf_write+0x40/0x58 [ 202.197010] kernfs_fop_write+0xcc/0x1d8 [ 202.197099] __vfs_write+0x18/0x40 [ 202.197187] vfs_write+0xa4/0x1b0 [ 202.197295] ksys_write+0x64/0xd8 [ 202.197430] __arm64_sys_write+0x18/0x20 [ 202.197521] el0_svc_common.constprop.0+0x7c/0xe8 [ 202.197621] el0_svc_handler+0x28/0x78 [ 202.197706] el0_svc+0x8/0xc [ 202.197828] ---[ end trace 57719823dda6d21e ]--- Thank you, Pasha
On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > Hi Dan, > > How do you test these patches? Do you have any instructions? Yes, I briefly mentioned this in the cover letter, but here is the test I am using: > > I see for example that check_hotplug_memory_range() still enforces > memory_block_size_bytes() alignment. > > Also, after removing check_hotplug_memory_range(), I tried to online > 16M aligned DAX memory, and got the following panic: Right, this functionality is currently strictly limited to the devm_memremap_pages() case where there are guarantees that the memory will never be onlined. This is due to the fact that the section size is entangled with the memblock api. That said I would have expected you to trigger the warning in subsection_check() before getting this far into the hotplug process. > > # echo online > /sys/devices/system/memory/memory7/state > [ 202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207 > memory_block_action+0x110/0x178 > [ 202.193391] Modules linked in: > [ 202.193698] CPU: 2 PID: 351 Comm: sh Not tainted > 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9 > [ 202.193909] Hardware name: linux,dummy-virt (DT) > [ 202.194122] pstate: 60000005 (nZCv daif -PAN -UAO) > [ 202.194243] pc : memory_block_action+0x110/0x178 > [ 202.194404] lr : memory_block_action+0x90/0x178 > [ 202.194506] sp : ffff000016763ca0 > [ 202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80 > [ 202.194724] x27: 0000000000000000 x26: 0000000000000000 > [ 202.194838] x25: ffff000015546000 x24: 00000000001c0000 > [ 202.194949] x23: 0000000000000000 x22: 0000000000040000 > [ 202.195058] x21: 00000000001c0000 x20: 0000000000000008 > [ 202.195168] x19: 0000000000000007 x18: 0000000000000000 > [ 202.195281] x17: 0000000000000000 x16: 0000000000000000 > [ 202.195393] x15: 0000000000000000 x14: 0000000000000000 > [ 202.195505] x13: 0000000000000000 x12: 0000000000000000 > [ 202.195614] x11: 0000000000000000 x10: 0000000000000000 > [ 202.195744] x9 : 0000000000000000 x8 : 0000000180000000 > [ 202.195858] x7 : 0000000000000018 x6 : ffff000015541930 > [ 202.195966] x5 : ffff000015541930 x4 : 0000000000000001 > [ 202.196074] x3 : 0000000000000001 x2 : 0000000000000000 > [ 202.196185] x1 : 0000000000000070 x0 : 0000000000000000 > [ 202.196366] Call trace: > [ 202.196455] memory_block_action+0x110/0x178 > [ 202.196589] memory_subsys_online+0x3c/0x80 > [ 202.196681] device_online+0x6c/0x90 > [ 202.196761] state_store+0x84/0x100 > [ 202.196841] dev_attr_store+0x18/0x28 > [ 202.196927] sysfs_kf_write+0x40/0x58 > [ 202.197010] kernfs_fop_write+0xcc/0x1d8 > [ 202.197099] __vfs_write+0x18/0x40 > [ 202.197187] vfs_write+0xa4/0x1b0 > [ 202.197295] ksys_write+0x64/0xd8 > [ 202.197430] __arm64_sys_write+0x18/0x20 > [ 202.197521] el0_svc_common.constprop.0+0x7c/0xe8 > [ 202.197621] el0_svc_handler+0x28/0x78 > [ 202.197706] el0_svc+0x8/0xc > [ 202.197828] ---[ end trace 57719823dda6d21e ]--- > > Thank you, > Pasha
On Thu, May 2, 2019 at 4:20 PM Dan Williams <dan.j.williams@intel.com> wrote: > > On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > > > Hi Dan, > > > > How do you test these patches? Do you have any instructions? > > Yes, I briefly mentioned this in the cover letter, but here is the > test I am using: Sorry, fumble fingered the 'send' button, here is that link: https://github.com/pmem/ndctl/blob/subsection-pending/test/sub-section.sh
On Thu, May 02, 2019 at 04:20:03PM -0700, Dan Williams wrote: > On Thu, May 2, 2019 at 3:46 PM Pavel Tatashin <pasha.tatashin@soleen.com> wrote: > > > > Hi Dan, > > > > How do you test these patches? Do you have any instructions? > > Yes, I briefly mentioned this in the cover letter, but here is the > test I am using: > > > > > I see for example that check_hotplug_memory_range() still enforces > > memory_block_size_bytes() alignment. > > > > Also, after removing check_hotplug_memory_range(), I tried to online > > 16M aligned DAX memory, and got the following panic: > > Right, this functionality is currently strictly limited to the > devm_memremap_pages() case where there are guarantees that the memory > will never be onlined. This is due to the fact that the section size > is entangled with the memblock api. That said I would have expected > you to trigger the warning in subsection_check() before getting this > far into the hotplug process. > > > > # echo online > /sys/devices/system/memory/memory7/state > > [ 202.193132] WARNING: CPU: 2 PID: 351 at drivers/base/memory.c:207 > > memory_block_action+0x110/0x178 > > [ 202.193391] Modules linked in: > > [ 202.193698] CPU: 2 PID: 351 Comm: sh Not tainted > > 5.1.0-rc7_pt_devdax-00038-g865af4385544-dirty #9 > > [ 202.193909] Hardware name: linux,dummy-virt (DT) > > [ 202.194122] pstate: 60000005 (nZCv daif -PAN -UAO) > > [ 202.194243] pc : memory_block_action+0x110/0x178 > > [ 202.194404] lr : memory_block_action+0x90/0x178 > > [ 202.194506] sp : ffff000016763ca0 > > [ 202.194592] x29: ffff000016763ca0 x28: ffff80016fd29b80 > > [ 202.194724] x27: 0000000000000000 x26: 0000000000000000 > > [ 202.194838] x25: ffff000015546000 x24: 00000000001c0000 > > [ 202.194949] x23: 0000000000000000 x22: 0000000000040000 > > [ 202.195058] x21: 00000000001c0000 x20: 0000000000000008 > > [ 202.195168] x19: 0000000000000007 x18: 0000000000000000 > > [ 202.195281] x17: 0000000000000000 x16: 0000000000000000 > > [ 202.195393] x15: 0000000000000000 x14: 0000000000000000 > > [ 202.195505] x13: 0000000000000000 x12: 0000000000000000 > > [ 202.195614] x11: 0000000000000000 x10: 0000000000000000 > > [ 202.195744] x9 : 0000000000000000 x8 : 0000000180000000 > > [ 202.195858] x7 : 0000000000000018 x6 : ffff000015541930 > > [ 202.195966] x5 : ffff000015541930 x4 : 0000000000000001 > > [ 202.196074] x3 : 0000000000000001 x2 : 0000000000000000 > > [ 202.196185] x1 : 0000000000000070 x0 : 0000000000000000 > > [ 202.196366] Call trace: > > [ 202.196455] memory_block_action+0x110/0x178 > > [ 202.196589] memory_subsys_online+0x3c/0x80 > > [ 202.196681] device_online+0x6c/0x90 > > [ 202.196761] state_store+0x84/0x100 > > [ 202.196841] dev_attr_store+0x18/0x28 > > [ 202.196927] sysfs_kf_write+0x40/0x58 > > [ 202.197010] kernfs_fop_write+0xcc/0x1d8 > > [ 202.197099] __vfs_write+0x18/0x40 > > [ 202.197187] vfs_write+0xa4/0x1b0 > > [ 202.197295] ksys_write+0x64/0xd8 > > [ 202.197430] __arm64_sys_write+0x18/0x20 > > [ 202.197521] el0_svc_common.constprop.0+0x7c/0xe8 > > [ 202.197621] el0_svc_handler+0x28/0x78 > > [ 202.197706] el0_svc+0x8/0xc > > [ 202.197828] ---[ end trace 57719823dda6d21e ]--- This warning relates to: for (; section_nr < section_nr_end; section_nr++) { if (WARN_ON_ONCE(!pfn_valid(pfn))) return false; from pages_correctly_probed(). AFAICS, this is orthogonal to subsection_check().