Message ID | 20210525080119.5455-1-mgorman@techsingularity.net (mailing list archive) |
---|---|
Headers | show |
Series | Calculate pcp->high based on zone sizes and active CPUs | expand |
Hi Mel, Feng Tang tossed these on a "Cascade Lake" system with 96 threads and ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile use" mode and being managed via the buddy just like the normal RAM. The PMEM zones are big ones: present 65011712 = 248 G high 134595 = 525 M The PMEM nodes, of course, don't have any CPUs in them. With your series, the pcp->high value per-cpu is 69584 pages or about 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of worst-case memory in the pcps per zone, or roughly 10% of the size of the zone. I did see quite a few pcp->counts above 60,000, so it's definitely possible in practice to see the pcps filled up. This was not observed to cause any actual problems in practice. But, it's still a bit worrisome. Maybe instead of pretending that cpu-less nodes have one cpu: nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))); we should just consider them to have *all* the CPUs in the system. Perhaps: nr_local_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))); if (!nr_local_cpus) nr_local_cpus = num_online_cpus(); Even if a system has a silly number of CPUs, the 'high' sizes will still hit a floor at 4*batch: high = max(high, batch << 2); Which doesn't seem too bad, especially considering that CPU-less nodes naturally have less contention.
On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: > Hi Mel, > > Feng Tang tossed these on a "Cascade Lake" system with 96 threads and > ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile > use" mode and being managed via the buddy just like the normal RAM. > > The PMEM zones are big ones: > > present 65011712 = 248 G > high 134595 = 525 M > > The PMEM nodes, of course, don't have any CPUs in them. > > With your series, the pcp->high value per-cpu is 69584 pages or about > 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of > worst-case memory in the pcps per zone, or roughly 10% of the size of > the zone. > > I did see quite a few pcp->counts above 60,000, so it's definitely > possible in practice to see the pcps filled up. This was not observed > to cause any actual problems in practice. But, it's still a bit worrisome. > Ok, it does have the potential to trigger early reclaim as pages are stored on remote PCP lists. The problem would be transient because vmstat would drain those pages over time but still, how about this patch on top of the series? --8<-- mm/page_alloc: Split pcp->high across all online CPUs for cpuless nodes Dave Hansen reported the following about Feng Tang's tests on a machine with persistent memory onlined as a DRAM-like device. Feng Tang tossed these on a "Cascade Lake" system with 96 threads and ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile use" mode and being managed via the buddy just like the normal RAM. The PMEM zones are big ones: present 65011712 = 248 G high 134595 = 525 M The PMEM nodes, of course, don't have any CPUs in them. With your series, the pcp->high value per-cpu is 69584 pages or about 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of worst-case memory in the pcps per zone, or roughly 10% of the size of the zone. This should not cause a problem as such although it could trigger reclaim due to pages being stored on per-cpu lists for CPUs remote to a node. It is not possible to treat cpuless nodes exactly the same as normal nodes but the worst-case scenario can be mitigated by splitting pcp->high across all online CPUs for cpuless memory nodes. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> --- mm/page_alloc.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d708aa14f4ef..af566e97a0f8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6687,7 +6687,7 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online) { #ifdef CONFIG_MMU int high; - int nr_local_cpus; + int nr_split_cpus; unsigned long total_pages; if (!percpu_pagelist_high_fraction) { @@ -6710,10 +6710,14 @@ static int zone_highsize(struct zone *zone, int batch, int cpu_online) * Split the high value across all online CPUs local to the zone. Note * that early in boot that CPUs may not be online yet and that during * CPU hotplug that the cpumask is not yet updated when a CPU is being - * onlined. - */ - nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online; - high = total_pages / nr_local_cpus; + * onlined. For memory nodes that have no CPUs, split pcp->high across + * all online CPUs to mitigate the risk that reclaim is triggered + * prematurely due to pages stored on pcp lists. + */ + nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; + if (!nr_split_cpus) + nr_split_cpus = num_online_cpus(); + high = total_pages / nr_split_cpus; /* * Ensure high is at least batch*4. The multiple is based on the
On 28.05.21 10:55, Mel Gorman wrote: > On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: >> Hi Mel, >> >> Feng Tang tossed these on a "Cascade Lake" system with 96 threads and >> ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile >> use" mode and being managed via the buddy just like the normal RAM. >> >> The PMEM zones are big ones: >> >> present 65011712 = 248 G >> high 134595 = 525 M >> >> The PMEM nodes, of course, don't have any CPUs in them. >> >> With your series, the pcp->high value per-cpu is 69584 pages or about >> 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of >> worst-case memory in the pcps per zone, or roughly 10% of the size of >> the zone. When I read about having such big amounts of free memory theoretically stuck in PCP lists, I guess we really want to start draining the PCP in alloc_contig_range(), just as we do with memory hotunplug when offlining.
On 28.05.21 11:03, David Hildenbrand wrote: > On 28.05.21 10:55, Mel Gorman wrote: >> On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: >>> Hi Mel, >>> >>> Feng Tang tossed these on a "Cascade Lake" system with 96 threads and >>> ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile >>> use" mode and being managed via the buddy just like the normal RAM. >>> >>> The PMEM zones are big ones: >>> >>> present 65011712 = 248 G >>> high 134595 = 525 M >>> >>> The PMEM nodes, of course, don't have any CPUs in them. >>> >>> With your series, the pcp->high value per-cpu is 69584 pages or about >>> 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of >>> worst-case memory in the pcps per zone, or roughly 10% of the size of >>> the zone. > > When I read about having such big amounts of free memory theoretically > stuck in PCP lists, I guess we really want to start draining the PCP in > alloc_contig_range(), just as we do with memory hotunplug when offlining. > Correction: we already drain the pcp, we just don't temporarily disable it, so a race as described in offline_pages() could apply: "Disable pcplists so that page isolation cannot race with freeing in a way that pages from isolated pageblock are left on pcplists." Guess we'd then want to move the draining before start_isolate_page_range() in alloc_contig_range().
On Fri, May 28, 2021 at 11:08:01AM +0200, David Hildenbrand wrote: > On 28.05.21 11:03, David Hildenbrand wrote: > > On 28.05.21 10:55, Mel Gorman wrote: > > > On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: > > > > Hi Mel, > > > > > > > > Feng Tang tossed these on a "Cascade Lake" system with 96 threads and > > > > ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile > > > > use" mode and being managed via the buddy just like the normal RAM. > > > > > > > > The PMEM zones are big ones: > > > > > > > > present 65011712 = 248 G > > > > high 134595 = 525 M > > > > > > > > The PMEM nodes, of course, don't have any CPUs in them. > > > > > > > > With your series, the pcp->high value per-cpu is 69584 pages or about > > > > 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of > > > > worst-case memory in the pcps per zone, or roughly 10% of the size of > > > > the zone. > > > > When I read about having such big amounts of free memory theoretically > > stuck in PCP lists, I guess we really want to start draining the PCP in > > alloc_contig_range(), just as we do with memory hotunplug when offlining. > > > > Correction: we already drain the pcp, we just don't temporarily disable it, > so a race as described in offline_pages() could apply: > > "Disable pcplists so that page isolation cannot race with freeing > in a way that pages from isolated pageblock are left on pcplists." > > Guess we'd then want to move the draining before start_isolate_page_range() > in alloc_contig_range(). > Or instead of draining, validate the PFN range in alloc_contig_range is within the same zone and if so, call zone_pcp_disable() before start_isolate_page_range and enable after __alloc_contig_migrate_range.
On 28.05.21 11:49, Mel Gorman wrote: > On Fri, May 28, 2021 at 11:08:01AM +0200, David Hildenbrand wrote: >> On 28.05.21 11:03, David Hildenbrand wrote: >>> On 28.05.21 10:55, Mel Gorman wrote: >>>> On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: >>>>> Hi Mel, >>>>> >>>>> Feng Tang tossed these on a "Cascade Lake" system with 96 threads and >>>>> ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile >>>>> use" mode and being managed via the buddy just like the normal RAM. >>>>> >>>>> The PMEM zones are big ones: >>>>> >>>>> present 65011712 = 248 G >>>>> high 134595 = 525 M >>>>> >>>>> The PMEM nodes, of course, don't have any CPUs in them. >>>>> >>>>> With your series, the pcp->high value per-cpu is 69584 pages or about >>>>> 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of >>>>> worst-case memory in the pcps per zone, or roughly 10% of the size of >>>>> the zone. >>> >>> When I read about having such big amounts of free memory theoretically >>> stuck in PCP lists, I guess we really want to start draining the PCP in >>> alloc_contig_range(), just as we do with memory hotunplug when offlining. >>> >> >> Correction: we already drain the pcp, we just don't temporarily disable it, >> so a race as described in offline_pages() could apply: >> >> "Disable pcplists so that page isolation cannot race with freeing >> in a way that pages from isolated pageblock are left on pcplists." >> >> Guess we'd then want to move the draining before start_isolate_page_range() >> in alloc_contig_range(). >> > > Or instead of draining, validate the PFN range in alloc_contig_range > is within the same zone and if so, call zone_pcp_disable() before > start_isolate_page_range and enable after __alloc_contig_migrate_range. > We require the caller to only pass a range within a single zone, so that should be fine. The only ugly thing about zone_pcp_disable() is mutex_lock(&pcp_batch_high_lock) which would serialize all alloc_contig_range() and even with offline_pages().
On Fri, May 28, 2021 at 11:52:53AM +0200, David Hildenbrand wrote: > > > "Disable pcplists so that page isolation cannot race with freeing > > > in a way that pages from isolated pageblock are left on pcplists." > > > > > > Guess we'd then want to move the draining before start_isolate_page_range() > > > in alloc_contig_range(). > > > > > > > Or instead of draining, validate the PFN range in alloc_contig_range > > is within the same zone and if so, call zone_pcp_disable() before > > start_isolate_page_range and enable after __alloc_contig_migrate_range. > > > > We require the caller to only pass a range within a single zone, so that > should be fine. > > The only ugly thing about zone_pcp_disable() is > mutex_lock(&pcp_batch_high_lock) which would serialize all > alloc_contig_range() and even with offline_pages(). > True so it would have to be accessed if that is bad or not. If racing against offline_pages, memory is potentially being offlined in the target zone which may cause allocation failure. If racing with other alloc_contig_range calls, the two callers are potentially racing to isolate and allocate the same range. The arguement could be made that alloc_contig_ranges should be serialised within one zone to improve the allocation success rate at the potential cost of allocation latency.
On 28.05.21 12:09, Mel Gorman wrote: > On Fri, May 28, 2021 at 11:52:53AM +0200, David Hildenbrand wrote: >>>> "Disable pcplists so that page isolation cannot race with freeing >>>> in a way that pages from isolated pageblock are left on pcplists." >>>> >>>> Guess we'd then want to move the draining before start_isolate_page_range() >>>> in alloc_contig_range(). >>>> >>> >>> Or instead of draining, validate the PFN range in alloc_contig_range >>> is within the same zone and if so, call zone_pcp_disable() before >>> start_isolate_page_range and enable after __alloc_contig_migrate_range. >>> >> >> We require the caller to only pass a range within a single zone, so that >> should be fine. >> >> The only ugly thing about zone_pcp_disable() is >> mutex_lock(&pcp_batch_high_lock) which would serialize all >> alloc_contig_range() and even with offline_pages(). >> > > True so it would have to be accessed if that is bad or not. If racing > against offline_pages, memory is potentially being offlined in the > target zone which may cause allocation failure. If racing with other > alloc_contig_range calls, the two callers are potentially racing to > isolate and allocate the same range. The arguement could be made that > alloc_contig_ranges should be serialised within one zone to improve the > allocation success rate at the potential cost of allocation latency. We have 3 main users of alloc_contig_range(): 1. CMA CMA synchronizes allocation within a CMA area via the allocation bitmap. So parallel CMA is perfectly possible and avoids races by design. 2. alloc_contig_pages() / Gigantic pages Gigantic page allocation could race with virtio-mem. CMA does not apply. Possible but unlikely to matter in practice. virtio-mem will retry later again. 3. virito-mem A virtio-mem device only operates on its assigned memory region, so we cannot have alloc_contig_range() from different devices racing, even when within a single zone. It could only race with gigantic pages as CMA does not apply. So serializing would mostly harm parallel CMA (possible and likely) and parallel virtio-mem operation (e.g., unplug memory of two virtio-mem devices -- unlikely but possible). Memory offlining racing with CMA is not an issue (impossible). virtio-mem synchronizes with memory offlining via memory notifiers, there is only a tiny little race window that usually doesn't matter as virtio-mem is expected to usually triggers offlining itself, and not user space rancomly. Memory offlining can race with dynamic gigantic page allocation, wich is highly unreliable already. I wonder if we could optimize locking in zone_pcp_disable() instead.
On 5/28/21 10:55 AM, Mel Gorman wrote: > On Thu, May 27, 2021 at 12:36:21PM -0700, Dave Hansen wrote: >> Hi Mel, >> >> Feng Tang tossed these on a "Cascade Lake" system with 96 threads and >> ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile >> use" mode and being managed via the buddy just like the normal RAM. >> >> The PMEM zones are big ones: >> >> present 65011712 = 248 G >> high 134595 = 525 M >> >> The PMEM nodes, of course, don't have any CPUs in them. >> >> With your series, the pcp->high value per-cpu is 69584 pages or about >> 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of >> worst-case memory in the pcps per zone, or roughly 10% of the size of >> the zone. >> >> I did see quite a few pcp->counts above 60,000, so it's definitely >> possible in practice to see the pcps filled up. This was not observed >> to cause any actual problems in practice. But, it's still a bit worrisome. >> > > Ok, it does have the potential to trigger early reclaim as pages are > stored on remote PCP lists. The problem would be transient because > vmstat would drain those pages over time but still, how about this patch > on top of the series? > > --8<-- > mm/page_alloc: Split pcp->high across all online CPUs for cpuless nodes > > Dave Hansen reported the following about Feng Tang's tests on a machine > with persistent memory onlined as a DRAM-like device. > > Feng Tang tossed these on a "Cascade Lake" system with 96 threads and > ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile > use" mode and being managed via the buddy just like the normal RAM. > > The PMEM zones are big ones: > > present 65011712 = 248 G > high 134595 = 525 M > > The PMEM nodes, of course, don't have any CPUs in them. > > With your series, the pcp->high value per-cpu is 69584 pages or about > 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of > worst-case memory in the pcps per zone, or roughly 10% of the size of > the zone. > > This should not cause a problem as such although it could trigger reclaim > due to pages being stored on per-cpu lists for CPUs remote to a node. It > is not possible to treat cpuless nodes exactly the same as normal nodes > but the worst-case scenario can be mitigated by splitting pcp->high across > all online CPUs for cpuless memory nodes. > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Maybe we should even consider distinguishing high limits for local-to-cpu zones vs remote, for example for the local-to-cpu zones we would divide by the number of local cpus, for remote-to-cpu zones we would divide by all cpus. Because we can expect cpus to allocate mostly from local zones, so leaving more pages on percpu for those zones can be beneficial. But as the motivation here was to reduce lock contention on freeing, that's less clear. We probably can't expect the cpu to be freeing mostly local pages (in case of e.g. a large process exiting), because no mechanism works towards that, or does it? In case of cpu freeing to remote zone, the lower high limit could hurt. So that would have to be evaluated if that works in practice. Out of scope here, just an idea to discuss.
On Fri, May 28, 2021 at 02:12:09PM +0200, Vlastimil Babka wrote: > > mm/page_alloc: Split pcp->high across all online CPUs for cpuless nodes > > > > Dave Hansen reported the following about Feng Tang's tests on a machine > > with persistent memory onlined as a DRAM-like device. > > > > Feng Tang tossed these on a "Cascade Lake" system with 96 threads and > > ~512G of persistent memory and 128G of DRAM. The PMEM is in "volatile > > use" mode and being managed via the buddy just like the normal RAM. > > > > The PMEM zones are big ones: > > > > present 65011712 = 248 G > > high 134595 = 525 M > > > > The PMEM nodes, of course, don't have any CPUs in them. > > > > With your series, the pcp->high value per-cpu is 69584 pages or about > > 270MB per CPU. Scaled up by the 96 CPU threads, that's ~26GB of > > worst-case memory in the pcps per zone, or roughly 10% of the size of > > the zone. > > > > This should not cause a problem as such although it could trigger reclaim > > due to pages being stored on per-cpu lists for CPUs remote to a node. It > > is not possible to treat cpuless nodes exactly the same as normal nodes > > but the worst-case scenario can be mitigated by splitting pcp->high across > > all online CPUs for cpuless memory nodes. > > > > Suggested-by: Dave Hansen <dave.hansen@intel.com> > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> > Thanks. > Maybe we should even consider distinguishing high limits for local-to-cpu zones > vs remote, for example for the local-to-cpu zones we would divide by the number > of local cpus, for remote-to-cpu zones we would divide by all cpus. > > Because we can expect cpus to allocate mostly from local zones, so leaving more > pages on percpu for those zones can be beneficial. > I did think about whether the ratios should be different but failed to conclude that it was necessary or useful so I kept it simple. > But as the motivation here was to reduce lock contention on freeing, that's less > clear. We probably can't expect the cpu to be freeing mostly local pages (in > case of e.g. a large process exiting), because no mechanism works towards that, > or does it? In case of cpu freeing to remote zone, the lower high limit could hurt. > This is the major issue. Even if an application was NUMA aware and heavily threaded, the process exiting is potentially freeing remote memory and there is nothing wrong about that. The remote memory will be partially drained by pcp->high being reached and the remaining memory will be cleaned up by vmstat. It's a similar problem if a process is truncating a large file with page cache allocated on a remote node. Hence I decided to do nothing fancy with the ratios until a practical problem was identified that could be alleviated by adjusting pcp->high based on whether the CPU is remote or local to memory.
On 5/28/21 1:55 AM, Mel Gorman wrote: > - * onlined. > - */ > - nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online; > - high = total_pages / nr_local_cpus; > + * onlined. For memory nodes that have no CPUs, split pcp->high across > + * all online CPUs to mitigate the risk that reclaim is triggered > + * prematurely due to pages stored on pcp lists. > + */ > + nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; > + if (!nr_split_cpus) > + nr_split_cpus = num_online_cpus(); > + high = total_pages / nr_split_cpus; Updated version looks fine to me, thanks! BTW, to do some of this testing, Feng was doing a plain old kernel build. On the one system where this got run, he noted a ~2% regression in build times. Nothing major, but you might want to be on the lookout in case 0day or the other test harnesses find something similar once this series gets to them. Acked-by: Dave Hansen <dave.hansen@intel.com>
On Fri, May 28, 2021 at 07:39:29AM -0700, Dave Hansen wrote: > On 5/28/21 1:55 AM, Mel Gorman wrote: > > - * onlined. > > - */ > > - nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online; > > - high = total_pages / nr_local_cpus; > > + * onlined. For memory nodes that have no CPUs, split pcp->high across > > + * all online CPUs to mitigate the risk that reclaim is triggered > > + * prematurely due to pages stored on pcp lists. > > + */ > > + nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; > > + if (!nr_split_cpus) > > + nr_split_cpus = num_online_cpus(); > > + high = total_pages / nr_split_cpus; > > Updated version looks fine to me, thanks! > > BTW, to do some of this testing, Feng was doing a plain old kernel > build. On the one system where this got run, he noted a ~2% regression > in build times. Nothing major, but you might want to be on the lookout > in case 0day or the other test harnesses find something similar once > this series gets to them. > What type of system was it? I noticed minor differences for some thread counts on kernel compilations but for CascadeLake at least, it was mostly neutral. Below is an old test result based on a previous revision. kernbench 5.13.0-rc2 5.13.0-rc2 vanilla mm-pcpburst-v2r3 Amean elsp-2 469.22 ( 0.00%) 470.03 * -0.17%* Amean elsp-4 251.03 ( 0.00%) 250.83 ( 0.08%) Amean elsp-8 131.39 ( 0.00%) 130.89 ( 0.38%) Amean elsp-16 74.37 ( 0.00%) 75.11 ( -0.99%) Amean elsp-32 42.10 ( 0.00%) 42.20 ( -0.24%) Amean elsp-64 32.21 ( 0.00%) 32.14 ( 0.23%) Amean elsp-128 31.59 ( 0.00%) 31.68 ( -0.27%) Amean elsp-160 31.76 ( 0.00%) 31.69 ( 0.21%) A Haswell machine showed the worst results for kernbench Amean elsp-2 459.99 ( 0.00%) 465.27 * -1.15%* Amean elsp-4 250.76 ( 0.00%) 253.17 * -0.96%* Amean elsp-8 141.28 ( 0.00%) 141.78 ( -0.36%) Amean elsp-16 77.71 ( 0.00%) 77.88 ( -0.22%) Amean elsp-32 44.09 ( 0.00%) 44.40 ( -0.69%) Amean elsp-64 33.79 ( 0.00%) 33.46 ( 0.96%) Amean elsp-128 33.14 ( 0.00%) 33.26 ( -0.37%) Amean elsp-160 33.26 ( 0.00%) 33.36 * -0.30%* The series with review feedback and dealing with cpuless nodes is queued and should complete over the weekend. > Acked-by: Dave Hansen <dave.hansen@intel.com> Thanks!
On 5/28/21 8:18 AM, Mel Gorman wrote: >> BTW, to do some of this testing, Feng was doing a plain old kernel >> build. On the one system where this got run, he noted a ~2% regression >> in build times. Nothing major, but you might want to be on the lookout >> in case 0day or the other test harnesses find something similar once >> this series gets to them. >> > What type of system was it? > > I noticed minor differences for some thread counts on kernel compilations > but for CascadeLake at least, it was mostly neutral. Below is an old test > result based on a previous revision. It's a Cascade Lake as well. But, I never trust hardware at a hardware company. These could be preproduction CPUs or BIOS or both, or have some bonkers configuration knob flipped. It's also got a bunch of PMEM plugged and onlined, including the _possibility_ of kernel data structures ended up on PMEM. They *mostly* don't end up there, but it does happen on occasion. Anyway, I'll see if we can do some more runs with your latest version. It looks like it's been picked up for -mm so 0day should be pounding on it soon enough.
On Fri, May 28, 2021 at 09:17:41AM -0700, Dave Hansen wrote: > On 5/28/21 8:18 AM, Mel Gorman wrote: > >> BTW, to do some of this testing, Feng was doing a plain old kernel > >> build. On the one system where this got run, he noted a ~2% regression > >> in build times. Nothing major, but you might want to be on the lookout > >> in case 0day or the other test harnesses find something similar once > >> this series gets to them. > >> > > What type of system was it? > > > > I noticed minor differences for some thread counts on kernel compilations > > but for CascadeLake at least, it was mostly neutral. Below is an old test > > result based on a previous revision. > > It's a Cascade Lake as well. But, I never trust hardware at a hardware > company. These could be preproduction CPUs or BIOS or both, or have > some bonkers configuration knob flipped. > > It's also got a bunch of PMEM plugged and onlined, including the > _possibility_ of kernel data structures ended up on PMEM. They *mostly* > don't end up there, but it does happen on occasion. > > Anyway, I'll see if we can do some more runs with your latest version. > It looks like it's been picked up for -mm so 0day should be pounding on > it soon enough. Yes, usually 0day has more benchmark test covering -mm tree. As for the kbuild test run for v2, after more runs, the previous 2% longer kbuild time turns to 1% shorter time, seems to be in normal deviation range. Also I checked Mel's v3 branch which has the fix for cpuless node, the pcp 'high' looks normal on PMEM node: pagesets cpu: 0 count: 67 high: 724 batch: 63 vm stats threshold: 125 Thanks, Feng