Message ID | 158818022727.24327.14309662489731832234.stgit@Palanthas (mailing list archive) |
---|---|
Headers | show |
Series | xen: credit2: limit the number of CPUs per runqueue | expand |
So, I felt like providing some additional thoughts about this series, from a release point of view (adding Paul). Timing is *beyond tight* so if this series, entirely or partly, has any chance to go in, it would be through some form of exception, which of course comes with some risks, etc. I did work hard to submit the full series, because I wanted people to be able to see the complete solution. However, I think the series itself can be logically split in two parts. Basically, if we just consider patches 1 and 4 we will end up, right after boot, with a system that has smaller runqueues. They will most likely be balanced in terms of how many CPUs each one has, so a good setup. This will likely (actual differences seems to depend *quite a bit* on the actual workload) be an improvement for very large systems. This is a path will get a decent amount of testing in OSSTests, from now until the day of the release, I think, because booting with the default CPU configuration and setup is what most (all?) OSSTests' jobs do. If the user starts to create pools, we can get to a point where the different runqueues are unbalanced, i.e., each one has a different number of CPUs in them, wrt the others. This, however: * can happen already, as of today, without this patches. Whether these patches may make things "better" or "worse", from this point of view, it's impossible to tell, because it actually depends on what CPUs the user moves among pools or put offline, etc. * this means that the scheduler has to deal with unbalanced runqueues anyway, and if it doesn't, it's a bug and, again, it seems to me that these patches don't make things particularly better or worse. So, again, for patches 1 and 4 alone, it looks to me that we'd get improvements, at least in some cases, the codepath will get some testing and they do not introduce additional or different issues than what we have already right now. They also are at their second iteration, as the original patch series was comprised of exactly those two patches. So, I think it would be interesting if these two patches would be given a chance, even just of getting some reviews... And I would be fine going through the formal process necessary for making that to happen myself. Then, there's the rest, the runqueue rebalancing thing. As said above, I personally would love if we'd release with it, but I see one rather big issue. In fact, such mechanism is triggered and stressed is stressed by the dynamic creation and manipulation of cpupools (and CPU on/off-lining), and we don't have an OSSTests test for that. Therefore, we are not in the best position for catching issues it may have introduced. I can commit to do some testing myself, but it's not the same thing has having them in our CI, I know that very well. So, I'd be interested in hearing what others think about these other patches as well, and I am happy to do my best to make sure that they are working fine, if we decide to try to include them too, but I do see this as much more of a risk myself. So, any thoughts? :-) Thanks and Regards
On Fri, 2020-05-29 at 13:46 +0200, Dario Faggioli wrote: > Basically, if we just consider patches 1 and 4 we will end up, right > after boot, with a system that has smaller runqueues. > Actually, to be fully precise, given how I reorganized the series, it's not patches 1 and 4, it's patches 1, 3 and 4. Hopefully, that is not a big deal, but it patch 3 is really a problem, I can re-arrange patch 4 for working without it. Apart from this, and for adding more information, on a system with 96 CPUs in 2 sockets, this is how the runqueues looks like (with these patches: (XEN) Online Cpus: 0-95 (XEN) Cpupool 0: (XEN) Cpus: 0-95 (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2) (XEN) Active queues: 6 (XEN) default-weight = 256 (XEN) Runqueue 0: (XEN) ncpus = 16 (XEN) cpus = 0-15 (XEN) max_weight = 256 (XEN) pick_bias = 0 (XEN) instload = 0 (XEN) aveload = 1223 (~0%) (XEN) idlers: 00000000,00000000,00000000,00000000,00000000,00000000,0000ffff (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,00000000,00000000,0000ffff (XEN) Runqueue 1: (XEN) ncpus = 16 (XEN) cpus = 16-31 (XEN) max_weight = 256 (XEN) pick_bias = 16 (XEN) instload = 0 (XEN) aveload = 3324 (~1%) (XEN) idlers: 00000000,00000000,00000000,00000000,00000000,00000000,ffff0000 (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,00000000,00000000,ffff0000 (XEN) Runqueue 2: (XEN) ncpus = 16 (XEN) cpus = 32-47 (XEN) max_weight = 256 (XEN) pick_bias = 32 (XEN) instload = 1 (XEN) aveload = 8996 (~3%) (XEN) idlers: 00000000,00000000,00000000,00000000,00000000,0000feff,00000000 (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,00000000,0000fcff,00000000 (XEN) Runqueue 3: (XEN) ncpus = 16 (XEN) cpus = 48-63 (XEN) max_weight = 256 (XEN) pick_bias = 48 (XEN) instload = 0 (XEN) aveload = 2424 (~0%) (XEN) idlers: 00000000,00000000,00000000,00000000,00000000,ffff0000,00000000 (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,00000000,ffff0000,00000000 (XEN) Runqueue 4: (XEN) ncpus = 16 (XEN) cpus = 64-79 (XEN) max_weight = 256 (XEN) pick_bias = 66 (XEN) instload = 0 (XEN) aveload = 1070 (~0%) (XEN) idlers: 00000000,00000000,00000000,00000000,0000ffff,00000000,00000000 (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,0000ffff,00000000,00000000 (XEN) Runqueue 5: (XEN) ncpus = 16 (XEN) cpus = 80-95 (XEN) max_weight = 256 (XEN) pick_bias = 82 (XEN) instload = 0 (XEN) aveload = 425 (~0%) (XEN) idlers: 00000000,00000000,00000000,00000000,ffff0000,00000000,00000000 (XEN) tickled: 00000000,00000000,00000000,00000000,00000000,00000000,00000000 (XEN) fully idle cores: 00000000,00000000,00000000,00000000,ffff0000,00000000,00000000 Without the patches, there would be just 2 of them (on with CPUs 0-47 and another with CPUs 48-95). On a system with "just" 16 CPUs, in 2 sockets, they look like this: (XEN) Online Cpus: 0-15 (XEN) Cpupool 0: (XEN) Cpus: 0-15 (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2) (XEN) Active queues: 2 (XEN) default-weight = 256 (XEN) Runqueue 0: (XEN) ncpus = 8 (XEN) cpus = 0-7 (XEN) max_weight = 256 (XEN) pick_bias = 0 (XEN) instload = 0 (XEN) aveload = 7077 (~2%) (XEN) idlers: 00000000,000000ff (XEN) tickled: 00000000,00000000 (XEN) fully idle cores: 00000000,000000ff (XEN) Runqueue 1: (XEN) ncpus = 8 (XEN) cpus = 8-15 (XEN) max_weight = 256 (XEN) pick_bias = 8 (XEN) instload = 1 (XEN) aveload = 11848 (~4%) (XEN) idlers: 00000000,0000fe00 (XEN) tickled: 00000000,00000000 (XEN) fully idle cores: 00000000,0000fc00 There are still 2, because there are 2 sockets, and we still honor the topology (and 8 CPUs in a runqueue is fine, because is lower than 16). I'll share the same output on a 256 CPU system, as soon as I finish installing it. Regards
> On May 29, 2020, at 12:46 PM, Dario Faggioli <dfaggioli@suse.com> wrote: > > So, > > I felt like providing some additional thoughts about this series, from > a release point of view (adding Paul). > > Timing is *beyond tight* so if this series, entirely or partly, has any > chance to go in, it would be through some form of exception, which of > course comes with some risks, etc. > > I did work hard to submit the full series, because I wanted people to > be able to see the complete solution. However, I think the series > itself can be logically split in two parts. > > Basically, if we just consider patches 1 and 4 we will end up, right > after boot, with a system that has smaller runqueues. They will most > likely be balanced in terms of how many CPUs each one has, so a good > setup. This will likely (actual differences seems to depend *quite a > bit* on the actual workload) be an improvement for very large systems. Fundamentally, I feel like the reason we have the feature freeze is exactly to have to avoid questions like this. Something very much like patch 4 was posted before the last posting date; patches 1-4 received R-b’s before the feature freeze. I think they should probably go in. The rebalancing patches I’m inclined to say should wait until they’ve had a bit more time to be thought about. -George