Message ID | 20200327193023.506-1-jeff.kubascik@dornerworks.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sched/core: Fix bug when moving a domain between cpupools | expand |
Hello, I wanted to follow up on this patch, as I have not seen any responses to it. In my work on the ARINC653 scheduler, I have observed this bug write to memory past the end of a private UNIT structure (and in my case, stomping on a TLSF allocator header) when migrating a domain from an ARINC cpupool to a credit cpupool. This occurs because (a) the private UNIT structure is smaller for the ARINC cpupool and (b) the credit scheduler method csched_aff_cntl does some bit setting/ clearing while the private UNIT pointer still points incorrectly to the ARINC cpupool one. On 3/27/2020 3:30 PM, Jeff Kubascik wrote: > For each UNIT, sched_set_affinity is called before unit->priv is updated > to the new cpupool private UNIT data structure. The issue is > sched_set_affinity will call the adjust_affinity method of the cpupool. > If defined, the new cpupool may use unit->priv (e.g. credit), which at > this point still references the old cpupool private UNIT data structure. > > This change fixes the bug by moving the switch of unit->priv earler in > the function. > > Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com> > --- > Hello, > > I've been working on updating the arinc653 scheduler to support > multicore for a few months now. In the process of testing, I came across > this obscure bug in the core scheduler code that took me a few weeks to > track down. This bug resulted in the credit scheduler writing past the > end of the arinc653 private UNIT data structure into the TLSF allocator > bhdr structure of the adjacent region. This required some deep diving > into the TLSF allocator code to trace the bug back to this point. > > Sincerely, > Jeff Kubascik > --- > xen/common/sched/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c > index 7e8e7d2c39..ea572a345a 100644 > --- a/xen/common/sched/core.c > +++ b/xen/common/sched/core.c > @@ -686,6 +686,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > unsigned int unit_p = new_p; > > unitdata = unit->priv; > + unit->priv = unit_priv[unit_idx]; > > for_each_sched_unit_vcpu ( unit, v ) > { > @@ -707,7 +708,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > */ > spin_unlock_irq(lock); > > - unit->priv = unit_priv[unit_idx]; > if ( !d->is_dying ) > sched_move_irqs(unit); > > Sincerely, Jeff Kubascik
On 27.03.20 20:30, Jeff Kubascik wrote: > For each UNIT, sched_set_affinity is called before unit->priv is updated > to the new cpupool private UNIT data structure. The issue is > sched_set_affinity will call the adjust_affinity method of the cpupool. > If defined, the new cpupool may use unit->priv (e.g. credit), which at > this point still references the old cpupool private UNIT data structure. > > This change fixes the bug by moving the switch of unit->priv earler in > the function. > > Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com> Reviewed-by: Juergen Gross <jgross@suse.com> Juergen
On Wed, 2020-04-15 at 11:08 +0200, Jürgen Groß wrote: > On 27.03.20 20:30, Jeff Kubascik wrote: > > For each UNIT, sched_set_affinity is called before unit->priv is > > updated > > to the new cpupool private UNIT data structure. The issue is > > sched_set_affinity will call the adjust_affinity method of the > > cpupool. > > If defined, the new cpupool may use unit->priv (e.g. credit), which > > at > > this point still references the old cpupool private UNIT data > > structure. > > > > This change fixes the bug by moving the switch of unit->priv earler > > in > > the function. > > > > Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com> > > Reviewed-by: Juergen Gross <jgross@suse.com> > Acked-by: Dario Faggioli <dfaggioli@suse.com> Sorry it took a while. And thanks Juergen for also having a look. Regards
Thank you Juergen and Dario! On 4/16/2020 12:09 PM, Dario Faggioli wrote: > On Wed, 2020-04-15 at 11:08 +0200, Jürgen Groß wrote: >> On 27.03.20 20:30, Jeff Kubascik wrote: >>> For each UNIT, sched_set_affinity is called before unit->priv is >>> updated >>> to the new cpupool private UNIT data structure. The issue is >>> sched_set_affinity will call the adjust_affinity method of the >>> cpupool. >>> If defined, the new cpupool may use unit->priv (e.g. credit), which >>> at >>> this point still references the old cpupool private UNIT data >>> structure. >>> >>> This change fixes the bug by moving the switch of unit->priv earler >>> in >>> the function. >>> >>> Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com> >> >> Reviewed-by: Juergen Gross <jgross@suse.com> >> > Acked-by: Dario Faggioli <dfaggioli@suse.com> > > Sorry it took a while. And thanks Juergen for also having a look. > > Regards > Sincerely, Jeff Kubascik
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 7e8e7d2c39..ea572a345a 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -686,6 +686,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) unsigned int unit_p = new_p; unitdata = unit->priv; + unit->priv = unit_priv[unit_idx]; for_each_sched_unit_vcpu ( unit, v ) { @@ -707,7 +708,6 @@ int sched_move_domain(struct domain *d, struct cpupool *c) */ spin_unlock_irq(lock); - unit->priv = unit_priv[unit_idx]; if ( !d->is_dying ) sched_move_irqs(unit);
For each UNIT, sched_set_affinity is called before unit->priv is updated to the new cpupool private UNIT data structure. The issue is sched_set_affinity will call the adjust_affinity method of the cpupool. If defined, the new cpupool may use unit->priv (e.g. credit), which at this point still references the old cpupool private UNIT data structure. This change fixes the bug by moving the switch of unit->priv earler in the function. Signed-off-by: Jeff Kubascik <jeff.kubascik@dornerworks.com> --- Hello, I've been working on updating the arinc653 scheduler to support multicore for a few months now. In the process of testing, I came across this obscure bug in the core scheduler code that took me a few weeks to track down. This bug resulted in the credit scheduler writing past the end of the arinc653 private UNIT data structure into the TLSF allocator bhdr structure of the adjacent region. This required some deep diving into the TLSF allocator code to trace the bug back to this point. Sincerely, Jeff Kubascik --- xen/common/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)