diff mbox

[v3,1/6] xen, cpupool: correct error handling when removing cpu from cpupool

Message ID 1457023730-10997-2-git-send-email-jgross@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jürgen Groß March 3, 2016, 4:48 p.m. UTC
When schedule_cpu_switch() called from cpupool_unassign_cpu_helper()
returns an error, the domlist_read_lock isn't released again.

As cpu_disable_scheduler() might have changed affinity of some
domains domain_update_node_affinity() must be called for all domains
in the cpupool even in error case.

Even if looking weird it is okay to let the to be removed cpu set in
cpupool_free_cpus in case of an error returned by
cpu_disable_scheduler(). Add a comment explaining the reason for this.

Cc: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

Comments

Jan Beulich March 4, 2016, 9:42 a.m. UTC | #1
>>> On 03.03.16 at 17:48, <JGross@suse.com> wrote:
>      if ( !ret )
>      {
>          ret = schedule_cpu_switch(cpu, NULL);
>          if ( ret )
> -        {
>              cpumask_clear_cpu(cpu, &cpupool_free_cpus);
> -            goto out;
> +        else
> +        {
> +            cpupool_moving_cpu = -1;
> +            cpupool_put(cpupool_cpu_moving);
> +            cpupool_cpu_moving = NULL;
>          }
> -        cpupool_moving_cpu = -1;
> -        cpupool_put(cpupool_cpu_moving);
> -        cpupool_cpu_moving = NULL;
>      }
>  
>      for_each_domain_in_cpupool(d, c)

So this now also eliminates the bypass of the loop the beginning of
which we can still see here - is that really intended, or wouldn't
this better be made conditional upon !ret?

Jan
Jürgen Groß March 4, 2016, 9:54 a.m. UTC | #2
On 04/03/16 10:42, Jan Beulich wrote:
>>>> On 03.03.16 at 17:48, <JGross@suse.com> wrote:
>>      if ( !ret )
>>      {
>>          ret = schedule_cpu_switch(cpu, NULL);
>>          if ( ret )
>> -        {
>>              cpumask_clear_cpu(cpu, &cpupool_free_cpus);
>> -            goto out;
>> +        else
>> +        {
>> +            cpupool_moving_cpu = -1;
>> +            cpupool_put(cpupool_cpu_moving);
>> +            cpupool_cpu_moving = NULL;
>>          }
>> -        cpupool_moving_cpu = -1;
>> -        cpupool_put(cpupool_cpu_moving);
>> -        cpupool_cpu_moving = NULL;
>>      }
>>  
>>      for_each_domain_in_cpupool(d, c)
> 
> So this now also eliminates the bypass of the loop the beginning of
> which we can still see here - is that really intended, or wouldn't
> this better be made conditional upon !ret?

Did you look at the commit message? It is explaining exactly this
behaviour.

Juergen
Jan Beulich March 4, 2016, 10:03 a.m. UTC | #3
>>> On 04.03.16 at 10:54, <JGross@suse.com> wrote:
> On 04/03/16 10:42, Jan Beulich wrote:
>>>>> On 03.03.16 at 17:48, <JGross@suse.com> wrote:
>>>      if ( !ret )
>>>      {
>>>          ret = schedule_cpu_switch(cpu, NULL);
>>>          if ( ret )
>>> -        {
>>>              cpumask_clear_cpu(cpu, &cpupool_free_cpus);
>>> -            goto out;
>>> +        else
>>> +        {
>>> +            cpupool_moving_cpu = -1;
>>> +            cpupool_put(cpupool_cpu_moving);
>>> +            cpupool_cpu_moving = NULL;
>>>          }
>>> -        cpupool_moving_cpu = -1;
>>> -        cpupool_put(cpupool_cpu_moving);
>>> -        cpupool_cpu_moving = NULL;
>>>      }
>>>  
>>>      for_each_domain_in_cpupool(d, c)
>> 
>> So this now also eliminates the bypass of the loop the beginning of
>> which we can still see here - is that really intended, or wouldn't
>> this better be made conditional upon !ret?
> 
> Did you look at the commit message? It is explaining exactly this
> behaviour.

Argh - I had looked, even twice (once before writing the
comment, and once before hitting the send button), but
nevertheless managed to not connect things together.
I'm sorry.

And I think this aspect (and only it) makes this a backport
candidate.

Jan
Dario Faggioli March 8, 2016, 10:46 a.m. UTC | #4
On Thu, 2016-03-03 at 17:48 +0100, Juergen Gross wrote:
> When schedule_cpu_switch() called from cpupool_unassign_cpu_helper()
> returns an error, the domlist_read_lock isn't released again.
> 
> As cpu_disable_scheduler() might have changed affinity of some
> domains domain_update_node_affinity() must be called for all domains
> in the cpupool even in error case.
> 
> Even if looking weird it is okay to let the to be removed cpu set in
> cpupool_free_cpus in case of an error returned by
> cpu_disable_scheduler(). Add a comment explaining the reason for
> this.
> 
> Cc: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>

Thanks and Regards,
Dario
diff mbox

Patch

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 8e7b723..d0189f8 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -312,17 +312,25 @@  static long cpupool_unassign_cpu_helper(void *info)
     rcu_read_lock(&domlist_read_lock);
     ret = cpu_disable_scheduler(cpu);
     cpumask_set_cpu(cpu, &cpupool_free_cpus);
+
+    /*
+     * cpu_disable_scheduler() returning an error doesn't require resetting
+     * cpupool_free_cpus' cpu bit. All error cases should be of temporary
+     * nature and tools will retry the operation. Even if the number of
+     * retries may be limited, the in-between state can easily be repaired
+     * by adding the cpu to the cpupool again.
+     */
     if ( !ret )
     {
         ret = schedule_cpu_switch(cpu, NULL);
         if ( ret )
-        {
             cpumask_clear_cpu(cpu, &cpupool_free_cpus);
-            goto out;
+        else
+        {
+            cpupool_moving_cpu = -1;
+            cpupool_put(cpupool_cpu_moving);
+            cpupool_cpu_moving = NULL;
         }
-        cpupool_moving_cpu = -1;
-        cpupool_put(cpupool_cpu_moving);
-        cpupool_cpu_moving = NULL;
     }
 
     for_each_domain_in_cpupool(d, c)