diff mbox series

[v3] x86/cpu: Sync any remaining RCU callbacks before CPU up/down

Message ID 1583336008-10123-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State Superseded
Headers show
Series [v3] x86/cpu: Sync any remaining RCU callbacks before CPU up/down | expand

Commit Message

Igor Druzhinin March 4, 2020, 3:33 p.m. UTC
During CPU down operation RCU callbacks are scheduled to finish
off some actions later as soon as CPU is fully dead (the same applies
to CPU up operation in case error path is taken). If in the same grace
period another CPU up operation is performed on the same CPU, RCU callback
will be called later on a CPU in a potentially wrong (already up again
instead of still being down) state leading to eventual state inconsistency
and/or crash.

In order to avoid it - flush RCU callbacks explicitly before starting the
next CPU up/down operation.

Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
This got discovered trying to resume PV shim with multiple vCPUs on AMD
machine (where park_offline_cpus == 0). RCU callback responsible for
freeing percpu area on CPU offline got finally called after CPU went
online again as the guest performed regular vCPU offline/online operations
on resume.

Note: this patch requires RCU series v3 from Juergen to be applied -
https://lists.xenproject.org/archives/html/xen-devel/2020-03/msg00200.html

v2: changed rcu_barrier() position, updated description
v3: moved rcu_barrier() to common cpu_up/cpu_down code to cover more cases
---
 xen/arch/x86/acpi/power.c | 1 -
 xen/arch/x86/sysctl.c     | 8 --------
 xen/common/cpu.c          | 2 ++
 3 files changed, 2 insertions(+), 9 deletions(-)

Comments

Jan Beulich March 6, 2020, 9:43 a.m. UTC | #1
On 04.03.2020 16:33, Igor Druzhinin wrote:
> --- a/xen/arch/x86/acpi/power.c
> +++ b/xen/arch/x86/acpi/power.c
> @@ -305,7 +305,6 @@ static int enter_state(u32 state)
>      cpufreq_add_cpu(0);
>  
>   enable_cpu:
> -    rcu_barrier();
>      mtrr_aps_sync_begin();
>      enable_nonboot_cpus();
>      mtrr_aps_sync_end();

I take it you remove the invocation here because of being redundant
with the cpu_up() in enable_nonboot_cpus(). Is this safe / correct
in all cases? For one, it's not obvious to me that
mtrr_aps_sync_begin() couldn't rely on RCU syncing to have happened.
And then enable_nonboot_cpus() may not call cpu_up() at all,
because of the park_offline_cpus-based early loop continuation in
the function.

> --- a/xen/arch/x86/sysctl.c
> +++ b/xen/arch/x86/sysctl.c
> @@ -85,11 +85,7 @@ long cpu_up_helper(void *data)
>      int ret = cpu_up(cpu);
>  
>      if ( ret == -EBUSY )
> -    {
> -        /* On EBUSY, flush RCU work and have one more go. */
> -        rcu_barrier();
>          ret = cpu_up(cpu);
> -    }
>  
>      if ( !ret && !opt_smt &&
>           cpu_data[cpu].compute_unit_id == INVALID_CUID &&
> @@ -110,11 +106,7 @@ long cpu_down_helper(void *data)
>      int cpu = (unsigned long)data;
>      int ret = cpu_down(cpu);
>      if ( ret == -EBUSY )
> -    {
> -        /* On EBUSY, flush RCU work and have one more go. */
> -        rcu_barrier();
>          ret = cpu_down(cpu);
> -    }

In both cases I think the comments would better be retained (in
an adjusted shape).

Jan
Igor Druzhinin March 6, 2020, 1:10 p.m. UTC | #2
On 06/03/2020 09:43, Jan Beulich wrote:
> On 04.03.2020 16:33, Igor Druzhinin wrote:
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -305,7 +305,6 @@ static int enter_state(u32 state)
>>      cpufreq_add_cpu(0);
>>  
>>   enable_cpu:
>> -    rcu_barrier();
>>      mtrr_aps_sync_begin();
>>      enable_nonboot_cpus();
>>      mtrr_aps_sync_end();
> 
> I take it you remove the invocation here because of being redundant
> with the cpu_up() in enable_nonboot_cpus(). Is this safe / correct
> in all cases? For one, it's not obvious to me that
> mtrr_aps_sync_begin() couldn't rely on RCU syncing to have happened.

From the history (9d9af7dca878), rcu_barrier there was introduce for
exactly same reason I put it into cpu_up/down. I'm pretty certain
it's safe as there is no other obvious reason to have it here.

The only function that could affect mtrr_aps_sync_begin() is
mtrr_aps_sync_end() and that one is called only below.

> And then enable_nonboot_cpus() may not call cpu_up() at all,
> because of the park_offline_cpus-based early loop continuation in
> the function.

I can't see how that is related.

>> --- a/xen/arch/x86/sysctl.c
>> +++ b/xen/arch/x86/sysctl.c
>> @@ -85,11 +85,7 @@ long cpu_up_helper(void *data)
>>      int ret = cpu_up(cpu);
>>  
>>      if ( ret == -EBUSY )
>> -    {
>> -        /* On EBUSY, flush RCU work and have one more go. */
>> -        rcu_barrier();
>>          ret = cpu_up(cpu);
>> -    }
>>  
>>      if ( !ret && !opt_smt &&
>>           cpu_data[cpu].compute_unit_id == INVALID_CUID &&
>> @@ -110,11 +106,7 @@ long cpu_down_helper(void *data)
>>      int cpu = (unsigned long)data;
>>      int ret = cpu_down(cpu);
>>      if ( ret == -EBUSY )
>> -    {
>> -        /* On EBUSY, flush RCU work and have one more go. */
>> -        rcu_barrier();
>>          ret = cpu_down(cpu);
>> -    }
> 
> In both cases I think the comments would better be retained (in
> an adjusted shape).

Ok.

Igor
Jan Beulich March 6, 2020, 1:28 p.m. UTC | #3
On 06.03.2020 14:10, Igor Druzhinin wrote:
> On 06/03/2020 09:43, Jan Beulich wrote:
>> On 04.03.2020 16:33, Igor Druzhinin wrote:
>>> --- a/xen/arch/x86/acpi/power.c
>>> +++ b/xen/arch/x86/acpi/power.c
>>> @@ -305,7 +305,6 @@ static int enter_state(u32 state)
>>>      cpufreq_add_cpu(0);
>>>  
>>>   enable_cpu:
>>> -    rcu_barrier();
>>>      mtrr_aps_sync_begin();
>>>      enable_nonboot_cpus();
>>>      mtrr_aps_sync_end();
>>
>> I take it you remove the invocation here because of being redundant
>> with the cpu_up() in enable_nonboot_cpus(). Is this safe / correct
>> in all cases? For one, it's not obvious to me that
>> mtrr_aps_sync_begin() couldn't rely on RCU syncing to have happened.
> 
> From the history (9d9af7dca878), rcu_barrier there was introduce for
> exactly same reason I put it into cpu_up/down.

Oh, I didn't go do archeology here. This is then certainly fine.

> I'm pretty certain
> it's safe as there is no other obvious reason to have it here.
> 
> The only function that could affect mtrr_aps_sync_begin() is
> mtrr_aps_sync_end() and that one is called only below.
> 
>> And then enable_nonboot_cpus() may not call cpu_up() at all,
>> because of the park_offline_cpus-based early loop continuation in
>> the function.
> 
> I can't see how that is related.

It could be without your observation above, as there could have
been other reasons to have the barrier there (and hence it then
could be a bug if the barrier was removed from an effective
code path, rather than just moved).

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index b5df00b..847c273 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -305,7 +305,6 @@  static int enter_state(u32 state)
     cpufreq_add_cpu(0);
 
  enable_cpu:
-    rcu_barrier();
     mtrr_aps_sync_begin();
     enable_nonboot_cpus();
     mtrr_aps_sync_end();
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 59a3840..b4e86a8 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -85,11 +85,7 @@  long cpu_up_helper(void *data)
     int ret = cpu_up(cpu);
 
     if ( ret == -EBUSY )
-    {
-        /* On EBUSY, flush RCU work and have one more go. */
-        rcu_barrier();
         ret = cpu_up(cpu);
-    }
 
     if ( !ret && !opt_smt &&
          cpu_data[cpu].compute_unit_id == INVALID_CUID &&
@@ -110,11 +106,7 @@  long cpu_down_helper(void *data)
     int cpu = (unsigned long)data;
     int ret = cpu_down(cpu);
     if ( ret == -EBUSY )
-    {
-        /* On EBUSY, flush RCU work and have one more go. */
-        rcu_barrier();
         ret = cpu_down(cpu);
-    }
     return ret;
 }
 
diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 31953f3..1f976db 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -4,6 +4,7 @@ 
 #include <xen/init.h>
 #include <xen/sched.h>
 #include <xen/stop_machine.h>
+#include <xen/rcupdate.h>
 
 unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
 #ifndef nr_cpumask_bits
@@ -53,6 +54,7 @@  void put_cpu_maps(void)
 
 void cpu_hotplug_begin(void)
 {
+    rcu_barrier();
     write_lock(&cpu_add_remove_lock);
 }