Message ID | 20170628001130.GB3721@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 6/27/2017 6:11 PM, Paul E. McKenney wrote: > On Tue, Jun 27, 2017 at 04:32:09PM -0600, Jeffrey Hugo wrote: >> On 6/22/2017 9:34 PM, Paul E. McKenney wrote: >>> On Wed, Jun 21, 2017 at 09:18:53AM -0700, Paul E. McKenney wrote: >>>> No worries, and I am very much looking forward to seeing the results of >>>> your testing. >>> >>> And please see below for an updated patch based on LKML review and >>> more intensive testing. >>> >> >> I spent some time on this today. It didn't go as I expected. I >> validated the issue is reproducible as before on 4.11 and 4.12 rcs 1 >> through 4. However, the version of stress-ng that I was using ran >> into constant errors starting with rc5, making it nearly impossible >> to make progress toward reproduction. Upgrading stress-ng to tip >> fixes the issue, however, I've still been unable to repro the issue. >> >> Its my unfounded suspicion that something went in between rc4 and >> rc5 which changed the timing, and didn't actually fix the issue. I >> will run the test overnight for 5 hours to try to repro. >> >> The patch you sent appears to be based on linux-next, and appears to >> have a number of dependencies which prevent it from cleanly applying >> on anything current that I'm able to repro on at this time. Do you >> want to provide a rebased version of the patch which applies to say >> 4.11? I could easily test that and report back. > > Here is a very lightly tested backport to v4.11. > Works for me. Always reproduced the lockup within 2 minutes on stock 4.11. With the change applied, I was able to test for 2 hours in the same conditions, and 4 hours with the full system and not encounter an issue. Feel free to add: Tested-by: Jeffrey Hugo <jhugo@codeaurora.org> I'm going to go back to 4.12-rc5 and see if I can get either repro the issue, or identify what changed. Hopefully I can get to linux-next and double check the original version of the change as well.
On Thu, Jun 29, 2017 at 10:29:12AM -0600, Jeffrey Hugo wrote: > On 6/27/2017 6:11 PM, Paul E. McKenney wrote: > >On Tue, Jun 27, 2017 at 04:32:09PM -0600, Jeffrey Hugo wrote: > >>On 6/22/2017 9:34 PM, Paul E. McKenney wrote: > >>>On Wed, Jun 21, 2017 at 09:18:53AM -0700, Paul E. McKenney wrote: > >>>>No worries, and I am very much looking forward to seeing the results of > >>>>your testing. > >>> > >>>And please see below for an updated patch based on LKML review and > >>>more intensive testing. > >>> > >> > >>I spent some time on this today. It didn't go as I expected. I > >>validated the issue is reproducible as before on 4.11 and 4.12 rcs 1 > >>through 4. However, the version of stress-ng that I was using ran > >>into constant errors starting with rc5, making it nearly impossible > >>to make progress toward reproduction. Upgrading stress-ng to tip > >>fixes the issue, however, I've still been unable to repro the issue. > >> > >>Its my unfounded suspicion that something went in between rc4 and > >>rc5 which changed the timing, and didn't actually fix the issue. I > >>will run the test overnight for 5 hours to try to repro. > >> > >>The patch you sent appears to be based on linux-next, and appears to > >>have a number of dependencies which prevent it from cleanly applying > >>on anything current that I'm able to repro on at this time. Do you > >>want to provide a rebased version of the patch which applies to say > >>4.11? I could easily test that and report back. > > > >Here is a very lightly tested backport to v4.11. > > > > Works for me. Always reproduced the lockup within 2 minutes on stock > 4.11. With the change applied, I was able to test for 2 hours in > the same conditions, and 4 hours with the full system and not > encounter an issue. > > Feel free to add: > Tested-by: Jeffrey Hugo <jhugo@codeaurora.org> Applied, thank you! > I'm going to go back to 4.12-rc5 and see if I can get either repro > the issue, or identify what changed. Hopefully I can get to > linux-next and double check the original version of the change as > well. Looking forward to hearing what you find! Thanx, Paul
On 6/29/2017 6:18 PM, Paul E. McKenney wrote: > On Thu, Jun 29, 2017 at 10:29:12AM -0600, Jeffrey Hugo wrote: >> On 6/27/2017 6:11 PM, Paul E. McKenney wrote: >>> On Tue, Jun 27, 2017 at 04:32:09PM -0600, Jeffrey Hugo wrote: >>>> On 6/22/2017 9:34 PM, Paul E. McKenney wrote: >>>>> On Wed, Jun 21, 2017 at 09:18:53AM -0700, Paul E. McKenney wrote: >>>>>> No worries, and I am very much looking forward to seeing the results of >>>>>> your testing. >>>>> >>>>> And please see below for an updated patch based on LKML review and >>>>> more intensive testing. >>>>> >>>> >>>> I spent some time on this today. It didn't go as I expected. I >>>> validated the issue is reproducible as before on 4.11 and 4.12 rcs 1 >>>> through 4. However, the version of stress-ng that I was using ran >>>> into constant errors starting with rc5, making it nearly impossible >>>> to make progress toward reproduction. Upgrading stress-ng to tip >>>> fixes the issue, however, I've still been unable to repro the issue. >>>> >>>> Its my unfounded suspicion that something went in between rc4 and >>>> rc5 which changed the timing, and didn't actually fix the issue. I >>>> will run the test overnight for 5 hours to try to repro. >>>> >>>> The patch you sent appears to be based on linux-next, and appears to >>>> have a number of dependencies which prevent it from cleanly applying >>>> on anything current that I'm able to repro on at this time. Do you >>>> want to provide a rebased version of the patch which applies to say >>>> 4.11? I could easily test that and report back. >>> >>> Here is a very lightly tested backport to v4.11. >>> >> >> Works for me. Always reproduced the lockup within 2 minutes on stock >> 4.11. With the change applied, I was able to test for 2 hours in >> the same conditions, and 4 hours with the full system and not >> encounter an issue. >> >> Feel free to add: >> Tested-by: Jeffrey Hugo <jhugo@codeaurora.org> > > Applied, thank you! > >> I'm going to go back to 4.12-rc5 and see if I can get either repro >> the issue, or identify what changed. Hopefully I can get to >> linux-next and double check the original version of the change as >> well. > > Looking forward to hearing what you find! > > Thanx, Paul > According to git bisect, the following is what "changed" commit 9d0eb4624601ac978b9e89be4aeadbd51ab2c830 Merge: 5faab9e 9bc1f09 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun Jun 11 11:07:25 2017 -0700 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Bug fixes (ARM, s390, x86)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: async_pf: avoid async pf injection when in guest mode KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation arm: KVM: Allow unaligned accesses at HYP arm64: KVM: Allow unaligned accesses at EL2 arm64: KVM: Preserve RES1 bits in SCTLR_EL2 KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages KVM: nVMX: Fix exception injection kvm: async_pf: fix rcu_irq_enter() with irqs enabled KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction KVM: s390: fix ais handling vs cpu model KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration Nothing really stands out to me which would "fix" the issue.
On Sun, Aug 20, 2017 at 01:31:01PM -0600, Jeffrey Hugo wrote: > On 6/29/2017 6:18 PM, Paul E. McKenney wrote: > >On Thu, Jun 29, 2017 at 10:29:12AM -0600, Jeffrey Hugo wrote: > >>On 6/27/2017 6:11 PM, Paul E. McKenney wrote: > >>>On Tue, Jun 27, 2017 at 04:32:09PM -0600, Jeffrey Hugo wrote: > >>>>On 6/22/2017 9:34 PM, Paul E. McKenney wrote: > >>>>>On Wed, Jun 21, 2017 at 09:18:53AM -0700, Paul E. McKenney wrote: > >>>>>>No worries, and I am very much looking forward to seeing the results of > >>>>>>your testing. > >>>>> > >>>>>And please see below for an updated patch based on LKML review and > >>>>>more intensive testing. > >>>>> > >>>> > >>>>I spent some time on this today. It didn't go as I expected. I > >>>>validated the issue is reproducible as before on 4.11 and 4.12 rcs 1 > >>>>through 4. However, the version of stress-ng that I was using ran > >>>>into constant errors starting with rc5, making it nearly impossible > >>>>to make progress toward reproduction. Upgrading stress-ng to tip > >>>>fixes the issue, however, I've still been unable to repro the issue. > >>>> > >>>>Its my unfounded suspicion that something went in between rc4 and > >>>>rc5 which changed the timing, and didn't actually fix the issue. I > >>>>will run the test overnight for 5 hours to try to repro. > >>>> > >>>>The patch you sent appears to be based on linux-next, and appears to > >>>>have a number of dependencies which prevent it from cleanly applying > >>>>on anything current that I'm able to repro on at this time. Do you > >>>>want to provide a rebased version of the patch which applies to say > >>>>4.11? I could easily test that and report back. > >>> > >>>Here is a very lightly tested backport to v4.11. > >>> > >> > >>Works for me. Always reproduced the lockup within 2 minutes on stock > >>4.11. With the change applied, I was able to test for 2 hours in > >>the same conditions, and 4 hours with the full system and not > >>encounter an issue. > >> > >>Feel free to add: > >>Tested-by: Jeffrey Hugo <jhugo@codeaurora.org> > > > >Applied, thank you! > > > >>I'm going to go back to 4.12-rc5 and see if I can get either repro > >>the issue, or identify what changed. Hopefully I can get to > >>linux-next and double check the original version of the change as > >>well. > > > >Looking forward to hearing what you find! > > > > Thanx, Paul > > > > According to git bisect, the following is what "changed" > > commit 9d0eb4624601ac978b9e89be4aeadbd51ab2c830 > Merge: 5faab9e 9bc1f09 > Author: Linus Torvalds <torvalds@linux-foundation.org> > Date: Sun Jun 11 11:07:25 2017 -0700 > > Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm > > Pull KVM fixes from Paolo Bonzini: > "Bug fixes (ARM, s390, x86)" > > * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: > KVM: async_pf: avoid async pf injection when in guest mode > KVM: cpuid: Fix read/write out-of-bounds vulnerability in > cpuid emulation > arm: KVM: Allow unaligned accesses at HYP > arm64: KVM: Allow unaligned accesses at EL2 > arm64: KVM: Preserve RES1 bits in SCTLR_EL2 > KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages > KVM: nVMX: Fix exception injection > kvm: async_pf: fix rcu_irq_enter() with irqs enabled > KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction > KVM: s390: fix ais handling vs cpu model > KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration > > Nothing really stands out to me which would "fix" the issue. My guess would be an undo of the change that provoked the problem in the first place. Did you try bisecting within the above group of commits? Either way, CCing Paolo for his thoughts? Thanx, Paul
On 20/08/2017 22:56, Paul E. McKenney wrote: >> KVM: async_pf: avoid async pf injection when in guest mode >> KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation >> arm: KVM: Allow unaligned accesses at HYP >> arm64: KVM: Allow unaligned accesses at EL2 >> arm64: KVM: Preserve RES1 bits in SCTLR_EL2 >> KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages >> KVM: nVMX: Fix exception injection >> kvm: async_pf: fix rcu_irq_enter() with irqs enabled >> KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction >> KVM: s390: fix ais handling vs cpu model >> KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration >> >> Nothing really stands out to me which would "fix" the issue. > > My guess would be an undo of the change that provoked the problem > in the first place. Did you try bisecting within the above group > of commits? > > Either way, CCing Paolo for his thoughts? There is "kvm: async_pf: fix rcu_irq_enter() with irqs enabled", but it would have caused splats, not deadlocks. If you are using nested virtualization, "KVM: async_pf: avoid async pf injection when in guest mode" can be a wildcard, but only if you have memory pressure. My bet is still on the former changing the timing just a little bit. Paolo
On 8/22/2017 10:12 AM, Paolo Bonzini wrote: > On 20/08/2017 22:56, Paul E. McKenney wrote: >>> KVM: async_pf: avoid async pf injection when in guest mode >>> KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation >>> arm: KVM: Allow unaligned accesses at HYP >>> arm64: KVM: Allow unaligned accesses at EL2 >>> arm64: KVM: Preserve RES1 bits in SCTLR_EL2 >>> KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages >>> KVM: nVMX: Fix exception injection >>> kvm: async_pf: fix rcu_irq_enter() with irqs enabled >>> KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction >>> KVM: s390: fix ais handling vs cpu model >>> KVM: arm/arm64: Fix isues with GICv2 on GICv3 migration >>> >>> Nothing really stands out to me which would "fix" the issue. >> >> My guess would be an undo of the change that provoked the problem >> in the first place. Did you try bisecting within the above group >> of commits? >> >> Either way, CCing Paolo for his thoughts? > > There is "kvm: async_pf: fix rcu_irq_enter() with irqs enabled", but it > would have caused splats, not deadlocks. > > If you are using nested virtualization, "KVM: async_pf: avoid async pf > injection when in guest mode" can be a wildcard, but only if you have > memory pressure. > > My bet is still on the former changing the timing just a little bit. > > Paolo > I'm sorry, I must have done the bisect incorrectly. I attempted to bisect the KVM changes from the merge, but was seeing that the issue didn't repro with any of them. I double checked the merge commit, and found it did not introduce a "fix". I redid the bisect, and it identified the following change this time. I double checked that reverting the change reintroduces the deadlock, and cherry-picking the change onto 4.12-rc4 (known to exhibit the issue) causes the issue to disappear. I'm pretty sure (knock on wood) that the bisect result is actually correct this time. commit 6460495709aeb651896bc8e5c134b2e4ca7d34a8 Author: James Wang <jnwang@suse.com> Date: Thu Jun 8 14:52:51 2017 +0800 Fix loop device flush before configure v3 While installing SLES-12 (based on v4.4), I found that the installer will stall for 60+ seconds during LVM disk scan. The root cause was determined to be the removal of a bound device check in loop_flush() by commit b5dd2f6047ca ("block: loop: improve performance via blk-mq"). Restoring this check, examining ->lo_state as set by loop_set_fd() eliminates the bad behavior. Test method: modprobe loop max_loop=64 dd if=/dev/zero of=disk bs=512 count=200K for((i=0;i<4;i++))do losetup -f disk; done mkfs.ext4 -F /dev/loop0 for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done for f in `ls /dev/loop[0-9]*|sort`; do \ echo $f; dd if=$f of=/dev/null bs=512 count=1; \ done Test output: stock patched /dev/loop0 18.1217e-05 8.3842e-05 /dev/loop1 6.1114e-05 0.000147979 /dev/loop10 0.414701 0.000116564 /dev/loop11 0.7474 6.7942e-05 /dev/loop12 0.747986 8.9082e-05 /dev/loop13 0.746532 7.4799e-05 /dev/loop14 0.480041 9.3926e-05 /dev/loop15 1.26453 7.2522e-05 Note that from loop10 onward, the device is not mounted, yet the stock kernel consumes several orders of magnitude more wall time than it does for a mounted device. (Thanks for Mike Galbraith <efault@gmx.de>, give a changelog review.) Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: James Wang <jnwang@suse.com> Fixes: b5dd2f6047ca ("block: loop: improve performance via blk-mq") Signed-off-by: Jens Axboe <axboe@fb.com> Considering the original analysis of the issue, it seems plausible that this change could be fixing it.
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index de88b33c0974..183d69438776 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -295,6 +295,7 @@ void rcu_bh_qs(void); void rcu_check_callbacks(int user); void rcu_report_dead(unsigned int cpu); void rcu_cpu_starting(unsigned int cpu); +void rcutree_migrate_callbacks(int cpu); #ifndef CONFIG_TINY_RCU void rcu_end_inkernel_boot(void); diff --git a/kernel/cpu.c b/kernel/cpu.c index 37b223e4fc05..21be6ab54ea2 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -729,6 +729,7 @@ static int takedown_cpu(unsigned int cpu) __cpu_die(cpu); tick_cleanup_dead_cpu(cpu); + rcutree_migrate_callbacks(cpu); return 0; } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 50fee7689e71..63206f81574a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2636,114 +2636,6 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp) } /* - * Send the specified CPU's RCU callbacks to the orphanage. The - * specified CPU must be offline, and the caller must hold the - * ->orphan_lock. - */ -static void -rcu_send_cbs_to_orphanage(int cpu, struct rcu_state *rsp, - struct rcu_node *rnp, struct rcu_data *rdp) -{ - /* No-CBs CPUs do not have orphanable callbacks. */ - if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || rcu_is_nocb_cpu(rdp->cpu)) - return; - - /* - * Orphan the callbacks. First adjust the counts. This is safe - * because _rcu_barrier() excludes CPU-hotplug operations, so it - * cannot be running now. Thus no memory barrier is required. - */ - if (rdp->nxtlist != NULL) { - rsp->qlen_lazy += rdp->qlen_lazy; - rsp->qlen += rdp->qlen; - rdp->n_cbs_orphaned += rdp->qlen; - rdp->qlen_lazy = 0; - WRITE_ONCE(rdp->qlen, 0); - } - - /* - * Next, move those callbacks still needing a grace period to - * the orphanage, where some other CPU will pick them up. - * Some of the callbacks might have gone partway through a grace - * period, but that is too bad. They get to start over because we - * cannot assume that grace periods are synchronized across CPUs. - * We don't bother updating the ->nxttail[] array yet, instead - * we just reset the whole thing later on. - */ - if (*rdp->nxttail[RCU_DONE_TAIL] != NULL) { - *rsp->orphan_nxttail = *rdp->nxttail[RCU_DONE_TAIL]; - rsp->orphan_nxttail = rdp->nxttail[RCU_NEXT_TAIL]; - *rdp->nxttail[RCU_DONE_TAIL] = NULL; - } - - /* - * Then move the ready-to-invoke callbacks to the orphanage, - * where some other CPU will pick them up. These will not be - * required to pass though another grace period: They are done. - */ - if (rdp->nxtlist != NULL) { - *rsp->orphan_donetail = rdp->nxtlist; - rsp->orphan_donetail = rdp->nxttail[RCU_DONE_TAIL]; - } - - /* - * Finally, initialize the rcu_data structure's list to empty and - * disallow further callbacks on this CPU. - */ - init_callback_list(rdp); - rdp->nxttail[RCU_NEXT_TAIL] = NULL; -} - -/* - * Adopt the RCU callbacks from the specified rcu_state structure's - * orphanage. The caller must hold the ->orphan_lock. - */ -static void rcu_adopt_orphan_cbs(struct rcu_state *rsp, unsigned long flags) -{ - int i; - struct rcu_data *rdp = raw_cpu_ptr(rsp->rda); - - /* No-CBs CPUs are handled specially. */ - if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || - rcu_nocb_adopt_orphan_cbs(rsp, rdp, flags)) - return; - - /* Do the accounting first. */ - rdp->qlen_lazy += rsp->qlen_lazy; - rdp->qlen += rsp->qlen; - rdp->n_cbs_adopted += rsp->qlen; - if (rsp->qlen_lazy != rsp->qlen) - rcu_idle_count_callbacks_posted(); - rsp->qlen_lazy = 0; - rsp->qlen = 0; - - /* - * We do not need a memory barrier here because the only way we - * can get here if there is an rcu_barrier() in flight is if - * we are the task doing the rcu_barrier(). - */ - - /* First adopt the ready-to-invoke callbacks. */ - if (rsp->orphan_donelist != NULL) { - *rsp->orphan_donetail = *rdp->nxttail[RCU_DONE_TAIL]; - *rdp->nxttail[RCU_DONE_TAIL] = rsp->orphan_donelist; - for (i = RCU_NEXT_SIZE - 1; i >= RCU_DONE_TAIL; i--) - if (rdp->nxttail[i] == rdp->nxttail[RCU_DONE_TAIL]) - rdp->nxttail[i] = rsp->orphan_donetail; - rsp->orphan_donelist = NULL; - rsp->orphan_donetail = &rsp->orphan_donelist; - } - - /* And then adopt the callbacks that still need a grace period. */ - if (rsp->orphan_nxtlist != NULL) { - *rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_nxtlist; - rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_nxttail; - rsp->orphan_nxtlist = NULL; - rsp->orphan_nxttail = &rsp->orphan_nxtlist; - } -} - -/* * Trace the fact that this CPU is going offline. */ static void rcu_cleanup_dying_cpu(struct rcu_state *rsp) @@ -2805,14 +2697,12 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf) /* * The CPU has been completely removed, and some other CPU is reporting - * this fact from process context. Do the remainder of the cleanup, - * including orphaning the outgoing CPU's RCU callbacks, and also - * adopting them. There can only be one CPU hotplug operation at a time, - * so no other CPU can be attempting to update rcu_cpu_kthread_task. + * this fact from process context. Do the remainder of the cleanup. + * There can only be one CPU hotplug operation at a time, so no need for + * explicit locking. */ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp) { - unsigned long flags; struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ @@ -2821,16 +2711,6 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp) /* Adjust any no-longer-needed kthreads. */ rcu_boost_kthread_setaffinity(rnp, -1); - - /* Orphan the dead CPU's callbacks, and adopt them if appropriate. */ - raw_spin_lock_irqsave(&rsp->orphan_lock, flags); - rcu_send_cbs_to_orphanage(cpu, rsp, rnp, rdp); - rcu_adopt_orphan_cbs(rsp, flags); - raw_spin_unlock_irqrestore(&rsp->orphan_lock, flags); - - WARN_ONCE(rdp->qlen != 0 || rdp->nxtlist != NULL, - "rcu_cleanup_dead_cpu: Callbacks on offline CPU %d: qlen=%lu, nxtlist=%p\n", - cpu, rdp->qlen, rdp->nxtlist); } /* @@ -4011,6 +3891,140 @@ void rcu_report_dead(unsigned int cpu) for_each_rcu_flavor(rsp) rcu_cleanup_dying_idle_cpu(cpu, rsp); } + +/* + * Send the specified CPU's RCU callbacks to the orphanage. The + * specified CPU must be offline, and the caller must hold the + * ->orphan_lock. + */ +static void +rcu_send_cbs_to_orphanage(int cpu, struct rcu_state *rsp, + struct rcu_node *rnp, struct rcu_data *rdp) +{ + /* No-CBs CPUs do not have orphanable callbacks. */ + if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || rcu_is_nocb_cpu(rdp->cpu)) + return; + + /* + * Orphan the callbacks. First adjust the counts. This is safe + * because _rcu_barrier() excludes CPU-hotplug operations, so it + * cannot be running now. Thus no memory barrier is required. + */ + if (rdp->nxtlist != NULL) { + rsp->qlen_lazy += rdp->qlen_lazy; + rsp->qlen += rdp->qlen; + rdp->n_cbs_orphaned += rdp->qlen; + rdp->qlen_lazy = 0; + WRITE_ONCE(rdp->qlen, 0); + } + + /* + * Next, move those callbacks still needing a grace period to + * the orphanage, where some other CPU will pick them up. + * Some of the callbacks might have gone partway through a grace + * period, but that is too bad. They get to start over because we + * cannot assume that grace periods are synchronized across CPUs. + * We don't bother updating the ->nxttail[] array yet, instead + * we just reset the whole thing later on. + */ + if (*rdp->nxttail[RCU_DONE_TAIL] != NULL) { + *rsp->orphan_nxttail = *rdp->nxttail[RCU_DONE_TAIL]; + rsp->orphan_nxttail = rdp->nxttail[RCU_NEXT_TAIL]; + *rdp->nxttail[RCU_DONE_TAIL] = NULL; + } + + /* + * Then move the ready-to-invoke callbacks to the orphanage, + * where some other CPU will pick them up. These will not be + * required to pass though another grace period: They are done. + */ + if (rdp->nxtlist != NULL) { + *rsp->orphan_donetail = rdp->nxtlist; + rsp->orphan_donetail = rdp->nxttail[RCU_DONE_TAIL]; + } + + /* + * Finally, initialize the rcu_data structure's list to empty and + * disallow further callbacks on this CPU. + */ + init_callback_list(rdp); + rdp->nxttail[RCU_NEXT_TAIL] = NULL; +} + +/* + * Adopt the RCU callbacks from the specified rcu_state structure's + * orphanage. The caller must hold the ->orphan_lock. + */ +static void rcu_adopt_orphan_cbs(struct rcu_state *rsp, unsigned long flags) +{ + int i; + struct rcu_data *rdp = raw_cpu_ptr(rsp->rda); + + /* No-CBs CPUs are handled specially. */ + if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || + rcu_nocb_adopt_orphan_cbs(rsp, rdp, flags)) + return; + + /* Do the accounting first. */ + rdp->qlen_lazy += rsp->qlen_lazy; + rdp->qlen += rsp->qlen; + rdp->n_cbs_adopted += rsp->qlen; + if (rsp->qlen_lazy != rsp->qlen) + rcu_idle_count_callbacks_posted(); + rsp->qlen_lazy = 0; + rsp->qlen = 0; + + /* + * We do not need a memory barrier here because the only way we + * can get here if there is an rcu_barrier() in flight is if + * we are the task doing the rcu_barrier(). + */ + + /* First adopt the ready-to-invoke callbacks. */ + if (rsp->orphan_donelist != NULL) { + *rsp->orphan_donetail = *rdp->nxttail[RCU_DONE_TAIL]; + *rdp->nxttail[RCU_DONE_TAIL] = rsp->orphan_donelist; + for (i = RCU_NEXT_SIZE - 1; i >= RCU_DONE_TAIL; i--) + if (rdp->nxttail[i] == rdp->nxttail[RCU_DONE_TAIL]) + rdp->nxttail[i] = rsp->orphan_donetail; + rsp->orphan_donelist = NULL; + rsp->orphan_donetail = &rsp->orphan_donelist; + } + + /* And then adopt the callbacks that still need a grace period. */ + if (rsp->orphan_nxtlist != NULL) { + *rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_nxtlist; + rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_nxttail; + rsp->orphan_nxtlist = NULL; + rsp->orphan_nxttail = &rsp->orphan_nxtlist; + } +} + +/* Orphan the dead CPU's callbacks, and then adopt them. */ +static void rcu_migrate_callbacks(int cpu, struct rcu_state *rsp) +{ + unsigned long flags; + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu); + struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ + + raw_spin_lock_irqsave(&rsp->orphan_lock, flags); + rcu_send_cbs_to_orphanage(cpu, rsp, rnp, rdp); + rcu_adopt_orphan_cbs(rsp, flags); + raw_spin_unlock_irqrestore(&rsp->orphan_lock, flags); +} + +/* + * The outgoing CPU has just passed through the dying-idle state, + * and we are being invoked from the CPU that was IPIed to continue the + * offline operation. We need to migrate the outgoing CPU's callbacks. + */ +void rcutree_migrate_callbacks(int cpu) +{ + struct rcu_state *rsp; + + for_each_rcu_flavor(rsp) + rcu_migrate_callbacks(cpu, rsp); +} #endif static int rcu_pm_notify(struct notifier_block *self,