Message ID | 20200805181303.7822-2-robert.foley@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path | expand |
On 8/5/20 11:12 AM, Robert Foley wrote: > This change removes the implied BQL from the cpu_handle_interrupt, > and cpu_handle_exception paths. This BQL acquire is being pushed > down into the per arch implementation. > > Signed-off-by: Robert Foley <robert.foley@linaro.org> > --- > accel/tcg/cpu-exec.c | 19 +++++++++++-------- > 1 file changed, 11 insertions(+), 8 deletions(-) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 80d0e649b2..8e2bfd97a1 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -517,9 +517,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) > #else > if (replay_exception()) { > CPUClass *cc = CPU_GET_CLASS(cpu); > - qemu_mutex_lock_iothread(); > cc->do_interrupt(cpu); > - qemu_mutex_unlock_iothread(); > cpu->exception_index = -1; > This patch is not bisectable. The removal of the lock here needs to happen at the end, or something. r~
On 05/08/20 21:18, Richard Henderson wrote: > On 8/5/20 11:12 AM, Robert Foley wrote: >> This change removes the implied BQL from the cpu_handle_interrupt, >> and cpu_handle_exception paths. This BQL acquire is being pushed >> down into the per arch implementation. >> >> Signed-off-by: Robert Foley <robert.foley@linaro.org> >> --- >> accel/tcg/cpu-exec.c | 19 +++++++++++-------- >> 1 file changed, 11 insertions(+), 8 deletions(-) >> >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c >> index 80d0e649b2..8e2bfd97a1 100644 >> --- a/accel/tcg/cpu-exec.c >> +++ b/accel/tcg/cpu-exec.c >> @@ -517,9 +517,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) >> #else >> if (replay_exception()) { >> CPUClass *cc = CPU_GET_CLASS(cpu); >> - qemu_mutex_lock_iothread(); >> cc->do_interrupt(cpu); >> - qemu_mutex_unlock_iothread(); >> cpu->exception_index = -1; >> > > This patch is not bisectable. The removal of the lock here needs to happen at > the end, or something. Indeed the series should be structured like this: 1) rename all *_do_interrupt functions to *_do_interrupt_locked 2) add back *_do_interrupt that takes the BQL and calls *_do_interrupt_locked, point ->do_interrupt to it, remove the BQL from cpu-exec.c 3) modify the cpu_mutex and BQL critical sections around ->cpu_exec_interrupt, so that the BQL critical section covers just the call to ->cpu_exec_interrupt. Document which fields are now covered by cpu_mutex. 4/5) same as 1/2 for ->cpu_exec_interrupt Patches 1/2 would be pretty large, but they're trivial to review just by grepping for "->do_interrupt\s*=", and likewise for 4/5. Thanks, Paolo
On Thu, 6 Aug 2020 at 05:22, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 05/08/20 21:18, Richard Henderson wrote: > > On 8/5/20 11:12 AM, Robert Foley wrote: > >> This change removes the implied BQL from the cpu_handle_interrupt, > >> and cpu_handle_exception paths. This BQL acquire is being pushed > >> down into the per arch implementation. > >> > >> Signed-off-by: Robert Foley <robert.foley@linaro.org> > >> --- > >> accel/tcg/cpu-exec.c | 19 +++++++++++-------- > >> 1 file changed, 11 insertions(+), 8 deletions(-) > >> > >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > >> index 80d0e649b2..8e2bfd97a1 100644 > >> --- a/accel/tcg/cpu-exec.c > >> +++ b/accel/tcg/cpu-exec.c > >> @@ -517,9 +517,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) > >> #else > >> if (replay_exception()) { > >> CPUClass *cc = CPU_GET_CLASS(cpu); > >> - qemu_mutex_lock_iothread(); > >> cc->do_interrupt(cpu); > >> - qemu_mutex_unlock_iothread(); > >> cpu->exception_index = -1; > >> > > > > This patch is not bisectable. The removal of the lock here needs to happen at > > the end, or something. > > Indeed the series should be structured like this: > > 1) rename all *_do_interrupt functions to *_do_interrupt_locked > > 2) add back *_do_interrupt that takes the BQL and calls > *_do_interrupt_locked, point ->do_interrupt to it, remove the BQL from > cpu-exec.c > > 3) modify the cpu_mutex and BQL critical sections around > ->cpu_exec_interrupt, so that the BQL critical section covers just the > call to ->cpu_exec_interrupt. Document which fields are now covered by > cpu_mutex. > > 4/5) same as 1/2 for ->cpu_exec_interrupt > > Patches 1/2 would be pretty large, but they're trivial to review just by > grepping for "->do_interrupt\s*=", and likewise for 4/5. > Thanks for the details ! It seems like we will have 3 separate patches for this series, 1/2, 3, and 4/5. We will go in this direction. Thanks, -Rob > Thanks, > > Paolo >
On 06/08/20 18:11, Robert Foley wrote: >> Indeed the series should be structured like this: >> >> 1) rename all *_do_interrupt functions to *_do_interrupt_locked >> >> 2) add back *_do_interrupt that takes the BQL and calls >> *_do_interrupt_locked, point ->do_interrupt to it, remove the BQL from >> cpu-exec.c >> >> 3) modify the cpu_mutex and BQL critical sections around >> ->cpu_exec_interrupt, so that the BQL critical section covers just the >> call to ->cpu_exec_interrupt. Document which fields are now covered by >> cpu_mutex. >> >> 4/5) same as 1/2 for ->cpu_exec_interrupt >> >> Patches 1/2 would be pretty large, but they're trivial to review just by >> grepping for "->do_interrupt\s*=", and likewise for 4/5. >> > > Thanks for the details ! > > It seems like we will have 3 separate patches for this series, 1/2, 3, and 4/5. No, five patches. :) Paolo
The comment around documenting the cpu_mutex fields and critical sections got us thinking and revisiting our locking assumptions in cpu_handle_interrupt. Initially we were thinking that removing the BQL from cpu_handle_interrupt meant that we needed to replace it with the cpu mutex to protect the per cpu data that is accessed like interrupt_request. We are reconsidering this and now thinking that the cpu mutex might not be needed here. BQL is clearly needed to protect the critical section around the call to ->cpu_exec_interrupt. What else is the BQL protecting in cpu_handle_interrupt that we need to consider? Are we missing anything here? It's also worth mentioning that we did give this approach a try. We tried out changes to cpu_handle_interrupt(), dropping the BQL from all but around ->cpu_exec_interrupt, and not using the cpu_mutex at all. It seemed to be functional, passing all the tests that we tried (so far). :) Thanks, -Rob On Thu, 6 Aug 2020 at 12:11, Robert Foley <robert.foley@linaro.org> wrote: > > On Thu, 6 Aug 2020 at 05:22, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > On 05/08/20 21:18, Richard Henderson wrote: > > > On 8/5/20 11:12 AM, Robert Foley wrote: > > >> This change removes the implied BQL from the cpu_handle_interrupt, > > >> and cpu_handle_exception paths. This BQL acquire is being pushed > > >> down into the per arch implementation. > > >> > > >> Signed-off-by: Robert Foley <robert.foley@linaro.org> > > >> --- > > >> accel/tcg/cpu-exec.c | 19 +++++++++++-------- > > >> 1 file changed, 11 insertions(+), 8 deletions(-) > > >> > > >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > >> index 80d0e649b2..8e2bfd97a1 100644 > > >> --- a/accel/tcg/cpu-exec.c > > >> +++ b/accel/tcg/cpu-exec.c > > >> @@ -517,9 +517,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) > > >> #else > > >> if (replay_exception()) { > > >> CPUClass *cc = CPU_GET_CLASS(cpu); > > >> - qemu_mutex_lock_iothread(); > > >> cc->do_interrupt(cpu); > > >> - qemu_mutex_unlock_iothread(); > > >> cpu->exception_index = -1; > > >> > > > > > > This patch is not bisectable. The removal of the lock here needs to happen at > > > the end, or something. > > > > Indeed the series should be structured like this: > > > > 1) rename all *_do_interrupt functions to *_do_interrupt_locked > > > > 2) add back *_do_interrupt that takes the BQL and calls > > *_do_interrupt_locked, point ->do_interrupt to it, remove the BQL from > > cpu-exec.c > > > > 3) modify the cpu_mutex and BQL critical sections around > > ->cpu_exec_interrupt, so that the BQL critical section covers just the > > call to ->cpu_exec_interrupt. Document which fields are now covered by > > cpu_mutex. > > > > 4/5) same as 1/2 for ->cpu_exec_interrupt > > > > Patches 1/2 would be pretty large, but they're trivial to review just by > > grepping for "->do_interrupt\s*=", and likewise for 4/5. > > > > Thanks for the details ! > > It seems like we will have 3 separate patches for this series, 1/2, 3, and 4/5. > > We will go in this direction. > > Thanks, > -Rob > > > Thanks, > > > > Paolo > >
We found cases where a few of the *_cpu_exec_interrupt per arch functions call CPU class's cc->do_interrupt function pointer (for example arm_cpu_exec_interrupt does this). This is an issue because *_cpu_exec_interrupt will hold the BQL across the call to cc->do_interrupt, and *_do_interrupt will also hold the BQL. Most of the arches do not have this issue because they call the *_do_interrupt function for that arch directly, and in those cases we will change to call the function *_do_interrupt_locked. We see a few possible solutions to this: 1) We could go back to the option of conditionalizing the BQL inside these few *_do_interrupt functions, only acquiring the BQL if it is not already held. I counted 3 different arches that directly use the ->do_interrupt function from their *_cpu_exec_interrupt. 2) Another perhaps cleaner option is to add a new cpu class function ->do_interrupt_locked. This lets callers like *_cpu_exec_interrupt call to ->do_interrupt_locked with lock held and solves the issue without resorting to conditional locking. Another benefit we could gain from this approach is to simplify our solution overall by adding a common do_interrupt function. void cpu_common_do_interrupt(CPUState *cs) { CPUClass *cc = CPU_GET_CLASS(cpu); qemu_mutex_lock_iothread(); cc->do_interrupt_locked(cpu); qemu_mutex_unlock_iothread(); } cc->do_interrupt would be set to cpu_common_do_interrupt by default in cpu_class_init. In other words, the base cpu class would handle holding the BQL for us, and we would not need to implement a new *_do_interrupt function for each arch. We are thinking that 2) would be a good option. What are the opinions on these possible solutions? Or are there other solutions that we should consider here? Thanks & Regards, -Rob On Thu, 6 Aug 2020 at 16:04, Robert Foley <robert.foley@linaro.org> wrote: > > The comment around documenting the cpu_mutex fields and critical sections > got us thinking and revisiting our locking assumptions in cpu_handle_interrupt. > > Initially we were thinking that removing the BQL from cpu_handle_interrupt > meant that we needed to replace it with the cpu mutex to protect the per cpu > data that is accessed like interrupt_request. We are reconsidering this and > now thinking that the cpu mutex might not be needed here. > > BQL is clearly needed to protect the critical section around the call to > ->cpu_exec_interrupt. What else is the BQL protecting in cpu_handle_interrupt > that we need to consider? Are we missing anything here? > > It's also worth mentioning that we did give this approach a try. > We tried out changes to cpu_handle_interrupt(), > dropping the BQL from all but around ->cpu_exec_interrupt, and not using the > cpu_mutex at all. It seemed to be functional, passing all the tests that > we tried (so far). :) > > Thanks, > -Rob > > On Thu, 6 Aug 2020 at 12:11, Robert Foley <robert.foley@linaro.org> wrote: > > > > On Thu, 6 Aug 2020 at 05:22, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > > > On 05/08/20 21:18, Richard Henderson wrote: > > > > On 8/5/20 11:12 AM, Robert Foley wrote: > > > >> This change removes the implied BQL from the cpu_handle_interrupt, > > > >> and cpu_handle_exception paths. This BQL acquire is being pushed > > > >> down into the per arch implementation. > > > >> > > > >> Signed-off-by: Robert Foley <robert.foley@linaro.org> > > > >> --- > > > >> accel/tcg/cpu-exec.c | 19 +++++++++++-------- > > > >> 1 file changed, 11 insertions(+), 8 deletions(-) > > > >> > > > >> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > > >> index 80d0e649b2..8e2bfd97a1 100644 > > > >> --- a/accel/tcg/cpu-exec.c > > > >> +++ b/accel/tcg/cpu-exec.c > > > >> @@ -517,9 +517,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) > > > >> #else > > > >> if (replay_exception()) { > > > >> CPUClass *cc = CPU_GET_CLASS(cpu); > > > >> - qemu_mutex_lock_iothread(); > > > >> cc->do_interrupt(cpu); > > > >> - qemu_mutex_unlock_iothread(); > > > >> cpu->exception_index = -1; > > > >> > > > > > > > > This patch is not bisectable. The removal of the lock here needs to happen at > > > > the end, or something. > > > > > > Indeed the series should be structured like this: > > > > > > 1) rename all *_do_interrupt functions to *_do_interrupt_locked > > > > > > 2) add back *_do_interrupt that takes the BQL and calls > > > *_do_interrupt_locked, point ->do_interrupt to it, remove the BQL from > > > cpu-exec.c > > > > > > 3) modify the cpu_mutex and BQL critical sections around > > > ->cpu_exec_interrupt, so that the BQL critical section covers just the > > > call to ->cpu_exec_interrupt. Document which fields are now covered by > > > cpu_mutex. > > > > > > 4/5) same as 1/2 for ->cpu_exec_interrupt > > > > > > Patches 1/2 would be pretty large, but they're trivial to review just by > > > grepping for "->do_interrupt\s*=", and likewise for 4/5. > > > > > > > Thanks for the details ! > > > > It seems like we will have 3 separate patches for this series, 1/2, 3, and 4/5. > > > > We will go in this direction. > > > > Thanks, > > -Rob > > > > > Thanks, > > > > > > Paolo > > >
On 08/08/20 00:18, Robert Foley wrote: > 2) Another perhaps cleaner option is to add a new cpu class function > ->do_interrupt_locked. > This lets callers like *_cpu_exec_interrupt call to ->do_interrupt_locked > with lock held and solves the issue without resorting to conditional locking. > > Another benefit we could gain from this approach is to simplify our solution > overall by adding a common do_interrupt function. > > void cpu_common_do_interrupt(CPUState *cs) > { > CPUClass *cc = CPU_GET_CLASS(cpu); > qemu_mutex_lock_iothread(); > cc->do_interrupt_locked(cpu); > qemu_mutex_unlock_iothread(); > } > cc->do_interrupt would be set to cpu_common_do_interrupt by default > in cpu_class_init. > In other words, the base cpu class would handle holding the BQL for us, > and we would not need to implement a new *_do_interrupt function > for each arch. > > We are thinking that 2) would be a good option. Yes, it is. The only slight complication is that you'd have both ->do_interrupt and ->do_interrupt_locked so you probably should add some consistency check, for example /* * cc->do_interrupt_locked should only be needed if * the class uses cpu_common_do_interrupt. */ assert(cc->do_interrupt == cpu_common_do_interrupt || !cc->do_interrupt_locked); Therefore, a variant is to add ->do_interrupt_locked to ARMCPUClass and CRISCPUClass (target/avr/helper.c can just call avr_cpu_do_interrupt_locked, because that's the only value that cc->do_interrupt can have). Then ARM and CRIS can have a do_interrupt like you wrote above: void arm_do_interrupt(CPUState *cs) { ARMCPUClass *acc = ARM_CPU_GET_CLASS(cs); qemu_mutex_lock_iothread(); acc->do_interrupt_locked(cpu); qemu_mutex_unlock_iothread(); } with a small duplication between ARM and CRIS but on the other hand a simpler definition of the common CPUClass. Paolo
On Sat, 8 Aug 2020 at 08:00, Paolo Bonzini <pbonzini@redhat.com> wrote: > > We are thinking that 2) would be a good option. > > Yes, it is. The only slight complication is that you'd have both > ->do_interrupt and ->do_interrupt_locked so you probably should add some > consistency check, for example > > /* > * cc->do_interrupt_locked should only be needed if > * the class uses cpu_common_do_interrupt. > */ > assert(cc->do_interrupt == cpu_common_do_interrupt || > !cc->do_interrupt_locked); > > Therefore, a variant is to add ->do_interrupt_locked to ARMCPUClass and > CRISCPUClass (target/avr/helper.c can just call > avr_cpu_do_interrupt_locked, because that's the only value that > cc->do_interrupt can have). Then ARM and CRIS can have a do_interrupt > like you wrote above: > > void arm_do_interrupt(CPUState *cs) > { > ARMCPUClass *acc = ARM_CPU_GET_CLASS(cs); > qemu_mutex_lock_iothread(); > acc->do_interrupt_locked(cpu); > qemu_mutex_unlock_iothread(); > } > > with a small duplication between ARM and CRIS but on the other hand a > simpler definition of the common CPUClass. Thanks for all the details! It sounds like a good approach and we will move forward with it. Thanks & Regards, -Rob > > Paolo >
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 80d0e649b2..8e2bfd97a1 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -517,9 +517,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret) #else if (replay_exception()) { CPUClass *cc = CPU_GET_CLASS(cpu); - qemu_mutex_lock_iothread(); cc->do_interrupt(cpu); - qemu_mutex_unlock_iothread(); cpu->exception_index = -1; if (unlikely(cpu->singlestep_enabled)) { @@ -558,7 +556,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, if (unlikely(cpu_interrupt_request(cpu))) { int interrupt_request; - qemu_mutex_lock_iothread(); + cpu_mutex_lock(cpu); interrupt_request = cpu_interrupt_request(cpu); if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) { /* Mask out external interrupts for this step. */ @@ -567,7 +565,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, if (interrupt_request & CPU_INTERRUPT_DEBUG) { cpu_reset_interrupt(cpu, CPU_INTERRUPT_DEBUG); cpu->exception_index = EXCP_DEBUG; - qemu_mutex_unlock_iothread(); + cpu_mutex_unlock(cpu); return true; } if (replay_mode == REPLAY_MODE_PLAY && !replay_has_interrupt()) { @@ -577,13 +575,15 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, cpu_reset_interrupt(cpu, CPU_INTERRUPT_HALT); cpu_halted_set(cpu, 1); cpu->exception_index = EXCP_HLT; - qemu_mutex_unlock_iothread(); + cpu_mutex_unlock(cpu); return true; } #if defined(TARGET_I386) else if (interrupt_request & CPU_INTERRUPT_INIT) { X86CPU *x86_cpu = X86_CPU(cpu); CPUArchState *env = &x86_cpu->env; + cpu_mutex_unlock(cpu); + qemu_mutex_lock_iothread(); replay_interrupt(); cpu_svm_check_intercept_param(env, SVM_EXIT_INIT, 0, 0); do_cpu_init(x86_cpu); @@ -595,7 +595,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, else if (interrupt_request & CPU_INTERRUPT_RESET) { replay_interrupt(); cpu_reset(cpu); - qemu_mutex_unlock_iothread(); + cpu_mutex_unlock(cpu); return true; } #endif @@ -604,7 +604,9 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, True when it is, and we should restart on a new TB, and via longjmp via cpu_loop_exit. */ else { + cpu_mutex_unlock(cpu); if (cc->cpu_exec_interrupt(cpu, interrupt_request)) { + cpu_mutex_lock(cpu); replay_interrupt(); /* * After processing the interrupt, ensure an EXCP_DEBUG is @@ -614,6 +616,8 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, cpu->exception_index = (cpu->singlestep_enabled ? EXCP_DEBUG : -1); *last_tb = NULL; + } else { + cpu_mutex_lock(cpu); } /* The target hook may have updated the 'cpu->interrupt_request'; * reload the 'interrupt_request' value */ @@ -627,7 +631,7 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, } /* If we exit via cpu_loop_exit/longjmp it is reset in cpu_exec */ - qemu_mutex_unlock_iothread(); + cpu_mutex_unlock(cpu); } /* Finally, check if we need to exit to the main loop. */ @@ -691,7 +695,6 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb, } #endif } - /* main execution loop */ int cpu_exec(CPUState *cpu)
This change removes the implied BQL from the cpu_handle_interrupt, and cpu_handle_exception paths. This BQL acquire is being pushed down into the per arch implementation. Signed-off-by: Robert Foley <robert.foley@linaro.org> --- accel/tcg/cpu-exec.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)