Message ID | 20240615185423.49474-1-florian.lugou@provenrun.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/arm/helper: Fix timer interrupt masking when HCR_EL2.E2H == 0 | expand |
On Sat, 15 Jun 2024 at 19:56, Florian Lugou <florian.lugou@provenrun.com> wrote: > > CNTHCTL_EL2 based masking of timer interrupts was introduced in > f6fc36deef6abcee406211f3e2f11ff894b87fa4. This masking was however > effective no matter whether EL2 was enabled in the current security > state or not, contrary to arm specification. > > Signed-off-by: Florian Lugou <florian.lugou@provenrun.com> > --- > target/arm/helper.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/target/arm/helper.c b/target/arm/helper.c > index ce31957235..60e2344c68 100644 > --- a/target/arm/helper.c > +++ b/target/arm/helper.c > @@ -2684,7 +2684,8 @@ static void gt_update_irq(ARMCPU *cpu, int timeridx) > * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK. > * It is RES0 in Secure and NonSecure state. > */ > - if ((ss == ARMSS_Root || ss == ARMSS_Realm) && > + if ((arm_hcr_el2_eff(env) & HCR_E2H) && > + (ss == ARMSS_Root || ss == ARMSS_Realm) && When the architecture says "is EL2 enabled in the current security state" it doesn't mean "is HCR_EL2.E2H set?", it means "is this either NonSecure/Realm or else is SCR_EL2.EEL2 set?". Compare the pseudocode EL2Enabled() and QEMU's arm_is_el2_enabled() and arm_is_el2_enabled_secstate() functions. This doesn't mean much in Root state, and for Realm state EL2 is always enabled (assuming it is implemented). For this timer check, we're doing I think the same thing as the pseudocode AArch64.CheckTimerConditions(), which does: if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} && CNTHCTL_EL2.CNTPMASK == '1') then imask = '1'; so I'm inclined to say that our current implementation in QEMU is correct. > ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) || > (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) { > irqstate = 0; > -- thanks -- PMM
On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote: > On Sat, 15 Jun 2024 at 19:56, Florian Lugou <florian.lugou@provenrun.com> wrote: > > > > CNTHCTL_EL2 based masking of timer interrupts was introduced in > > f6fc36deef6abcee406211f3e2f11ff894b87fa4. This masking was however > > effective no matter whether EL2 was enabled in the current security > > state or not, contrary to arm specification. > > > > Signed-off-by: Florian Lugou <florian.lugou@provenrun.com> > > --- > > target/arm/helper.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/target/arm/helper.c b/target/arm/helper.c > > index ce31957235..60e2344c68 100644 > > --- a/target/arm/helper.c > > +++ b/target/arm/helper.c > > @@ -2684,7 +2684,8 @@ static void gt_update_irq(ARMCPU *cpu, int timeridx) > > * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK. > > * It is RES0 in Secure and NonSecure state. > > */ > > - if ((ss == ARMSS_Root || ss == ARMSS_Realm) && > > + if ((arm_hcr_el2_eff(env) & HCR_E2H) && > > + (ss == ARMSS_Root || ss == ARMSS_Realm) && > > When the architecture says "is EL2 enabled in the current security state" > it doesn't mean "is HCR_EL2.E2H set?", it means "is this either NonSecure/Realm > or else is SCR_EL2.EEL2 set?". Compare the pseudocode EL2Enabled() > and QEMU's arm_is_el2_enabled() and arm_is_el2_enabled_secstate() functions. > This doesn't mean much in Root state, and for Realm state EL2 is always > enabled (assuming it is implemented). > > For this timer check, we're doing I think the same thing as the > pseudocode AArch64.CheckTimerConditions(), which does: > > if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} && > CNTHCTL_EL2.CNTPMASK == '1') then > imask = '1'; > > so I'm inclined to say that our current implementation in QEMU is correct. Indeed. I got confused with the specification, my apologies. I am facing an issue with QEMU freezing waiting for a timer interrupt when running with -icount shift=0,sleep=off. Bissection has shown that the issue appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4. Further testing suggests that the issue may come from gt_recalc_timer. Calling gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than at the end of the function solves the issue. Is it possible that timer_mod relies on cpu->gt_timer_outputs, which has not been modified at this point to reflect the timer triggering? > > > ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) || > > (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) { > > irqstate = 0; > > -- > > thanks > -- PMM Best,
On Thu, 20 Jun 2024 at 14:56, Florian Lugou <florian.lugou@provenrun.com> wrote: > > On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote: > > For this timer check, we're doing I think the same thing as the > > pseudocode AArch64.CheckTimerConditions(), which does: > > > > if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} && > > CNTHCTL_EL2.CNTPMASK == '1') then > > imask = '1'; > > > > so I'm inclined to say that our current implementation in QEMU is correct. > > Indeed. I got confused with the specification, my apologies. > > I am facing an issue with QEMU freezing waiting for a timer interrupt when > running with -icount shift=0,sleep=off. Bissection has shown that the issue > appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4. > > Further testing suggests that the issue may come from gt_recalc_timer. Calling > gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than > at the end of the function solves the issue. Is it possible that timer_mod > relies on cpu->gt_timer_outputs, which has not been modified at this point to > reflect the timer triggering? I don't *think* it ought to care -- timer_mod() tells QEMU's timer infrastructure when to schedule the next timer callback for, and the gt_timer_outputs qemu_irqs tell the interrupt controller that the interrupt lines have changed state. Do you have a reproduce case? -- PMM
On Thu, Jun 20, 2024 at 08:01:01PM +0100, Peter Maydell wrote: > On Thu, 20 Jun 2024 at 14:56, Florian Lugou <florian.lugou@provenrun.com> wrote: > > > > On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote: > > > For this timer check, we're doing I think the same thing as the > > > pseudocode AArch64.CheckTimerConditions(), which does: > > > > > > if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} && > > > CNTHCTL_EL2.CNTPMASK == '1') then > > > imask = '1'; > > > > > > so I'm inclined to say that our current implementation in QEMU is correct. > > > > Indeed. I got confused with the specification, my apologies. > > > > I am facing an issue with QEMU freezing waiting for a timer interrupt when > > running with -icount shift=0,sleep=off. Bissection has shown that the issue > > appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4. > > > > Further testing suggests that the issue may come from gt_recalc_timer. Calling > > gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than > > at the end of the function solves the issue. Is it possible that timer_mod > > relies on cpu->gt_timer_outputs, which has not been modified at this point to > > reflect the timer triggering? > > I don't *think* it ought to care -- timer_mod() tells QEMU's timer > infrastructure when to schedule the next timer callback for, > and the gt_timer_outputs qemu_irqs tell the interrupt controller > that the interrupt lines have changed state. > > Do you have a reproduce case? I do: $ cat test.S .section .text .global __start __start: /* Setup exception table */ ldr x0, =exc_vector_table msr vbar_el3, x0 /* Enable and mask secure physical timer */ mrs x0, CNTPS_CTL_EL1 orr x0, x0, 3 msr CNTPS_CTL_EL1, x0 mov x0, 0x8000000 /* GIC base address */ /* Enable group 0 */ ldr w1, [x0, 0] /* GICD_CTLR */ orr w1, w1, 0x1 str w1, [x0, 0] /* GICD_CTLR */ /* Enable timer interrupt */ add x0, x0, 0xb0000 /* SGI_base */ mov w1, (1 << 29) str w1, [x0, 0x100] /* GICR_ISENABLER0 */ /* Enable all priorities */ mov x0, 0xff msr ICC_PMR_EL1, x0 mov x0, 1 msr ICC_IGRPEN0_EL1, x0 /* Set timer compare value ~0.8s in the future */ mrs x0, CNTPCT_EL0 mov x1, 0x3000000 add x0, x0, x1 msr CNTPS_CVAL_EL1, x0 /* Unmask the timer */ mrs x0, CNTPS_CTL_EL1 bic x0, x0, 2 msr CNTPS_CTL_EL1, x0 /* Enable interrupts */ mrs x0, SCR_EL3 orr x0, x0, 4 msr SCR_EL3, x0 msr daifclr, 0x1 dsb sy /* Loop on WFI */ 0: wfi b 0b .macro EXIT .p2align 7 /* Issue a SYS_EXIT semihosting call */ mov x0, 0x18 .word 0xD45E0000 /* unreachable */ b . .endm .macro HOLE .p2align 7 b . .endm .p2align 11 exc_vector_table: HOLE /* Synchronous, from EL3, with SP_EL0 */ HOLE /* IRQ, from EL3, with SP_EL0 */ HOLE /* FIQ, from EL3, with SP_EL0 */ HOLE /* SError, from EL3, with SP_EL0 */ HOLE /* Synchronous, from EL3, with SP_ELx */ HOLE /* IRQ, from EL3, with SP_ELx */ EXIT /* FIQ, from EL3, with SP_ELx */ HOLE /* SError, from EL3, with SP_ELx */ HOLE /* Synchronous, from lower, with lvl n-1 aarch64 */ HOLE /* IRQ, from lower, with lvl n-1 aarch64 */ HOLE /* FIQ, from lower, with lvl n-1 aarch64 */ HOLE /* SError, from lower, with lvl n-1 aarch64 */ HOLE /* Synchronous, from lower, with lvl n-1 aarch32 */ HOLE /* IRQ, from lower, with lvl n-1 aarch32 */ HOLE /* FIQ, from lower, with lvl n-1 aarch32 */ HOLE /* SError, from lower, with lvl n-1 aarch32 */ $ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S $ qemu-system-aarch64 \ -machine virt,secure=on,gic-version=3 \ -cpu cortex-a57 \ -kernel test \ -display none \ -semihosting $ # Exits after ~1s $ qemu-system-aarch64 \ -machine virt,secure=on,gic-version=3 \ -cpu cortex-a57 \ -kernel test \ -display none \ -semihosting \ -icount shift=0,sleep=off ... (hangs until QEMU is killed) Best,
On Fri, 21 Jun 2024 at 15:07, Florian Lugou <florian.lugou@provenrun.com> wrote: > > On Thu, Jun 20, 2024 at 08:01:01PM +0100, Peter Maydell wrote: > > On Thu, 20 Jun 2024 at 14:56, Florian Lugou <florian.lugou@provenrun.com> wrote: > > > > > > On Thu, Jun 20, 2024 at 11:43:17AM +0100, Peter Maydell wrote: > > > > For this timer check, we're doing I think the same thing as the > > > > pseudocode AArch64.CheckTimerConditions(), which does: > > > > > > > > if (IsFeatureImplemented(FEAT_RME) && ss IN {SS_Root, SS_Realm} && > > > > CNTHCTL_EL2.CNTPMASK == '1') then > > > > imask = '1'; > > > > > > > > so I'm inclined to say that our current implementation in QEMU is correct. > > > > > > Indeed. I got confused with the specification, my apologies. > > > > > > I am facing an issue with QEMU freezing waiting for a timer interrupt when > > > running with -icount shift=0,sleep=off. Bissection has shown that the issue > > > appeared with f6fc36deef6abcee406211f3e2f11ff894b87fa4. > > > > > > Further testing suggests that the issue may come from gt_recalc_timer. Calling > > > gt_update_irq before timer_mod (as it was done before f6fc36deef6a) rather than > > > at the end of the function solves the issue. Is it possible that timer_mod > > > relies on cpu->gt_timer_outputs, which has not been modified at this point to > > > reflect the timer triggering? > > > > I don't *think* it ought to care -- timer_mod() tells QEMU's timer > > infrastructure when to schedule the next timer callback for, > > and the gt_timer_outputs qemu_irqs tell the interrupt controller > > that the interrupt lines have changed state. > > > > Do you have a reproduce case? > > I do: (Sorry I didn't get back to you on this earlier.) > $ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S > > $ qemu-system-aarch64 \ > -machine virt,secure=on,gic-version=3 \ > -cpu cortex-a57 \ > -kernel test \ > -display none \ > -semihosting > > $ # Exits after ~1s > > $ qemu-system-aarch64 \ > -machine virt,secure=on,gic-version=3 \ > -cpu cortex-a57 \ > -kernel test \ > -display none \ > -semihosting \ > -icount shift=0,sleep=off > > ... (hangs until QEMU is killed) For me, with QEMU commit 9eb51530c12ae645b, this test case exits (doesn't hang) with both these command lines. Do you still see this bug? I guess it's possible we fixed it in the last month or so, though I can't see anything obviously relevant in the git logs. thanks -- PMM
> > $ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S > > > > $ qemu-system-aarch64 \ > > -machine virt,secure=on,gic-version=3 \ > > -cpu cortex-a57 \ > > -kernel test \ > > -display none \ > > -semihosting > > > > $ # Exits after ~1s > > > > $ qemu-system-aarch64 \ > > -machine virt,secure=on,gic-version=3 \ > > -cpu cortex-a57 \ > > -kernel test \ > > -display none \ > > -semihosting \ > > -icount shift=0,sleep=off > > > > ... (hangs until QEMU is killed) > > For me, with QEMU commit 9eb51530c12ae645b, this test case > exits (doesn't hang) with both these command lines. Do you > still see this bug? I guess it's possible we fixed it in > the last month or so, though I can't see anything obviously > relevant in the git logs. Thank you for taking the time to test it. On my machine (Ubuntu 22.04), with QEMU configuration options "--target-list=aarch64-softmmu --enable-debug", running the provided test case with "-icount shift=0,sleep=off" still makes QEMU hang forever on commit 9eb51530c12ae645b. The issue was initially reported by a colleague of mine so I was hoping it would be somehow reliably reproducible. But apparently it is not. I will try to find some time to investigate a bit more. Thank you,
On Tue, 20 Aug 2024 at 12:30, Florian Lugou <florian.lugou@provenrun.com> wrote: > > > > $ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S > > > > > > $ qemu-system-aarch64 \ > > > -machine virt,secure=on,gic-version=3 \ > > > -cpu cortex-a57 \ > > > -kernel test \ > > > -display none \ > > > -semihosting > > > > > > $ # Exits after ~1s > > > > > > $ qemu-system-aarch64 \ > > > -machine virt,secure=on,gic-version=3 \ > > > -cpu cortex-a57 \ > > > -kernel test \ > > > -display none \ > > > -semihosting \ > > > -icount shift=0,sleep=off > > > > > > ... (hangs until QEMU is killed) > > > > For me, with QEMU commit 9eb51530c12ae645b, this test case > > exits (doesn't hang) with both these command lines. Do you > > still see this bug? I guess it's possible we fixed it in > > the last month or so, though I can't see anything obviously > > relevant in the git logs. > > Thank you for taking the time to test it. > > On my machine (Ubuntu 22.04), with QEMU configuration options > "--target-list=aarch64-softmmu --enable-debug", running the provided test case > with "-icount shift=0,sleep=off" still makes QEMU hang forever on commit > 9eb51530c12ae645b. > > The issue was initially reported by a colleague of mine so I was hoping it would > be somehow reliably reproducible. But apparently it is not. Somebody else reported a similar issue to this to me this week, which reminded me of a different bug that we'd found and fixed in the interim, which was enough of a hint that I figured out why this wasn't reproducible for me. This bug only reproduces if your QEMU binary isn't compiled with slirp support (which will happen if your host system doesn't have libslirp-dev or libslirp-devel installed). If slirp is present then the usermode networking backend will be present and it always has an active timer. Without slirp, the problem manifests when there are no active timers. You can also repro it on a with-slirp compile by adding "-net none" to the command line. I'll have a look at what the underlying bug is... thanks again for the handy test case. thanks -- PMM
On Thu, 6 Feb 2025 at 15:41, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Tue, 20 Aug 2024 at 12:30, Florian Lugou <florian.lugou@provenrun.com> wrote: > > > > > > $ aarch64-none-elf-gcc -ffreestanding -nostdlib -T qemu/tests/tcg/aarch64/system/kernel.ld -o test test.S > > > > > > > > $ qemu-system-aarch64 \ > > > > -machine virt,secure=on,gic-version=3 \ > > > > -cpu cortex-a57 \ > > > > -kernel test \ > > > > -display none \ > > > > -semihosting > > > > > > > > $ # Exits after ~1s > > > > > > > > $ qemu-system-aarch64 \ > > > > -machine virt,secure=on,gic-version=3 \ > > > > -cpu cortex-a57 \ > > > > -kernel test \ > > > > -display none \ > > > > -semihosting \ > > > > -icount shift=0,sleep=off > > > > > > > > ... (hangs until QEMU is killed) > Somebody else reported a similar issue to this to me this week, > which reminded me of a different bug that we'd found and fixed > in the interim, which was enough of a hint that I figured out > why this wasn't reproducible for me. > > This bug only reproduces if your QEMU binary isn't compiled > with slirp support (which will happen if your host system > doesn't have libslirp-dev or libslirp-devel installed). > If slirp is present then the usermode networking backend > will be present and it always has an active timer. Without > slirp, the problem manifests when there are no active timers. > > You can also repro it on a with-slirp compile by adding > "-net none" to the command line. > > I'll have a look at what the underlying bug is... thanks again > for the handy test case. So I've looked at this, and I can see *why* it hangs, but it looks like a structural problem with icount, which I'm not super familiar with. Richard, Alex, any suggestions? The sequence of events is that the test case sets up the timer with an expiry in the future, enables interrupts, and then executes a WFI insn. What's supposed to happen is that the interrupt fires and the test case exits. In helper_wfi, we set cs->halted and do a cpu_loop_exit(), so the WFI definitely goes to sleep. We do fire the timer's callback: #0 arm_gt_stimer_cb (opaque=0x55870482d670) at ../../target/arm/helper.c:2962 #1 0x00005587013c3c86 in timerlist_run_timers (timer_list=0x5587042f5950) at ../../util/qemu-timer.c:566 #2 0x00005587013c3d3c in qemu_clock_run_timers (type=QEMU_CLOCK_VIRTUAL) at ../../util/qemu-timer.c:580 #3 0x0000558701102f4c in icount_notify_aio_contexts () at ../../accel/tcg/tcg-accel-ops-icount.c:73 #4 0x0000558701102faf in icount_handle_deadline () at ../../accel/tcg/tcg-accel-ops-icount.c:88 #5 0x0000558701103ac9 in rr_cpu_thread_fn (arg=0x55870482d670) at ../../accel/tcg/tcg-accel-ops-rr.c:234 and the timer count is correct for the timer expiry, so this part is OK. First the arm generic timer code reprograms the QEMU timer (for an expiry at INT64_MAX, ie "far far future") by calling timer_mod_ns(). timer_mod_ns() ends up calling timerlist_rearm(), which calls icount_start_warp_timer(). This is where things go wrong -- icount_start_warp_timer() notices that all CPU threads are currently idle, and decides it needs to warp the timer forwards to the next deadline, which is at the end of time -- INT64_MAX. But once timer_mod_ns() returns, the generic timer code is going to raise an interrupt (this goes through the GIC code and comes back into the CPU which calls cpu_interrupt()), so we don't want to warp the timer at all. The clock should stay exactly at the value it has and the CPU is going to have more work to do. How is this supposed to work? Shouldn't we only be doing the "start moving the icount forward to the next deadline" once we've completed all the "run timers and AIO stuff" that icount_handle_deadline() triggers, not randomly in the middle of that when this timer callback or some other one might do something to trigger an interrupt? If the arm_gt_stimer_cb() was written in the other order, so that it first raised the interrupt and then modified the QEMU timer second, this would happen to work, because raising the interrupt sets cpu->interrupt_request, which means that arm_cpu_has_work() returns true, so when icount_start_warp_timer() calls all_cpu_threads_idle() it returns false and icount_start_warp_timer() returns without doing anything. But I don't think there's any reason why timer callbacks should be obliged to reprogram their timers last, and in any case you can imagine scenarios where there are multiple timer callbacks for different timers and it's only the second timer that raises an interrupt... thanks -- PMM
On 2/7/25 07:45, Peter Maydell wrote: > This is where things go wrong -- icount_start_warp_timer() > notices that all CPU threads are currently idle, and > decides it needs to warp the timer forwards to the > next deadline, which is at the end of time -- INT64_MAX. > > But once timer_mod_ns() returns, the generic timer code > is going to raise an interrupt (this goes through the GIC > code and comes back into the CPU which calls cpu_interrupt()), > so we don't want to warp the timer at all. The clock should > stay exactly at the value it has and the CPU is going to > have more work to do. > > How is this supposed to work? Shouldn't we only be doing > the "start moving the icount forward to the next deadline" > once we've completed all the "run timers and AIO stuff" that > icount_handle_deadline() triggers, not randomly in the middle > of that when this timer callback or some other one might do > something to trigger an interrupt? I don't understand timer warping at all. And you're right, it doesn't seem like this should happen outside of a specific point in the main loop. > ... But I don't think there's any reason why > timer callbacks should be obliged to reprogram their timers > last, and in any case you can imagine scenarios where there > are multiple timer callbacks for different timers and it's > only the second timer that raises an interrupt... Agreed. r~
Richard Henderson <richard.henderson@linaro.org> writes: > On 2/7/25 07:45, Peter Maydell wrote: >> This is where things go wrong -- icount_start_warp_timer() >> notices that all CPU threads are currently idle, and >> decides it needs to warp the timer forwards to the >> next deadline, which is at the end of time -- INT64_MAX. >> But once timer_mod_ns() returns, the generic timer code >> is going to raise an interrupt (this goes through the GIC >> code and comes back into the CPU which calls cpu_interrupt()), >> so we don't want to warp the timer at all. The clock should >> stay exactly at the value it has and the CPU is going to >> have more work to do. >> How is this supposed to work? Shouldn't we only be doing >> the "start moving the icount forward to the next deadline" >> once we've completed all the "run timers and AIO stuff" that >> icount_handle_deadline() triggers, not randomly in the middle >> of that when this timer callback or some other one might do >> something to trigger an interrupt? > > I don't understand timer warping at all. And you're right, it doesn't > seem like this should happen outside of a specific point in the main > loop. This has come up before - and the conclusion was we don't know what sleep=on/off is meant to mean. If the processor is asleep and there are no timers to fire then nothing will happen. It was off-list though: Subject: Re: qemu-system-aarch64 & icount behavior Date: Wed, 22 Jul 2020 at 11:21 From: Kumar Gala <kumar.gala@linaro.org> Subject: Fwd: qemu-system-aarch64 & icount behavior Message-ID: <CAFEAcA9DnBQFnOc1HJav2DyLwsQu+YYE5RyZp5wrLxyc1gZqOQ@mail.gmail.com> Date: Fri, 24 Jul 2020 17:25:51 +0100 From: Peter Maydell <peter.maydell@linaro.org> >> ... But I don't think there's any reason why >> timer callbacks should be obliged to reprogram their timers >> last, and in any case you can imagine scenarios where there >> are multiple timer callbacks for different timers and it's >> only the second timer that raises an interrupt... > > Agreed. > > > r~
On Fri, 7 Feb 2025 at 18:29, Alex Bennée <alex.bennee@linaro.org> wrote: > > Richard Henderson <richard.henderson@linaro.org> writes: > > > On 2/7/25 07:45, Peter Maydell wrote: > >> This is where things go wrong -- icount_start_warp_timer() > >> notices that all CPU threads are currently idle, and > >> decides it needs to warp the timer forwards to the > >> next deadline, which is at the end of time -- INT64_MAX. > >> But once timer_mod_ns() returns, the generic timer code > >> is going to raise an interrupt (this goes through the GIC > >> code and comes back into the CPU which calls cpu_interrupt()), > >> so we don't want to warp the timer at all. The clock should > >> stay exactly at the value it has and the CPU is going to > >> have more work to do. > >> How is this supposed to work? Shouldn't we only be doing > >> the "start moving the icount forward to the next deadline" > >> once we've completed all the "run timers and AIO stuff" that > >> icount_handle_deadline() triggers, not randomly in the middle > >> of that when this timer callback or some other one might do > >> something to trigger an interrupt? > > > > I don't understand timer warping at all. And you're right, it doesn't > > seem like this should happen outside of a specific point in the main > > loop. > > This has come up before - and the conclusion was we don't know what > sleep=on/off is meant to mean. If the processor is asleep and there are > no timers to fire then nothing will happen. > > It was off-list though: > > Subject: Re: qemu-system-aarch64 & icount behavior No, that was a different situation. That thread was about when there genuinely is nothing to do (all CPUs asleep and no timers active) for the rest of the life of the simulation. The bug in this thread is that icount incorrectly prematurely decides it should warp the timer forwards, when in fact there is going to be something the CPU should be doing right now (i.e. responding to the interrupt the timer callback is about to raise). It becomes very obvious when there's no other timer callback, because the place that icount incorrectly warps us to is end-of-time, but I'm pretty sure that even when there is another timer active icount will still be wrongly warping time -- it will just be less obvious because the interrupt gets incorrectly delayed to whatever that subsequent timer callback time is, rather than forever. thanks -- PMM
diff --git a/target/arm/helper.c b/target/arm/helper.c index ce31957235..60e2344c68 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -2684,7 +2684,8 @@ static void gt_update_irq(ARMCPU *cpu, int timeridx) * If bit CNTHCTL_EL2.CNT[VP]MASK is set, it overrides IMASK. * It is RES0 in Secure and NonSecure state. */ - if ((ss == ARMSS_Root || ss == ARMSS_Realm) && + if ((arm_hcr_el2_eff(env) & HCR_E2H) && + (ss == ARMSS_Root || ss == ARMSS_Realm) && ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) || (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) { irqstate = 0;
CNTHCTL_EL2 based masking of timer interrupts was introduced in f6fc36deef6abcee406211f3e2f11ff894b87fa4. This masking was however effective no matter whether EL2 was enabled in the current security state or not, contrary to arm specification. Signed-off-by: Florian Lugou <florian.lugou@provenrun.com> --- target/arm/helper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)