Message ID | 20190711071756.2784-1-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [kvm-unit-tests] tscdeadline_latency: Check condition first before loop | expand |
On Thu, Jul 11, 2019 at 03:17:56PM +0800, Peter Xu wrote: > This patch fixes a tscdeadline_latency hang when specifying a very > small breakmax value. It's easily reproduced on my host with > parameters like "200000 10000 10" (set breakmax to 10 TSC clocks). > > The problem is test_tsc_deadline_timer() can be very slow because > we've got printf() in there. So when reach the main loop we might > have already triggered the IRQ handler for multiple times and we might > have triggered the hitmax condition which will turn IRQ off. Then > with no IRQ that first HLT instruction can last forever. > > Fix this by simply checking the condition first in the loop. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > x86/tscdeadline_latency.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c > index 0617a1b..4ee5917 100644 > --- a/x86/tscdeadline_latency.c > +++ b/x86/tscdeadline_latency.c > @@ -118,9 +118,9 @@ int main(int argc, char **argv) > test_tsc_deadline_timer(); > irq_enable(); > > - do { > + /* The condition might have triggered already, so check before HLT. */ > + while (!hitmax && table_idx < size) Hmm... I think this is not ideal too in that variables (e.g., hitmax) could logically still change between the condition check and HLT below (though this patch already runs nicely here). Maybe we can simply use "nop" or "pause" instead of "hlt". I tested that using pause fixes the problem too. > asm volatile("hlt"); > - } while (!hitmax && table_idx < size); > > for (i = 0; i < table_idx; i++) { > if (hitmax && i == table_idx-1) > -- > 2.21.0 > Regards,
On Thu, Jul 11, 2019 at 03:33:35PM +0800, Peter Xu wrote: > On Thu, Jul 11, 2019 at 03:17:56PM +0800, Peter Xu wrote: > > This patch fixes a tscdeadline_latency hang when specifying a very > > small breakmax value. It's easily reproduced on my host with > > parameters like "200000 10000 10" (set breakmax to 10 TSC clocks). > > > > The problem is test_tsc_deadline_timer() can be very slow because > > we've got printf() in there. So when reach the main loop we might > > have already triggered the IRQ handler for multiple times and we might > > have triggered the hitmax condition which will turn IRQ off. Then > > with no IRQ that first HLT instruction can last forever. > > > > Fix this by simply checking the condition first in the loop. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > x86/tscdeadline_latency.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c > > index 0617a1b..4ee5917 100644 > > --- a/x86/tscdeadline_latency.c > > +++ b/x86/tscdeadline_latency.c > > @@ -118,9 +118,9 @@ int main(int argc, char **argv) > > test_tsc_deadline_timer(); > > irq_enable(); > > > > - do { > > + /* The condition might have triggered already, so check before HLT. */ > > + while (!hitmax && table_idx < size) > > Hmm... I think this is not ideal too in that variables (e.g., hitmax) > could logically still change between the condition check and HLT below > (though this patch already runs nicely here). Maybe we can simply use > "nop" or "pause" instead of "hlt". > > I tested that using pause fixes the problem too. Ensuring the first hlt lands in an interrupt shadow should prevent getting into a halted state after the timer has been disabled, e.g.: irq_disable(); test_tsc_deadline_timer(); do { safe_halt(); } while (!hitmax && table_idx < size); > > > asm volatile("hlt"); > > - } while (!hitmax && table_idx < size); > > > > for (i = 0; i < table_idx; i++) { > > if (hitmax && i == table_idx-1) > > -- > > 2.21.0 > > > > Regards, > > -- > Peter Xu
On Thu, Jul 11, 2019 at 07:05:53AM -0700, Sean Christopherson wrote: > Ensuring the first hlt lands in an interrupt shadow should prevent getting > into a halted state after the timer has been disabled, e.g.: > > irq_disable(); > test_tsc_deadline_timer(); > > do { > safe_halt(); > } while (!hitmax && table_idx < size); Yes seems better, thanks for the suggestion (though I'll probably also need to remove the hidden sti in start_tsc_deadline_timer). Is safe_halt() really safe? I mean, IRQ handler could still run before HLT right after STI right? Though no matter what I think it's fine for this test case because we'll skip the first IRQ after all. Just curious. Thanks,
On Fri, Jul 12, 2019 at 07:27:36AM +0800, Peter Xu wrote: > On Thu, Jul 11, 2019 at 07:05:53AM -0700, Sean Christopherson wrote: > > Ensuring the first hlt lands in an interrupt shadow should prevent getting > > into a halted state after the timer has been disabled, e.g.: > > > > irq_disable(); > > test_tsc_deadline_timer(); > > > > do { > > safe_halt(); > > } while (!hitmax && table_idx < size); > > Yes seems better, thanks for the suggestion (though I'll probably also > need to remove the hidden sti in start_tsc_deadline_timer). > > Is safe_halt() really safe? I mean, IRQ handler could still run > before HLT right after STI right? Though no matter what I think it's > fine for this test case because we'll skip the first IRQ after all. > Just curious. It's safe, at least on modern hardware. Everything since P6, and I think all AMD CPUs?, have an interrupt shadow where interrupts are blocked for one additional instruction after being enabled by STI.
diff --git a/x86/tscdeadline_latency.c b/x86/tscdeadline_latency.c index 0617a1b..4ee5917 100644 --- a/x86/tscdeadline_latency.c +++ b/x86/tscdeadline_latency.c @@ -118,9 +118,9 @@ int main(int argc, char **argv) test_tsc_deadline_timer(); irq_enable(); - do { + /* The condition might have triggered already, so check before HLT. */ + while (!hitmax && table_idx < size) asm volatile("hlt"); - } while (!hitmax && table_idx < size); for (i = 0; i < table_idx; i++) { if (hitmax && i == table_idx-1)
This patch fixes a tscdeadline_latency hang when specifying a very small breakmax value. It's easily reproduced on my host with parameters like "200000 10000 10" (set breakmax to 10 TSC clocks). The problem is test_tsc_deadline_timer() can be very slow because we've got printf() in there. So when reach the main loop we might have already triggered the IRQ handler for multiple times and we might have triggered the hitmax condition which will turn IRQ off. Then with no IRQ that first HLT instruction can last forever. Fix this by simply checking the condition first in the loop. Signed-off-by: Peter Xu <peterx@redhat.com> --- x86/tscdeadline_latency.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)