diff mbox series

[4/7] x86/pv: Drop priv_op_ctxt.bpmatch and use pending_dbg instead

Message ID 20230915203628.837732-5-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/pv: #DB vs %dr6 fixes, part 2 | expand

Commit Message

Andrew Cooper Sept. 15, 2023, 8:36 p.m. UTC
With a full pending_dbg field in x86_emulate_ctxt, use it rather than using a
local bpmatch field.

This simplifies the X86EMUL_OKAY path as singlestep is already accumulated by
x86_emulate() when appropriate.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Jinoh Kang <jinoh.kang.kr@gmail.com>

v2:
 * Tweak commit message to avoid referencing X86EMUL_DONE.
---
 xen/arch/x86/pv/emul-priv-op.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Comments

Jinoh Kang Sept. 16, 2023, 7:36 a.m. UTC | #1
On 9/16/23 05:36, Andrew Cooper wrote:
> @@ -658,7 +660,7 @@ static int cf_check rep_ins(
>  
>          ++*reps;
>  
> -        if ( poc->bpmatch || hypercall_preempt_check() )
> +        if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
>              break;
>  
>          /* x86_emulate() clips the repetition count to ensure we don't wrap. */

(snip)

> @@ -726,7 +729,7 @@ static int cf_check rep_outs(
>  
>          ++*reps;
>  
> -        if ( poc->bpmatch || hypercall_preempt_check() )
> +        if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
>              break;
>  
>          /* x86_emulate() clips the repetition count to ensure we don't wrap. */

These two hunks look like a behavioral change in singlestep mode.

This is actually a fix, assuming the emulator previously did not handle
'rep {in,out}s' in singlestep mode correctly, since it now checks for
PENDING_DBG.BS in addition to PENDING_DBG.B[0-4].

If this is the case, (at least) this part of the patch looks like a stable
candidate.  You might want to edit the commit message to reflect that.

(Ideally all the HWBP handling should be part of the emulator logic, but
 I don't see an easy way to generalize the PV-specific logic.  It could
 be its own patch anyway.)
Andrew Cooper Sept. 16, 2023, 2 p.m. UTC | #2
On 16/09/2023 8:36 am, Jinoh Kang wrote:
> On 9/16/23 05:36, Andrew Cooper wrote:
>> @@ -658,7 +660,7 @@ static int cf_check rep_ins(
>>  
>>          ++*reps;
>>  
>> -        if ( poc->bpmatch || hypercall_preempt_check() )
>> +        if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
>>              break;
>>  
>>          /* x86_emulate() clips the repetition count to ensure we don't wrap. */
> (snip)
>
>> @@ -726,7 +729,7 @@ static int cf_check rep_outs(
>>  
>>          ++*reps;
>>  
>> -        if ( poc->bpmatch || hypercall_preempt_check() )
>> +        if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
>>              break;
>>  
>>          /* x86_emulate() clips the repetition count to ensure we don't wrap. */
> These two hunks look like a behavioral change in singlestep mode.
>
> This is actually a fix, assuming the emulator previously did not handle
> 'rep {in,out}s' in singlestep mode correctly, since it now checks for
> PENDING_DBG.BS in addition to PENDING_DBG.B[0-4].

The emulator should handle this correctly already.  I had been meaning
to test this, but hadn't so far - guess I should fix that.

x86_emulate.c line 511 in get_rep_prefix() sets max_reps to 1 if
SingleStepping is active.

This in turn causes the emulator to use the io_{read,write}() hook
rather than the rep hook.

This is important, because singlestepping becoming pending is normally
evaluated at the end of the instruction.  i.e. in this example it
wouldn't show up in pending_dbg (yet).

What definitely is broken here is the recognition of a data breakpoint
on the memory operand of the INS/OUTS instruction, but it's broken
everywhere else for PV guest emulation too, so needs to go on the TODO list.

> If this is the case, (at least) this part of the patch looks like a stable
> candidate.  You might want to edit the commit message to reflect that.

We're going to try and get all the %dr6 handling issues sorted, then
decide whether to backport the lot or not.  It will entirely depend on
how invasive the fixes end up being, but I hope they'll be ok to backport.

> (Ideally all the HWBP handling should be part of the emulator logic, but
>  I don't see an easy way to generalize the PV-specific logic.  It could
>  be its own patch anyway.)

The emulator does have HWBP handling for HVM guests, because that's
architectural behaviour to look in the TSS.

PV guests are the odd-ones-out with non-standard behaviour.

~Andrew
Jinoh Kang Sept. 16, 2023, 2:42 p.m. UTC | #3
On 9/16/23 23:00, Andrew Cooper wrote:
> On 16/09/2023 8:36 am, Jinoh Kang wrote:

(snip)

>> These two hunks look like a behavioral change in singlestep mode.
>>
>> This is actually a fix, assuming the emulator previously did not handle
>> 'rep {in,out}s' in singlestep mode correctly, since it now checks for
>> PENDING_DBG.BS in addition to PENDING_DBG.B[0-4].
> 
> The emulator should handle this correctly already.  I had been meaning
> to test this, but hadn't so far - guess I should fix that.
> 
> x86_emulate.c line 511 in get_rep_prefix() sets max_reps to 1 if
> SingleStepping is active.

Thanks for informing.  Although that EFLAGS.TF check in the macro now
makes me--almost reflexively--imagine all sort of creative pathological
cases, like "mov ss, ax; rep ins"...

> 
> This in turn causes the emulator to use the io_{read,write}() hook
> rather than the rep hook.

Right.  (Frankly that part of code has too many branches to be readable.
Also the "presumably most efficient" part of the comment hints at perf
optimization sans any profiling attempts...)

> 
> This is important, because singlestepping becoming pending is normally
> evaluated at the end of the instruction.  i.e. in this example it
> wouldn't show up in pending_dbg (yet).
> 
> What definitely is broken here is the recognition of a data breakpoint
> on the memory operand of the INS/OUTS instruction, but it's broken
> everywhere else for PV guest emulation too, so needs to go on the TODO list.

(Another thing definitely broken here is the recognition of I/O breakpoints
 post the first iteration.  Maybe it would be beneficial to do differential
 testing between the {read,write}_io slowpath and rep_{in,out}s fastpath.)

> 
>> If this is the case, (at least) this part of the patch looks like a stable
>> candidate.  You might want to edit the commit message to reflect that.
> 
> We're going to try and get all the %dr6 handling issues sorted, then
> decide whether to backport the lot or not.  It will entirely depend on
> how invasive the fixes end up being, but I hope they'll be ok to backport.
> 
>> (Ideally all the HWBP handling should be part of the emulator logic, but
>>  I don't see an easy way to generalize the PV-specific logic.  It could
>>  be its own patch anyway.)
> 
> The emulator does have HWBP handling for HVM guests, because that's
> architectural behaviour to look in the TSS.

I was under such impression since I didn't immediately notice I/O
breakpoint handling in ins/outs path; maybe I haven't looked into it deeper...

> 
> PV guests are the odd-ones-out with non-standard behaviour.
> 
> ~Andrew
Andrew Cooper Sept. 16, 2023, 4:18 p.m. UTC | #4
On 16/09/2023 3:42 pm, Jinoh Kang wrote:
> On 9/16/23 23:00, Andrew Cooper wrote:
>> On 16/09/2023 8:36 am, Jinoh Kang wrote:
> (snip)
>
>>> These two hunks look like a behavioral change in singlestep mode.
>>>
>>> This is actually a fix, assuming the emulator previously did not handle
>>> 'rep {in,out}s' in singlestep mode correctly, since it now checks for
>>> PENDING_DBG.BS in addition to PENDING_DBG.B[0-4].
>> The emulator should handle this correctly already.  I had been meaning
>> to test this, but hadn't so far - guess I should fix that.
>>
>> x86_emulate.c line 511 in get_rep_prefix() sets max_reps to 1 if
>> SingleStepping is active.
> Thanks for informing.  Although that EFLAGS.TF check in the macro now
> makes me--almost reflexively--imagine all sort of creative pathological
> cases, like "mov ss, ax; rep ins"...

Yeah isn't x86 fun...

That's why it's important to accumulate in pending_dbg, which does hold
the breakpoint recognition across point where the blocked-by-movss state
inhibits the generation of #DB, and causes it to generate #DB on the
subsequent instruction boundary.

>> This in turn causes the emulator to use the io_{read,write}() hook
>> rather than the rep hook.
> Right.  (Frankly that part of code has too many branches to be readable.
> Also the "presumably most efficient" part of the comment hints at perf
> optimization sans any profiling attempts...)

Yeah it's not the greatest code.  I cant remember the exact history
there, but IIRC we used to handle the rep looping entirely in
x86_emulate() and there were no rep hooks.

There are two cases where the perf hit did get noticed.  REP OUTSB to
the qemu/bochs/Xen debug port, and Windows which does a REP MOVSB across
the various bits of PCI MMIO.

>> This is important, because singlestepping becoming pending is normally
>> evaluated at the end of the instruction.  i.e. in this example it
>> wouldn't show up in pending_dbg (yet).
>>
>> What definitely is broken here is the recognition of a data breakpoint
>> on the memory operand of the INS/OUTS instruction, but it's broken
>> everywhere else for PV guest emulation too, so needs to go on the TODO list.
> (Another thing definitely broken here is the recognition of I/O breakpoints
>  post the first iteration.  Maybe it would be beneficial to do differential
>  testing between the {read,write}_io slowpath and rep_{in,out}s fastpath.)

PV, or HVM guest?

IO breakpoint recognition is missing generally in HVM guests.  I opened
https://gitlab.com/xen-project/xen/-/work_items/171 yesterday for it,
because it's not 30s work to fix.

But, IO breakpoints are invariant of REP.  The IO port doesn't change,
so the breakpoint(s) either match on every iteration, or not at all. 
(Of course, this doesn't mean that Xen is handling the recognition
correctly.)

In this patch, there is IO breakpoint recognition on all 4 of the PV IO
port hooks, so I think it ought to work properly, whether it's the first
iteration or not.

Obviously if not, I've got another bug to figure out...


It's worth saying that patch 2 does fix a bug (an X86EMUL_OKAY/DONE
confusion), but I'm pretty sure it was only a singlestep on admin-ok PV
IO ports which would be skipped, not the breakpoints.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 0d9f84f458ba..2f73ed0682e1 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -34,7 +34,6 @@  struct priv_op_ctxt {
         unsigned long base, limit;
     } cs;
     char *io_emul_stub;
-    unsigned int bpmatch;
 };
 
 /* I/O emulation helpers.  Use non-standard calling conventions. */
@@ -367,7 +366,8 @@  static int cf_check read_io(
     if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
         return X86EMUL_UNHANDLEABLE;
 
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+    poc->ctxt.retire.pending_dbg |=
+        check_guest_io_breakpoint(curr, port, bytes);
 
     if ( admin_io_okay(port, bytes, currd) )
     {
@@ -472,7 +472,8 @@  static int cf_check write_io(
     if ( !guest_io_okay(port, bytes, curr, ctxt->regs) )
         return X86EMUL_UNHANDLEABLE;
 
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes);
+    poc->ctxt.retire.pending_dbg |=
+        check_guest_io_breakpoint(curr, port, bytes);
 
     if ( admin_io_okay(port, bytes, currd) )
     {
@@ -636,7 +637,8 @@  static int cf_check rep_ins(
         return X86EMUL_EXCEPTION;
     }
 
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+    poc->ctxt.retire.pending_dbg |=
+        check_guest_io_breakpoint(curr, port, bytes_per_rep);
 
     while ( *reps < goal )
     {
@@ -658,7 +660,7 @@  static int cf_check rep_ins(
 
         ++*reps;
 
-        if ( poc->bpmatch || hypercall_preempt_check() )
+        if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
             break;
 
         /* x86_emulate() clips the repetition count to ensure we don't wrap. */
@@ -703,7 +705,8 @@  static int cf_check rep_outs(
         return X86EMUL_EXCEPTION;
     }
 
-    poc->bpmatch = check_guest_io_breakpoint(curr, port, bytes_per_rep);
+    poc->ctxt.retire.pending_dbg |=
+        check_guest_io_breakpoint(curr, port, bytes_per_rep);
 
     while ( *reps < goal )
     {
@@ -726,7 +729,7 @@  static int cf_check rep_outs(
 
         ++*reps;
 
-        if ( poc->bpmatch || hypercall_preempt_check() )
+        if ( poc->ctxt.retire.pending_dbg || hypercall_preempt_check() )
             break;
 
         /* x86_emulate() clips the repetition count to ensure we don't wrap. */
@@ -1360,12 +1363,9 @@  int pv_emulate_privileged_op(struct cpu_user_regs *regs)
     case X86EMUL_OKAY:
         ASSERT(!curr->arch.pv.trap_bounce.flags);
 
-        if ( ctxt.ctxt.retire.singlestep )
-            ctxt.bpmatch |= DR_STEP;
-
-        if ( ctxt.bpmatch )
+        if ( ctxt.ctxt.retire.pending_dbg )
         {
-            curr->arch.dr6 |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
+            curr->arch.dr6 |= ctxt.ctxt.retire.pending_dbg | DR_STATUS_RESERVED_ONE;
             pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
         }