Message ID | 185bbccb-0156-07fe-d060-6135cae07caf@bitdefender.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 03/29/2017 06:04 PM, Razvan Cojocaru wrote: > On 03/29/2017 05:00 PM, Razvan Cojocaru wrote: >> On 03/29/2017 04:55 PM, Jan Beulich wrote: >>>>>> On 28.03.17 at 12:50, <rcojocaru@bitdefender.com> wrote: >>>> On 03/28/2017 01:47 PM, Jan Beulich wrote: >>>>>>>> On 28.03.17 at 12:27, <rcojocaru@bitdefender.com> wrote: >>>>>> On 03/28/2017 01:03 PM, Jan Beulich wrote: >>>>>>>>>> On 28.03.17 at 11:14, <rcojocaru@bitdefender.com> wrote: >>>>>>>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a >>>>>>>> failed CMPXCHG should happen just once, with the proper registers and ZF >>>>>>>> set. The guest surely expects neither that the instruction resume until >>>>>>>> it succeeds, nor that some hidden loop goes on for an undeterminate >>>>>>>> ammount of time until a CMPXCHG succeeds. >>>>>>> >>>>>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to >>>>>>> the instruction being restarted instead of completed. >>>>>> >>>>>> Indeed, but it works differently with hvm_emulate_one_vm_event() where >>>>>> RETRY currently would have the instruction be re-executed (properly >>>>>> re-executed, not just re-emulated) by the guest. >>>>> >>>>> Right - see my other reply to Andrew: The function likely would >>>>> need to tell apart guest CMPXCHG uses from us using the insn to >>>>> carry out the write by some other one. That may involve >>>>> adjustments to the memory write logic in x86_emulate() itself, as >>>>> the late failure of the comparison then would also need to be >>>>> communicated back (via ZF clear) to the guest. >>>> >>>> Exactly, it would require quite some reworking of x86_emulate(). >>> >>> I had imagined it to be less intrusive (outside of x86_emulate()), >>> but I've now learned why Andrew was able to get rid of >>> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior >>> was never implemented. Attached a first take at it, which has >>> seen smoke testing, but nothing more. The way it ends up being >>> I don't think this can reasonably be considered for 4.9 at this >>> point in time. (Also Cc-ing Tim for the shadow code changes, >>> even if this isn't really a proper patch submission.) >> >> Thanks! I'll give a spin with a modified version of my CMPXCHG patch as >> soon as possible. > > With the attached patch with hvmemul_cmpxchg() now returning > X86EMUL_CMPXCHG_FAILED if __cmpxchg() fails my (32-bit) Windows 7 guest > gets stuck at the "Starting Windows" screen. And again this change: 1162 if ( __cmpxchg(map, old, new, bytes) != old ) 1163 { 1164 memcpy(p_old, map, bytes); 1165 rc = X86EMUL_CMPXCHG_FAILED; 1166 } i.e. doing the accumulator <- destination part of a failed CMPXCHG which might be missing from your patch leads me again to BSODs. I'm not sure if __cmpxchg() should work differently and do this atomically, or if this should be done in x86_emulate() and it's not, or if it is done there somewhere I've missed in the first patch. Thanks, Razvan
>>> On 29.03.17 at 17:49, <rcojocaru@bitdefender.com> wrote: > On 03/29/2017 06:04 PM, Razvan Cojocaru wrote: >> On 03/29/2017 05:00 PM, Razvan Cojocaru wrote: >>> On 03/29/2017 04:55 PM, Jan Beulich wrote: >>>>>>> On 28.03.17 at 12:50, <rcojocaru@bitdefender.com> wrote: >>>>> On 03/28/2017 01:47 PM, Jan Beulich wrote: >>>>>>>>> On 28.03.17 at 12:27, <rcojocaru@bitdefender.com> wrote: >>>>>>> On 03/28/2017 01:03 PM, Jan Beulich wrote: >>>>>>>>>>> On 28.03.17 at 11:14, <rcojocaru@bitdefender.com> wrote: >>>>>>>>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a >>>>>>>>> failed CMPXCHG should happen just once, with the proper registers and ZF >>>>>>>>> set. The guest surely expects neither that the instruction resume until >>>>>>>>> it succeeds, nor that some hidden loop goes on for an undeterminate >>>>>>>>> ammount of time until a CMPXCHG succeeds. >>>>>>>> >>>>>>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to >>>>>>>> the instruction being restarted instead of completed. >>>>>>> >>>>>>> Indeed, but it works differently with hvm_emulate_one_vm_event() where >>>>>>> RETRY currently would have the instruction be re-executed (properly >>>>>>> re-executed, not just re-emulated) by the guest. >>>>>> >>>>>> Right - see my other reply to Andrew: The function likely would >>>>>> need to tell apart guest CMPXCHG uses from us using the insn to >>>>>> carry out the write by some other one. That may involve >>>>>> adjustments to the memory write logic in x86_emulate() itself, as >>>>>> the late failure of the comparison then would also need to be >>>>>> communicated back (via ZF clear) to the guest. >>>>> >>>>> Exactly, it would require quite some reworking of x86_emulate(). >>>> >>>> I had imagined it to be less intrusive (outside of x86_emulate()), >>>> but I've now learned why Andrew was able to get rid of >>>> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior >>>> was never implemented. Attached a first take at it, which has >>>> seen smoke testing, but nothing more. The way it ends up being >>>> I don't think this can reasonably be considered for 4.9 at this >>>> point in time. (Also Cc-ing Tim for the shadow code changes, >>>> even if this isn't really a proper patch submission.) >>> >>> Thanks! I'll give a spin with a modified version of my CMPXCHG patch as >>> soon as possible. >> >> With the attached patch with hvmemul_cmpxchg() now returning >> X86EMUL_CMPXCHG_FAILED if __cmpxchg() fails my (32-bit) Windows 7 guest >> gets stuck at the "Starting Windows" screen. That's with or without monitoring in use? I specifically did try a 32-bit Win7 guest, and I didn't have an issue. But then again a single run may not mean much. > And again this change: > > 1162 if ( __cmpxchg(map, old, new, bytes) != old ) > 1163 { > 1164 memcpy(p_old, map, bytes); > 1165 rc = X86EMUL_CMPXCHG_FAILED; > 1166 } > > i.e. doing the accumulator <- destination part of a failed CMPXCHG which > might be missing from your patch leads me again to BSODs. Missing from my patch? Why and/or where? It's not clear to me which function the above code fragment is supposed to go into. I might guess hvmemul_cmpxchg(), but then my patch doesn't alter its behavior (from forwarding to hvmeml_write()), and hence I don't see why my patch would need to do such an adjustment. What I do note though is that you don't copy back the value __cmpxchg() returns, yet that's what is needed. *map may have changed again already. Jan
On 03/30/2017 03:05 PM, Jan Beulich wrote: >>>> On 29.03.17 at 17:49, <rcojocaru@bitdefender.com> wrote: >> On 03/29/2017 06:04 PM, Razvan Cojocaru wrote: >>> On 03/29/2017 05:00 PM, Razvan Cojocaru wrote: >>>> On 03/29/2017 04:55 PM, Jan Beulich wrote: >>>>>>>> On 28.03.17 at 12:50, <rcojocaru@bitdefender.com> wrote: >>>>>> On 03/28/2017 01:47 PM, Jan Beulich wrote: >>>>>>>>>> On 28.03.17 at 12:27, <rcojocaru@bitdefender.com> wrote: >>>>>>>> On 03/28/2017 01:03 PM, Jan Beulich wrote: >>>>>>>>>>>> On 28.03.17 at 11:14, <rcojocaru@bitdefender.com> wrote: >>>>>>>>>> I'm not sure that the RETRY model is what the guest OS expects. AFAIK, a >>>>>>>>>> failed CMPXCHG should happen just once, with the proper registers and ZF >>>>>>>>>> set. The guest surely expects neither that the instruction resume until >>>>>>>>>> it succeeds, nor that some hidden loop goes on for an undeterminate >>>>>>>>>> ammount of time until a CMPXCHG succeeds. >>>>>>>>> >>>>>>>>> The guest doesn't observe the CMPXCHG failing - RETRY leads to >>>>>>>>> the instruction being restarted instead of completed. >>>>>>>> >>>>>>>> Indeed, but it works differently with hvm_emulate_one_vm_event() where >>>>>>>> RETRY currently would have the instruction be re-executed (properly >>>>>>>> re-executed, not just re-emulated) by the guest. >>>>>>> >>>>>>> Right - see my other reply to Andrew: The function likely would >>>>>>> need to tell apart guest CMPXCHG uses from us using the insn to >>>>>>> carry out the write by some other one. That may involve >>>>>>> adjustments to the memory write logic in x86_emulate() itself, as >>>>>>> the late failure of the comparison then would also need to be >>>>>>> communicated back (via ZF clear) to the guest. >>>>>> >>>>>> Exactly, it would require quite some reworking of x86_emulate(). >>>>> >>>>> I had imagined it to be less intrusive (outside of x86_emulate()), >>>>> but I've now learned why Andrew was able to get rid of >>>>> X86EMUL_CMPXCHG_FAILED - the apparently intended behavior >>>>> was never implemented. Attached a first take at it, which has >>>>> seen smoke testing, but nothing more. The way it ends up being >>>>> I don't think this can reasonably be considered for 4.9 at this >>>>> point in time. (Also Cc-ing Tim for the shadow code changes, >>>>> even if this isn't really a proper patch submission.) >>>> >>>> Thanks! I'll give a spin with a modified version of my CMPXCHG patch as >>>> soon as possible. >>> >>> With the attached patch with hvmemul_cmpxchg() now returning >>> X86EMUL_CMPXCHG_FAILED if __cmpxchg() fails my (32-bit) Windows 7 guest >>> gets stuck at the "Starting Windows" screen. > > That's with or without monitoring in use? I specifically did try a > 32-bit Win7 guest, and I didn't have an issue. But then again a > single run may not mean much. With monitoring in use - specifically using hvm_emulate_one_vm_event(). Sorry for the ommision. >> And again this change: >> >> 1162 if ( __cmpxchg(map, old, new, bytes) != old ) >> 1163 { >> 1164 memcpy(p_old, map, bytes); >> 1165 rc = X86EMUL_CMPXCHG_FAILED; >> 1166 } >> >> i.e. doing the accumulator <- destination part of a failed CMPXCHG which >> might be missing from your patch leads me again to BSODs. > > Missing from my patch? Why and/or where? It's not clear to me which > function the above code fragment is supposed to go into. I might > guess hvmemul_cmpxchg(), but then my patch doesn't alter its > behavior (from forwarding to hvmeml_write()), and hence I don't > see why my patch would need to do such an adjustment. Right, I was thinking about this bit: 6704 if ( _regs.eflags & X86_EFLAGS_ZF ) 6705 dst.type = OP_NONE; 6706 else 6707 { 6708 /* Failure: write the value we saw to EAX. */ 6709 dst.type = OP_REG; 6710 dst.reg = (unsigned long *)&_regs.r(ax); 6711 } For some reason I had missed it, but I now see it does the writeback. My mistake. > What I do note though is that you don't copy back the value > __cmpxchg() returns, yet that's what is needed. *map may > have changed again already. True, I'll update my tests. Thanks, Razvan
On 03/30/2017 03:05 PM, Jan Beulich wrote: > What I do note though is that you don't copy back the value > __cmpxchg() returns, yet that's what is needed. *map may > have changed again already. Changing the code to: 1162 ret = __cmpxchg(map, old, new, bytes); 1163 1164 if ( ret != old ) 1165 { 1166 memcpy(p_old, &ret, bytes); 1167 rc = X86EMUL_CMPXCHG_FAILED; 1168 } where ret is an unsigned long still triggers BSODs when I add my patch to yours. I'll need to dig deeper. Thanks, Razvan
On 03/30/2017 03:56 PM, Razvan Cojocaru wrote: > On 03/30/2017 03:05 PM, Jan Beulich wrote: >> What I do note though is that you don't copy back the value >> __cmpxchg() returns, yet that's what is needed. *map may >> have changed again already. > > Changing the code to: > > 1162 ret = __cmpxchg(map, old, new, bytes); > 1163 > 1164 if ( ret != old ) > 1165 { > 1166 memcpy(p_old, &ret, bytes); > 1167 rc = X86EMUL_CMPXCHG_FAILED; > 1168 } > > where ret is an unsigned long still triggers BSODs when I add my patch > to yours. I'll need to dig deeper. Nevermind, I've found the culprit: hvm_emulate_one_vm_event()'s code needs to be wrapped in a loop that checks for X86EMUL_RETRY again, since hvmemul_cmpxchg() may return RETRY even for some mapping problems, in which case we again end up with the guest trying to re-execute an emulable CMPXCHG. However, this gets me back to my original problem when I "solved" it in the same manner (looping until emulation succeeds) back when hvmemul_cmpxchg() failures were reported with RETRY: eventually the guest BSODs with code 101. RETRY failures are still possible coming from the hvmemul_vaddr_to_mfn() code in my patch. I wonder if I should just return X86EMUL_CMPXCHG_FAILED for all those as well and just never end up returning RETRY from hvmemul_cmpxchg(). Thanks, Razvan
>>> On 30.03.17 at 16:08, <rcojocaru@bitdefender.com> wrote: > On 03/30/2017 03:56 PM, Razvan Cojocaru wrote: >> On 03/30/2017 03:05 PM, Jan Beulich wrote: >>> What I do note though is that you don't copy back the value >>> __cmpxchg() returns, yet that's what is needed. *map may >>> have changed again already. >> >> Changing the code to: >> >> 1162 ret = __cmpxchg(map, old, new, bytes); >> 1163 >> 1164 if ( ret != old ) >> 1165 { >> 1166 memcpy(p_old, &ret, bytes); >> 1167 rc = X86EMUL_CMPXCHG_FAILED; >> 1168 } >> >> where ret is an unsigned long still triggers BSODs when I add my patch >> to yours. I'll need to dig deeper. > > Nevermind, I've found the culprit: hvm_emulate_one_vm_event()'s code > needs to be wrapped in a loop that checks for X86EMUL_RETRY again, since > hvmemul_cmpxchg() may return RETRY even for some mapping problems, in > which case we again end up with the guest trying to re-execute an > emulable CMPXCHG. This seems wrong to me - note how my patch changes behavior regarding the return value from paging_cmpxchg_guest_entry() in ptwr_emulated_update(). > However, this gets me back to my original problem when I "solved" it in > the same manner (looping until emulation succeeds) back when > hvmemul_cmpxchg() failures were reported with RETRY: eventually the > guest BSODs with code 101. RETRY failures are still possible coming from > the hvmemul_vaddr_to_mfn() code in my patch. > > I wonder if I should just return X86EMUL_CMPXCHG_FAILED for all those as > well and just never end up returning RETRY from hvmemul_cmpxchg(). That would seem similarly wrong to me - it ought to be UNHANDLEABLE, I would think. In any event, never returning RETRY would also be wrong in case you hit emulated MMIO, but that's really the only case where RETRY should be passed back. Jan
On 03/30/2017 05:21 PM, Jan Beulich wrote: >>>> On 30.03.17 at 16:08, <rcojocaru@bitdefender.com> wrote: >> On 03/30/2017 03:56 PM, Razvan Cojocaru wrote: >>> On 03/30/2017 03:05 PM, Jan Beulich wrote: >>>> What I do note though is that you don't copy back the value >>>> __cmpxchg() returns, yet that's what is needed. *map may >>>> have changed again already. >>> >>> Changing the code to: >>> >>> 1162 ret = __cmpxchg(map, old, new, bytes); >>> 1163 >>> 1164 if ( ret != old ) >>> 1165 { >>> 1166 memcpy(p_old, &ret, bytes); >>> 1167 rc = X86EMUL_CMPXCHG_FAILED; >>> 1168 } >>> >>> where ret is an unsigned long still triggers BSODs when I add my patch >>> to yours. I'll need to dig deeper. >> >> Nevermind, I've found the culprit: hvm_emulate_one_vm_event()'s code >> needs to be wrapped in a loop that checks for X86EMUL_RETRY again, since >> hvmemul_cmpxchg() may return RETRY even for some mapping problems, in >> which case we again end up with the guest trying to re-execute an >> emulable CMPXCHG. > > This seems wrong to me - note how my patch changes behavior > regarding the return value from paging_cmpxchg_guest_entry() > in ptwr_emulated_update(). > >> However, this gets me back to my original problem when I "solved" it in >> the same manner (looping until emulation succeeds) back when >> hvmemul_cmpxchg() failures were reported with RETRY: eventually the >> guest BSODs with code 101. RETRY failures are still possible coming from >> the hvmemul_vaddr_to_mfn() code in my patch. >> >> I wonder if I should just return X86EMUL_CMPXCHG_FAILED for all those as >> well and just never end up returning RETRY from hvmemul_cmpxchg(). > > That would seem similarly wrong to me - it ought to be > UNHANDLEABLE, I would think. In any event, never returning > RETRY would also be wrong in case you hit emulated MMIO, but > that's really the only case where RETRY should be passed back. Sorry, I don't follow: hvm_emulate_one_vm_event() calls hvm_emulate_one() and then does this with the return value: switch ( rc ) { case X86EMUL_RETRY: /* * This function is called when handling an EPT-related vm_event * reply. As such, nothing else needs to be done here, since simply * returning makes the current instruction cause a page fault again, * consistent with X86EMUL_RETRY. */ return; case X86EMUL_UNHANDLEABLE: hvm_dump_emulation_state(XENLOG_G_DEBUG "Mem event", &ctx); hvm_inject_hw_exception(trapnr, errcode); break; case X86EMUL_EXCEPTION: if ( ctx.ctxt.event_pending ) hvm_inject_event(&ctx.ctxt.event); break; } hvm_emulate_writeback(&ctx); If I'd return UNHANDLEABLE from hvmemul_cmpxchg() where I'm now returning RETRY from hvmemul_vaddr_to_mfn(), that would inject TRAP_invalid_op in the guest, so it seems rather harsh. And returning RETRY here does nothing, so the guest will resume at the same IP (re-trying CMPXCHG, causing a page fault (or crashing the guest), and re-trying the emulation (if it hasn't)). Please see hvm_do_resume() in arch/x86/hvm/hvm.c. Speaking of emulated MMIO, I've got this when the guest was crashing immediately (pre RETRY loop): MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 Again, this is all with guest monitoring enabled. Thanks, Razvan
>>> On 30.03.17 at 17:05, <rcojocaru@bitdefender.com> wrote: > On 03/30/2017 05:21 PM, Jan Beulich wrote: >>>>> On 30.03.17 at 16:08, <rcojocaru@bitdefender.com> wrote: >>> On 03/30/2017 03:56 PM, Razvan Cojocaru wrote: >>>> On 03/30/2017 03:05 PM, Jan Beulich wrote: >>>>> What I do note though is that you don't copy back the value >>>>> __cmpxchg() returns, yet that's what is needed. *map may >>>>> have changed again already. >>>> >>>> Changing the code to: >>>> >>>> 1162 ret = __cmpxchg(map, old, new, bytes); >>>> 1163 >>>> 1164 if ( ret != old ) >>>> 1165 { >>>> 1166 memcpy(p_old, &ret, bytes); >>>> 1167 rc = X86EMUL_CMPXCHG_FAILED; >>>> 1168 } >>>> >>>> where ret is an unsigned long still triggers BSODs when I add my patch >>>> to yours. I'll need to dig deeper. >>> >>> Nevermind, I've found the culprit: hvm_emulate_one_vm_event()'s code >>> needs to be wrapped in a loop that checks for X86EMUL_RETRY again, since >>> hvmemul_cmpxchg() may return RETRY even for some mapping problems, in >>> which case we again end up with the guest trying to re-execute an >>> emulable CMPXCHG. >> >> This seems wrong to me - note how my patch changes behavior >> regarding the return value from paging_cmpxchg_guest_entry() >> in ptwr_emulated_update(). >> >>> However, this gets me back to my original problem when I "solved" it in >>> the same manner (looping until emulation succeeds) back when >>> hvmemul_cmpxchg() failures were reported with RETRY: eventually the >>> guest BSODs with code 101. RETRY failures are still possible coming from >>> the hvmemul_vaddr_to_mfn() code in my patch. >>> >>> I wonder if I should just return X86EMUL_CMPXCHG_FAILED for all those as >>> well and just never end up returning RETRY from hvmemul_cmpxchg(). >> >> That would seem similarly wrong to me - it ought to be >> UNHANDLEABLE, I would think. In any event, never returning >> RETRY would also be wrong in case you hit emulated MMIO, but >> that's really the only case where RETRY should be passed back. > > Sorry, I don't follow: hvm_emulate_one_vm_event() calls > hvm_emulate_one() and then does this with the return value: > > switch ( rc ) > { > case X86EMUL_RETRY: > /* > * This function is called when handling an EPT-related vm_event > * reply. As such, nothing else needs to be done here, since simply > * returning makes the current instruction cause a page fault again, > * consistent with X86EMUL_RETRY. > */ > return; > case X86EMUL_UNHANDLEABLE: > hvm_dump_emulation_state(XENLOG_G_DEBUG "Mem event", &ctx); > hvm_inject_hw_exception(trapnr, errcode); > break; > case X86EMUL_EXCEPTION: > if ( ctx.ctxt.event_pending ) > hvm_inject_event(&ctx.ctxt.event); > break; > } > > hvm_emulate_writeback(&ctx); > > If I'd return UNHANDLEABLE from hvmemul_cmpxchg() where I'm now > returning RETRY from hvmemul_vaddr_to_mfn(), that would inject > TRAP_invalid_op in the guest, so it seems rather harsh. Well, my comment was for normal execution of the guest. In oder to make it into the emulator, the virtual->physical translation in the guest must have worked, so if there is a mapping failure, this indicates something fishy going on inside the guest. > Speaking of emulated MMIO, I've got this when the guest was crashing > immediately (pre RETRY loop): > > MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 > 07 8b cb e8 da 4b ff ff 8b 45 That's a BTR, which we should be emulating fine. More information would need to be collected to have a chance to understand what might be going one (first of all the virtual and physical memory address this was trying to act on). Jan
On 03/30/2017 06:47 PM, Jan Beulich wrote: >> Speaking of emulated MMIO, I've got this when the guest was crashing >> immediately (pre RETRY loop): >> >> MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 >> 07 8b cb e8 da 4b ff ff 8b 45 > > That's a BTR, which we should be emulating fine. More information > would need to be collected to have a chance to understand what > might be going one (first of all the virtual and physical memory > address this was trying to act on). Right, the BTR part should be fine, but I think the LOCK part is what's causing the issue. I've done a few more test runs to see what return RETRY (dumping the instruction with an "(r)" prefix to distinguish from the UNHANDLEABLE dump), and a couple of instructions return RETRY (BTR and XADD, both LOCK-prefixed, which means they now involve CMPXCHG handler, which presumably now fails - possibly simply because it's always LOCKed in my patch): # grep "Mem event" /var/log/xen/console/hypervisor.log | sort | uniq (XEN) (r) Mem event emulation failed: d3v1 32bit @ 0008:8267f1aa -> f0 0f ba 28 07 72 d5 8d 45 f4 50 33 ff 56 47 53 (XEN) (r) Mem event emulation failed: d3v5 32bit @ 0008:8267f1aa -> f0 0f ba 28 07 72 d5 8d 45 f4 50 33 ff 56 47 53 (XEN) (r) Mem event emulation failed: d3v5 32bit @ 0008:826ebc7c -> f0 0f c1 08 85 c9 74 1f f6 c1 02 75 1a 41 8d 41 (XEN) (r) Mem event emulation failed: d3v6 32bit @ 0008:8267f1aa -> f0 0f ba 28 07 72 d5 8d 45 f4 50 33 ff 56 47 53 (XEN) (r) Mem event emulation failed: d3v6 32bit @ 0008:826eb861 -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 (XEN) (r) Mem event emulation failed: d3v6 32bit @ 0008:826ebc7c -> f0 0f c1 08 85 c9 74 1f f6 c1 02 75 1a 41 8d 41 (XEN) (r) Mem event emulation failed: d3v6 32bit @ 0008:826ebce6 -> f0 0f c1 01 8b 7d fc c1 ef 09 81 e7 f8 ff 7f 00 (XEN) (r) Mem event emulation failed: d3v7 32bit @ 0008:8267f1aa -> f0 0f ba 28 07 72 d5 8d 45 f4 50 33 ff 56 47 53 (XEN) (r) Mem event emulation failed: d3v7 32bit @ 0008:826eb861 -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 (XEN) (r) Mem event emulation failed: d3v7 32bit @ 0008:826ebc7c -> f0 0f c1 08 85 c9 74 1f f6 c1 02 75 1a 41 8d 41 (XEN) (r) Mem event emulation failed: d3v7 32bit @ 0008:826ebce6 -> f0 0f c1 01 8b 7d fc c1 ef 09 81 e7 f8 ff 7f 00 (XEN) (r) Mem event emulation failed: d3v7 32bit @ 0008:826ec59a -> f0 0f ba 31 00 72 09 e8 a3 3e ff ff 8b 44 24 18 (XEN) (r) Mem event emulation failed: d3v7 32bit @ 0008:826f6276 -> f0 0f ba 28 07 72 cc 39 53 04 75 5e 8d 43 08 8b (XEN) (r) Mem event emulation failed: d3v8 32bit @ 0008:8267f1aa -> f0 0f ba 28 07 72 d5 8d 45 f4 50 33 ff 56 47 53 (XEN) (r) Mem event emulation failed: d3v8 32bit @ 0008:826eb861 -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 (XEN) (r) Mem event emulation failed: d3v9 32bit @ 0008:8267f1aa -> f0 0f ba 28 07 72 d5 8d 45 f4 50 33 ff 56 47 53 (XEN) (r) Mem event emulation failed: d3v9 32bit @ 0008:826eb861 -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 (XEN) (r) Mem event emulation failed: d3v9 32bit @ 0008:826ebce6 -> f0 0f c1 01 8b 7d fc c1 ef 09 81 e7 f8 ff 7f 00 (XEN) (r) Mem event emulation failed: d3v9 32bit @ 0008:826ec583 -> f0 0f c1 01 64 a1 24 01 00 00 66 ff 88 86 00 00 Thanks, Razvan
>>> On 31.03.17 at 08:17, <rcojocaru@bitdefender.com> wrote: > On 03/30/2017 06:47 PM, Jan Beulich wrote: >>> Speaking of emulated MMIO, I've got this when the guest was crashing >>> immediately (pre RETRY loop): >>> >>> MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 >>> 07 8b cb e8 da 4b ff ff 8b 45 >> >> That's a BTR, which we should be emulating fine. More information >> would need to be collected to have a chance to understand what >> might be going one (first of all the virtual and physical memory >> address this was trying to act on). > > Right, the BTR part should be fine, but I think the LOCK part is what's > causing the issue. I've done a few more test runs to see what return > RETRY (dumping the instruction with an "(r)" prefix to distinguish from > the UNHANDLEABLE dump), and a couple of instructions return RETRY (BTR > and XADD, both LOCK-prefixed, which means they now involve CMPXCHG > handler, which presumably now fails - possibly simply because it's > always LOCKed in my patch): Well, all of that looks to be expected behavior. I'm afraid I don't see how this information helps understanding the MMIO emulation failure above. Jan
On 03/31/2017 10:34 AM, Jan Beulich wrote: >>>> On 31.03.17 at 08:17, <rcojocaru@bitdefender.com> wrote: >> On 03/30/2017 06:47 PM, Jan Beulich wrote: >>>> Speaking of emulated MMIO, I've got this when the guest was crashing >>>> immediately (pre RETRY loop): >>>> >>>> MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 >>>> 07 8b cb e8 da 4b ff ff 8b 45 >>> >>> That's a BTR, which we should be emulating fine. More information >>> would need to be collected to have a chance to understand what >>> might be going one (first of all the virtual and physical memory >>> address this was trying to act on). >> >> Right, the BTR part should be fine, but I think the LOCK part is what's >> causing the issue. I've done a few more test runs to see what return >> RETRY (dumping the instruction with an "(r)" prefix to distinguish from >> the UNHANDLEABLE dump), and a couple of instructions return RETRY (BTR >> and XADD, both LOCK-prefixed, which means they now involve CMPXCHG >> handler, which presumably now fails - possibly simply because it's >> always LOCKed in my patch): > > Well, all of that looks to be expected behavior. I'm afraid I don't see > how this information helps understanding the MMIO emulation failure > above. I've managed to obtain this log of emulation errors: https://pastebin.com/Esy1SkHx The "virtual address" lines that are not followed by any "Mem event" line correspond to CMXCHG_FAILED return codes. The very last line is a MMIO emulation failed. It's probably important that this happens with the model where hvm_emulate_one_vm_event() does _not_ re-try the emulation until it succeeds. The other model allows me to go further with the guest, but eventually I get timeout-related BSODs or the guest becomes unresponsive. Thanks, Razvan
>>> On 31.03.17 at 11:56, <rcojocaru@bitdefender.com> wrote: > On 03/31/2017 10:34 AM, Jan Beulich wrote: >>>>> On 31.03.17 at 08:17, <rcojocaru@bitdefender.com> wrote: >>> On 03/30/2017 06:47 PM, Jan Beulich wrote: >>>>> Speaking of emulated MMIO, I've got this when the guest was crashing >>>>> immediately (pre RETRY loop): >>>>> >>>>> MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 >>>>> 07 8b cb e8 da 4b ff ff 8b 45 >>>> >>>> That's a BTR, which we should be emulating fine. More information >>>> would need to be collected to have a chance to understand what >>>> might be going one (first of all the virtual and physical memory >>>> address this was trying to act on). >>> >>> Right, the BTR part should be fine, but I think the LOCK part is what's >>> causing the issue. I've done a few more test runs to see what return >>> RETRY (dumping the instruction with an "(r)" prefix to distinguish from >>> the UNHANDLEABLE dump), and a couple of instructions return RETRY (BTR >>> and XADD, both LOCK-prefixed, which means they now involve CMPXCHG >>> handler, which presumably now fails - possibly simply because it's >>> always LOCKed in my patch): >> >> Well, all of that looks to be expected behavior. I'm afraid I don't see >> how this information helps understanding the MMIO emulation failure >> above. > > I've managed to obtain this log of emulation errors: > https://pastebin.com/Esy1SkHx > > The "virtual address" lines that are not followed by any "Mem event" > line correspond to CMXCHG_FAILED return codes. > > The very last line is a MMIO emulation failed. > > It's probably important that this happens with the model where > hvm_emulate_one_vm_event() does _not_ re-try the emulation until it > succeeds. The other model allows me to go further with the guest, but > eventually I get timeout-related BSODs or the guest becomes unresponsive. Interesting. You didn't clarify what the printed "offset" values are, and it doesn't look like these have any correlation with the underlying (guest) physical address, which we would also want to see. And then it strikes me as odd that in these last lines (XEN) Mem event (RETRY) emulation failed: d5v8 32bit @ 0008:826bb861 -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 (XEN) virtual address: 0xffd080f0, offset: 4291854576 (XEN) MMIO emulation failed: d5v8 32bit @ 0008:82655f3c -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 the instruction pointers and virtual addresses are different, but the code bytes are exactly the same. This doesn't seem very likely, so I wonder whether there's an issue with us wrongly re-using previously fetched insn bytes. (Of course I'd be happy to be proven wrong with this guessing, by you checking the involved binary/ies.) Jan
On 03/31/2017 05:46 PM, Jan Beulich wrote: >>>> On 31.03.17 at 11:56, <rcojocaru@bitdefender.com> wrote: >> On 03/31/2017 10:34 AM, Jan Beulich wrote: >>>>>> On 31.03.17 at 08:17, <rcojocaru@bitdefender.com> wrote: >>>> On 03/30/2017 06:47 PM, Jan Beulich wrote: >>>>>> Speaking of emulated MMIO, I've got this when the guest was crashing >>>>>> immediately (pre RETRY loop): >>>>>> >>>>>> MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 >>>>>> 07 8b cb e8 da 4b ff ff 8b 45 >>>>> >>>>> That's a BTR, which we should be emulating fine. More information >>>>> would need to be collected to have a chance to understand what >>>>> might be going one (first of all the virtual and physical memory >>>>> address this was trying to act on). >>>> >>>> Right, the BTR part should be fine, but I think the LOCK part is what's >>>> causing the issue. I've done a few more test runs to see what return >>>> RETRY (dumping the instruction with an "(r)" prefix to distinguish from >>>> the UNHANDLEABLE dump), and a couple of instructions return RETRY (BTR >>>> and XADD, both LOCK-prefixed, which means they now involve CMPXCHG >>>> handler, which presumably now fails - possibly simply because it's >>>> always LOCKed in my patch): >>> >>> Well, all of that looks to be expected behavior. I'm afraid I don't see >>> how this information helps understanding the MMIO emulation failure >>> above. >> >> I've managed to obtain this log of emulation errors: >> https://pastebin.com/Esy1SkHx >> >> The "virtual address" lines that are not followed by any "Mem event" >> line correspond to CMXCHG_FAILED return codes. >> >> The very last line is a MMIO emulation failed. >> >> It's probably important that this happens with the model where >> hvm_emulate_one_vm_event() does _not_ re-try the emulation until it >> succeeds. The other model allows me to go further with the guest, but >> eventually I get timeout-related BSODs or the guest becomes unresponsive. > > Interesting. You didn't clarify what the printed "offset" values are, > and it doesn't look like these have any correlation with the underlying > (guest) physical address, which we would also want to see. And then > it strikes me as odd that in these last lines > > (XEN) Mem event (RETRY) emulation failed: d5v8 32bit @ 0008:826bb861 -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 > (XEN) virtual address: 0xffd080f0, offset: 4291854576 > (XEN) MMIO emulation failed: d5v8 32bit @ 0008:82655f3c -> f0 0f ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 > > the instruction pointers and virtual addresses are different, but the > code bytes are exactly the same. This doesn't seem very likely, so I > wonder whether there's an issue with us wrongly re-using previously > fetched insn bytes. (Of course I'd be happy to be proven wrong with > this guessing, by you checking the involved binary/ies.) Offset is the actual value of the "offset" parameter of hvmemul_cmpxchg(). I'll output more info and recheck. Thanks, Razvan
>>> On 31.03.17 at 17:01, <rcojocaru@bitdefender.com> wrote: > On 03/31/2017 05:46 PM, Jan Beulich wrote: >>>>> On 31.03.17 at 11:56, <rcojocaru@bitdefender.com> wrote: >>> On 03/31/2017 10:34 AM, Jan Beulich wrote: >>>>>>> On 31.03.17 at 08:17, <rcojocaru@bitdefender.com> wrote: >>>>> On 03/30/2017 06:47 PM, Jan Beulich wrote: >>>>>>> Speaking of emulated MMIO, I've got this when the guest was crashing >>>>>>> immediately (pre RETRY loop): >>>>>>> >>>>>>> MMIO emulation failed: d3v8 32bit @ 0008:82679f3c -> f0 0f ba 30 00 72 >>>>>>> 07 8b cb e8 da 4b ff ff 8b 45 >>>>>> >>>>>> That's a BTR, which we should be emulating fine. More information >>>>>> would need to be collected to have a chance to understand what >>>>>> might be going one (first of all the virtual and physical memory >>>>>> address this was trying to act on). >>>>> >>>>> Right, the BTR part should be fine, but I think the LOCK part is what's >>>>> causing the issue. I've done a few more test runs to see what return >>>>> RETRY (dumping the instruction with an "(r)" prefix to distinguish from >>>>> the UNHANDLEABLE dump), and a couple of instructions return RETRY (BTR >>>>> and XADD, both LOCK-prefixed, which means they now involve CMPXCHG >>>>> handler, which presumably now fails - possibly simply because it's >>>>> always LOCKed in my patch): >>>> >>>> Well, all of that looks to be expected behavior. I'm afraid I don't see >>>> how this information helps understanding the MMIO emulation failure >>>> above. >>> >>> I've managed to obtain this log of emulation errors: >>> https://pastebin.com/Esy1SkHx >>> >>> The "virtual address" lines that are not followed by any "Mem event" >>> line correspond to CMXCHG_FAILED return codes. >>> >>> The very last line is a MMIO emulation failed. >>> >>> It's probably important that this happens with the model where >>> hvm_emulate_one_vm_event() does _not_ re-try the emulation until it >>> succeeds. The other model allows me to go further with the guest, but >>> eventually I get timeout-related BSODs or the guest becomes unresponsive. >> >> Interesting. You didn't clarify what the printed "offset" values are, >> and it doesn't look like these have any correlation with the underlying >> (guest) physical address, which we would also want to see. And then >> it strikes me as odd that in these last lines >> >> (XEN) Mem event (RETRY) emulation failed: d5v8 32bit @ 0008:826bb861 -> f0 0f > ba 30 00 72 07 8b cb e8 da 4b ff ff 8b 45 >> (XEN) virtual address: 0xffd080f0, offset: 4291854576 >> (XEN) MMIO emulation failed: d5v8 32bit @ 0008:82655f3c -> f0 0f ba 30 00 72 > 07 8b cb e8 da 4b ff ff 8b 45 >> >> the instruction pointers and virtual addresses are different, but the >> code bytes are exactly the same. This doesn't seem very likely, so I >> wonder whether there's an issue with us wrongly re-using previously >> fetched insn bytes. (Of course I'd be happy to be proven wrong with >> this guessing, by you checking the involved binary/ies.) > > Offset is the actual value of the "offset" parameter of > hvmemul_cmpxchg(). That's not very useful then, as for flat segments "offset" == "virtual address" (i.e. you merely re-print in decimal what you've already printed in hex). Jan
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 2d92957..b946ef7 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -20,6 +20,7 @@ #include <asm/hvm/emulate.h> #include <asm/hvm/hvm.h> #include <asm/hvm/ioreq.h> +#include <asm/hvm/nestedhvm.h> #include <asm/hvm/trace.h> #include <asm/hvm/support.h> #include <asm/hvm/svm/svm.h> @@ -1029,6 +1030,77 @@ static int hvmemul_wbinvd_discard( return X86EMUL_OKAY; } +static int hvmemul_vaddr_to_mfn( + unsigned long addr, + mfn_t *mfn, + uint32_t pfec, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t gpa = addr & ~PAGE_MASK; + struct page_info *page; + p2m_type_t p2mt; + unsigned long gfn; + struct vcpu *curr = current; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + + gfn = paging_gva_to_gfn(curr, addr, &pfec); + + if ( gfn == gfn_x(INVALID_GFN) ) + { + pagefault_info_t pfinfo = {}; + + if ( ( pfec & PFEC_page_paged ) || ( pfec & PFEC_page_shared ) ) + return X86EMUL_RETRY; + + pfinfo.linear = addr; + pfinfo.ec = pfec; + + x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt); + return X86EMUL_EXCEPTION; + } + + gpa |= (paddr_t)gfn << PAGE_SHIFT; + + /* + * No need to do the P2M lookup for internally handled MMIO, benefiting + * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses, + * - newer Windows (like Server 2012) for HPET accesses. + */ + if ( !nestedhvm_vcpu_in_guestmode(curr) && hvm_mmio_internal(gpa) ) + return X86EMUL_UNHANDLEABLE; + + page = get_page_from_gfn(curr->domain, gfn, &p2mt, P2M_UNSHARE); + + if ( !page ) + return X86EMUL_UNHANDLEABLE; + + if ( p2m_is_paging(p2mt) ) + { + put_page(page); + p2m_mem_paging_populate(curr->domain, gfn); + return X86EMUL_RETRY; + } + + if ( p2m_is_shared(p2mt) ) + { + put_page(page); + return X86EMUL_RETRY; + } + + if ( p2m_is_grant(p2mt) ) + { + put_page(page); + return X86EMUL_UNHANDLEABLE; + } + + *mfn = _mfn(page_to_mfn(page)); + + put_page(page); + + return X86EMUL_OKAY; +} + static int hvmemul_cmpxchg( enum x86_segment seg, unsigned long offset, @@ -1037,8 +1109,70 @@ static int hvmemul_cmpxchg( unsigned int bytes, struct x86_emulate_ctxt *ctxt) { - /* Fix this in case the guest is really relying on r-m-w atomicity. */ - return hvmemul_write(seg, offset, p_new, bytes, ctxt); + unsigned long addr, reps = 1; + int rc = X86EMUL_OKAY; + unsigned long old = 0, new = 0; + uint32_t pfec = PFEC_page_present | PFEC_write_access; + struct hvm_emulate_ctxt *hvmemul_ctxt = + container_of(ctxt, struct hvm_emulate_ctxt, ctxt); + mfn_t mfn[2]; + void *map = NULL; + struct domain *currd = current->domain; + + if ( is_x86_system_segment(seg) ) + pfec |= PFEC_implicit; + else if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 ) + pfec |= PFEC_user_mode; + + rc = hvmemul_virtual_to_linear( + seg, offset, bytes, &reps, hvm_access_write, hvmemul_ctxt, &addr); + + if ( rc != X86EMUL_OKAY || !bytes ) + return rc; + + rc = hvmemul_vaddr_to_mfn(addr, &mfn[0], pfec, ctxt); + + if ( rc != X86EMUL_OKAY ) + return rc; + + if ( likely(((addr + bytes - 1) & PAGE_MASK) == (addr & PAGE_MASK)) ) + { + /* Whole write fits on a single page. */ + mfn[1] = INVALID_MFN; + map = map_domain_page(mfn[0]); + } + else + { + rc = hvmemul_vaddr_to_mfn((addr + bytes - 1) & PAGE_MASK, &mfn[1], + pfec, ctxt); + if ( rc != X86EMUL_OKAY ) + return rc; + + map = vmap(mfn, 2); + } + + if ( !map ) + return X86EMUL_UNHANDLEABLE; + + map += (addr & ~PAGE_MASK); + + memcpy(&old, p_old, bytes); + memcpy(&new, p_new, bytes); + + if ( __cmpxchg(map, old, new, bytes) != old ) + rc = X86EMUL_CMPXCHG_FAILED; + + paging_mark_dirty(currd, mfn[0]); + + if ( unlikely(mfn_valid(mfn[1])) ) + { + paging_mark_dirty(currd, mfn[1]); + vunmap((void *)((unsigned long)map & PAGE_MASK)); + } + else + unmap_domain_page(map); + + return rc; } static int hvmemul_validate(