Message ID | 1472813240-11011-2-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: > @@ -178,8 +179,27 @@ static int hvmemul_do_io( > break; > case X86EMUL_UNHANDLEABLE: > { > - struct hvm_ioreq_server *s = > - hvm_select_ioreq_server(curr->domain, &p); > + struct hvm_ioreq_server *s = NULL; > + p2m_type_t p2mt = p2m_invalid; > + > + if ( is_mmio ) > + { > + unsigned long gmfn = paddr_to_pfn(addr); > + > + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); > + > + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) > + { > + unsigned int flags; > + > + s = p2m_get_ioreq_server(currd, &flags); > + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) > + s = NULL; > + } > + } > + > + if ( !s && p2mt != p2m_ioreq_server ) > + s = hvm_select_ioreq_server(currd, &p); What I recall is that we had agreed on p2m_ioreq_server pages to be treated as ordinary RAM ones as long as no server can be found. The type check here contradicts that. Is there a reason? > +static int hvmop_map_mem_type_to_ioreq_server( > + XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop) > +{ > + xen_hvm_map_mem_type_to_ioreq_server_t op; > + struct domain *d; > + int rc; > + > + if ( copy_from_guest(&op, uop, 1) ) > + return -EFAULT; > + > + rc = rcu_lock_remote_domain_by_id(op.domid, &d); > + if ( rc != 0 ) > + return rc; > + > + rc = -EINVAL; > + if ( !is_hvm_domain(d) ) > + goto out; > + > + if ( op.pad != 0 ) > + goto out; This, I think, should be done first thing after having copied in the structure. No need to lookup domain or anything if this is not zero. > +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, > + uint32_t type, uint32_t flags) > +{ > + struct hvm_ioreq_server *s; > + int rc; > + > + /* For now, only HVMMEM_ioreq_server is supported. */ > + if ( type != HVMMEM_ioreq_server ) > + return -EINVAL; > + > + /* For now, only write emulation is supported. */ > + if ( flags & ~(XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) > + return -EINVAL; > + > + domain_pause(d); > + spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock); > + > + rc = -ENOENT; > + list_for_each_entry ( s, > + &d->arch.hvm_domain.ioreq_server.list, > + list_entry ) > + { > + if ( s == d->arch.hvm_domain.default_ioreq_server ) > + continue; > + > + if ( s->id == id ) > + { > + rc = p2m_set_ioreq_server(d, flags, s); > + break; > + } > + } > + > + spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock); > + domain_unpause(d); > + return rc; > +} Blank line before final return statement of a function please. > +int p2m_set_ioreq_server(struct domain *d, > + unsigned int flags, > + struct hvm_ioreq_server *s) > +{ > + struct p2m_domain *p2m = p2m_get_hostp2m(d); > + int rc; > + > + /* > + * Use lock to prevent concurrent setting requirements > + * from multiple ioreq serers. > + */ "Concurrent setting requirements"? DYM "attempts"? Jan
On 05/09/16 14:31, Jan Beulich wrote: >>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: >> @@ -178,8 +179,27 @@ static int hvmemul_do_io( >> break; >> case X86EMUL_UNHANDLEABLE: >> { >> - struct hvm_ioreq_server *s = >> - hvm_select_ioreq_server(curr->domain, &p); >> + struct hvm_ioreq_server *s = NULL; >> + p2m_type_t p2mt = p2m_invalid; >> + >> + if ( is_mmio ) >> + { >> + unsigned long gmfn = paddr_to_pfn(addr); >> + >> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >> + >> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) >> + { >> + unsigned int flags; >> + >> + s = p2m_get_ioreq_server(currd, &flags); >> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >> + s = NULL; >> + } >> + } >> + >> + if ( !s && p2mt != p2m_ioreq_server ) >> + s = hvm_select_ioreq_server(currd, &p); > > What I recall is that we had agreed on p2m_ioreq_server pages > to be treated as ordinary RAM ones as long as no server can be > found. The type check here contradicts that. Is there a reason? I think it must be a confusion as to what "treated like ordinary RAM ones" means. p2m_ram_rw types that gets here will go through hvm_select_ioreq_server(), and (therefore) potentially be treated as MMIO accesses, which is not how "ordinary RAM" would behave. If what you meant was that you want p2m_ioreq_server to behave like p2m_ram_rw (and be treated as MMIO if it matches an iorange) then yes. If what you want is for p2m_ioreq_server to actually act like RAM, then probably something more needs to happen here. -George
On 02/09/16 11:47, Yu Zhang wrote: > A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to > let one ioreq server claim/disclaim its responsibility for the > handling of guest pages with p2m type p2m_ioreq_server. Users > of this HVMOP can specify which kind of operation is supposed > to be emulated in a parameter named flags. Currently, this HVMOP > only support the emulation of write operations. And it can be > further extended to support the emulation of read ones if an > ioreq server has such requirement in the future. > > For now, we only support one ioreq server for this p2m type, so > once an ioreq server has claimed its ownership, subsequent calls > of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also > disclaim the ownership of guest ram pages with p2m_ioreq_server, by > triggering this new HVMOP, with ioreq server id set to the current > owner's and flags parameter set to 0. > > Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server > are only supported for HVMs with HAP enabled. > > Also note that only after one ioreq server claims its ownership > of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server > be allowed. > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Acked-by: Tim Deegan <tim@xen.org> > --- > Cc: Paul Durrant <paul.durrant@citrix.com> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Jun Nakajima <jun.nakajima@intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Tim Deegan <tim@xen.org> > > changes in v6: > - Clarify logic in hvmemul_do_io(). > - Use recursive lock for ioreq server lock. > - Remove debug print when mapping ioreq server. > - Clarify code in ept_p2m_type_to_flags() for consistency. > - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS. > - Add comments for HVMMEM_ioreq_server to note only changes > to/from HVMMEM_ram_rw are permitted. > - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server() > to avoid the race condition when a vm exit happens on a write- > protected page, just to find the ioreq server has been unmapped > already. Why do we need to do this? Won't the default case just DTRT if it finds that the ioreq server has been unmapped? -George
>>> On 05.09.16 at 19:20, <george.dunlap@citrix.com> wrote: > On 05/09/16 14:31, Jan Beulich wrote: >>>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: >>> @@ -178,8 +179,27 @@ static int hvmemul_do_io( >>> break; >>> case X86EMUL_UNHANDLEABLE: >>> { >>> - struct hvm_ioreq_server *s = >>> - hvm_select_ioreq_server(curr->domain, &p); >>> + struct hvm_ioreq_server *s = NULL; >>> + p2m_type_t p2mt = p2m_invalid; >>> + >>> + if ( is_mmio ) >>> + { >>> + unsigned long gmfn = paddr_to_pfn(addr); >>> + >>> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >>> + >>> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) >>> + { >>> + unsigned int flags; >>> + >>> + s = p2m_get_ioreq_server(currd, &flags); >>> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >>> + s = NULL; >>> + } >>> + } >>> + >>> + if ( !s && p2mt != p2m_ioreq_server ) >>> + s = hvm_select_ioreq_server(currd, &p); >> >> What I recall is that we had agreed on p2m_ioreq_server pages >> to be treated as ordinary RAM ones as long as no server can be >> found. The type check here contradicts that. Is there a reason? > > I think it must be a confusion as to what "treated like ordinary RAM > ones" means. p2m_ram_rw types that gets here will go through > hvm_select_ioreq_server(), and (therefore) potentially be treated as > MMIO accesses, which is not how "ordinary RAM" would behave. If what > you meant was that you want p2m_ioreq_server to behave like p2m_ram_rw > (and be treated as MMIO if it matches an iorange) then yes. If what you > want is for p2m_ioreq_server to actually act like RAM, then probably > something more needs to happen here. Well, all I'm questioning is the special casing of p2m_ioreq_server in the if(). That's imo orthogonal to p2m_ram_rw pages not being supposed to make it here (hence the is_mmio check in the earlier if() also looks questionable). Perhaps it would already help if there was a comment explaining what the exact intended behavior here is. Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 06 September 2016 08:58 > To: George Dunlap <George.Dunlap@citrix.com>; Yu Zhang > <yu.c.zhang@linux.intel.com> > Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Paul Durrant > <Paul.Durrant@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; > JunNakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; > zhiyuan.lv@intel.com; xen-devel@lists.xen.org; Tim (Xen.org) > <tim@xen.org> > Subject: Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram > with p2m_ioreq_server to an ioreq server. > > >>> On 05.09.16 at 19:20, <george.dunlap@citrix.com> wrote: > > On 05/09/16 14:31, Jan Beulich wrote: > >>>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: > >>> @@ -178,8 +179,27 @@ static int hvmemul_do_io( > >>> break; > >>> case X86EMUL_UNHANDLEABLE: > >>> { > >>> - struct hvm_ioreq_server *s = > >>> - hvm_select_ioreq_server(curr->domain, &p); > >>> + struct hvm_ioreq_server *s = NULL; > >>> + p2m_type_t p2mt = p2m_invalid; > >>> + > >>> + if ( is_mmio ) > >>> + { > >>> + unsigned long gmfn = paddr_to_pfn(addr); > >>> + > >>> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); > >>> + > >>> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) > >>> + { > >>> + unsigned int flags; > >>> + > >>> + s = p2m_get_ioreq_server(currd, &flags); > >>> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) > >>> + s = NULL; > >>> + } > >>> + } > >>> + > >>> + if ( !s && p2mt != p2m_ioreq_server ) > >>> + s = hvm_select_ioreq_server(currd, &p); > >> > >> What I recall is that we had agreed on p2m_ioreq_server pages to be > >> treated as ordinary RAM ones as long as no server can be found. The > >> type check here contradicts that. Is there a reason? > > > > I think it must be a confusion as to what "treated like ordinary RAM > > ones" means. p2m_ram_rw types that gets here will go through > > hvm_select_ioreq_server(), and (therefore) potentially be treated as > > MMIO accesses, which is not how "ordinary RAM" would behave. If what > > you meant was that you want p2m_ioreq_server to behave like > p2m_ram_rw > > (and be treated as MMIO if it matches an iorange) then yes. If what > > you want is for p2m_ioreq_server to actually act like RAM, then > > probably something more needs to happen here. > > Well, all I'm questioning is the special casing of p2m_ioreq_server in the if(). > That's imo orthogonal to p2m_ram_rw pages not being supposed to make it > here (hence the is_mmio check in the earlier if() also looks questionable). > Perhaps it would already help if there was a comment explaining what the > exact intended behavior here is. > My understanding is that we want accesses that make it here for pages that are not of type 'ioreq_server' to result in MMIO emulation (i.e. they hit an emulator's requested ranges, or the access is completed as unclaimed MMIO by Xen). Accesses that make it here because the page *is* of type 'ioreq server' should be sent to the emulator that has claimed the type and, if no emulator does currently have a claim to the type, be handled as if the access was to r/w RAM. Paul > Jan
>>> On 06.09.16 at 10:03, <Paul.Durrant@citrix.com> wrote: >> -----Original Message----- >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: 06 September 2016 08:58 >> To: George Dunlap <George.Dunlap@citrix.com>; Yu Zhang >> <yu.c.zhang@linux.intel.com> >> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Paul Durrant >> <Paul.Durrant@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; >> JunNakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; >> zhiyuan.lv@intel.com; xen-devel@lists.xen.org; Tim (Xen.org) >> <tim@xen.org> >> Subject: Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram >> with p2m_ioreq_server to an ioreq server. >> >> >>> On 05.09.16 at 19:20, <george.dunlap@citrix.com> wrote: >> > On 05/09/16 14:31, Jan Beulich wrote: >> >>>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: >> >>> @@ -178,8 +179,27 @@ static int hvmemul_do_io( >> >>> break; >> >>> case X86EMUL_UNHANDLEABLE: >> >>> { >> >>> - struct hvm_ioreq_server *s = >> >>> - hvm_select_ioreq_server(curr->domain, &p); >> >>> + struct hvm_ioreq_server *s = NULL; >> >>> + p2m_type_t p2mt = p2m_invalid; >> >>> + >> >>> + if ( is_mmio ) >> >>> + { >> >>> + unsigned long gmfn = paddr_to_pfn(addr); >> >>> + >> >>> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >> >>> + >> >>> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) >> >>> + { >> >>> + unsigned int flags; >> >>> + >> >>> + s = p2m_get_ioreq_server(currd, &flags); >> >>> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >> >>> + s = NULL; >> >>> + } >> >>> + } >> >>> + >> >>> + if ( !s && p2mt != p2m_ioreq_server ) >> >>> + s = hvm_select_ioreq_server(currd, &p); >> >> >> >> What I recall is that we had agreed on p2m_ioreq_server pages to be >> >> treated as ordinary RAM ones as long as no server can be found. The >> >> type check here contradicts that. Is there a reason? >> > >> > I think it must be a confusion as to what "treated like ordinary RAM >> > ones" means. p2m_ram_rw types that gets here will go through >> > hvm_select_ioreq_server(), and (therefore) potentially be treated as >> > MMIO accesses, which is not how "ordinary RAM" would behave. If what >> > you meant was that you want p2m_ioreq_server to behave like >> p2m_ram_rw >> > (and be treated as MMIO if it matches an iorange) then yes. If what >> > you want is for p2m_ioreq_server to actually act like RAM, then >> > probably something more needs to happen here. >> >> Well, all I'm questioning is the special casing of p2m_ioreq_server in the if(). >> That's imo orthogonal to p2m_ram_rw pages not being supposed to make it >> here (hence the is_mmio check in the earlier if() also looks questionable). >> Perhaps it would already help if there was a comment explaining what the >> exact intended behavior here is. >> > > My understanding is that we want accesses that make it here for pages that > are not of type 'ioreq_server' to result in MMIO emulation (i.e. they hit an > emulator's requested ranges, or the access is completed as unclaimed MMIO by > Xen). Accesses that make it here because the page *is* of type 'ioreq server' > should be sent to the emulator that has claimed the type and, if no emulator > does currently have a claim to the type, be handled as if the access was to > r/w RAM. Makes sense, but it doesn't look like the code above does that, as - keeping s to be NULL means ignoring the access - finding a server by some means would imply handling the access as I/O instead of RAM. Jan
On 9/6/2016 4:13 PM, Jan Beulich wrote: >>>> On 06.09.16 at 10:03, <Paul.Durrant@citrix.com> wrote: >>> -----Original Message----- >>> From: Jan Beulich [mailto:JBeulich@suse.com] >>> Sent: 06 September 2016 08:58 >>> To: George Dunlap <George.Dunlap@citrix.com>; Yu Zhang >>> <yu.c.zhang@linux.intel.com> >>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Paul Durrant >>> <Paul.Durrant@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; >>> JunNakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; >>> zhiyuan.lv@intel.com; xen-devel@lists.xen.org; Tim (Xen.org) >>> <tim@xen.org> >>> Subject: Re: [PATCH v6 1/4] x86/ioreq server: Add HVMOP to map guest ram >>> with p2m_ioreq_server to an ioreq server. >>> >>>>>> On 05.09.16 at 19:20, <george.dunlap@citrix.com> wrote: >>>> On 05/09/16 14:31, Jan Beulich wrote: >>>>>>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: >>>>>> @@ -178,8 +179,27 @@ static int hvmemul_do_io( >>>>>> break; >>>>>> case X86EMUL_UNHANDLEABLE: >>>>>> { >>>>>> - struct hvm_ioreq_server *s = >>>>>> - hvm_select_ioreq_server(curr->domain, &p); >>>>>> + struct hvm_ioreq_server *s = NULL; >>>>>> + p2m_type_t p2mt = p2m_invalid; >>>>>> + >>>>>> + if ( is_mmio ) >>>>>> + { >>>>>> + unsigned long gmfn = paddr_to_pfn(addr); >>>>>> + >>>>>> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >>>>>> + >>>>>> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) >>>>>> + { >>>>>> + unsigned int flags; >>>>>> + >>>>>> + s = p2m_get_ioreq_server(currd, &flags); >>>>>> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >>>>>> + s = NULL; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + if ( !s && p2mt != p2m_ioreq_server ) >>>>>> + s = hvm_select_ioreq_server(currd, &p); >>>>> What I recall is that we had agreed on p2m_ioreq_server pages to be >>>>> treated as ordinary RAM ones as long as no server can be found. The >>>>> type check here contradicts that. Is there a reason? >>>> I think it must be a confusion as to what "treated like ordinary RAM >>>> ones" means. p2m_ram_rw types that gets here will go through >>>> hvm_select_ioreq_server(), and (therefore) potentially be treated as >>>> MMIO accesses, which is not how "ordinary RAM" would behave. If what >>>> you meant was that you want p2m_ioreq_server to behave like >>> p2m_ram_rw >>>> (and be treated as MMIO if it matches an iorange) then yes. If what >>>> you want is for p2m_ioreq_server to actually act like RAM, then >>>> probably something more needs to happen here. >>> Well, all I'm questioning is the special casing of p2m_ioreq_server in the if(). >>> That's imo orthogonal to p2m_ram_rw pages not being supposed to make it >>> here (hence the is_mmio check in the earlier if() also looks questionable). >>> Perhaps it would already help if there was a comment explaining what the >>> exact intended behavior here is. >>> >> My understanding is that we want accesses that make it here for pages that >> are not of type 'ioreq_server' to result in MMIO emulation (i.e. they hit an >> emulator's requested ranges, or the access is completed as unclaimed MMIO by >> Xen). Accesses that make it here because the page *is* of type 'ioreq server' >> should be sent to the emulator that has claimed the type and, if no emulator >> does currently have a claim to the type, be handled as if the access was to >> r/w RAM. > Makes sense, but it doesn't look like the code above does that, as > - keeping s to be NULL means ignoring the access > - finding a server by some means would imply handling the access as I/O > instead of RAM. > > Jan Thanks Paul for helping me explain this. :) And I'd like to give some clarification: Handling of accesses to p2m_ioreq_server pages are supposed to be delivered to device model selected from p2m_get_ioreq_server(); only accesses to port I/O and MMIO are supposed to use routine hvm_select_ioreq_server() to select the device model. For p2m_ioreq_server pages, variable s(for ioreq server) could be NULL in following cases: 1> Since we now only support the emulation of write accesses, we shall not forward current access to the device model if it is a read access. Therefore, you can see the p2m_get_ioreq_server() is only called with "if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE )" restriction. However, we may encounter a read-modify-write instruction in guest, trying to access this page, and the read emulation should be handled in hypervisor first and then deliver the write access to device model. In such case, the s is NULL, and will be handled by the mem_handler in patch 3/4. 2> Even when p2m_get_ioreq_server() return a valid ioreq server, it could be reset to NULL, if the (flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) is not true. But this part is not indispensable. It is for future support of emualtion of read accesses. For now, we have code in hvm_map_mem_type_to_ioreq_server() to guarantee the flag is XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE; Note: we do not need to worry about the situation when an p2m_ioreq_server page is written, yet no ioreq server is bound. Because: 1> hvmop_set_mem_type() requires an ioreq server have been bound when we set a page to p2m_ioreq_server; 2> hvmop_map_mem_type_to_ioreq_server() also makes sure all pages with p2m_ioreq_server be reset back to p2m_ram_rw when it unbind an ioreq server(in patch 4/4). So when no ioreq server is bound with p2m_ioreq_server, there should be no such page. Thanks Yu
On 9/9/2016 1:22 PM, Yu Zhang wrote: > > > On 9/2/2016 6:47 PM, Yu Zhang wrote: >> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to >> let one ioreq server claim/disclaim its responsibility for the >> handling of guest pages with p2m type p2m_ioreq_server. Users >> of this HVMOP can specify which kind of operation is supposed >> to be emulated in a parameter named flags. Currently, this HVMOP >> only support the emulation of write operations. And it can be >> further extended to support the emulation of read ones if an >> ioreq server has such requirement in the future. >> >> For now, we only support one ioreq server for this p2m type, so >> once an ioreq server has claimed its ownership, subsequent calls >> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also >> disclaim the ownership of guest ram pages with p2m_ioreq_server, by >> triggering this new HVMOP, with ioreq server id set to the current >> owner's and flags parameter set to 0. >> >> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server >> are only supported for HVMs with HAP enabled. >> >> Also note that only after one ioreq server claims its ownership >> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server >> be allowed. >> >> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> >> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> >> Acked-by: Tim Deegan <tim@xen.org> >> --- >> Cc: Paul Durrant <paul.durrant@citrix.com> >> Cc: Jan Beulich <jbeulich@suse.com> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >> Cc: George Dunlap <george.dunlap@eu.citrix.com> >> Cc: Jun Nakajima <jun.nakajima@intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Tim Deegan <tim@xen.org> >> >> changes in v6: >> - Clarify logic in hvmemul_do_io(). >> - Use recursive lock for ioreq server lock. >> - Remove debug print when mapping ioreq server. >> - Clarify code in ept_p2m_type_to_flags() for consistency. >> - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS. >> - Add comments for HVMMEM_ioreq_server to note only changes >> to/from HVMMEM_ram_rw are permitted. >> - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server() >> to avoid the race condition when a vm exit happens on a write- >> protected page, just to find the ioreq server has been unmapped >> already. >> - Introduce a seperate patch to delay the release of p2m >> lock to avoid the race condition. >> - Introduce a seperate patch to handle the read-modify-write >> operations on a write protected page. >> > > Why do we need to do this? Won't the default case just DTRT if it finds > that the ioreq server has been unmapped? Well, patch 4 will either mark the remaining p2m_ioreq_server entries as "recal" or reset to p2m_ram_rw directly. So my understanding is that we do not wish to see a ept violation due to a p2m_ioreq_server access after the ioreq server is unmapped. Yet without this domain_pause/unpause() pair, VM accesses may trigger an ept violation during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to find the ioreq server is NULL. Then we would have to provide handlers which just do the copy to/from actions for the VM. This seems awkward to me. IIUC, with domain_pause/unpause(), we can guarantee that the VM will not trigger such ept violation during the unmapping process. After that, accesses to a p2m_ioreq_server page will only trigger ept misconfiguration. :) Thanks Yu
On 9/5/2016 9:31 PM, Jan Beulich wrote: >>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: >> @@ -178,8 +179,27 @@ static int hvmemul_do_io( >> break; >> case X86EMUL_UNHANDLEABLE: >> { >> - struct hvm_ioreq_server *s = >> - hvm_select_ioreq_server(curr->domain, &p); >> + struct hvm_ioreq_server *s = NULL; >> + p2m_type_t p2mt = p2m_invalid; >> + >> + if ( is_mmio ) >> + { >> + unsigned long gmfn = paddr_to_pfn(addr); >> + >> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >> + >> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) >> + { >> + unsigned int flags; >> + >> + s = p2m_get_ioreq_server(currd, &flags); >> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >> + s = NULL; >> + } >> + } >> + >> + if ( !s && p2mt != p2m_ioreq_server ) >> + s = hvm_select_ioreq_server(currd, &p); > What I recall is that we had agreed on p2m_ioreq_server pages > to be treated as ordinary RAM ones as long as no server can be > found. The type check here contradicts that. Is there a reason? Thanks Jan. I had given my explaination on Sep 6's reply. :) If s is NULL for a p2m_ioreq_server page, we do not wish to traverse the rangeset by hvm_select_ioreq_server() again, it may probably be a read emulation process for a read-modify-write scenario. > >> +static int hvmop_map_mem_type_to_ioreq_server( >> + XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop) >> +{ >> + xen_hvm_map_mem_type_to_ioreq_server_t op; >> + struct domain *d; >> + int rc; >> + >> + if ( copy_from_guest(&op, uop, 1) ) >> + return -EFAULT; >> + >> + rc = rcu_lock_remote_domain_by_id(op.domid, &d); >> + if ( rc != 0 ) >> + return rc; >> + >> + rc = -EINVAL; >> + if ( !is_hvm_domain(d) ) >> + goto out; >> + >> + if ( op.pad != 0 ) >> + goto out; > This, I think, should be done first thing after having copied in the > structure. No need to lookup domain or anything if this is not zero. Right. Thanks! >> +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, >> + uint32_t type, uint32_t flags) >> +{ >> + struct hvm_ioreq_server *s; >> + int rc; >> + >> + /* For now, only HVMMEM_ioreq_server is supported. */ >> + if ( type != HVMMEM_ioreq_server ) >> + return -EINVAL; >> + >> + /* For now, only write emulation is supported. */ >> + if ( flags & ~(XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >> + return -EINVAL; >> + >> + domain_pause(d); >> + spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock); >> + >> + rc = -ENOENT; >> + list_for_each_entry ( s, >> + &d->arch.hvm_domain.ioreq_server.list, >> + list_entry ) >> + { >> + if ( s == d->arch.hvm_domain.default_ioreq_server ) >> + continue; >> + >> + if ( s->id == id ) >> + { >> + rc = p2m_set_ioreq_server(d, flags, s); >> + break; >> + } >> + } >> + >> + spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock); >> + domain_unpause(d); >> + return rc; >> +} > Blank line before final return statement of a function please. Got it. > >> +int p2m_set_ioreq_server(struct domain *d, >> + unsigned int flags, >> + struct hvm_ioreq_server *s) >> +{ >> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >> + int rc; >> + >> + /* >> + * Use lock to prevent concurrent setting requirements >> + * from multiple ioreq serers. >> + */ > "Concurrent setting requirements"? DYM "attempts"? Yep. "attempts" is more accurate. > Jan >
>>> On 09.09.16 at 07:55, <yu.c.zhang@linux.intel.com> wrote: > > On 9/5/2016 9:31 PM, Jan Beulich wrote: >>>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: >>> @@ -178,8 +179,27 @@ static int hvmemul_do_io( >>> break; >>> case X86EMUL_UNHANDLEABLE: >>> { >>> - struct hvm_ioreq_server *s = >>> - hvm_select_ioreq_server(curr->domain, &p); >>> + struct hvm_ioreq_server *s = NULL; >>> + p2m_type_t p2mt = p2m_invalid; >>> + >>> + if ( is_mmio ) >>> + { >>> + unsigned long gmfn = paddr_to_pfn(addr); >>> + >>> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >>> + >>> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) >>> + { >>> + unsigned int flags; >>> + >>> + s = p2m_get_ioreq_server(currd, &flags); >>> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >>> + s = NULL; >>> + } >>> + } >>> + >>> + if ( !s && p2mt != p2m_ioreq_server ) >>> + s = hvm_select_ioreq_server(currd, &p); >> What I recall is that we had agreed on p2m_ioreq_server pages >> to be treated as ordinary RAM ones as long as no server can be >> found. The type check here contradicts that. Is there a reason? > > Thanks Jan. I had given my explaination on Sep 6's reply. :) > If s is NULL for a p2m_ioreq_server page, we do not wish to traverse the > rangeset > by hvm_select_ioreq_server() again, it may probably be a read emulation > process > for a read-modify-write scenario. Well, yes, I recall. Yet my request stands: At the very least there should be a comment here outlining the intended behavior. That'll make it quite a bit easier, I think, for the reader to figure whether the code actually matches those intentions (and also help people needing to touch this code again down the road). Jan
On 9/9/2016 4:09 PM, Jan Beulich wrote: >>>> On 09.09.16 at 07:55, <yu.c.zhang@linux.intel.com> wrote: >> On 9/5/2016 9:31 PM, Jan Beulich wrote: >>>>>> On 02.09.16 at 12:47, <yu.c.zhang@linux.intel.com> wrote: >>>> @@ -178,8 +179,27 @@ static int hvmemul_do_io( >>>> break; >>>> case X86EMUL_UNHANDLEABLE: >>>> { >>>> - struct hvm_ioreq_server *s = >>>> - hvm_select_ioreq_server(curr->domain, &p); >>>> + struct hvm_ioreq_server *s = NULL; >>>> + p2m_type_t p2mt = p2m_invalid; >>>> + >>>> + if ( is_mmio ) >>>> + { >>>> + unsigned long gmfn = paddr_to_pfn(addr); >>>> + >>>> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); >>>> + >>>> + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) >>>> + { >>>> + unsigned int flags; >>>> + >>>> + s = p2m_get_ioreq_server(currd, &flags); >>>> + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) >>>> + s = NULL; >>>> + } >>>> + } >>>> + >>>> + if ( !s && p2mt != p2m_ioreq_server ) >>>> + s = hvm_select_ioreq_server(currd, &p); >>> What I recall is that we had agreed on p2m_ioreq_server pages >>> to be treated as ordinary RAM ones as long as no server can be >>> found. The type check here contradicts that. Is there a reason? >> Thanks Jan. I had given my explaination on Sep 6's reply. :) >> If s is NULL for a p2m_ioreq_server page, we do not wish to traverse the >> rangeset >> by hvm_select_ioreq_server() again, it may probably be a read emulation >> process >> for a read-modify-write scenario. > Well, yes, I recall. Yet my request stands: At the very least there > should be a comment here outlining the intended behavior. That'll > make it quite a bit easier, I think, for the reader to figure whether > the code actually matches those intentions (and also help people > needing to touch this code again down the road). > OK. Thanks. I'll add a comment here. :) Yu
On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote: >> On 9/2/2016 6:47 PM, Yu Zhang wrote: >>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to >>> let one ioreq server claim/disclaim its responsibility for the >>> handling of guest pages with p2m type p2m_ioreq_server. Users >>> of this HVMOP can specify which kind of operation is supposed >>> to be emulated in a parameter named flags. Currently, this HVMOP >>> only support the emulation of write operations. And it can be >>> further extended to support the emulation of read ones if an >>> ioreq server has such requirement in the future. >>> >>> For now, we only support one ioreq server for this p2m type, so >>> once an ioreq server has claimed its ownership, subsequent calls >>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also >>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by >>> triggering this new HVMOP, with ioreq server id set to the current >>> owner's and flags parameter set to 0. >>> >>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server >>> are only supported for HVMs with HAP enabled. >>> >>> Also note that only after one ioreq server claims its ownership >>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server >>> be allowed. >>> >>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> >>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> >>> Acked-by: Tim Deegan <tim@xen.org> >>> --- >>> Cc: Paul Durrant <paul.durrant@citrix.com> >>> Cc: Jan Beulich <jbeulich@suse.com> >>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>> Cc: George Dunlap <george.dunlap@eu.citrix.com> >>> Cc: Jun Nakajima <jun.nakajima@intel.com> >>> Cc: Kevin Tian <kevin.tian@intel.com> >>> Cc: Tim Deegan <tim@xen.org> >>> >>> changes in v6: >>> - Clarify logic in hvmemul_do_io(). >>> - Use recursive lock for ioreq server lock. >>> - Remove debug print when mapping ioreq server. >>> - Clarify code in ept_p2m_type_to_flags() for consistency. >>> - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS. >>> - Add comments for HVMMEM_ioreq_server to note only changes >>> to/from HVMMEM_ram_rw are permitted. >>> - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server() >>> to avoid the race condition when a vm exit happens on a write- >>> protected page, just to find the ioreq server has been unmapped >>> already. >>> - Introduce a seperate patch to delay the release of p2m >>> lock to avoid the race condition. >>> - Introduce a seperate patch to handle the read-modify-write >>> operations on a write protected page. >>> >> >> Why do we need to do this? Won't the default case just DTRT if it finds >> that the ioreq server has been unmapped? > > > Well, patch 4 will either mark the remaining p2m_ioreq_server entries as > "recal" or > reset to p2m_ram_rw directly. So my understanding is that we do not wish to > see a ept violation due to a p2m_ioreq_server access after the ioreq server > is unmapped. > Yet without this domain_pause/unpause() pair, VM accesses may trigger an ept > violation > during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to find > the ioreq > server is NULL. Then we would have to provide handlers which just do the > copy to/from > actions for the VM. This seems awkward to me. So the race you're worried about is this: 1. Guest fault happens 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking 3. guest finds no ioreq server present I think in that case the easiest thing to do would be to simply assume there was a race and re-execute the instruction. Is that not possible for some reason? -George
On 9/21/2016 9:04 PM, George Dunlap wrote: > On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote: >>> On 9/2/2016 6:47 PM, Yu Zhang wrote: >>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to >>>> let one ioreq server claim/disclaim its responsibility for the >>>> handling of guest pages with p2m type p2m_ioreq_server. Users >>>> of this HVMOP can specify which kind of operation is supposed >>>> to be emulated in a parameter named flags. Currently, this HVMOP >>>> only support the emulation of write operations. And it can be >>>> further extended to support the emulation of read ones if an >>>> ioreq server has such requirement in the future. >>>> >>>> For now, we only support one ioreq server for this p2m type, so >>>> once an ioreq server has claimed its ownership, subsequent calls >>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also >>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by >>>> triggering this new HVMOP, with ioreq server id set to the current >>>> owner's and flags parameter set to 0. >>>> >>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server >>>> are only supported for HVMs with HAP enabled. >>>> >>>> Also note that only after one ioreq server claims its ownership >>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server >>>> be allowed. >>>> >>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> >>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> >>>> Acked-by: Tim Deegan <tim@xen.org> >>>> --- >>>> Cc: Paul Durrant <paul.durrant@citrix.com> >>>> Cc: Jan Beulich <jbeulich@suse.com> >>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>>> Cc: George Dunlap <george.dunlap@eu.citrix.com> >>>> Cc: Jun Nakajima <jun.nakajima@intel.com> >>>> Cc: Kevin Tian <kevin.tian@intel.com> >>>> Cc: Tim Deegan <tim@xen.org> >>>> >>>> changes in v6: >>>> - Clarify logic in hvmemul_do_io(). >>>> - Use recursive lock for ioreq server lock. >>>> - Remove debug print when mapping ioreq server. >>>> - Clarify code in ept_p2m_type_to_flags() for consistency. >>>> - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS. >>>> - Add comments for HVMMEM_ioreq_server to note only changes >>>> to/from HVMMEM_ram_rw are permitted. >>>> - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server() >>>> to avoid the race condition when a vm exit happens on a write- >>>> protected page, just to find the ioreq server has been unmapped >>>> already. >>>> - Introduce a seperate patch to delay the release of p2m >>>> lock to avoid the race condition. >>>> - Introduce a seperate patch to handle the read-modify-write >>>> operations on a write protected page. >>>> >>> Why do we need to do this? Won't the default case just DTRT if it finds >>> that the ioreq server has been unmapped? >> >> Well, patch 4 will either mark the remaining p2m_ioreq_server entries as >> "recal" or >> reset to p2m_ram_rw directly. So my understanding is that we do not wish to >> see a ept violation due to a p2m_ioreq_server access after the ioreq server >> is unmapped. >> Yet without this domain_pause/unpause() pair, VM accesses may trigger an ept >> violation >> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to find >> the ioreq >> server is NULL. Then we would have to provide handlers which just do the >> copy to/from >> actions for the VM. This seems awkward to me. > So the race you're worried about is this: > > 1. Guest fault happens > 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking > 3. guest finds no ioreq server present > > I think in that case the easiest thing to do would be to simply assume > there was a race and re-execute the instruction. Is that not possible > for some reason? > > -George Thanks for your reply, George. :) Two reasons I'd like to use the domain_pause/unpause() to avoid the race condition: 1> Like my previous explanation, in the read-modify-write scenario, the ioreq server will be NULL for the read emulation. But in such case, hypervisor will not discard this trap, instead it is supposed to do the copy work for the read access. So it would be difficult for hypervisor to decide if the ioreq server was detached due to a race condition, or if the ioreq server should be a NULL because we are emulating a read operation first for a read-modify-write instruction. 2> I also realized it can avoid a deadlock possibility - a> dom0 triggers map_mem_type_to_ioreq_server, which spin locks the ioreq_server.lock, and before routine hvm_map_mem_type_to_ioreq_server() returns, b> The HVM triggers an ept violation, and p2m lock is held by hvm_hap_nested_page_fault(); c> hypervisor continues to the I/O handler, which will need the ioreq_server.lock to select an ioreq server, which is being held by the hypercall handler side; d> likewise, the map_mem_type_to_ioreq_server will meet problem when trying to get the p2m lock when it calls p2m_change_entry_type_global(). With a domain_pause/unpause(), we could avoid the ept violation during this period(which I believe won't be long because this pair only covers the p2m_change_entry_type_global() call, not the p2m_finish_type_change() call). Thanks Yu
On 9/22/2016 5:12 PM, Yu Zhang wrote: > > > On 9/21/2016 9:04 PM, George Dunlap wrote: >> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com> >> wrote: >>>> On 9/2/2016 6:47 PM, Yu Zhang wrote: >>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to >>>>> let one ioreq server claim/disclaim its responsibility for the >>>>> handling of guest pages with p2m type p2m_ioreq_server. Users >>>>> of this HVMOP can specify which kind of operation is supposed >>>>> to be emulated in a parameter named flags. Currently, this HVMOP >>>>> only support the emulation of write operations. And it can be >>>>> further extended to support the emulation of read ones if an >>>>> ioreq server has such requirement in the future. >>>>> >>>>> For now, we only support one ioreq server for this p2m type, so >>>>> once an ioreq server has claimed its ownership, subsequent calls >>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also >>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by >>>>> triggering this new HVMOP, with ioreq server id set to the current >>>>> owner's and flags parameter set to 0. >>>>> >>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server >>>>> are only supported for HVMs with HAP enabled. >>>>> >>>>> Also note that only after one ioreq server claims its ownership >>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server >>>>> be allowed. >>>>> >>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> >>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> >>>>> Acked-by: Tim Deegan <tim@xen.org> >>>>> --- >>>>> Cc: Paul Durrant <paul.durrant@citrix.com> >>>>> Cc: Jan Beulich <jbeulich@suse.com> >>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com> >>>>> Cc: Jun Nakajima <jun.nakajima@intel.com> >>>>> Cc: Kevin Tian <kevin.tian@intel.com> >>>>> Cc: Tim Deegan <tim@xen.org> >>>>> >>>>> changes in v6: >>>>> - Clarify logic in hvmemul_do_io(). >>>>> - Use recursive lock for ioreq server lock. >>>>> - Remove debug print when mapping ioreq server. >>>>> - Clarify code in ept_p2m_type_to_flags() for consistency. >>>>> - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS. >>>>> - Add comments for HVMMEM_ioreq_server to note only changes >>>>> to/from HVMMEM_ram_rw are permitted. >>>>> - Add domain_pause/unpause() in >>>>> hvm_map_mem_type_to_ioreq_server() >>>>> to avoid the race condition when a vm exit happens on a write- >>>>> protected page, just to find the ioreq server has been unmapped >>>>> already. >>>>> - Introduce a seperate patch to delay the release of p2m >>>>> lock to avoid the race condition. >>>>> - Introduce a seperate patch to handle the read-modify-write >>>>> operations on a write protected page. >>>>> >>>> Why do we need to do this? Won't the default case just DTRT if it >>>> finds >>>> that the ioreq server has been unmapped? >>> >>> Well, patch 4 will either mark the remaining p2m_ioreq_server >>> entries as >>> "recal" or >>> reset to p2m_ram_rw directly. So my understanding is that we do not >>> wish to >>> see a ept violation due to a p2m_ioreq_server access after the ioreq >>> server >>> is unmapped. >>> Yet without this domain_pause/unpause() pair, VM accesses may >>> trigger an ept >>> violation >>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just >>> to find >>> the ioreq >>> server is NULL. Then we would have to provide handlers which just do >>> the >>> copy to/from >>> actions for the VM. This seems awkward to me. >> So the race you're worried about is this: >> >> 1. Guest fault happens >> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking >> 3. guest finds no ioreq server present >> >> I think in that case the easiest thing to do would be to simply assume >> there was a race and re-execute the instruction. Is that not possible >> for some reason? >> >> -George > > Thanks for your reply, George. :) > Two reasons I'd like to use the domain_pause/unpause() to avoid the > race condition: > > 1> Like my previous explanation, in the read-modify-write scenario, > the ioreq server will > be NULL for the read emulation. But in such case, hypervisor will not > discard this trap, instead > it is supposed to do the copy work for the read access. So it would be > difficult for hypervisor > to decide if the ioreq server was detached due to a race condition, or > if the ioreq server should > be a NULL because we are emulating a read operation first for a > read-modify-write instruction. > > 2> I also realized it can avoid a deadlock possibility - > a> dom0 triggers map_mem_type_to_ioreq_server, which spin locks > the ioreq_server.lock, > and before routine hvm_map_mem_type_to_ioreq_server() returns, > b> The HVM triggers an ept violation, and p2m lock is held by > hvm_hap_nested_page_fault(); > c> hypervisor continues to the I/O handler, which will need the > ioreq_server.lock to select > an ioreq server, which is being held by the hypercall handler > side; > d> likewise, the map_mem_type_to_ioreq_server will meet problem > when trying to get the > p2m lock when it calls p2m_change_entry_type_global(). > With a domain_pause/unpause(), we could avoid the ept violation > during this period(which I > believe won't be long because this pair only covers the > p2m_change_entry_type_global() call, not > the p2m_finish_type_change() call). Sorry, the deadlock issue should be between the p2m->ioreq.lock and the p2m lock which are both inside p2m_set_ioreq_server() called by hvm_map_mem_type_to_ioreq_server(). The sequence when deadlock happens is similar to the above description: a> dom0 triggers map_mem_type_to_ioreq_server, to unmap the p2m_ioreq_server type; b> The HVM triggers an ept violation, and p2m lock is held by hvm_hap_nested_page_fault(); c> hypervisor continues to the I/O handler, which will need the p2m->ioreq.lock to select an ioreq server for the write protected address, which is being held by the hypercall handler side; d> likewise, p2m_set_ioreq_server() will meet problem when trying to get the p2m lock when it calls p2m_change_entry_type_global(). Now I believe we can avoid this deadlock by moving the p2m_change_entry_type_global() outside the p2m_set_ioreq_server() to its caller hvm_map_mem_type_to_ioreq_server(), and there would be no nested lock between the p2m->ioreq.lock and p2m lock. :) A code snippet: @@ -944,6 +944,13 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, if ( s->id == id ) { rc = p2m_set_ioreq_server(d, flags, s); + if ( rc == 0 && flags == 0 ) + { + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + if ( read_atomic(&p2m->ioreq.entry_count) ) + p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); + } break; } } Thanks Yu > > Thanks > Yu > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index c55ad7b..a33346e 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -100,6 +100,7 @@ static int hvmemul_do_io( uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data) { struct vcpu *curr = current; + struct domain *currd = curr->domain; struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; ioreq_t p = { .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@ -141,7 +142,7 @@ static int hvmemul_do_io( (p.dir != dir) || (p.df != df) || (p.data_is_ptr != data_is_addr) ) - domain_crash(curr->domain); + domain_crash(currd); if ( data_is_addr ) return X86EMUL_UNHANDLEABLE; @@ -178,8 +179,27 @@ static int hvmemul_do_io( break; case X86EMUL_UNHANDLEABLE: { - struct hvm_ioreq_server *s = - hvm_select_ioreq_server(curr->domain, &p); + struct hvm_ioreq_server *s = NULL; + p2m_type_t p2mt = p2m_invalid; + + if ( is_mmio ) + { + unsigned long gmfn = paddr_to_pfn(addr); + + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); + + if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) + { + unsigned int flags; + + s = p2m_get_ioreq_server(currd, &flags); + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) + s = NULL; + } + } + + if ( !s && p2mt != p2m_ioreq_server ) + s = hvm_select_ioreq_server(currd, &p); /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 0180f26..e969735 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -5408,9 +5408,14 @@ static int hvmop_get_mem_type( static bool_t hvm_allow_p2m_type_change(p2m_type_t old, p2m_type_t new) { + if ( new == p2m_ioreq_server ) + return old == p2m_ram_rw; + + if ( old == p2m_ioreq_server ) + return new == p2m_ram_rw; + if ( p2m_is_ram(old) || - (p2m_is_hole(old) && new == p2m_mmio_dm) || - (old == p2m_ioreq_server && new == p2m_ram_rw) ) + (p2m_is_hole(old) && new == p2m_mmio_dm) ) return 1; return 0; @@ -5445,6 +5450,19 @@ static int hvmop_set_mem_type( if ( !is_hvm_domain(d) ) goto out; + if ( a.hvmmem_type == HVMMEM_ioreq_server ) + { + unsigned int flags; + + /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */ + if ( !hap_enabled(d) ) + goto out; + + /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped. */ + if ( !p2m_get_ioreq_server(d, &flags) ) + goto out; + } + rc = xsm_hvm_control(XSM_DM_PRIV, d, HVMOP_set_mem_type); if ( rc ) goto out; @@ -5507,6 +5525,43 @@ static int hvmop_set_mem_type( return rc; } +static int hvmop_map_mem_type_to_ioreq_server( + XEN_GUEST_HANDLE_PARAM(xen_hvm_map_mem_type_to_ioreq_server_t) uop) +{ + xen_hvm_map_mem_type_to_ioreq_server_t op; + struct domain *d; + int rc; + + if ( copy_from_guest(&op, uop, 1) ) + return -EFAULT; + + rc = rcu_lock_remote_domain_by_id(op.domid, &d); + if ( rc != 0 ) + return rc; + + rc = -EINVAL; + if ( !is_hvm_domain(d) ) + goto out; + + if ( op.pad != 0 ) + goto out; + + /* Only support for HAP enabled hvm. */ + if ( !hap_enabled(d) ) + goto out; + + rc = xsm_hvm_ioreq_server(XSM_DM_PRIV, d, + HVMOP_map_mem_type_to_ioreq_server); + if ( rc != 0 ) + goto out; + + rc = hvm_map_mem_type_to_ioreq_server(d, op.id, op.type, op.flags); + + out: + rcu_unlock_domain(d); + return rc; +} + long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) { unsigned long start_iter, mask; @@ -5546,6 +5601,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) guest_handle_cast(arg, xen_hvm_io_range_t)); break; + case HVMOP_map_mem_type_to_ioreq_server: + rc = hvmop_map_mem_type_to_ioreq_server( + guest_handle_cast(arg, xen_hvm_map_mem_type_to_ioreq_server_t)); + break; + case HVMOP_set_ioreq_server_state: rc = hvmop_set_ioreq_server_state( guest_handle_cast(arg, xen_hvm_set_ioreq_server_state_t)); diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index d2245e2..ac84563 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) domain_pause(d); + p2m_destroy_ioreq_server(d, s); + hvm_ioreq_server_disable(s, 0); list_del(&s->list_entry); @@ -914,6 +916,43 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, return rc; } +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags) +{ + struct hvm_ioreq_server *s; + int rc; + + /* For now, only HVMMEM_ioreq_server is supported. */ + if ( type != HVMMEM_ioreq_server ) + return -EINVAL; + + /* For now, only write emulation is supported. */ + if ( flags & ~(XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) + return -EINVAL; + + domain_pause(d); + spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock); + + rc = -ENOENT; + list_for_each_entry ( s, + &d->arch.hvm_domain.ioreq_server.list, + list_entry ) + { + if ( s == d->arch.hvm_domain.default_ioreq_server ) + continue; + + if ( s->id == id ) + { + rc = p2m_set_ioreq_server(d, flags, s); + break; + } + } + + spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock); + domain_unpause(d); + return rc; +} + int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, bool_t enabled) { diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index d41bb09..aa90a62 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -174,7 +174,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa, if ( *p2mt == p2m_mmio_direct ) goto direct_mmio_out; rc = NESTEDHVM_PAGEFAULT_MMIO; - if ( *p2mt == p2m_mmio_dm ) + if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server ) goto out; rc = NESTEDHVM_PAGEFAULT_L0_ERROR; diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 13cab24..700420a 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -132,6 +132,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, entry->r = entry->w = entry->x = 1; entry->a = entry->d = !!cpu_has_vmx_ept_ad; break; + case p2m_ioreq_server: + entry->r = 1; + entry->w = !(p2m->ioreq.flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE); + entry->x = 0; + entry->a = !!cpu_has_vmx_ept_ad; + entry->d = entry->w && entry->a; + break; case p2m_mmio_direct: entry->r = entry->x = 1; entry->w = !rangeset_contains_singleton(mmio_ro_ranges, @@ -171,7 +178,6 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, entry->a = entry->d = !!cpu_has_vmx_ept_ad; break; case p2m_grant_map_ro: - case p2m_ioreq_server: entry->r = 1; entry->w = entry->x = 0; entry->a = !!cpu_has_vmx_ept_ad; diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index 3b025d5..46a56fa 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -72,7 +72,9 @@ static const unsigned long pgt[] = { PGT_l3_page_table }; -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn, +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m, + p2m_type_t t, + mfn_t mfn, unsigned int level) { unsigned long flags; @@ -94,8 +96,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn, default: return flags | _PAGE_NX_BIT; case p2m_grant_map_ro: - case p2m_ioreq_server: return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT; + case p2m_ioreq_server: + flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT; + if ( p2m->ioreq.flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE ) + return flags & ~_PAGE_RW; + else + return flags; case p2m_ram_ro: case p2m_ram_logdirty: case p2m_ram_shared: @@ -442,7 +449,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) ? p2m_ram_logdirty : p2m_ram_rw; unsigned long mfn = l1e_get_pfn(e); - unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level); + unsigned long flags = p2m_type_to_flags(p2m, p2mt, + _mfn(mfn), level); if ( level ) { @@ -579,7 +587,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct); l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ? l3e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE) + p2m_type_to_flags(p2m, p2mt, mfn, 2) | _PAGE_PSE) : l3e_empty(); entry_content.l1 = l3e_content.l3; @@ -615,7 +623,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ) entry_content = p2m_l1e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn, 0)); + p2m_type_to_flags(p2m, p2mt, mfn, 0)); else entry_content = l1e_empty(); @@ -651,7 +659,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct); if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ) l2e_content = l2e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn, 1) | + p2m_type_to_flags(p2m, p2mt, mfn, 1) | _PAGE_PSE); else l2e_content = l2e_empty(); diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 812dbf6..6e4cb1f 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -83,6 +83,8 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m) else p2m_pt_init(p2m); + spin_lock_init(&p2m->ioreq.lock); + return ret; } @@ -289,6 +291,78 @@ void p2m_memory_type_changed(struct domain *d) } } +int p2m_set_ioreq_server(struct domain *d, + unsigned int flags, + struct hvm_ioreq_server *s) +{ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + int rc; + + /* + * Use lock to prevent concurrent setting requirements + * from multiple ioreq serers. + */ + spin_lock(&p2m->ioreq.lock); + + /* Unmap ioreq server from p2m type by passing flags with 0. */ + if ( flags == 0 ) + { + rc = -EINVAL; + if ( p2m->ioreq.server != s ) + goto out; + + p2m->ioreq.server = NULL; + p2m->ioreq.flags = 0; + } + else + { + rc = -EBUSY; + if ( p2m->ioreq.server != NULL ) + goto out; + + p2m->ioreq.server = s; + p2m->ioreq.flags = flags; + } + + rc = 0; + + out: + spin_unlock(&p2m->ioreq.lock); + + return rc; +} + +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, + unsigned int *flags) +{ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + struct hvm_ioreq_server *s; + + spin_lock(&p2m->ioreq.lock); + + s = p2m->ioreq.server; + *flags = p2m->ioreq.flags; + + spin_unlock(&p2m->ioreq.lock); + return s; +} + +void p2m_destroy_ioreq_server(const struct domain *d, + const struct hvm_ioreq_server *s) +{ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + spin_lock(&p2m->ioreq.lock); + + if ( p2m->ioreq.server == s ) + { + p2m->ioreq.server = NULL; + p2m->ioreq.flags = 0; + } + + spin_unlock(&p2m->ioreq.lock); +} + void p2m_enable_hardware_log_dirty(struct domain *d) { struct p2m_domain *p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 833f279..4deab39 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -3225,8 +3225,7 @@ static int sh_page_fault(struct vcpu *v, } /* Need to hand off device-model MMIO to the device model */ - if ( p2mt == p2m_mmio_dm - || (p2mt == p2m_ioreq_server && ft == ft_demand_write) ) + if ( p2mt == p2m_mmio_dm ) { gpa = guest_walk_to_gpa(&gw); goto mmio; diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index fbf2c74..b43667a 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, uint32_t type, uint64_t start, uint64_t end); +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags); int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, bool_t enabled); diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index d5fd546..4924c4b 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t; | p2m_to_mask(p2m_ram_paging_out) \ | p2m_to_mask(p2m_ram_paged) \ | p2m_to_mask(p2m_ram_paging_in) \ - | p2m_to_mask(p2m_ram_shared)) + | p2m_to_mask(p2m_ram_shared) \ + | p2m_to_mask(p2m_ioreq_server)) /* Types that represent a physmap hole that is ok to replace with a shared * entry */ @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t; #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty) \ | p2m_to_mask(p2m_ram_ro) \ | p2m_to_mask(p2m_grant_map_ro) \ - | p2m_to_mask(p2m_ram_shared) \ - | p2m_to_mask(p2m_ioreq_server)) + | p2m_to_mask(p2m_ram_shared)) /* Write-discard types, which should discard the write operations */ #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro) \ @@ -336,6 +336,20 @@ struct p2m_domain { struct ept_data ept; /* NPT-equivalent structure could be added here. */ }; + + struct { + spinlock_t lock; + /* + * ioreq server who's responsible for the emulation of + * gfns with specific p2m type(for now, p2m_ioreq_server). + */ + struct hvm_ioreq_server *server; + /* + * flags specifies whether read, write or both operations + * are to be emulated by an ioreq server. + */ + unsigned int flags; + } ioreq; }; /* get host p2m table */ @@ -845,6 +859,12 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt) return flags; } +int p2m_set_ioreq_server(struct domain *d, unsigned int flags, + struct hvm_ioreq_server *s); +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, + unsigned int *flags); +void p2m_destroy_ioreq_server(const struct domain *d, const struct hvm_ioreq_server *s); + #endif /* _XEN_P2M_H */ /* diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index b3e45cf..4c19fc5 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -89,7 +89,13 @@ typedef enum { HVMMEM_unused, /* Placeholder; setting memory to this type will fail for code after 4.7.0 */ #endif - HVMMEM_ioreq_server + HVMMEM_ioreq_server /* Memory type claimed by an ioreq server; type + changes to this value are only allowed after + an ioreq server has claimed its ownership. + Only pages with HVMMEM_ram_rw are allowed to + change to this type; conversely, pages with + this type are only allowed to be changed back + to HVMMEM_ram_rw. */ } hvmmem_type_t; /* Following tools-only interfaces may change in future. */ @@ -383,6 +389,33 @@ struct xen_hvm_set_ioreq_server_state { typedef struct xen_hvm_set_ioreq_server_state xen_hvm_set_ioreq_server_state_t; DEFINE_XEN_GUEST_HANDLE(xen_hvm_set_ioreq_server_state_t); +/* + * HVMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id> + * to specific memroy type <type> + * for specific accesses <flags> + * + * For now, flags only accept the value of HVMOP_IOREQ_MEM_ACCESS_WRITE, + * which means only write operations are to be forwarded to an ioreq server. + * Support for the emulation of read operations can be added when an ioreq + * server has such requirement in future. + */ +#define HVMOP_map_mem_type_to_ioreq_server 26 +struct xen_hvm_map_mem_type_to_ioreq_server { + domid_t domid; /* IN - domain to be serviced */ + ioservid_t id; /* IN - ioreq server id */ + uint16_t type; /* IN - memory type */ + uint16_t pad; + uint32_t flags; /* IN - types of accesses to be forwarded to the + ioreq server. flags with 0 means to unmap the + ioreq server */ + +#define XEN_HVMOP_IOREQ_MEM_ACCESS_READ (1u << 0) +#define XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1) +}; +typedef struct xen_hvm_map_mem_type_to_ioreq_server + xen_hvm_map_mem_type_to_ioreq_server_t; +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_mem_type_to_ioreq_server_t); + #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ #if defined(__i386__) || defined(__x86_64__)