Message ID | 1488987232-12349-3-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote: > changes in v7: > - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server. > - According to comments from George: removed domain_pause/unpause() in > hvm_map_mem_type_to_ioreq_server(), because it's too expensive, > and we can avoid the: > a> deadlock between p2m lock and ioreq server lock by using these locks > in the same order - solved in patch 4; That is, until patch 4 there is deadlock potential? I think you want to re-order the patches if so. Or was it that the type can't really be used until the last patch of the series? (I'm sorry, it's been quite a while since the previous version.) > @@ -365,6 +383,24 @@ static int dm_op(domid_t domid, > break; > } > > + case XEN_DMOP_map_mem_type_to_ioreq_server: > + { > + const struct xen_dm_op_map_mem_type_to_ioreq_server *data = > + &op.u.map_mem_type_to_ioreq_server; > + > + rc = -EINVAL; > + if ( data->pad ) > + break; > + > + /* Only support for HAP enabled hvm. */ > + if ( !hap_enabled(d) ) > + break; Perhaps better to give an error other than -EINVAL in this case? If so, then the same error should likely also be used in your set_mem_type() addition. > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -99,6 +99,7 @@ static int hvmemul_do_io( > uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data) > { > struct vcpu *curr = current; > + struct domain *currd = curr->domain; > struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; > ioreq_t p = { > .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, > @@ -140,7 +141,7 @@ static int hvmemul_do_io( > (p.dir != dir) || > (p.df != df) || > (p.data_is_ptr != data_is_addr) ) > - domain_crash(curr->domain); > + domain_crash(currd); If you mean to do this transformation here, then please do so consistently for the entire function. > @@ -177,8 +178,65 @@ static int hvmemul_do_io( > break; > case X86EMUL_UNHANDLEABLE: > { > - struct hvm_ioreq_server *s = > - hvm_select_ioreq_server(curr->domain, &p); > + /* > + * Xen isn't emulating the instruction internally, so see if > + * there's an ioreq server that can handle it. Rules: > + * > + * - PIO and "normal" mmio run through hvm_select_ioreq_server() > + * to choose the ioreq server by range. If no server is found, > + * the access is ignored. > + * > + * - p2m_ioreq_server accesses are handled by the current > + * ioreq_server for the domain, but there are some corner > + * cases: Who or what is "the current ioreq_server for the domain"? > + * - If the domain ioreq_server is NULL, assume this is a > + * race between the unbinding of ioreq server and guest fault > + * so re-try the instruction. > + * > + * - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it > + * like a normal PIO or MMIO that doesn't have an ioreq > + * server (i.e., by ignoring it). > + */ > + struct hvm_ioreq_server *s = NULL; > + p2m_type_t p2mt = p2m_invalid; > + > + if ( is_mmio ) > + { > + unsigned long gmfn = paddr_to_pfn(addr); > + > + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); Stray cast. > + if ( p2mt == p2m_ioreq_server ) > + { > + unsigned int flags; > + > + s = p2m_get_ioreq_server(currd, &flags); > + > + /* > + * If p2mt is ioreq_server but ioreq_server is NULL, > + * we probably lost a race with unbinding of ioreq > + * server, just retry the access. > + */ > + if ( s == NULL ) > + { > + rc = X86EMUL_RETRY; > + vio->io_req.state = STATE_IOREQ_NONE; > + break; > + } > + > + /* > + * If the IOREQ_MEM_ACCESS_WRITE flag is not set, > + * we should set s to NULL, and just ignore such > + * access. > + */ > + if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) ) > + s = NULL; What is this about? You only allow WRITE registrations, so this looks to be dead code. Yet if it is meant to guard against future enabling of READ, then this clearly should not be done for reads. > --- a/xen/arch/x86/mm/p2m-ept.c > +++ b/xen/arch/x86/mm/p2m-ept.c > @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, > entry->r = entry->w = entry->x = 1; > entry->a = entry->d = !!cpu_has_vmx_ept_ad; > break; > + case p2m_ioreq_server: > + entry->r = 1; > + entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE); Along the lines of the previous comment - if you mean to have the code cope with READ, please also set ->r accordingly, or add a comment why this won't have the intended effect (yielding a not present EPTE). > @@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn, > default: > return flags | _PAGE_NX_BIT; > case p2m_grant_map_ro: > - case p2m_ioreq_server: > return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT; > + case p2m_ioreq_server: > + flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT; > + if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) > + return flags & ~_PAGE_RW; > + else > + return flags; Stray else. But even better would imo be if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) flags &= ~_PAGE_RW; return flags; > +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, > + unsigned int *flags) > +{ > + struct p2m_domain *p2m = p2m_get_hostp2m(d); > + struct hvm_ioreq_server *s; > + > + spin_lock(&p2m->ioreq.lock); > + > + s = p2m->ioreq.server; > + *flags = p2m->ioreq.flags; > + > + spin_unlock(&p2m->ioreq.lock); > + return s; > +} I'm afraid this question was asked before, but since there's no comment here or anywhere, I can't recall if there was a reason why s potentially being stale by the time the caller looks at it is not a problem. > --- a/xen/include/public/hvm/dm_op.h > +++ b/xen/include/public/hvm/dm_op.h > @@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi { > uint64_aligned_t addr; > }; > > +/* > + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id> > + * to specific memroy type <type> > + * for specific accesses <flags> > + * > + * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE, > + * which means only write operations are to be forwarded to an ioreq server. > + * Support for the emulation of read operations can be added when an ioreq > + * server has such requirement in future. > + */ > +#define XEN_DMOP_map_mem_type_to_ioreq_server 15 > + > +struct xen_dm_op_map_mem_type_to_ioreq_server { > + ioservid_t id; /* IN - ioreq server id */ > + uint16_t type; /* IN - memory type */ > + uint16_t pad; Perhaps there was padding needed when this was a hvmop, but now the padding does exactly the wrong thing. Jan
On 3/10/2017 11:29 PM, Jan Beulich wrote: >>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote: >> changes in v7: >> - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server. >> - According to comments from George: removed domain_pause/unpause() in >> hvm_map_mem_type_to_ioreq_server(), because it's too expensive, >> and we can avoid the: >> a> deadlock between p2m lock and ioreq server lock by using these locks >> in the same order - solved in patch 4; > That is, until patch 4 there is deadlock potential? I think you want to > re-order the patches if so. Or was it that the type can't really be used > until the last patch of the series? (I'm sorry, it's been quite a while > since the previous version.) Oh. There's no deadlock potential in this version patch set. But in v6, there was, and I used domain_pause/unpause() to avoid this. Later on, I realized that if I use different locks in the same order, the deadlock potential can be avoid and we do not need domain_pause/unpause in this version. >> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid, >> break; >> } >> >> + case XEN_DMOP_map_mem_type_to_ioreq_server: >> + { >> + const struct xen_dm_op_map_mem_type_to_ioreq_server *data = >> + &op.u.map_mem_type_to_ioreq_server; >> + >> + rc = -EINVAL; >> + if ( data->pad ) >> + break; >> + >> + /* Only support for HAP enabled hvm. */ >> + if ( !hap_enabled(d) ) >> + break; > Perhaps better to give an error other than -EINVAL in this case? > If so, then the same error should likely also be used in your > set_mem_type() addition. How about -ENOTSUP? >> --- a/xen/arch/x86/hvm/emulate.c >> +++ b/xen/arch/x86/hvm/emulate.c >> @@ -99,6 +99,7 @@ static int hvmemul_do_io( >> uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data) >> { >> struct vcpu *curr = current; >> + struct domain *currd = curr->domain; >> struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; >> ioreq_t p = { >> .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, >> @@ -140,7 +141,7 @@ static int hvmemul_do_io( >> (p.dir != dir) || >> (p.df != df) || >> (p.data_is_ptr != data_is_addr) ) >> - domain_crash(curr->domain); >> + domain_crash(currd); > If you mean to do this transformation here, then please do so > consistently for the entire function. OK. Thanks. >> @@ -177,8 +178,65 @@ static int hvmemul_do_io( >> break; >> case X86EMUL_UNHANDLEABLE: >> { >> - struct hvm_ioreq_server *s = >> - hvm_select_ioreq_server(curr->domain, &p); >> + /* >> + * Xen isn't emulating the instruction internally, so see if >> + * there's an ioreq server that can handle it. Rules: >> + * >> + * - PIO and "normal" mmio run through hvm_select_ioreq_server() >> + * to choose the ioreq server by range. If no server is found, >> + * the access is ignored. >> + * >> + * - p2m_ioreq_server accesses are handled by the current >> + * ioreq_server for the domain, but there are some corner >> + * cases: > Who or what is "the current ioreq_server for the domain"? It means "the current ioreq_server which maps the p2m_ioreq_server type for this domain"... I'd like to use a succinct phrase, but now seems not accurate enough. Any preference? >> + * - If the domain ioreq_server is NULL, assume this is a >> + * race between the unbinding of ioreq server and guest fault >> + * so re-try the instruction. >> + * >> + * - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it >> + * like a normal PIO or MMIO that doesn't have an ioreq >> + * server (i.e., by ignoring it). >> + */ >> + struct hvm_ioreq_server *s = NULL; >> + p2m_type_t p2mt = p2m_invalid; >> + >> + if ( is_mmio ) >> + { >> + unsigned long gmfn = paddr_to_pfn(addr); >> + >> + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); > Stray cast. OK. Will remove it. > >> + if ( p2mt == p2m_ioreq_server ) >> + { >> + unsigned int flags; >> + >> + s = p2m_get_ioreq_server(currd, &flags); >> + >> + /* >> + * If p2mt is ioreq_server but ioreq_server is NULL, >> + * we probably lost a race with unbinding of ioreq >> + * server, just retry the access. >> + */ >> + if ( s == NULL ) >> + { >> + rc = X86EMUL_RETRY; >> + vio->io_req.state = STATE_IOREQ_NONE; >> + break; >> + } >> + >> + /* >> + * If the IOREQ_MEM_ACCESS_WRITE flag is not set, >> + * we should set s to NULL, and just ignore such >> + * access. >> + */ >> + if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) ) >> + s = NULL; > What is this about? You only allow WRITE registrations, so this looks > to be dead code. Yet if it is meant to guard against future enabling > of READ, then this clearly should not be done for reads. It's to guard against future emulation of READ. We can remove it for now. >> --- a/xen/arch/x86/mm/p2m-ept.c >> +++ b/xen/arch/x86/mm/p2m-ept.c >> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, >> entry->r = entry->w = entry->x = 1; >> entry->a = entry->d = !!cpu_has_vmx_ept_ad; >> break; >> + case p2m_ioreq_server: >> + entry->r = 1; >> + entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE); > Along the lines of the previous comment - if you mean to have the > code cope with READ, please also set ->r accordingly, or add a > comment why this won't have the intended effect (yielding a not > present EPTE). How about we keep this code and do not support READ? I'll remove above dead code in hvmemul_do_io(). >> @@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn, >> default: >> return flags | _PAGE_NX_BIT; >> case p2m_grant_map_ro: >> - case p2m_ioreq_server: >> return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT; >> + case p2m_ioreq_server: >> + flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT; >> + if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) >> + return flags & ~_PAGE_RW; >> + else >> + return flags; > Stray else. But even better would imo be > > if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) > flags &= ~_PAGE_RW; > return flags; Oh. Thanks. :) >> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, >> + unsigned int *flags) >> +{ >> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >> + struct hvm_ioreq_server *s; >> + >> + spin_lock(&p2m->ioreq.lock); >> + >> + s = p2m->ioreq.server; >> + *flags = p2m->ioreq.flags; >> + >> + spin_unlock(&p2m->ioreq.lock); >> + return s; >> +} > I'm afraid this question was asked before, but since there's no > comment here or anywhere, I can't recall if there was a reason why > s potentially being stale by the time the caller looks at it is not a > problem. Well, it is possibe that s is stale. I did not take it as a problem because the device model will later discard such io request. And I believe current hvm_select_ioreq_server() also has the same issue - the returned s should be considered to be stale, if the MMIO/PIO address is removed from the ioreq server's rangeset. Another thought is, if you think it is inappropriate for device model to do the check, we can use spin_lock_recursive on ioreq_server.lock to protect all the ioreq server select and release the lock after the ioreq server is sent out. >> --- a/xen/include/public/hvm/dm_op.h >> +++ b/xen/include/public/hvm/dm_op.h >> @@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi { >> uint64_aligned_t addr; >> }; >> >> +/* >> + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id> >> + * to specific memroy type <type> >> + * for specific accesses <flags> >> + * >> + * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE, >> + * which means only write operations are to be forwarded to an ioreq server. >> + * Support for the emulation of read operations can be added when an ioreq >> + * server has such requirement in future. >> + */ >> +#define XEN_DMOP_map_mem_type_to_ioreq_server 15 >> + >> +struct xen_dm_op_map_mem_type_to_ioreq_server { >> + ioservid_t id; /* IN - ioreq server id */ >> + uint16_t type; /* IN - memory type */ >> + uint16_t pad; > Perhaps there was padding needed when this was a hvmop, but > now the padding does exactly the wrong thing. Right, padding is useless in this new structure. I will remove it if other field does not change(I proposed a change for flag field in reply to Andrew's comments on patch 5). Thank you, Jan. Yu > > Jan >
>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote: > On 3/10/2017 11:29 PM, Jan Beulich wrote: >>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote: >>> changes in v7: >>> - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server. >>> - According to comments from George: removed domain_pause/unpause() in >>> hvm_map_mem_type_to_ioreq_server(), because it's too expensive, >>> and we can avoid the: >>> a> deadlock between p2m lock and ioreq server lock by using these locks >>> in the same order - solved in patch 4; >> That is, until patch 4 there is deadlock potential? I think you want to >> re-order the patches if so. Or was it that the type can't really be used >> until the last patch of the series? (I'm sorry, it's been quite a while >> since the previous version.) > > Oh. There's no deadlock potential in this version patch set. But in v6, there was, and I used > domain_pause/unpause() to avoid this. Later on, I realized that if I use different locks in the > same order, the deadlock potential can be avoid and we do not need domain_pause/unpause > in this version. Well, okay, but in the future please don't add misleading change info then. >>> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid, >>> break; >>> } >>> >>> + case XEN_DMOP_map_mem_type_to_ioreq_server: >>> + { >>> + const struct xen_dm_op_map_mem_type_to_ioreq_server *data = >>> + &op.u.map_mem_type_to_ioreq_server; >>> + >>> + rc = -EINVAL; >>> + if ( data->pad ) >>> + break; >>> + >>> + /* Only support for HAP enabled hvm. */ >>> + if ( !hap_enabled(d) ) >>> + break; >> Perhaps better to give an error other than -EINVAL in this case? >> If so, then the same error should likely also be used in your >> set_mem_type() addition. > How about -ENOTSUP? If you mean -EOPNOTSUPP, then yes. >>> @@ -177,8 +178,65 @@ static int hvmemul_do_io( >>> break; >>> case X86EMUL_UNHANDLEABLE: >>> { >>> - struct hvm_ioreq_server *s = >>> - hvm_select_ioreq_server(curr->domain, &p); >>> + /* >>> + * Xen isn't emulating the instruction internally, so see if >>> + * there's an ioreq server that can handle it. Rules: >>> + * >>> + * - PIO and "normal" mmio run through hvm_select_ioreq_server() >>> + * to choose the ioreq server by range. If no server is found, >>> + * the access is ignored. >>> + * >>> + * - p2m_ioreq_server accesses are handled by the current >>> + * ioreq_server for the domain, but there are some corner >>> + * cases: >> Who or what is "the current ioreq_server for the domain"? > It means "the current ioreq_server which maps the p2m_ioreq_server type > for this domain"... > I'd like to use a succinct phrase, but now seems not accurate enough. > Any preference? Just add "designated" after "current", or replacing "current"? >>> + if ( p2mt == p2m_ioreq_server ) >>> + { >>> + unsigned int flags; >>> + >>> + s = p2m_get_ioreq_server(currd, &flags); >>> + >>> + /* >>> + * If p2mt is ioreq_server but ioreq_server is NULL, >>> + * we probably lost a race with unbinding of ioreq >>> + * server, just retry the access. >>> + */ >>> + if ( s == NULL ) >>> + { >>> + rc = X86EMUL_RETRY; >>> + vio->io_req.state = STATE_IOREQ_NONE; >>> + break; >>> + } >>> + >>> + /* >>> + * If the IOREQ_MEM_ACCESS_WRITE flag is not set, >>> + * we should set s to NULL, and just ignore such >>> + * access. >>> + */ >>> + if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) ) >>> + s = NULL; >> What is this about? You only allow WRITE registrations, so this looks >> to be dead code. Yet if it is meant to guard against future enabling >> of READ, then this clearly should not be done for reads. > > It's to guard against future emulation of READ. We can remove it for now. I guess first of all you need to settle on what you want the code to look like generally wrt reads: Do you want it to support the option as much as possible, reducing code changes to a minimum when someone wants to actually add support, or do you want to reject such attempts? Whichever variant you choose, you should carry it out consistently rather than mixing both. >>> --- a/xen/arch/x86/mm/p2m-ept.c >>> +++ b/xen/arch/x86/mm/p2m-ept.c >>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, >>> entry->r = entry->w = entry->x = 1; >>> entry->a = entry->d = !!cpu_has_vmx_ept_ad; >>> break; >>> + case p2m_ioreq_server: >>> + entry->r = 1; >>> + entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE); >> Along the lines of the previous comment - if you mean to have the >> code cope with READ, please also set ->r accordingly, or add a >> comment why this won't have the intended effect (yielding a not >> present EPTE). > > How about we keep this code and do not support READ? I'll remove above > dead code in hvmemul_do_io(). Sure, as said above: All I'd like to push for is that the result is consistent across the code base. >>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, >>> + unsigned int *flags) >>> +{ >>> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >>> + struct hvm_ioreq_server *s; >>> + >>> + spin_lock(&p2m->ioreq.lock); >>> + >>> + s = p2m->ioreq.server; >>> + *flags = p2m->ioreq.flags; >>> + >>> + spin_unlock(&p2m->ioreq.lock); >>> + return s; >>> +} >> I'm afraid this question was asked before, but since there's no >> comment here or anywhere, I can't recall if there was a reason why >> s potentially being stale by the time the caller looks at it is not a >> problem. > > Well, it is possibe that s is stale. I did not take it as a problem > because the device model > will later discard such io request. And I believe current > hvm_select_ioreq_server() also > has the same issue - the returned s should be considered to be stale, if > the MMIO/PIO > address is removed from the ioreq server's rangeset. > > Another thought is, if you think it is inappropriate for device model to > do the check, > we can use spin_lock_recursive on ioreq_server.lock to protect all the > ioreq server select > and release the lock after the ioreq server is sent out. Well, let's first ask Paul as to what his perspective here is, both specifically for this change and more generally regarding what you say above. Jan
On 3/13/2017 7:20 PM, Jan Beulich wrote: >>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote: >> On 3/10/2017 11:29 PM, Jan Beulich wrote: >>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote: >>>> changes in v7: >>>> - Use new ioreq server interface - XEN_DMOP_map_mem_type_to_ioreq_server. >>>> - According to comments from George: removed domain_pause/unpause() in >>>> hvm_map_mem_type_to_ioreq_server(), because it's too expensive, >>>> and we can avoid the: >>>> a> deadlock between p2m lock and ioreq server lock by using these locks >>>> in the same order - solved in patch 4; >>> That is, until patch 4 there is deadlock potential? I think you want to >>> re-order the patches if so. Or was it that the type can't really be used >>> until the last patch of the series? (I'm sorry, it's been quite a while >>> since the previous version.) >> Oh. There's no deadlock potential in this version patch set. But in v6, there was, and I used >> domain_pause/unpause() to avoid this. Later on, I realized that if I use different locks in the >> same order, the deadlock potential can be avoid and we do not need domain_pause/unpause >> in this version. > Well, okay, but in the future please don't add misleading change > info then. Got it. Thanks. :-) >>>> @@ -365,6 +383,24 @@ static int dm_op(domid_t domid, >>>> break; >>>> } >>>> >>>> + case XEN_DMOP_map_mem_type_to_ioreq_server: >>>> + { >>>> + const struct xen_dm_op_map_mem_type_to_ioreq_server *data = >>>> + &op.u.map_mem_type_to_ioreq_server; >>>> + >>>> + rc = -EINVAL; >>>> + if ( data->pad ) >>>> + break; >>>> + >>>> + /* Only support for HAP enabled hvm. */ >>>> + if ( !hap_enabled(d) ) >>>> + break; >>> Perhaps better to give an error other than -EINVAL in this case? >>> If so, then the same error should likely also be used in your >>> set_mem_type() addition. >> How about -ENOTSUP? > If you mean -EOPNOTSUPP, then yes. Yes, I mean -EOPNOTSUPP >>>> @@ -177,8 +178,65 @@ static int hvmemul_do_io( >>>> break; >>>> case X86EMUL_UNHANDLEABLE: >>>> { >>>> - struct hvm_ioreq_server *s = >>>> - hvm_select_ioreq_server(curr->domain, &p); >>>> + /* >>>> + * Xen isn't emulating the instruction internally, so see if >>>> + * there's an ioreq server that can handle it. Rules: >>>> + * >>>> + * - PIO and "normal" mmio run through hvm_select_ioreq_server() >>>> + * to choose the ioreq server by range. If no server is found, >>>> + * the access is ignored. >>>> + * >>>> + * - p2m_ioreq_server accesses are handled by the current >>>> + * ioreq_server for the domain, but there are some corner >>>> + * cases: >>> Who or what is "the current ioreq_server for the domain"? >> It means "the current ioreq_server which maps the p2m_ioreq_server type >> for this domain"... >> I'd like to use a succinct phrase, but now seems not accurate enough. >> Any preference? > Just add "designated" after "current", or replacing "current"? Replacing "current" with "designated" sounds good to me. :-) >>>> + if ( p2mt == p2m_ioreq_server ) >>>> + { >>>> + unsigned int flags; >>>> + >>>> + s = p2m_get_ioreq_server(currd, &flags); >>>> + >>>> + /* >>>> + * If p2mt is ioreq_server but ioreq_server is NULL, >>>> + * we probably lost a race with unbinding of ioreq >>>> + * server, just retry the access. >>>> + */ >>>> + if ( s == NULL ) >>>> + { >>>> + rc = X86EMUL_RETRY; >>>> + vio->io_req.state = STATE_IOREQ_NONE; >>>> + break; >>>> + } >>>> + >>>> + /* >>>> + * If the IOREQ_MEM_ACCESS_WRITE flag is not set, >>>> + * we should set s to NULL, and just ignore such >>>> + * access. >>>> + */ >>>> + if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) ) >>>> + s = NULL; >>> What is this about? You only allow WRITE registrations, so this looks >>> to be dead code. Yet if it is meant to guard against future enabling >>> of READ, then this clearly should not be done for reads. >> It's to guard against future emulation of READ. We can remove it for now. > I guess first of all you need to settle on what you want the code to > look like generally wrt reads: Do you want it to support the option > as much as possible, reducing code changes to a minimum when > someone wants to actually add support, or do you want to reject > such attempts? Whichever variant you choose, you should carry it > out consistently rather than mixing both. > >>>> --- a/xen/arch/x86/mm/p2m-ept.c >>>> +++ b/xen/arch/x86/mm/p2m-ept.c >>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, >>>> entry->r = entry->w = entry->x = 1; >>>> entry->a = entry->d = !!cpu_has_vmx_ept_ad; >>>> break; >>>> + case p2m_ioreq_server: >>>> + entry->r = 1; >>>> + entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE); >>> Along the lines of the previous comment - if you mean to have the >>> code cope with READ, please also set ->r accordingly, or add a >>> comment why this won't have the intended effect (yielding a not >>> present EPTE). >> How about we keep this code and do not support READ? I'll remove above >> dead code in hvmemul_do_io(). > Sure, as said above: All I'd like to push for is that the result is > consistent across the code base. Thank you, Jan. I should have keep a consistent principle in this code. My preference is to remove the possible read emulation logic. But could we still keep the definition of XEN_DMOP_IOREQ_MEM_ACCESS_READ, so that people can still know this interface can be extended in the future? >>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, >>>> + unsigned int *flags) >>>> +{ >>>> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >>>> + struct hvm_ioreq_server *s; >>>> + >>>> + spin_lock(&p2m->ioreq.lock); >>>> + >>>> + s = p2m->ioreq.server; >>>> + *flags = p2m->ioreq.flags; >>>> + >>>> + spin_unlock(&p2m->ioreq.lock); >>>> + return s; >>>> +} >>> I'm afraid this question was asked before, but since there's no >>> comment here or anywhere, I can't recall if there was a reason why >>> s potentially being stale by the time the caller looks at it is not a >>> problem. >> Well, it is possibe that s is stale. I did not take it as a problem >> because the device model >> will later discard such io request. And I believe current >> hvm_select_ioreq_server() also >> has the same issue - the returned s should be considered to be stale, if >> the MMIO/PIO >> address is removed from the ioreq server's rangeset. >> >> Another thought is, if you think it is inappropriate for device model to >> do the check, >> we can use spin_lock_recursive on ioreq_server.lock to protect all the >> ioreq server select >> and release the lock after the ioreq server is sent out. > Well, let's first ask Paul as to what his perspective here is, both > specifically for this change and more generally regarding what > you say above. Paul, any suggestions on this and the above one? :) Thanks Yu > Jan >
> -----Original Message----- [snip] > >>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, > >>>> + unsigned int *flags) > >>>> +{ > >>>> + struct p2m_domain *p2m = p2m_get_hostp2m(d); > >>>> + struct hvm_ioreq_server *s; > >>>> + > >>>> + spin_lock(&p2m->ioreq.lock); > >>>> + > >>>> + s = p2m->ioreq.server; > >>>> + *flags = p2m->ioreq.flags; > >>>> + > >>>> + spin_unlock(&p2m->ioreq.lock); > >>>> + return s; > >>>> +} > >>> I'm afraid this question was asked before, but since there's no > >>> comment here or anywhere, I can't recall if there was a reason why > >>> s potentially being stale by the time the caller looks at it is not a > >>> problem. > >> Well, it is possibe that s is stale. I did not take it as a problem > >> because the device model > >> will later discard such io request. And I believe current > >> hvm_select_ioreq_server() also > >> has the same issue - the returned s should be considered to be stale, if > >> the MMIO/PIO > >> address is removed from the ioreq server's rangeset. An enabled emulator has to be prepared to receive ioreqs for ranges it has unmapped since there is no domain_pause() to prevent a race. > >> > >> Another thought is, if you think it is inappropriate for device model to > >> do the check, > >> we can use spin_lock_recursive on ioreq_server.lock to protect all the > >> ioreq server select > >> and release the lock after the ioreq server is sent out. > > Well, let's first ask Paul as to what his perspective here is, both > > specifically for this change and more generally regarding what > > you say above. > > Paul, any suggestions on this and the above one? :) Well, as I said above, the device model has to check whether it is willing to handle and ioreq it is passed and terminate it appropriately under all circumstances. There is no option for it to reject the I/O. This may be ok for MMIO regions coming and going, but is there anything more to consider here it we change a page time from ioreq_server back to RAM? Clearly we need to make sure that there is no scope for I/O to that page being incorrectly handled during transition. Paul > > Thanks > Yu > > Jan > >
On 3/14/2017 5:40 PM, Paul Durrant wrote: >> -----Original Message----- > [snip] >>>>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, >>>>>> + unsigned int *flags) >>>>>> +{ >>>>>> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >>>>>> + struct hvm_ioreq_server *s; >>>>>> + >>>>>> + spin_lock(&p2m->ioreq.lock); >>>>>> + >>>>>> + s = p2m->ioreq.server; >>>>>> + *flags = p2m->ioreq.flags; >>>>>> + >>>>>> + spin_unlock(&p2m->ioreq.lock); >>>>>> + return s; >>>>>> +} >>>>> I'm afraid this question was asked before, but since there's no >>>>> comment here or anywhere, I can't recall if there was a reason why >>>>> s potentially being stale by the time the caller looks at it is not a >>>>> problem. >>>> Well, it is possibe that s is stale. I did not take it as a problem >>>> because the device model >>>> will later discard such io request. And I believe current >>>> hvm_select_ioreq_server() also >>>> has the same issue - the returned s should be considered to be stale, if >>>> the MMIO/PIO >>>> address is removed from the ioreq server's rangeset. > An enabled emulator has to be prepared to receive ioreqs for ranges it has unmapped since there is no domain_pause() to prevent a race. Thank you for the reply, Paul. So you mean using the ioreq server lock around this process will not prevent this either? Why? >>>> Another thought is, if you think it is inappropriate for device model to >>>> do the check, >>>> we can use spin_lock_recursive on ioreq_server.lock to protect all the >>>> ioreq server select >>>> and release the lock after the ioreq server is sent out. >>> Well, let's first ask Paul as to what his perspective here is, both >>> specifically for this change and more generally regarding what >>> you say above. >> Paul, any suggestions on this and the above one? :) > Well, as I said above, the device model has to check whether it is willing to handle and ioreq it is passed and terminate it appropriately under all circumstances. There is no option for it to reject the I/O. > This may be ok for MMIO regions coming and going, but is there anything more to consider here it we change a page time from ioreq_server back to RAM? Clearly we need to make sure that there is no scope for I/O to that page being incorrectly handled during transition. I don't think we need to worry about the p2m type change, patch 1 prevents this. The s returned by p2m_get_ioreq_server() may be stale(and is then discarded by device model in our current code), but p2m type will not be stale. I agree device model has responsibility to do such check, but I also wonder if it is possible for the hypervisor to provide some kind insurance. Thanks Yu > Paul > >> Thanks >> Yu >>> Jan >>> >
>>> On 14.03.17 at 08:28, <yu.c.zhang@linux.intel.com> wrote: > On 3/13/2017 7:20 PM, Jan Beulich wrote: >>>>> On 11.03.17 at 09:42, <yu.c.zhang@linux.intel.com> wrote: >>> On 3/10/2017 11:29 PM, Jan Beulich wrote: >>>>>>> On 08.03.17 at 16:33, <yu.c.zhang@linux.intel.com> wrote: >>>>> --- a/xen/arch/x86/mm/p2m-ept.c >>>>> +++ b/xen/arch/x86/mm/p2m-ept.c >>>>> @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, >>>>> entry->r = entry->w = entry->x = 1; >>>>> entry->a = entry->d = !!cpu_has_vmx_ept_ad; >>>>> break; >>>>> + case p2m_ioreq_server: >>>>> + entry->r = 1; >>>>> + entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE); >>>> Along the lines of the previous comment - if you mean to have the >>>> code cope with READ, please also set ->r accordingly, or add a >>>> comment why this won't have the intended effect (yielding a not >>>> present EPTE). >>> How about we keep this code and do not support READ? I'll remove above >>> dead code in hvmemul_do_io(). >> Sure, as said above: All I'd like to push for is that the result is >> consistent across the code base. > > Thank you, Jan. I should have keep a consistent principle in this code. > My preference is to remove the possible read emulation logic. But could > we still keep the > definition of XEN_DMOP_IOREQ_MEM_ACCESS_READ, so that people can still > know this > interface can be extended in the future? Of course. Jan
> -----Original Message----- > From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com] > Sent: 14 March 2017 09:53 > To: Paul Durrant <Paul.Durrant@citrix.com>; Jan Beulich > <JBeulich@suse.com> > Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap > <George.Dunlap@citrix.com>; Jun Nakajima <jun.nakajima@intel.com>; > Kevin Tian <kevin.tian@intel.com>; zhiyuan.lv@intel.com; xen- > devel@lists.xen.org; Tim (Xen.org) <tim@xen.org> > Subject: Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram > with p2m_ioreq_server to an ioreq server. > > > > On 3/14/2017 5:40 PM, Paul Durrant wrote: > >> -----Original Message----- > > [snip] > >>>>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain > *d, > >>>>>> + unsigned int *flags) > >>>>>> +{ > >>>>>> + struct p2m_domain *p2m = p2m_get_hostp2m(d); > >>>>>> + struct hvm_ioreq_server *s; > >>>>>> + > >>>>>> + spin_lock(&p2m->ioreq.lock); > >>>>>> + > >>>>>> + s = p2m->ioreq.server; > >>>>>> + *flags = p2m->ioreq.flags; > >>>>>> + > >>>>>> + spin_unlock(&p2m->ioreq.lock); > >>>>>> + return s; > >>>>>> +} > >>>>> I'm afraid this question was asked before, but since there's no > >>>>> comment here or anywhere, I can't recall if there was a reason why > >>>>> s potentially being stale by the time the caller looks at it is not a > >>>>> problem. > >>>> Well, it is possibe that s is stale. I did not take it as a problem > >>>> because the device model > >>>> will later discard such io request. And I believe current > >>>> hvm_select_ioreq_server() also > >>>> has the same issue - the returned s should be considered to be stale, if > >>>> the MMIO/PIO > >>>> address is removed from the ioreq server's rangeset. > > An enabled emulator has to be prepared to receive ioreqs for ranges it has > unmapped since there is no domain_pause() to prevent a race. > > Thank you for the reply, Paul. > So you mean using the ioreq server lock around this process will not > prevent this either? Why? > Well, if emulation as already sampled the value on another vcpu then that emulation request may race with the range being disabled. Remember that (for good reason) the lock is not held by hvm_select_ioreq_server() and is only held by hvm_unmap_io_range_from_ioreq_server() to protect against other invocations of similar manipulation functions. > >>>> Another thought is, if you think it is inappropriate for device model to > >>>> do the check, > >>>> we can use spin_lock_recursive on ioreq_server.lock to protect all the > >>>> ioreq server select > >>>> and release the lock after the ioreq server is sent out. > >>> Well, let's first ask Paul as to what his perspective here is, both > >>> specifically for this change and more generally regarding what > >>> you say above. > >> Paul, any suggestions on this and the above one? :) > > Well, as I said above, the device model has to check whether it is willing to > handle and ioreq it is passed and terminate it appropriately under all > circumstances. There is no option for it to reject the I/O. > > This may be ok for MMIO regions coming and going, but is there anything > more to consider here it we change a page time from ioreq_server back to > RAM? Clearly we need to make sure that there is no scope for I/O to that > page being incorrectly handled during transition. > > I don't think we need to worry about the p2m type change, patch 1 > prevents this. The s returned by > p2m_get_ioreq_server() may be stale(and is then discarded by device > model in our current code), but > p2m type will not be stale. I agree device model has responsibility to > do such check, but I also wonder > if it is possible for the hypervisor to provide some kind insurance. > Not in a cheap way. More locking code be added but it's likely to be convoluted and have a detrimental effect on performance. Paul > Thanks > Yu > > > Paul > > > >> Thanks > >> Yu > >>> Jan > >>> > >
On 3/14/2017 6:40 PM, Paul Durrant wrote: >> -----Original Message----- >> From: Yu Zhang [mailto:yu.c.zhang@linux.intel.com] >> Sent: 14 March 2017 09:53 >> To: Paul Durrant <Paul.Durrant@citrix.com>; Jan Beulich >> <JBeulich@suse.com> >> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap >> <George.Dunlap@citrix.com>; Jun Nakajima <jun.nakajima@intel.com>; >> Kevin Tian <kevin.tian@intel.com>; zhiyuan.lv@intel.com; xen- >> devel@lists.xen.org; Tim (Xen.org) <tim@xen.org> >> Subject: Re: [PATCH v7 2/5] x86/ioreq server: Add DMOP to map guest ram >> with p2m_ioreq_server to an ioreq server. >> >> >> >> On 3/14/2017 5:40 PM, Paul Durrant wrote: >>>> -----Original Message----- >>> [snip] >>>>>>>> +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain >> *d, >>>>>>>> + unsigned int *flags) >>>>>>>> +{ >>>>>>>> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >>>>>>>> + struct hvm_ioreq_server *s; >>>>>>>> + >>>>>>>> + spin_lock(&p2m->ioreq.lock); >>>>>>>> + >>>>>>>> + s = p2m->ioreq.server; >>>>>>>> + *flags = p2m->ioreq.flags; >>>>>>>> + >>>>>>>> + spin_unlock(&p2m->ioreq.lock); >>>>>>>> + return s; >>>>>>>> +} >>>>>>> I'm afraid this question was asked before, but since there's no >>>>>>> comment here or anywhere, I can't recall if there was a reason why >>>>>>> s potentially being stale by the time the caller looks at it is not a >>>>>>> problem. >>>>>> Well, it is possibe that s is stale. I did not take it as a problem >>>>>> because the device model >>>>>> will later discard such io request. And I believe current >>>>>> hvm_select_ioreq_server() also >>>>>> has the same issue - the returned s should be considered to be stale, if >>>>>> the MMIO/PIO >>>>>> address is removed from the ioreq server's rangeset. >>> An enabled emulator has to be prepared to receive ioreqs for ranges it has >> unmapped since there is no domain_pause() to prevent a race. >> >> Thank you for the reply, Paul. >> So you mean using the ioreq server lock around this process will not >> prevent this either? Why? >> > Well, if emulation as already sampled the value on another vcpu then that emulation request may race with the range being disabled. Remember that (for good reason) the lock is not held by hvm_select_ioreq_server() and is only held by hvm_unmap_io_range_from_ioreq_server() to protect against other invocations of similar manipulation functions. Oh. So that's why you mentioned the domain_pause() to prevent the race, right? >>>>>> Another thought is, if you think it is inappropriate for device model to >>>>>> do the check, >>>>>> we can use spin_lock_recursive on ioreq_server.lock to protect all the >>>>>> ioreq server select >>>>>> and release the lock after the ioreq server is sent out. >>>>> Well, let's first ask Paul as to what his perspective here is, both >>>>> specifically for this change and more generally regarding what >>>>> you say above. >>>> Paul, any suggestions on this and the above one? :) >>> Well, as I said above, the device model has to check whether it is willing to >> handle and ioreq it is passed and terminate it appropriately under all >> circumstances. There is no option for it to reject the I/O. >>> This may be ok for MMIO regions coming and going, but is there anything >> more to consider here it we change a page time from ioreq_server back to >> RAM? Clearly we need to make sure that there is no scope for I/O to that >> page being incorrectly handled during transition. >> >> I don't think we need to worry about the p2m type change, patch 1 >> prevents this. The s returned by >> p2m_get_ioreq_server() may be stale(and is then discarded by device >> model in our current code), but >> p2m type will not be stale. I agree device model has responsibility to >> do such check, but I also wonder >> if it is possible for the hypervisor to provide some kind insurance. >> > Not in a cheap way. More locking code be added but it's likely to be convoluted and have a detrimental effect on performance. Yep. Sounds reasonable. :) So, Jan. Could we draw a conclusion here, that although the selected ioreq server may be stale, but it should be the device model to do the check? Thanks Yu > Paul > >> Thanks >> Yu >> >>> Paul >>> >>>> Thanks >>>> Yu >>>>> Jan >>>>> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
>>> On 14.03.17 at 13:03, <yu.c.zhang@linux.intel.com> wrote: > So, Jan. Could we draw a conclusion here, that although the selected > ioreq server may be stale, but > it should be the device model to do the check? Yes. In order for this to not come up yet another time, please add a comment though. Jan
On 3/14/2017 9:10 PM, Jan Beulich wrote: >>>> On 14.03.17 at 13:03, <yu.c.zhang@linux.intel.com> wrote: >> So, Jan. Could we draw a conclusion here, that although the selected >> ioreq server may be stale, but >> it should be the device model to do the check? > Yes. In order for this to not come up yet another time, please add > a comment though. Got it. I should have done this earlier. :-) Yu > Jan > >
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index 2122c45..f97478b 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -173,9 +173,14 @@ static int modified_memory(struct domain *d, static bool allow_p2m_type_change(p2m_type_t old, p2m_type_t new) { + if ( new == p2m_ioreq_server ) + return old == p2m_ram_rw; + + if ( old == p2m_ioreq_server ) + return new == p2m_ram_rw; + return p2m_is_ram(old) || - (p2m_is_hole(old) && new == p2m_mmio_dm) || - (old == p2m_ioreq_server && new == p2m_ram_rw); + (p2m_is_hole(old) && new == p2m_mmio_dm); } static int set_mem_type(struct domain *d, @@ -202,6 +207,19 @@ static int set_mem_type(struct domain *d, unlikely(data->mem_type == HVMMEM_unused) ) return -EINVAL; + if ( data->mem_type == HVMMEM_ioreq_server ) + { + unsigned int flags; + + /* HVMMEM_ioreq_server is only supported for HAP enabled hvm. */ + if ( !hap_enabled(d) ) + return -EINVAL; + + /* Do not change to HVMMEM_ioreq_server if no ioreq server mapped. */ + if ( !p2m_get_ioreq_server(d, &flags) ) + return -EINVAL; + } + while ( iter < data->nr ) { unsigned long pfn = data->first_pfn + iter; @@ -365,6 +383,24 @@ static int dm_op(domid_t domid, break; } + case XEN_DMOP_map_mem_type_to_ioreq_server: + { + const struct xen_dm_op_map_mem_type_to_ioreq_server *data = + &op.u.map_mem_type_to_ioreq_server; + + rc = -EINVAL; + if ( data->pad ) + break; + + /* Only support for HAP enabled hvm. */ + if ( !hap_enabled(d) ) + break; + + rc = hvm_map_mem_type_to_ioreq_server(d, data->id, + data->type, data->flags); + break; + } + case XEN_DMOP_set_ioreq_server_state: { const struct xen_dm_op_set_ioreq_server_state *data = diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 1c66010..fb56f7b 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -99,6 +99,7 @@ static int hvmemul_do_io( uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data) { struct vcpu *curr = current; + struct domain *currd = curr->domain; struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; ioreq_t p = { .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, @@ -140,7 +141,7 @@ static int hvmemul_do_io( (p.dir != dir) || (p.df != df) || (p.data_is_ptr != data_is_addr) ) - domain_crash(curr->domain); + domain_crash(currd); if ( data_is_addr ) return X86EMUL_UNHANDLEABLE; @@ -177,8 +178,65 @@ static int hvmemul_do_io( break; case X86EMUL_UNHANDLEABLE: { - struct hvm_ioreq_server *s = - hvm_select_ioreq_server(curr->domain, &p); + /* + * Xen isn't emulating the instruction internally, so see if + * there's an ioreq server that can handle it. Rules: + * + * - PIO and "normal" mmio run through hvm_select_ioreq_server() + * to choose the ioreq server by range. If no server is found, + * the access is ignored. + * + * - p2m_ioreq_server accesses are handled by the current + * ioreq_server for the domain, but there are some corner + * cases: + * + * - If the domain ioreq_server is NULL, assume this is a + * race between the unbinding of ioreq server and guest fault + * so re-try the instruction. + * + * - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it + * like a normal PIO or MMIO that doesn't have an ioreq + * server (i.e., by ignoring it). + */ + struct hvm_ioreq_server *s = NULL; + p2m_type_t p2mt = p2m_invalid; + + if ( is_mmio ) + { + unsigned long gmfn = paddr_to_pfn(addr); + + (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); + + if ( p2mt == p2m_ioreq_server ) + { + unsigned int flags; + + s = p2m_get_ioreq_server(currd, &flags); + + /* + * If p2mt is ioreq_server but ioreq_server is NULL, + * we probably lost a race with unbinding of ioreq + * server, just retry the access. + */ + if ( s == NULL ) + { + rc = X86EMUL_RETRY; + vio->io_req.state = STATE_IOREQ_NONE; + break; + } + + /* + * If the IOREQ_MEM_ACCESS_WRITE flag is not set, + * we should set s to NULL, and just ignore such + * access. + */ + if ( !(flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) ) + s = NULL; + } + } + + if ( !s && p2mt != p2m_ioreq_server ) + s = hvm_select_ioreq_server(currd, &p); /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index ebb3eca..fcb9f38 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -753,6 +753,8 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) domain_pause(d); + p2m_destroy_ioreq_server(d, s); + hvm_ioreq_server_disable(s, 0); list_del(&s->list_entry); @@ -914,6 +916,42 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, return rc; } +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags) +{ + struct hvm_ioreq_server *s; + int rc; + + /* For now, only HVMMEM_ioreq_server is supported. */ + if ( type != HVMMEM_ioreq_server ) + return -EINVAL; + + /* For now, only write emulation is supported. */ + if ( flags & ~(XEN_DMOP_IOREQ_MEM_ACCESS_WRITE) ) + return -EINVAL; + + spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock); + + rc = -ENOENT; + list_for_each_entry ( s, + &d->arch.hvm_domain.ioreq_server.list, + list_entry ) + { + if ( s == d->arch.hvm_domain.default_ioreq_server ) + continue; + + if ( s->id == id ) + { + rc = p2m_set_ioreq_server(d, flags, s); + break; + } + } + + spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock); + + return rc; +} + int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, bool_t enabled) { diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index 162afed..408ea7f 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -172,7 +172,7 @@ nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa, if ( *p2mt == p2m_mmio_direct ) goto direct_mmio_out; rc = NESTEDHVM_PAGEFAULT_MMIO; - if ( *p2mt == p2m_mmio_dm ) + if ( *p2mt == p2m_mmio_dm || *p2mt == p2m_ioreq_server ) goto out; rc = NESTEDHVM_PAGEFAULT_L0_ERROR; diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 568944f..cc1eb21 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -131,6 +131,13 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, entry->r = entry->w = entry->x = 1; entry->a = entry->d = !!cpu_has_vmx_ept_ad; break; + case p2m_ioreq_server: + entry->r = 1; + entry->w = !(p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE); + entry->x = 0; + entry->a = !!cpu_has_vmx_ept_ad; + entry->d = entry->w && entry->a; + break; case p2m_mmio_direct: entry->r = entry->x = 1; entry->w = !rangeset_contains_singleton(mmio_ro_ranges, @@ -170,7 +177,6 @@ static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry, entry->a = entry->d = !!cpu_has_vmx_ept_ad; break; case p2m_grant_map_ro: - case p2m_ioreq_server: entry->r = 1; entry->w = entry->x = 0; entry->a = !!cpu_has_vmx_ept_ad; diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index bbfa54e..97dc25d 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -70,7 +70,9 @@ static const unsigned long pgt[] = { PGT_l3_page_table }; -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn, +static unsigned long p2m_type_to_flags(const struct p2m_domain *p2m, + p2m_type_t t, + mfn_t mfn, unsigned int level) { unsigned long flags; @@ -92,8 +94,13 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn, default: return flags | _PAGE_NX_BIT; case p2m_grant_map_ro: - case p2m_ioreq_server: return flags | P2M_BASE_FLAGS | _PAGE_NX_BIT; + case p2m_ioreq_server: + flags |= P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT; + if ( p2m->ioreq.flags & XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) + return flags & ~_PAGE_RW; + else + return flags; case p2m_ram_ro: case p2m_ram_logdirty: case p2m_ram_shared: @@ -440,7 +447,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) ? p2m_ram_logdirty : p2m_ram_rw; unsigned long mfn = l1e_get_pfn(e); - unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level); + unsigned long flags = p2m_type_to_flags(p2m, p2mt, + _mfn(mfn), level); if ( level ) { @@ -578,7 +586,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct); l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ? l3e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE) + p2m_type_to_flags(p2m, p2mt, mfn, 2) | _PAGE_PSE) : l3e_empty(); entry_content.l1 = l3e_content.l3; @@ -615,7 +623,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ) entry_content = p2m_l1e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn, 0)); + p2m_type_to_flags(p2m, p2mt, mfn, 0)); else entry_content = l1e_empty(); @@ -652,7 +660,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct); if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ) l2e_content = l2e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn, 1) | + p2m_type_to_flags(p2m, p2mt, mfn, 1) | _PAGE_PSE); else l2e_content = l2e_empty(); diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 2eee9cd..0edfc61 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -82,6 +82,8 @@ static int p2m_initialise(struct domain *d, struct p2m_domain *p2m) else p2m_pt_init(p2m); + spin_lock_init(&p2m->ioreq.lock); + return ret; } @@ -286,6 +288,78 @@ void p2m_memory_type_changed(struct domain *d) } } +int p2m_set_ioreq_server(struct domain *d, + unsigned int flags, + struct hvm_ioreq_server *s) +{ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + int rc; + + /* + * Use lock to prevent concurrent setting attempts + * from multiple ioreq serers. + */ + spin_lock(&p2m->ioreq.lock); + + /* Unmap ioreq server from p2m type by passing flags with 0. */ + if ( flags == 0 ) + { + rc = -EINVAL; + if ( p2m->ioreq.server != s ) + goto out; + + p2m->ioreq.server = NULL; + p2m->ioreq.flags = 0; + } + else + { + rc = -EBUSY; + if ( p2m->ioreq.server != NULL ) + goto out; + + p2m->ioreq.server = s; + p2m->ioreq.flags = flags; + } + + rc = 0; + + out: + spin_unlock(&p2m->ioreq.lock); + + return rc; +} + +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, + unsigned int *flags) +{ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + struct hvm_ioreq_server *s; + + spin_lock(&p2m->ioreq.lock); + + s = p2m->ioreq.server; + *flags = p2m->ioreq.flags; + + spin_unlock(&p2m->ioreq.lock); + return s; +} + +void p2m_destroy_ioreq_server(const struct domain *d, + const struct hvm_ioreq_server *s) +{ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + spin_lock(&p2m->ioreq.lock); + + if ( p2m->ioreq.server == s ) + { + p2m->ioreq.server = NULL; + p2m->ioreq.flags = 0; + } + + spin_unlock(&p2m->ioreq.lock); +} + void p2m_enable_hardware_log_dirty(struct domain *d) { struct p2m_domain *p2m = p2m_get_hostp2m(d); diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 128809d..7414060 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -3211,8 +3211,7 @@ static int sh_page_fault(struct vcpu *v, } /* Need to hand off device-model MMIO to the device model */ - if ( p2mt == p2m_mmio_dm - || (p2mt == p2m_ioreq_server && ft == ft_demand_write) ) + if ( p2mt == p2m_mmio_dm ) { gpa = guest_walk_to_gpa(&gw); goto mmio; diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index fbf2c74..b43667a 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -37,6 +37,8 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, uint32_t type, uint64_t start, uint64_t end); +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags); int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, bool_t enabled); diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 470d29d..3786680 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -89,7 +89,8 @@ typedef unsigned int p2m_query_t; | p2m_to_mask(p2m_ram_paging_out) \ | p2m_to_mask(p2m_ram_paged) \ | p2m_to_mask(p2m_ram_paging_in) \ - | p2m_to_mask(p2m_ram_shared)) + | p2m_to_mask(p2m_ram_shared) \ + | p2m_to_mask(p2m_ioreq_server)) /* Types that represent a physmap hole that is ok to replace with a shared * entry */ @@ -111,8 +112,7 @@ typedef unsigned int p2m_query_t; #define P2M_RO_TYPES (p2m_to_mask(p2m_ram_logdirty) \ | p2m_to_mask(p2m_ram_ro) \ | p2m_to_mask(p2m_grant_map_ro) \ - | p2m_to_mask(p2m_ram_shared) \ - | p2m_to_mask(p2m_ioreq_server)) + | p2m_to_mask(p2m_ram_shared)) /* Write-discard types, which should discard the write operations */ #define P2M_DISCARD_WRITE_TYPES (p2m_to_mask(p2m_ram_ro) \ @@ -336,6 +336,20 @@ struct p2m_domain { struct ept_data ept; /* NPT-equivalent structure could be added here. */ }; + + struct { + spinlock_t lock; + /* + * ioreq server who's responsible for the emulation of + * gfns with specific p2m type(for now, p2m_ioreq_server). + */ + struct hvm_ioreq_server *server; + /* + * flags specifies whether read, write or both operations + * are to be emulated by an ioreq server. + */ + unsigned int flags; + } ioreq; }; /* get host p2m table */ @@ -827,6 +841,12 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn) return flags; } +int p2m_set_ioreq_server(struct domain *d, unsigned int flags, + struct hvm_ioreq_server *s); +struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, + unsigned int *flags); +void p2m_destroy_ioreq_server(const struct domain *d, const struct hvm_ioreq_server *s); + #endif /* _XEN_ASM_X86_P2M_H */ /* diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h index f54cece..c643b67 100644 --- a/xen/include/public/hvm/dm_op.h +++ b/xen/include/public/hvm/dm_op.h @@ -318,6 +318,30 @@ struct xen_dm_op_inject_msi { uint64_aligned_t addr; }; +/* + * XEN_DMOP_map_mem_type_to_ioreq_server : map or unmap the IOREQ Server <id> + * to specific memroy type <type> + * for specific accesses <flags> + * + * For now, flags only accept the value of XEN_DMOP_IOREQ_MEM_ACCESS_WRITE, + * which means only write operations are to be forwarded to an ioreq server. + * Support for the emulation of read operations can be added when an ioreq + * server has such requirement in future. + */ +#define XEN_DMOP_map_mem_type_to_ioreq_server 15 + +struct xen_dm_op_map_mem_type_to_ioreq_server { + ioservid_t id; /* IN - ioreq server id */ + uint16_t type; /* IN - memory type */ + uint16_t pad; + uint32_t flags; /* IN - types of accesses to be forwarded to the + ioreq server. flags with 0 means to unmap the + ioreq server */ + +#define XEN_DMOP_IOREQ_MEM_ACCESS_READ (1u << 0) +#define XEN_DMOP_IOREQ_MEM_ACCESS_WRITE (1u << 1) +}; + struct xen_dm_op { uint32_t op; uint32_t pad; @@ -336,6 +360,8 @@ struct xen_dm_op { struct xen_dm_op_set_mem_type set_mem_type; struct xen_dm_op_inject_event inject_event; struct xen_dm_op_inject_msi inject_msi; + struct xen_dm_op_map_mem_type_to_ioreq_server + map_mem_type_to_ioreq_server; } u; }; diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index bc00ef0..0bdafdf 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -93,7 +93,13 @@ typedef enum { HVMMEM_unused, /* Placeholder; setting memory to this type will fail for code after 4.7.0 */ #endif - HVMMEM_ioreq_server + HVMMEM_ioreq_server /* Memory type claimed by an ioreq server; type + changes to this value are only allowed after + an ioreq server has claimed its ownership. + Only pages with HVMMEM_ram_rw are allowed to + change to this type; conversely, pages with + this type are only allowed to be changed back + to HVMMEM_ram_rw. */ } hvmmem_type_t; /* Hint from PV drivers for pagetable destruction. */