Message ID | 569780BD02000078000C6A1E@prv-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2016-01-14 at 03:04 -0700, Jan Beulich wrote: > - ARM side unimplemented (and hence libxc for now made cope with both > models), So, one model is the one described in the commit message: > - zero (success, everything done) > - positive (success, this many done, more to do: re-invoke) > - negative (error) What is the other one? I'd expect ARM to already implement a subset of this (i.e. 0 or negative, perhaps with a subset of the possible errno values), which I'd then expect libxc to just cope with without it constituting a second model. IOW I don't think there should be (or indeed is) any special casing of ARM vs x86 here or one model vs another, just a case of one arch only using a subset of the expressibility of the interface. What have I missed?
>>> On 15.01.16 at 11:09, <ian.campbell@citrix.com> wrote: > On Thu, 2016-01-14 at 03:04 -0700, Jan Beulich wrote: >> - ARM side unimplemented (and hence libxc for now made cope with both >> models), > > So, one model is the one described in the commit message: > >> - zero (success, everything done) >> - positive (success, this many done, more to do: re-invoke) >> - negative (error) > > What is the other one? I'd expect ARM to already implement a subset of this > (i.e. 0 or negative, perhaps with a subset of the possible errno values), > which I'd then expect libxc to just cope with without it constituting a > second model. Well, first of all ARM doesn't get switched away from the current model (yet), i.e. returning -E2BIG out of do_domctl(). And then the difference between what the patch implements and what the non-commit message comment describes is how errors get handled: The patch makes a negative error value returned upon error, with the caller having no way to tell at what point the error occurred (and with a best effort undo in the case of "map"). The described alternative would return the number of succeeded entries unless an error occurred on the very first MFN, without any attempt to undo the part that was done successfully. I.e. it would leave it to the caller to decide what to do, and whether/when to roll back. Jan
On Fri, 2016-01-15 at 03:47 -0700, Jan Beulich wrote: > > > > On 15.01.16 at 11:09, <ian.campbell@citrix.com> wrote: > > On Thu, 2016-01-14 at 03:04 -0700, Jan Beulich wrote: > > > - ARM side unimplemented (and hence libxc for now made cope with both > > > models), > > > > So, one model is the one described in the commit message: > > > > > - zero (success, everything done) > > > - positive (success, this many done, more to do: re-invoke) > > > - negative (error) > > > > What is the other one? I'd expect ARM to already implement a subset of > > this > > (i.e. 0 or negative, perhaps with a subset of the possible errno > > values), > > which I'd then expect libxc to just cope with without it constituting a > > second model. > > Well, first of all ARM doesn't get switched away from the current > model (yet), i.e. returning -E2BIG out of do_domctl(). Which AFAICT is a valid behaviour under the new model described in the commit message specifically the "negative (error)" case. I think the core of my objection/confusion here is describing this as two different models when what is being introduced is a single API which can fail either partially or entirely, with that being at the discretion of the internals. In any case libxc needs to cope with the complete gamut of behaviours of the interface. IOW rather than describing a new API and referring obliquely to ARM only supporting an old model I think this needs a complete description of the interface covering the full possibilities of the API. > And then > the difference between what the patch implements and what the > non-commit message comment describes is how errors get handled: > The patch makes a negative error value returned upon error, with > the caller having no way to tell at what point the error occurred > (and with a best effort undo in the case of "map"). The described > alternative would return the number of succeeded entries unless > an error occurred on the very first MFN, without any attempt to > undo the part that was done successfully. I.e. it would leave it > to the caller to decide what to do, and whether/when to roll back. That's (probably, I don't quite follow all the details as written) fine, but the interface should be described as a single API with the possible failure cases each spelled out, i.e. not described as a split/contrast between old vs. new or x86 vs. arm behaviour. The fact that x86 and arm might currently exhibit different subsets of the possibilities offered by the API is of at best secondary interest. Ian.
>>> On 15.01.16 at 14:57, <ian.campbell@citrix.com> wrote: > On Fri, 2016-01-15 at 03:47 -0700, Jan Beulich wrote: >> > > > On 15.01.16 at 11:09, <ian.campbell@citrix.com> wrote: >> > On Thu, 2016-01-14 at 03:04 -0700, Jan Beulich wrote: >> > > - ARM side unimplemented (and hence libxc for now made cope with both >> > > models), >> > >> > So, one model is the one described in the commit message: >> > >> > > - zero (success, everything done) >> > > - positive (success, this many done, more to do: re-invoke) >> > > - negative (error) >> > >> > What is the other one? I'd expect ARM to already implement a subset of >> > this >> > (i.e. 0 or negative, perhaps with a subset of the possible errno >> > values), >> > which I'd then expect libxc to just cope with without it constituting a >> > second model. >> >> Well, first of all ARM doesn't get switched away from the current >> model (yet), i.e. returning -E2BIG out of do_domctl(). > > Which AFAICT is a valid behaviour under the new model described in the > commit message specifically the "negative (error)" case. > > I think the core of my objection/confusion here is describing this as two > different models when what is being introduced is a single API which can > fail either partially or entirely, with that being at the discretion of the > internals. In any case libxc needs to cope with the complete gamut of > behaviours of the interface. > > IOW rather than describing a new API and referring obliquely to ARM only > supporting an old model I think this needs a complete description of the > interface covering the full possibilities of the API. > >> And then >> the difference between what the patch implements and what the >> non-commit message comment describes is how errors get handled: >> The patch makes a negative error value returned upon error, with >> the caller having no way to tell at what point the error occurred >> (and with a best effort undo in the case of "map"). The described >> alternative would return the number of succeeded entries unless >> an error occurred on the very first MFN, without any attempt to >> undo the part that was done successfully. I.e. it would leave it >> to the caller to decide what to do, and whether/when to roll back. > > That's (probably, I don't quite follow all the details as written) fine, > but the interface should be described as a single API with the possible > failure cases each spelled out, i.e. not described as a split/contrast > between old vs. new or x86 vs. arm behaviour. The fact that x86 and arm > might currently exhibit different subsets of the possibilities offered by > the API is of at best secondary interest. I don't think I agree - there are two models. The meaning of -E2BIG for the caller to retry with a smaller amount doesn't exist in the new model anymore, and hence libxc wouldn't need to deal with that case anymore if the ARM side got updated too. Whereas positive return values don't exist in the present (prior to the patch) model. Jan
On Fri, 2016-01-15 at 07:39 -0700, Jan Beulich wrote: > > > > On 15.01.16 at 14:57, <ian.campbell@citrix.com> wrote: > > On Fri, 2016-01-15 at 03:47 -0700, Jan Beulich wrote: > > > > > > On 15.01.16 at 11:09, <ian.campbell@citrix.com> wrote: > > > > On Thu, 2016-01-14 at 03:04 -0700, Jan Beulich wrote: > > > > > - ARM side unimplemented (and hence libxc for now made cope with > > > > > both > > > > > models), > > > > > > > > So, one model is the one described in the commit message: > > > > > > > > > - zero (success, everything done) > > > > > - positive (success, this many done, more to do: re-invoke) > > > > > - negative (error) > > > > > > > > What is the other one? I'd expect ARM to already implement a subset > > > > of > > > > this > > > > (i.e. 0 or negative, perhaps with a subset of the possible errno > > > > values), > > > > which I'd then expect libxc to just cope with without it > > > > constituting a > > > > second model. > > > > > > Well, first of all ARM doesn't get switched away from the current > > > model (yet), i.e. returning -E2BIG out of do_domctl(). > > > > Which AFAICT is a valid behaviour under the new model described in the > > commit message specifically the "negative (error)" case. > > > > I think the core of my objection/confusion here is describing this as > > two > > different models when what is being introduced is a single API which > > can > > fail either partially or entirely, with that being at the discretion of > > the > > internals. In any case libxc needs to cope with the complete gamut of > > behaviours of the interface. > > > > IOW rather than describing a new API and referring obliquely to ARM > > only > > supporting an old model I think this needs a complete description of > > the > > interface covering the full possibilities of the API. > > > > > And then > > > the difference between what the patch implements and what the > > > non-commit message comment describes is how errors get handled: > > > The patch makes a negative error value returned upon error, with > > > the caller having no way to tell at what point the error occurred > > > (and with a best effort undo in the case of "map"). The described > > > alternative would return the number of succeeded entries unless > > > an error occurred on the very first MFN, without any attempt to > > > undo the part that was done successfully. I.e. it would leave it > > > to the caller to decide what to do, and whether/when to roll back. > > > > That's (probably, I don't quite follow all the details as written) > > fine, > > but the interface should be described as a single API with the possible > > failure cases each spelled out, i.e. not described as a split/contrast > > between old vs. new or x86 vs. arm behaviour. The fact that x86 and arm > > might currently exhibit different subsets of the possibilities offered > > by > > the API is of at best secondary interest. > > I don't think I agree - there are two models. The meaning of > -E2BIG for the caller to retry with a smaller amount doesn't exist in > the new model anymore, and hence libxc wouldn't need to deal > with that case anymore if the ARM side got updated too. If ARM still has this behaviour then it is still part of the interface IMHO, whether or not x86 chooses to use this particular possibility or not. > Whereas > positive return values don't exist in the present (prior to the patch) > model. If there were two models in the way you suggest then there would surely be an ifdef somewhere in libxc. The fact that the two behaviours can coexist means to me that they are two halves of the same model (irrespective of arch code opting in to different halves, and irrespective if having updated ARM there are then fewer possible error cases and a follow up simplification to libxc). Anyway, the current three-bullet point description of the new ABI in the commit message is clearly insufficient for the complexity whether we want to split hairs about how many models there are here or not. At the very least the interface (_all_ aspects of it) should be thoroughly described in domctl.h next to XEN_DOMCTL_memory_mapping (which I just noticed describes E2BIG and isn't changed here at all). Ian.
>>> On 15.01.16 at 15:55, <ian.campbell@citrix.com> wrote: > On Fri, 2016-01-15 at 07:39 -0700, Jan Beulich wrote: >> I don't think I agree - there are two models. The meaning of >> -E2BIG for the caller to retry with a smaller amount doesn't exist in >> the new model anymore, and hence libxc wouldn't need to deal >> with that case anymore if the ARM side got updated too. > > If ARM still has this behaviour then it is still part of the interface > IMHO, whether or not x86 chooses to use this particular possibility or not. Okay, that's a valid perspective. >> Whereas >> positive return values don't exist in the present (prior to the patch) >> model. > > If there were two models in the way you suggest then there would surely be > an ifdef somewhere in libxc. The fact that the two behaviours can coexist > means to me that they are two halves of the same model (irrespective of > arch code opting in to different halves, and irrespective if having updated > ARM there are then fewer possible error cases and a follow up > simplification to libxc). Same here. > Anyway, the current three-bullet point description of the new ABI in the > commit message is clearly insufficient for the complexity whether we want > to split hairs about how many models there are here or not. > > At the very least the interface (_all_ aspects of it) should be thoroughly > described in domctl.h next to XEN_DOMCTL_memory_mapping (which I just > noticed describes E2BIG and isn't changed here at all). I can certainly do that, but I'd like to avoid doing this for the current model before having taken a decision on whether to instead use the alternative described in the post-commit message issue list. In fact, the more I think about it, the more I'm convinced that the alternative provides the more consistent interface, no matter that it leaves more of the (cleanup) work to the caller. Jan
On Mon, 2016-01-18 at 01:11 -0700, Jan Beulich wrote: > > > > On 15.01.16 at 15:55, <ian.campbell@citrix.com> wrote: > > On Fri, 2016-01-15 at 07:39 -0700, Jan Beulich wrote: > > > I don't think I agree - there are two models. The meaning of > > > -E2BIG for the caller to retry with a smaller amount doesn't exist in > > > the new model anymore, and hence libxc wouldn't need to deal > > > with that case anymore if the ARM side got updated too. > > > > If ARM still has this behaviour then it is still part of the interface > > IMHO, whether or not x86 chooses to use this particular possibility or > > not. > > Okay, that's a valid perspective. > > > > Whereas > > > positive return values don't exist in the present (prior to the > > > patch) > > > model. > > > > If there were two models in the way you suggest then there would surely > > be > > an ifdef somewhere in libxc. The fact that the two behaviours can > > coexist > > means to me that they are two halves of the same model (irrespective of > > arch code opting in to different halves, and irrespective if having > > updated > > ARM there are then fewer possible error cases and a follow up > > simplification to libxc). > > Same here. > > > Anyway, the current three-bullet point description of the new ABI in > > the > > commit message is clearly insufficient for the complexity whether we > > want > > to split hairs about how many models there are here or not. > > > > At the very least the interface (_all_ aspects of it) should be > > thoroughly > > described in domctl.h next to XEN_DOMCTL_memory_mapping (which I just > > noticed describes E2BIG and isn't changed here at all). > > I can certainly do that, but I'd like to avoid doing this for the current > model before having taken a decision on whether to instead use the > alternative described in the post-commit message issue list. In fact, > the more I think about it, the more I'm convinced that the alternative > provides the more consistent interface, no matter that it leaves more > of the (cleanup) work to the caller. I must confess I'm not entirely following what the various proposals are, but FWIW I have no in-principal problem with the caller (by which I think you mean the tools?) having to cleanup partial success in order to allow incremental attempts to set things up with smaller and smaller page sizes. Ian.
>>> On 18.01.16 at 17:32, <ian.campbell@citrix.com> wrote: > I must confess I'm not entirely following what the various proposals are, What is currently implemented by the patch is that, upon error on iteration N the hypervisor would clean up on a best effort basis and return the error indicator. In the alternative suggested model it wouldn't do any cleanup and return N to indicate how far success was seen; only in the event that N=0 would an error code be returned. > but FWIW I have no in-principal problem with the caller (by which I think > you mean the tools?) Yes. > having to cleanup partial success in order to allow > incremental attempts to set things up with smaller and smaller page sizes. Except that in the new x86 model we're not talking about decreasing page size, but just the splitting the hypervisor does in place of true preemption. Decreasing page size would actually be harmful to the goal of using large pages for the mappings. Jan
On Mon, 2016-01-18 at 09:51 -0700, Jan Beulich wrote: > > > > On 18.01.16 at 17:32, <ian.campbell@citrix.com> wrote: > > I must confess I'm not entirely following what the various proposals > > are, > > What is currently implemented by the patch is that, upon error on > iteration N the hypervisor would clean up on a best effort basis and > return the error indicator. In the alternative suggested model it > wouldn't do any cleanup and return N to indicate how far success > was seen; only in the event that N=0 would an error code be > returned. > > > but FWIW I have no in-principal problem with the caller (by which I > > think > > you mean the tools?) > > Yes. > > > having to cleanup partial success in order to allow > > incremental attempts to set things up with smaller and smaller page > > sizes. > > Except that in the new x86 model we're not talking about decreasing > page size, but just the splitting the hypervisor does in place of true > preemption. Decreasing page size would actually be harmful to the > goal of using large pages for the mappings. Ah, I assumed it was to allow things to progress if no large pages were actually around. Doing it for preemption purposes sounds ok too I guess. Ian.
--- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -2206,7 +2206,7 @@ int xc_domain_memory_mapping( { DECLARE_DOMCTL; xc_dominfo_t info; - int ret = 0, err; + int ret = 0, rc; unsigned long done = 0, nr, max_batch_sz; if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 || @@ -2231,19 +2231,24 @@ int xc_domain_memory_mapping( domctl.u.memory_mapping.nr_mfns = nr; domctl.u.memory_mapping.first_gfn = first_gfn + done; domctl.u.memory_mapping.first_mfn = first_mfn + done; - err = do_domctl(xch, &domctl); - if ( err && errno == E2BIG ) + rc = do_domctl(xch, &domctl); + if ( rc < 0 && errno == E2BIG ) { if ( max_batch_sz <= 1 ) break; max_batch_sz >>= 1; continue; } + if ( rc > 0 ) + { + done += rc; + continue; + } /* Save the first error... */ if ( !ret ) - ret = err; + ret = rc; /* .. and ignore the rest of them when removing. */ - if ( err && add_mapping != DPCI_REMOVE_MAPPING ) + if ( rc && add_mapping != DPCI_REMOVE_MAPPING ) break; done += nr; --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -436,7 +436,8 @@ static __init void pvh_add_mem_mapping(s else a = p2m_access_rw; - if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) ) + if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), + PAGE_ORDER_4K, a)) ) panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n", gfn, mfn, i, rc); if ( !(i & 0xfffff) ) --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2491,7 +2491,7 @@ static int vmx_alloc_vlapic_mapping(stru share_xen_page_with_guest(pg, d, XENSHARE_writable); d->arch.hvm_domain.vmx.apic_access_mfn = mfn; set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE), _mfn(mfn), - p2m_get_hostp2m(d)->default_access); + PAGE_ORDER_4K, p2m_get_hostp2m(d)->default_access); return 0; } --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -899,48 +899,62 @@ void p2m_change_type_range(struct domain p2m_unlock(p2m); } -/* Returns: 0 for success, -errno for failure */ +/* + * Returns: + * 0 for success + * -errno for failure + * order+1 for caller to retry with order (guaranteed smaller than + * the order value passed in) + */ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, - p2m_type_t gfn_p2mt, p2m_access_t access) + unsigned int order, p2m_type_t gfn_p2mt, + p2m_access_t access) { int rc = 0; p2m_access_t a; p2m_type_t ot; mfn_t omfn; + unsigned int cur_order = 0; struct p2m_domain *p2m = p2m_get_hostp2m(d); if ( !paging_mode_translate(d) ) return -EIO; - gfn_lock(p2m, gfn, 0); - omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL, NULL); + gfn_lock(p2m, gfn, order); + omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, &cur_order, NULL); + if ( cur_order < order ) + { + gfn_unlock(p2m, gfn, order); + return cur_order + 1; + } if ( p2m_is_grant(ot) || p2m_is_foreign(ot) ) { - gfn_unlock(p2m, gfn, 0); + gfn_unlock(p2m, gfn, order); domain_crash(d); return -ENOENT; } else if ( p2m_is_ram(ot) ) { + unsigned long i; + ASSERT(mfn_valid(omfn)); - set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY); + for ( i = 0; i < (1UL << order); ++i ) + set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY); } P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn)); - rc = p2m_set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt, - access); + rc = p2m_set_entry(p2m, gfn, mfn, order, gfn_p2mt, access); if ( rc ) - gdprintk(XENLOG_ERR, - "p2m_set_entry failed! mfn=%08lx rc:%d\n", - mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc); + gdprintk(XENLOG_ERR, "p2m_set_entry: %#lx:%u -> %d (0x%"PRI_mfn")\n", + gfn, order, rc, mfn_x(mfn)); else if ( p2m_is_pod(ot) ) { pod_lock(p2m); - p2m->pod.entry_count--; + p2m->pod.entry_count -= 1UL << order; BUG_ON(p2m->pod.entry_count < 0); pod_unlock(p2m); } - gfn_unlock(p2m, gfn, 0); + gfn_unlock(p2m, gfn, order); return rc; } @@ -949,14 +963,21 @@ static int set_typed_p2m_entry(struct do static int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) { - return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign, + return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign, p2m_get_hostp2m(d)->default_access); } int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, - p2m_access_t access) + unsigned int order, p2m_access_t access) { - return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access); + if ( order && + rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn), + mfn_x(mfn) + (1UL << order) - 1) && + !rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn), + mfn_x(mfn) + (1UL << order) - 1) ) + return order; + + return set_typed_p2m_entry(d, gfn, mfn, order, p2m_mmio_direct, access); } int set_identity_p2m_entry(struct domain *d, unsigned long gfn, @@ -1009,20 +1030,33 @@ int set_identity_p2m_entry(struct domain return ret; } -/* Returns: 0 for success, -errno for failure */ -int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) +/* + * Returns: + * 0 for success + * -errno for failure + * order+1 for caller to retry with order (guaranteed smaller than + * the order value passed in) + */ +int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, + unsigned int order) { int rc = -EINVAL; mfn_t actual_mfn; p2m_access_t a; p2m_type_t t; + unsigned int cur_order = 0; struct p2m_domain *p2m = p2m_get_hostp2m(d); if ( !paging_mode_translate(d) ) return -EIO; - gfn_lock(p2m, gfn, 0); - actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL); + gfn_lock(p2m, gfn, order); + actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, &cur_order, NULL); + if ( cur_order < order ) + { + rc = cur_order + 1; + goto out; + } /* Do not use mfn_valid() here as it will usually fail for MMIO pages. */ if ( (INVALID_MFN == mfn_x(actual_mfn)) || (t != p2m_mmio_direct) ) @@ -1035,11 +1069,11 @@ int clear_mmio_p2m_entry(struct domain * gdprintk(XENLOG_WARNING, "no mapping between mfn %08lx and gfn %08lx\n", mfn_x(mfn), gfn); - rc = p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), PAGE_ORDER_4K, p2m_invalid, + rc = p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), order, p2m_invalid, p2m->default_access); out: - gfn_unlock(p2m, gfn, 0); + gfn_unlock(p2m, gfn, order); return rc; } @@ -2095,6 +2129,25 @@ void *map_domain_gfn(struct p2m_domain * return map_domain_page(*mfn); } +static unsigned int mmio_order(const struct domain *d, + unsigned long start_fn, unsigned long nr) +{ + if ( !need_iommu(d) || !iommu_use_hap_pt(d) || + (start_fn & ((1UL << PAGE_ORDER_2M) - 1)) || !(nr >> PAGE_ORDER_2M) ) + return 0; + + if ( !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && (nr >> PAGE_ORDER_1G) && + hap_has_1gb ) + return PAGE_ORDER_1G; + + if ( hap_has_2mb ) + return PAGE_ORDER_2M; + + return 0; +} + +#define MAP_MMIO_MAX_ITER 64 /* pretty arbitrary */ + int map_mmio_regions(struct domain *d, unsigned long start_gfn, unsigned long nr, @@ -2102,22 +2155,48 @@ int map_mmio_regions(struct domain *d, { int ret = 0; unsigned long i; + unsigned int iter, order; if ( !paging_mode_translate(d) ) return 0; - for ( i = 0; !ret && i < nr; i++ ) + for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER; + i += 1UL << order, ++iter ) { - ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i), - p2m_get_hostp2m(d)->default_access); - if ( ret ) + /* OR'ing gfn and mfn values will return an order suitable to both. */ + for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ; + order = ret - 1 ) + { + ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i), order, + p2m_get_hostp2m(d)->default_access); + if ( ret <= 0 ) + break; + ASSERT(ret <= order); + } + if ( ret < 0 ) { - unmap_mmio_regions(d, start_gfn, i, mfn); + for ( nr = i, iter = i = 0; i < nr ; i += 1UL << order, ++iter ) + { + int rc; + + WARN_ON(iter == MAP_MMIO_MAX_ITER); + for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), + nr - i); ; order = rc - 1 ) + { + rc = clear_mmio_p2m_entry(d, start_gfn + i, + _mfn(mfn + i), order); + if ( rc <= 0 ) + break; + ASSERT(rc <= order); + } + if ( rc < 0 ) + order = 0; + } break; } } - return ret; + return ret < 0 ? ret : i == nr ? 0 : i; } int unmap_mmio_regions(struct domain *d, @@ -2127,18 +2206,33 @@ int unmap_mmio_regions(struct domain *d, { int err = 0; unsigned long i; + unsigned int iter, order; if ( !paging_mode_translate(d) ) return 0; - for ( i = 0; i < nr; i++ ) + for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER; + i += 1UL << order, ++iter ) { - int ret = clear_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i)); - if ( ret ) + int ret; + + /* OR'ing gfn and mfn values will return an order suitable to both. */ + for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ; + order = ret - 1 ) + { + ret = clear_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i), order); + if ( ret <= 0 ) + break; + ASSERT(ret <= order); + } + if ( ret < 0 ) + { err = ret; + order = 0; + } } - return err; + return err ?: i == nr ? 0 : i; } unsigned int p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp) --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -136,6 +136,7 @@ static void ept_p2m_type_to_flags(struct entry->r = entry->x = 1; entry->w = !rangeset_contains_singleton(mmio_ro_ranges, entry->mfn); + ASSERT(entry->w || !is_epte_superpage(entry)); entry->a = !!cpu_has_vmx_ept_ad; entry->d = entry->w && cpu_has_vmx_ept_ad; break; --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -72,7 +72,8 @@ static const unsigned long pgt[] = { PGT_l3_page_table }; -static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn) +static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn, + unsigned int level) { unsigned long flags; /* @@ -107,6 +108,8 @@ static unsigned long p2m_type_to_flags(p case p2m_mmio_direct: if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) flags |= _PAGE_RW; + else + ASSERT(!level); return flags | P2M_BASE_FLAGS | _PAGE_PCD; } } @@ -436,7 +449,7 @@ static int do_recalc(struct p2m_domain * p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) ? p2m_ram_logdirty : p2m_ram_rw; unsigned long mfn = l1e_get_pfn(e); - unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn)); + unsigned long flags = p2m_type_to_flags(p2mt, _mfn(mfn), level); if ( level ) { @@ -573,7 +576,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct); l3e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ? l3e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn) | _PAGE_PSE) + p2m_type_to_flags(p2mt, mfn, 2) | _PAGE_PSE) : l3e_empty(); entry_content.l1 = l3e_content.l3; @@ -609,7 +612,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ) entry_content = p2m_l1e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn)); + p2m_type_to_flags(p2mt, mfn, 0)); else entry_content = l1e_empty(); @@ -645,7 +648,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct); if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) ) l2e_content = l2e_from_pfn(mfn_x(mfn), - p2m_type_to_flags(p2mt, mfn) | + p2m_type_to_flags(p2mt, mfn, 1) | _PAGE_PSE); else l2e_content = l2e_empty(); --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -1046,10 +1046,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe (gfn + nr_mfns - 1) < gfn ) /* wrap? */ break; +#ifndef CONFIG_X86 /* XXX ARM!? */ ret = -E2BIG; /* Must break hypercall up as this could take a while. */ if ( nr_mfns > 64 ) break; +#endif ret = -EPERM; if ( !iomem_access_permitted(current->domain, mfn, mfn_end) || @@ -1067,7 +1069,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe d->domain_id, gfn, mfn, nr_mfns); ret = map_mmio_regions(d, gfn, nr_mfns, mfn); - if ( ret ) + if ( ret < 0 ) printk(XENLOG_G_WARNING "memory_map:fail: dom%d gfn=%lx mfn=%lx nr=%lx ret:%ld\n", d->domain_id, gfn, mfn, nr_mfns, ret); @@ -1079,7 +1081,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe d->domain_id, gfn, mfn, nr_mfns); ret = unmap_mmio_regions(d, gfn, nr_mfns, mfn); - if ( ret && is_hardware_domain(current->domain) ) + if ( ret < 0 && is_hardware_domain(current->domain) ) printk(XENLOG_ERR "memory_map: error %ld removing dom%d access to [%lx,%lx]\n", ret, d->domain_id, mfn, mfn_end); --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -259,7 +259,7 @@ int guest_remove_page(struct domain *d, } if ( p2mt == p2m_mmio_direct ) { - clear_mmio_p2m_entry(d, gmfn, _mfn(mfn)); + clear_mmio_p2m_entry(d, gmfn, _mfn(mfn), 0); put_gfn(d, gmfn); return 1; } --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -574,8 +574,9 @@ int p2m_is_logdirty_range(struct p2m_dom /* Set mmio addresses in the p2m table (for pass-through) */ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, - p2m_access_t access); -int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn); + unsigned int order, p2m_access_t access); +int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, + unsigned int order); /* Set identity addresses in the p2m table (for pass-through) */ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,