Message ID | 1491135867-623-6-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 02.04.17 at 14:24, <yu.c.zhang@linux.intel.com> wrote: > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d, > if ( p2m->ioreq.server != NULL ) > goto out; > > + /* > + * It is possible that an ioreq server has just been unmapped, > + * released the spin lock, with some p2m_ioreq_server entries > + * in p2m table remained. We shall refuse another ioreq server > + * mapping request in such case. > + */ > + if ( read_atomic(&p2m->ioreq.entry_count) ) > + goto out; So this produces the same -EINVAL as the earlier check in context above. I think it would be nice if neither did - -EINUSE for the first (which we don't have, so -EOPNOTSUPP would seem the second bets option there) and -EBUSY for the second would seem more appropriate. If you agree, respective adjustments could be done while committing, if no other reason for a v11 arises. Jan
>>> On 03.04.17 at 16:36, <JBeulich@suse.com> wrote: >>>> On 02.04.17 at 14:24, <yu.c.zhang@linux.intel.com> wrote: >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d, >> if ( p2m->ioreq.server != NULL ) >> goto out; >> >> + /* >> + * It is possible that an ioreq server has just been unmapped, >> + * released the spin lock, with some p2m_ioreq_server entries >> + * in p2m table remained. We shall refuse another ioreq server >> + * mapping request in such case. >> + */ >> + if ( read_atomic(&p2m->ioreq.entry_count) ) >> + goto out; > > So this produces the same -EINVAL as the earlier check in context > above. I think it would be nice if neither did - -EINUSE for the first > (which we don't have, so -EOPNOTSUPP would seem the second > bets option there) and -EBUSY for the second would seem more > appropriate. If you agree, respective adjustments could be done > while committing, if no other reason for a v11 arises. Oh, and with that Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
On 4/3/2017 10:36 PM, Jan Beulich wrote: > So this produces the same -EINVAL as the earlier check in context > above. I think it would be nice if neither did - -EINUSE for the first > (which we don't have, so -EOPNOTSUPP would seem the second > bets option there) and -EBUSY for the second would seem more > appropriate. If you agree, respective adjustments could be done > while committing, if no other reason for a v11 arises. Thanks Jan. But my code shows both will return -EBUSY, instead of -EINVAL for the mapping requirement: /* Unmap ioreq server from p2m type by passing flags with 0. */ if ( flags == 0 ) { /rc = -EINVAL;/ if ( p2m->ioreq.server != s ) goto out; p2m->ioreq.server = NULL; p2m->ioreq.flags = 0; } else { /rc = -EBUSY;/ if ( p2m->ioreq.server != NULL ) goto out; /* * It is possible that an ioreq server has just been unmapped, * released the spin lock, with some p2m_ioreq_server entries * in p2m table remained. We shall refuse another ioreq server * mapping request in such case. */ if ( read_atomic(&p2m->ioreq.entry_count) ) goto out; p2m->ioreq.server = s; p2m->ioreq.flags = flags; } Are you really wanna use -EOPNOTSUPP when p2m->ioreq.server is not NULL for the mapping code? For the unmapping code, -EINVAL is used when the ioreq server to be unmapped is not the designated one. But for this one, I am not sure which one is better: -EINVAL or -EBUSY? Yu
>>> On 05.04.17 at 09:18, <yu.c.zhang@linux.intel.com> wrote: > On 4/3/2017 10:36 PM, Jan Beulich wrote: >> So this produces the same -EINVAL as the earlier check in context >> above. I think it would be nice if neither did - -EINUSE for the first >> (which we don't have, so -EOPNOTSUPP would seem the second >> bets option there) and -EBUSY for the second would seem more >> appropriate. If you agree, respective adjustments could be done >> while committing, if no other reason for a v11 arises. > > Thanks Jan. First of all - deleting irrelevant context from replies is always appreciated, but I think you went too far here. It is no longer possible to re-associate your comments with the change you've made (not visible from the code fragment below). > But my code shows both will return -EBUSY, instead of -EINVAL for the > mapping requirement: > /* Unmap ioreq server from p2m type by passing flags with 0. */ > if ( flags == 0 ) > { > /rc = -EINVAL;/ > if ( p2m->ioreq.server != s ) > goto out; > > p2m->ioreq.server = NULL; > p2m->ioreq.flags = 0; > } > else > { > /rc = -EBUSY;/ > if ( p2m->ioreq.server != NULL ) > goto out; > > /* > * It is possible that an ioreq server has just been unmapped, > * released the spin lock, with some p2m_ioreq_server entries > * in p2m table remained. We shall refuse another ioreq server > * mapping request in such case. > */ > if ( read_atomic(&p2m->ioreq.entry_count) ) > goto out; Oh, I see: I've mistakenly assumed to addition was to the if() branch. I clearly didn't look closely enough, I'm sorry. > p2m->ioreq.server = s; > p2m->ioreq.flags = flags; > } > > Are you really wanna use -EOPNOTSUPP when p2m->ioreq.server is not NULL > for the mapping code? As per above, I really did refer to the wrong piece of code, so all is fine the way the code is. Jan
On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote: > After an ioreq server has unmapped, the remaining p2m_ioreq_server > entries need to be reset back to p2m_ram_rw. This patch does this > asynchronously with the current p2m_change_entry_type_global() > interface. > > New field entry_count is introduced in struct p2m_domain, to record > the number of p2m_ioreq_server p2m page table entries. One nature of > these entries is that they only point to 4K sized page frames, because > all p2m_ioreq_server entries are originated from p2m_ram_rw ones in > p2m_change_type_one(). We do not need to worry about the counting for > 2M/1G sized pages. Assuming that all p2m_ioreq_server entries are *created* by p2m_change_type_one() may valid, but can you assume that they are only ever *removed* by p2m_change_type_one() (or recalculation)? What happens, for instance, if a guest balloons out one of the ram pages? I don't immediately see anything preventing a p2m_ioreq_server page from being ballooned out, nor anything on the decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did I miss something? Other than that, only one minor comment... > diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c > index a57b385..6ec950a 100644 > --- a/xen/arch/x86/mm/hap/hap.c > +++ b/xen/arch/x86/mm/hap/hap.c > @@ -187,6 +187,15 @@ out: > */ > static int hap_enable_log_dirty(struct domain *d, bool_t log_global) > { > + struct p2m_domain *p2m = p2m_get_hostp2m(d); > + > + /* > + * Refuse to turn on global log-dirty mode if > + * there's outstanding p2m_ioreq_server pages. Grammar nit: if *there are* outstanding pages. -George
On 4/5/2017 10:41 PM, George Dunlap wrote: > On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote: >> After an ioreq server has unmapped, the remaining p2m_ioreq_server >> entries need to be reset back to p2m_ram_rw. This patch does this >> asynchronously with the current p2m_change_entry_type_global() >> interface. >> >> New field entry_count is introduced in struct p2m_domain, to record >> the number of p2m_ioreq_server p2m page table entries. One nature of >> these entries is that they only point to 4K sized page frames, because >> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >> p2m_change_type_one(). We do not need to worry about the counting for >> 2M/1G sized pages. > Assuming that all p2m_ioreq_server entries are *created* by > p2m_change_type_one() may valid, but can you assume that they are only > ever *removed* by p2m_change_type_one() (or recalculation)? > > What happens, for instance, if a guest balloons out one of the ram > pages? I don't immediately see anything preventing a p2m_ioreq_server > page from being ballooned out, nor anything on the > decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did > I miss something? > > Other than that, only one minor comment... Thanks for your thorough consideration, George. But I do not think we need to worry about this: If the emulation is in process, the balloon driver cannot get a p2m_ioreq_server page - because it is already allocated. And even when emulation is finished, the balloon driver successfully get this page, and triggers decrease_reservation, the purpose is to remove the current mapping relation between the gfn and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if everything is goes fine, and then p2m_set_entry(), which will trigger the recalc logic eventually, either in ept_set_entry() or p2m_pt_set_entry(). Then the entry_count will be updated in the recalc logic. >> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c >> index a57b385..6ec950a 100644 >> --- a/xen/arch/x86/mm/hap/hap.c >> +++ b/xen/arch/x86/mm/hap/hap.c >> @@ -187,6 +187,15 @@ out: >> */ >> static int hap_enable_log_dirty(struct domain *d, bool_t log_global) >> { >> + struct p2m_domain *p2m = p2m_get_hostp2m(d); >> + >> + /* >> + * Refuse to turn on global log-dirty mode if >> + * there's outstanding p2m_ioreq_server pages. > Grammar nit: if *there are* outstanding pages. Oh, right. Thanks B.R. Yu > -George >
On 4/6/2017 12:35 AM, George Dunlap wrote: > On 05/04/17 17:22, Yu Zhang wrote: >> >> On 4/5/2017 10:41 PM, George Dunlap wrote: >>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang <yu.c.zhang@linux.intel.com> >>> wrote: >>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>>> entries need to be reset back to p2m_ram_rw. This patch does this >>>> asynchronously with the current p2m_change_entry_type_global() >>>> interface. >>>> >>>> New field entry_count is introduced in struct p2m_domain, to record >>>> the number of p2m_ioreq_server p2m page table entries. One nature of >>>> these entries is that they only point to 4K sized page frames, because >>>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>>> p2m_change_type_one(). We do not need to worry about the counting for >>>> 2M/1G sized pages. >>> Assuming that all p2m_ioreq_server entries are *created* by >>> p2m_change_type_one() may valid, but can you assume that they are only >>> ever *removed* by p2m_change_type_one() (or recalculation)? >>> >>> What happens, for instance, if a guest balloons out one of the ram >>> pages? I don't immediately see anything preventing a p2m_ioreq_server >>> page from being ballooned out, nor anything on the >>> decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did >>> I miss something? >>> >>> Other than that, only one minor comment... >> Thanks for your thorough consideration, George. But I do not think we >> need to worry about this: >> >> If the emulation is in process, the balloon driver cannot get a >> p2m_ioreq_server page - because >> it is already allocated. > In theory, yes, the guest *shouldn't* do this. But what if the guest OS > makes a mistake? Or, what if the ioreq server makes a mistake and > places a watch on a page that *isn't* allocated by the device driver, or > forgets to change a page type back to ram when the device driver frees > it back to the guest kernel? Then the lazy p2m change code will be triggered, and this page is reset to p2m_ram_rw before being set to p2m_invalid, just like the normal path. Will this be a problem? > It's the hypervisor's job to do the right thing even if the guest and > the device model don't. > >> And even when emulation is finished, the balloon driver successfully get >> this page, and triggers >> decrease_reservation, the purpose is to remove the current mapping >> relation between the gfn >> and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if >> everything is goes fine, and then >> p2m_set_entry(), which will trigger the recalc logic eventually, either >> in ept_set_entry() or >> p2m_pt_set_entry(). Then the entry_count will be updated in the recalc >> logic. > Yes, once the lazy type change has been made, we can rely on the > recalculation logic to make sure that the types are changed appropriately. Yep. So my understanding is that as long as p2m_set_entry() is used to change the p2m, we do not need to worry about this. B.R. Yu > > -George > >
On 05/04/17 17:22, Yu Zhang wrote: > > > On 4/5/2017 10:41 PM, George Dunlap wrote: >> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang <yu.c.zhang@linux.intel.com> >> wrote: >>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>> entries need to be reset back to p2m_ram_rw. This patch does this >>> asynchronously with the current p2m_change_entry_type_global() >>> interface. >>> >>> New field entry_count is introduced in struct p2m_domain, to record >>> the number of p2m_ioreq_server p2m page table entries. One nature of >>> these entries is that they only point to 4K sized page frames, because >>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>> p2m_change_type_one(). We do not need to worry about the counting for >>> 2M/1G sized pages. >> Assuming that all p2m_ioreq_server entries are *created* by >> p2m_change_type_one() may valid, but can you assume that they are only >> ever *removed* by p2m_change_type_one() (or recalculation)? >> >> What happens, for instance, if a guest balloons out one of the ram >> pages? I don't immediately see anything preventing a p2m_ioreq_server >> page from being ballooned out, nor anything on the >> decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did >> I miss something? >> >> Other than that, only one minor comment... > > Thanks for your thorough consideration, George. But I do not think we > need to worry about this: > > If the emulation is in process, the balloon driver cannot get a > p2m_ioreq_server page - because > it is already allocated. In theory, yes, the guest *shouldn't* do this. But what if the guest OS makes a mistake? Or, what if the ioreq server makes a mistake and places a watch on a page that *isn't* allocated by the device driver, or forgets to change a page type back to ram when the device driver frees it back to the guest kernel? It's the hypervisor's job to do the right thing even if the guest and the device model don't. > And even when emulation is finished, the balloon driver successfully get > this page, and triggers > decrease_reservation, the purpose is to remove the current mapping > relation between the gfn > and mfn in p2m. So IIUC, p2m_remove_page() will be triggered if > everything is goes fine, and then > p2m_set_entry(), which will trigger the recalc logic eventually, either > in ept_set_entry() or > p2m_pt_set_entry(). Then the entry_count will be updated in the recalc > logic. Yes, once the lazy type change has been made, we can rely on the recalculation logic to make sure that the types are changed appropriately. -George
On 05/04/17 17:32, Yu Zhang wrote: > > > On 4/6/2017 12:35 AM, George Dunlap wrote: >> On 05/04/17 17:22, Yu Zhang wrote: >>> >>> On 4/5/2017 10:41 PM, George Dunlap wrote: >>>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang <yu.c.zhang@linux.intel.com> >>>> wrote: >>>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>>>> entries need to be reset back to p2m_ram_rw. This patch does this >>>>> asynchronously with the current p2m_change_entry_type_global() >>>>> interface. >>>>> >>>>> New field entry_count is introduced in struct p2m_domain, to record >>>>> the number of p2m_ioreq_server p2m page table entries. One nature of >>>>> these entries is that they only point to 4K sized page frames, because >>>>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>>>> p2m_change_type_one(). We do not need to worry about the counting for >>>>> 2M/1G sized pages. >>>> Assuming that all p2m_ioreq_server entries are *created* by >>>> p2m_change_type_one() may valid, but can you assume that they are only >>>> ever *removed* by p2m_change_type_one() (or recalculation)? >>>> >>>> What happens, for instance, if a guest balloons out one of the ram >>>> pages? I don't immediately see anything preventing a p2m_ioreq_server >>>> page from being ballooned out, nor anything on the >>>> decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did >>>> I miss something? >>>> >>>> Other than that, only one minor comment... >>> Thanks for your thorough consideration, George. But I do not think we >>> need to worry about this: >>> >>> If the emulation is in process, the balloon driver cannot get a >>> p2m_ioreq_server page - because >>> it is already allocated. >> In theory, yes, the guest *shouldn't* do this. But what if the guest OS >> makes a mistake? Or, what if the ioreq server makes a mistake and >> places a watch on a page that *isn't* allocated by the device driver, or >> forgets to change a page type back to ram when the device driver frees >> it back to the guest kernel? > > Then the lazy p2m change code will be triggered, and this page is reset > to p2m_ram_rw > before being set to p2m_invalid, just like the normal path. Will this be > a problem? No, I'm talking about before the ioreq server detaches. Scenario 1: Bug in driver 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver accidentally frees A to the kernel 4. guest kernel balloons out page A; ioreq.entry_count is wrong Scenario 2: Bug in the kernel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest kernel tries to balloon out page B, but makes a calculation mistake and balloons out A instead; now ioreq.entry_count is wrong Scenario 3: Off-by-one bug in devicemodel 1. Guest driver allocates pages A-D 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra page) 3. guest kernel balloons out page E; now ioreq.entry_count is wrong Scenario 4: "Leak" in devicemodel 1. Guest driver allocates page A 2. dm marks A as p2m_ioreq_server 3. Guest driver is done with page A, but DM forgets to reset it to p2m_ram_rw 4. Guest driver frees A to guest kernel 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong I could keep going on; there are *lots* of bugs in the driver, the kernel, or the devicemodel which could cause pages marked p2m_ioreq_server to end up being ballooned out; which under the current code would make ioreq.entry_count wrong. It's the hypervisor's job to do the right thing even when other components have bugs in them. This is why I initially suggested keeping count in atomic_write_ept_entry() -- no matter how the entry is changed, we always know exactly how many entries of type p2m_ioreq_server we have. -George
On 4/6/2017 1:01 AM, George Dunlap wrote: > On 05/04/17 17:32, Yu Zhang wrote: >> >> On 4/6/2017 12:35 AM, George Dunlap wrote: >>> On 05/04/17 17:22, Yu Zhang wrote: >>>> On 4/5/2017 10:41 PM, George Dunlap wrote: >>>>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang <yu.c.zhang@linux.intel.com> >>>>> wrote: >>>>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>>>>> entries need to be reset back to p2m_ram_rw. This patch does this >>>>>> asynchronously with the current p2m_change_entry_type_global() >>>>>> interface. >>>>>> >>>>>> New field entry_count is introduced in struct p2m_domain, to record >>>>>> the number of p2m_ioreq_server p2m page table entries. One nature of >>>>>> these entries is that they only point to 4K sized page frames, because >>>>>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>>>>> p2m_change_type_one(). We do not need to worry about the counting for >>>>>> 2M/1G sized pages. >>>>> Assuming that all p2m_ioreq_server entries are *created* by >>>>> p2m_change_type_one() may valid, but can you assume that they are only >>>>> ever *removed* by p2m_change_type_one() (or recalculation)? >>>>> >>>>> What happens, for instance, if a guest balloons out one of the ram >>>>> pages? I don't immediately see anything preventing a p2m_ioreq_server >>>>> page from being ballooned out, nor anything on the >>>>> decrease_reservation() path decreasing p2m->ioreq.entry_count. Or did >>>>> I miss something? >>>>> >>>>> Other than that, only one minor comment... >>>> Thanks for your thorough consideration, George. But I do not think we >>>> need to worry about this: >>>> >>>> If the emulation is in process, the balloon driver cannot get a >>>> p2m_ioreq_server page - because >>>> it is already allocated. >>> In theory, yes, the guest *shouldn't* do this. But what if the guest OS >>> makes a mistake? Or, what if the ioreq server makes a mistake and >>> places a watch on a page that *isn't* allocated by the device driver, or >>> forgets to change a page type back to ram when the device driver frees >>> it back to the guest kernel? >> Then the lazy p2m change code will be triggered, and this page is reset >> to p2m_ram_rw >> before being set to p2m_invalid, just like the normal path. Will this be >> a problem? > No, I'm talking about before the ioreq server detaches. Sorry, I do not get it. Take scenario 1 for example: > Scenario 1: Bug in driver > 1. Guest driver allocates page A > 2. dm marks A as p2m_ioreq_server Here in step 2. the ioreq.entry_count increases; > 3. Guest driver accidentally frees A to the kernel > 4. guest kernel balloons out page A; ioreq.entry_count is wrong Here in step 4. the ioreq.entry_count decreases. Isn't this what we are expecting? Yu > > Scenario 2: Bug in the kernel > 1. Guest driver allocates page A > 2. dm marks A as p2m_ioreq_server > 3. Guest kernel tries to balloon out page B, but makes a calculation > mistake and balloons out A instead; now ioreq.entry_count is wrong > > Scenario 3: Off-by-one bug in devicemodel > 1. Guest driver allocates pages A-D > 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra > page) > 3. guest kernel balloons out page E; now ioreq.entry_count is wrong > > Scenario 4: "Leak" in devicemodel > 1. Guest driver allocates page A > 2. dm marks A as p2m_ioreq_server > 3. Guest driver is done with page A, but DM forgets to reset it to > p2m_ram_rw > 4. Guest driver frees A to guest kernel > 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong > > I could keep going on; there are *lots* of bugs in the driver, the > kernel, or the devicemodel which could cause pages marked > p2m_ioreq_server to end up being ballooned out; which under the current > code would make ioreq.entry_count wrong. > > It's the hypervisor's job to do the right thing even when other > components have bugs in them. This is why I initially suggested keeping > count in atomic_write_ept_entry() -- no matter how the entry is changed, > we always know exactly how many entries of type p2m_ioreq_server we have. > > -George > >
On 4/6/2017 1:18 AM, Yu Zhang wrote: > > > On 4/6/2017 1:01 AM, George Dunlap wrote: >> On 05/04/17 17:32, Yu Zhang wrote: >>> >>> On 4/6/2017 12:35 AM, George Dunlap wrote: >>>> On 05/04/17 17:22, Yu Zhang wrote: >>>>> On 4/5/2017 10:41 PM, George Dunlap wrote: >>>>>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang >>>>>> <yu.c.zhang@linux.intel.com> >>>>>> wrote: >>>>>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>>>>>> entries need to be reset back to p2m_ram_rw. This patch does this >>>>>>> asynchronously with the current p2m_change_entry_type_global() >>>>>>> interface. >>>>>>> >>>>>>> New field entry_count is introduced in struct p2m_domain, to record >>>>>>> the number of p2m_ioreq_server p2m page table entries. One >>>>>>> nature of >>>>>>> these entries is that they only point to 4K sized page frames, >>>>>>> because >>>>>>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>>>>>> p2m_change_type_one(). We do not need to worry about the >>>>>>> counting for >>>>>>> 2M/1G sized pages. >>>>>> Assuming that all p2m_ioreq_server entries are *created* by >>>>>> p2m_change_type_one() may valid, but can you assume that they are >>>>>> only >>>>>> ever *removed* by p2m_change_type_one() (or recalculation)? >>>>>> >>>>>> What happens, for instance, if a guest balloons out one of the ram >>>>>> pages? I don't immediately see anything preventing a >>>>>> p2m_ioreq_server >>>>>> page from being ballooned out, nor anything on the >>>>>> decrease_reservation() path decreasing p2m->ioreq.entry_count. >>>>>> Or did >>>>>> I miss something? >>>>>> >>>>>> Other than that, only one minor comment... >>>>> Thanks for your thorough consideration, George. But I do not think we >>>>> need to worry about this: >>>>> >>>>> If the emulation is in process, the balloon driver cannot get a >>>>> p2m_ioreq_server page - because >>>>> it is already allocated. >>>> In theory, yes, the guest *shouldn't* do this. But what if the >>>> guest OS >>>> makes a mistake? Or, what if the ioreq server makes a mistake and >>>> places a watch on a page that *isn't* allocated by the device >>>> driver, or >>>> forgets to change a page type back to ram when the device driver frees >>>> it back to the guest kernel? >>> Then the lazy p2m change code will be triggered, and this page is reset >>> to p2m_ram_rw >>> before being set to p2m_invalid, just like the normal path. Will >>> this be >>> a problem? >> No, I'm talking about before the ioreq server detaches. > Sorry, I do not get it. Take scenario 1 for example: >> Scenario 1: Bug in driver >> 1. Guest driver allocates page A >> 2. dm marks A as p2m_ioreq_server > Here in step 2. the ioreq.entry_count increases; >> 3. Guest driver accidentally frees A to the kernel >> 4. guest kernel balloons out page A; ioreq.entry_count is wrong > > Here in step 4. the ioreq.entry_count decreases. Oh. I figured out. This entry is not invalidated yet if ioreq is not unmapped. Sorry. > Isn't this what we are expecting? > > Yu >> >> Scenario 2: Bug in the kernel >> 1. Guest driver allocates page A >> 2. dm marks A as p2m_ioreq_server >> 3. Guest kernel tries to balloon out page B, but makes a calculation >> mistake and balloons out A instead; now ioreq.entry_count is wrong >> >> Scenario 3: Off-by-one bug in devicemodel >> 1. Guest driver allocates pages A-D >> 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one extra >> page) >> 3. guest kernel balloons out page E; now ioreq.entry_count is wrong >> >> Scenario 4: "Leak" in devicemodel >> 1. Guest driver allocates page A >> 2. dm marks A as p2m_ioreq_server >> 3. Guest driver is done with page A, but DM forgets to reset it to >> p2m_ram_rw >> 4. Guest driver frees A to guest kernel >> 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong >> >> I could keep going on; there are *lots* of bugs in the driver, the >> kernel, or the devicemodel which could cause pages marked >> p2m_ioreq_server to end up being ballooned out; which under the current >> code would make ioreq.entry_count wrong. >> >> It's the hypervisor's job to do the right thing even when other >> components have bugs in them. This is why I initially suggested keeping >> count in atomic_write_ept_entry() -- no matter how the entry is changed, >> we always know exactly how many entries of type p2m_ioreq_server we >> have. >> Well, count in atomic_write_ept_entry() only works for ept. Besides, it requires interface changes - need to pass the p2m. Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn does not change. So how about we disable ballooning if ioreq.entry_count is not 0? Yu >> -George >> >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
On 05/04/17 18:18, Yu Zhang wrote: > > > On 4/6/2017 1:01 AM, George Dunlap wrote: >> On 05/04/17 17:32, Yu Zhang wrote: >>> >>> On 4/6/2017 12:35 AM, George Dunlap wrote: >>>> On 05/04/17 17:22, Yu Zhang wrote: >>>>> On 4/5/2017 10:41 PM, George Dunlap wrote: >>>>>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang <yu.c.zhang@linux.intel.com> >>>>>> wrote: >>>>>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>>>>>> entries need to be reset back to p2m_ram_rw. This patch does this >>>>>>> asynchronously with the current p2m_change_entry_type_global() >>>>>>> interface. >>>>>>> >>>>>>> New field entry_count is introduced in struct p2m_domain, to record >>>>>>> the number of p2m_ioreq_server p2m page table entries. One nature of >>>>>>> these entries is that they only point to 4K sized page frames, >>>>>>> because >>>>>>> all p2m_ioreq_server entries are originated from p2m_ram_rw ones in >>>>>>> p2m_change_type_one(). We do not need to worry about the counting >>>>>>> for >>>>>>> 2M/1G sized pages. >>>>>> Assuming that all p2m_ioreq_server entries are *created* by >>>>>> p2m_change_type_one() may valid, but can you assume that they are >>>>>> only >>>>>> ever *removed* by p2m_change_type_one() (or recalculation)? >>>>>> >>>>>> What happens, for instance, if a guest balloons out one of the ram >>>>>> pages? I don't immediately see anything preventing a >>>>>> p2m_ioreq_server >>>>>> page from being ballooned out, nor anything on the >>>>>> decrease_reservation() path decreasing p2m->ioreq.entry_count. Or >>>>>> did >>>>>> I miss something? >>>>>> >>>>>> Other than that, only one minor comment... >>>>> Thanks for your thorough consideration, George. But I do not think we >>>>> need to worry about this: >>>>> >>>>> If the emulation is in process, the balloon driver cannot get a >>>>> p2m_ioreq_server page - because >>>>> it is already allocated. >>>> In theory, yes, the guest *shouldn't* do this. But what if the >>>> guest OS >>>> makes a mistake? Or, what if the ioreq server makes a mistake and >>>> places a watch on a page that *isn't* allocated by the device >>>> driver, or >>>> forgets to change a page type back to ram when the device driver frees >>>> it back to the guest kernel? >>> Then the lazy p2m change code will be triggered, and this page is reset >>> to p2m_ram_rw >>> before being set to p2m_invalid, just like the normal path. Will this be >>> a problem? >> No, I'm talking about before the ioreq server detaches. > Sorry, I do not get it. Take scenario 1 for example: >> Scenario 1: Bug in driver >> 1. Guest driver allocates page A >> 2. dm marks A as p2m_ioreq_server > Here in step 2. the ioreq.entry_count increases; >> 3. Guest driver accidentally frees A to the kernel >> 4. guest kernel balloons out page A; ioreq.entry_count is wrong > > Here in step 4. the ioreq.entry_count decreases. > Isn't this what we are expecting? So step 4 happens via xen/common/memory.c:decrease_reservation(). Where along that path will ioreq.entry_count be decreased for the page freed? Maybe it is, but I didn't see it when I looked. As far as I can tell, after step 4, ioreq.entry_count would still be 1, but that the actual number of p2m_ioreq_server entries would be 0. -George
On 4/6/2017 1:28 AM, Yu Zhang wrote: > > > On 4/6/2017 1:18 AM, Yu Zhang wrote: >> >> >> On 4/6/2017 1:01 AM, George Dunlap wrote: >>> On 05/04/17 17:32, Yu Zhang wrote: >>>> >>>> On 4/6/2017 12:35 AM, George Dunlap wrote: >>>>> On 05/04/17 17:22, Yu Zhang wrote: >>>>>> On 4/5/2017 10:41 PM, George Dunlap wrote: >>>>>>> On Sun, Apr 2, 2017 at 1:24 PM, Yu Zhang >>>>>>> <yu.c.zhang@linux.intel.com> >>>>>>> wrote: >>>>>>>> After an ioreq server has unmapped, the remaining p2m_ioreq_server >>>>>>>> entries need to be reset back to p2m_ram_rw. This patch does this >>>>>>>> asynchronously with the current p2m_change_entry_type_global() >>>>>>>> interface. >>>>>>>> >>>>>>>> New field entry_count is introduced in struct p2m_domain, to >>>>>>>> record >>>>>>>> the number of p2m_ioreq_server p2m page table entries. One >>>>>>>> nature of >>>>>>>> these entries is that they only point to 4K sized page frames, >>>>>>>> because >>>>>>>> all p2m_ioreq_server entries are originated from p2m_ram_rw >>>>>>>> ones in >>>>>>>> p2m_change_type_one(). We do not need to worry about the >>>>>>>> counting for >>>>>>>> 2M/1G sized pages. >>>>>>> Assuming that all p2m_ioreq_server entries are *created* by >>>>>>> p2m_change_type_one() may valid, but can you assume that they >>>>>>> are only >>>>>>> ever *removed* by p2m_change_type_one() (or recalculation)? >>>>>>> >>>>>>> What happens, for instance, if a guest balloons out one of the ram >>>>>>> pages? I don't immediately see anything preventing a >>>>>>> p2m_ioreq_server >>>>>>> page from being ballooned out, nor anything on the >>>>>>> decrease_reservation() path decreasing p2m->ioreq.entry_count. >>>>>>> Or did >>>>>>> I miss something? >>>>>>> >>>>>>> Other than that, only one minor comment... >>>>>> Thanks for your thorough consideration, George. But I do not >>>>>> think we >>>>>> need to worry about this: >>>>>> >>>>>> If the emulation is in process, the balloon driver cannot get a >>>>>> p2m_ioreq_server page - because >>>>>> it is already allocated. >>>>> In theory, yes, the guest *shouldn't* do this. But what if the >>>>> guest OS >>>>> makes a mistake? Or, what if the ioreq server makes a mistake and >>>>> places a watch on a page that *isn't* allocated by the device >>>>> driver, or >>>>> forgets to change a page type back to ram when the device driver >>>>> frees >>>>> it back to the guest kernel? >>>> Then the lazy p2m change code will be triggered, and this page is >>>> reset >>>> to p2m_ram_rw >>>> before being set to p2m_invalid, just like the normal path. Will >>>> this be >>>> a problem? >>> No, I'm talking about before the ioreq server detaches. >> Sorry, I do not get it. Take scenario 1 for example: >>> Scenario 1: Bug in driver >>> 1. Guest driver allocates page A >>> 2. dm marks A as p2m_ioreq_server >> Here in step 2. the ioreq.entry_count increases; >>> 3. Guest driver accidentally frees A to the kernel >>> 4. guest kernel balloons out page A; ioreq.entry_count is wrong >> >> Here in step 4. the ioreq.entry_count decreases. > > Oh. I figured out. This entry is not invalidated yet if ioreq is not > unmapped. Sorry. > >> Isn't this what we are expecting? >> >> Yu >>> >>> Scenario 2: Bug in the kernel >>> 1. Guest driver allocates page A >>> 2. dm marks A as p2m_ioreq_server >>> 3. Guest kernel tries to balloon out page B, but makes a calculation >>> mistake and balloons out A instead; now ioreq.entry_count is wrong >>> >>> Scenario 3: Off-by-one bug in devicemodel >>> 1. Guest driver allocates pages A-D >>> 2. dm makes a mistake and marks pages A-E as p2m_ioreq_server (one >>> extra >>> page) >>> 3. guest kernel balloons out page E; now ioreq.entry_count is wrong >>> >>> Scenario 4: "Leak" in devicemodel >>> 1. Guest driver allocates page A >>> 2. dm marks A as p2m_ioreq_server >>> 3. Guest driver is done with page A, but DM forgets to reset it to >>> p2m_ram_rw >>> 4. Guest driver frees A to guest kernel >>> 5. Guest kernel balloons out page A; now ioreq.entry_count is wrong >>> >>> I could keep going on; there are *lots* of bugs in the driver, the >>> kernel, or the devicemodel which could cause pages marked >>> p2m_ioreq_server to end up being ballooned out; which under the current >>> code would make ioreq.entry_count wrong. >>> >>> It's the hypervisor's job to do the right thing even when other >>> components have bugs in them. This is why I initially suggested >>> keeping >>> count in atomic_write_ept_entry() -- no matter how the entry is >>> changed, >>> we always know exactly how many entries of type p2m_ioreq_server we >>> have. >>> > > Well, count in atomic_write_ept_entry() only works for ept. Besides, > it requires > interface changes - need to pass the p2m. > Another thought is - now in XenGT, PoD is disabled to make sure > gfn->mfn does > not change. So how about we disable ballooning if ioreq.entry_count is > not 0? Or maybe just change the p2m_ioreq_server to p2m_ram_rw before it is set to p2m_invalid? Like below code: diff --git a/xen/common/memory.c b/xen/common/memory.c index 7dbddda..40e5f63 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -288,6 +288,10 @@ int guest_remove_page(struct domain *d, unsigned long gmfn) put_gfn(d, gmfn); return 1; } +if ( unlikely(p2mt == p2m_ioreq_server) ) +p2m_change_type_one(d, gmfn, +p2m_ioreq_server, p2m_ram_rw); + #else mfn = gfn_to_mfn(d, _gfn(gmfn)); #endif Yu > > Yu >>> -George >>> >>> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> https://lists.xen.org/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
>>> On 05.04.17 at 19:28, <yu.c.zhang@linux.intel.com> wrote: > Well, count in atomic_write_ept_entry() only works for ept. Besides, it > requires > interface changes - need to pass the p2m. > Another thought is - now in XenGT, PoD is disabled to make sure gfn->mfn > does > not change. So how about we disable ballooning if ioreq.entry_count is > not 0? You can't disable ballooning - the guest is free to make decisions regarding what memory to return to the hypervisor all by itself. Tool stack directions (via xenstore) are merely a hint for well behaved guests. Jan
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index 5bf3b6d..07a6c26 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -955,6 +955,14 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, spin_unlock_recursive(&d->arch.hvm_domain.ioreq_server.lock); + if ( rc == 0 && flags == 0 ) + { + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + if ( read_atomic(&p2m->ioreq.entry_count) ) + p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); + } + return rc; } diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index a57b385..6ec950a 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -187,6 +187,15 @@ out: */ static int hap_enable_log_dirty(struct domain *d, bool_t log_global) { + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + /* + * Refuse to turn on global log-dirty mode if + * there's outstanding p2m_ioreq_server pages. + */ + if ( log_global && read_atomic(&p2m->ioreq.entry_count) ) + return -EBUSY; + /* turn on PG_log_dirty bit in paging mode */ paging_lock(d); d->arch.paging.mode |= PG_log_dirty; diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index cc1eb21..62e9ddf 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -544,6 +544,12 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn) e.ipat = ipat; if ( e.recalc && p2m_is_changeable(e.sa_p2mt) ) { + if ( e.sa_p2mt == p2m_ioreq_server ) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + e.sa_p2mt = p2m_is_logdirty_range(p2m, gfn + i, gfn + i) ? p2m_ram_logdirty : p2m_ram_rw; ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access); @@ -965,7 +971,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, if ( is_epte_valid(ept_entry) ) { if ( (recalc || ept_entry->recalc) && - p2m_is_changeable(ept_entry->sa_p2mt) ) + p2m_check_changeable(ept_entry->sa_p2mt) ) *t = p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty : p2m_ram_rw; else diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index c0055f3..0a70dc5 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -436,11 +436,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) needs_recalc(l1, *pent) ) { l1_pgentry_t e = *pent; + p2m_type_t p2mt_old; if ( !valid_recalc(l1, e) ) P2M_DEBUG("bogus recalc leaf at d%d:%lx:%u\n", p2m->domain->domain_id, gfn, level); - if ( p2m_is_changeable(p2m_flags_to_type(l1e_get_flags(e))) ) + p2mt_old = p2m_flags_to_type(l1e_get_flags(e)); + if ( p2m_is_changeable(p2mt_old) ) { unsigned long mask = ~0UL << (level * PAGETABLE_ORDER); p2m_type_t p2mt = p2m_is_logdirty_range(p2m, gfn & mask, gfn | ~mask) @@ -460,6 +462,13 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT); flags |= _PAGE_PSE; } + + if ( p2mt_old == p2m_ioreq_server ) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + e = l1e_from_pfn(mfn, flags); p2m_add_iommu_flags(&e, level, (p2mt == p2m_ram_rw) @@ -729,7 +738,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, static inline p2m_type_t recalc_type(bool_t recalc, p2m_type_t t, struct p2m_domain *p2m, unsigned long gfn) { - if ( !recalc || !p2m_is_changeable(t) ) + if ( !recalc || !p2m_check_changeable(t) ) return t; return p2m_is_logdirty_range(p2m, gfn, gfn) ? p2m_ram_logdirty : p2m_ram_rw; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index b84add0..7a1bdd8 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -317,6 +317,15 @@ int p2m_set_ioreq_server(struct domain *d, if ( p2m->ioreq.server != NULL ) goto out; + /* + * It is possible that an ioreq server has just been unmapped, + * released the spin lock, with some p2m_ioreq_server entries + * in p2m table remained. We shall refuse another ioreq server + * mapping request in such case. + */ + if ( read_atomic(&p2m->ioreq.entry_count) ) + goto out; + p2m->ioreq.server = s; p2m->ioreq.flags = flags; } @@ -938,6 +947,26 @@ int p2m_change_type_one(struct domain *d, unsigned long gfn, p2m->default_access) : -EBUSY; + if ( !rc ) + { + switch ( nt ) + { + case p2m_ram_rw: + if ( ot == p2m_ioreq_server ) + { + ASSERT(p2m->ioreq.entry_count > 0); + p2m->ioreq.entry_count--; + } + break; + case p2m_ioreq_server: + ASSERT(ot == p2m_ram_rw); + p2m->ioreq.entry_count++; + break; + default: + break; + } + } + gfn_unlock(p2m, gfn, 0); return rc; diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 4521620..e7e390d 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -120,7 +120,10 @@ typedef unsigned int p2m_query_t; /* Types that can be subject to bulk transitions. */ #define P2M_CHANGEABLE_TYPES (p2m_to_mask(p2m_ram_rw) \ - | p2m_to_mask(p2m_ram_logdirty) ) + | p2m_to_mask(p2m_ram_logdirty) \ + | p2m_to_mask(p2m_ioreq_server) ) + +#define P2M_IOREQ_TYPES (p2m_to_mask(p2m_ioreq_server)) #define P2M_POD_TYPES (p2m_to_mask(p2m_populate_on_demand)) @@ -157,6 +160,7 @@ typedef unsigned int p2m_query_t; #define p2m_is_readonly(_t) (p2m_to_mask(_t) & P2M_RO_TYPES) #define p2m_is_discard_write(_t) (p2m_to_mask(_t) & P2M_DISCARD_WRITE_TYPES) #define p2m_is_changeable(_t) (p2m_to_mask(_t) & P2M_CHANGEABLE_TYPES) +#define p2m_is_ioreq(_t) (p2m_to_mask(_t) & P2M_IOREQ_TYPES) #define p2m_is_pod(_t) (p2m_to_mask(_t) & P2M_POD_TYPES) #define p2m_is_grant(_t) (p2m_to_mask(_t) & P2M_GRANT_TYPES) /* Grant types are *not* considered valid, because they can be @@ -178,6 +182,8 @@ typedef unsigned int p2m_query_t; #define p2m_allows_invalid_mfn(t) (p2m_to_mask(t) & P2M_INVALID_MFN_TYPES) +#define p2m_check_changeable(t) (p2m_is_changeable(t) && !p2m_is_ioreq(t)) + typedef enum { p2m_host, p2m_nested, @@ -349,6 +355,7 @@ struct p2m_domain { * are to be emulated by an ioreq server. */ unsigned int flags; + unsigned long entry_count; } ioreq; };