Message ID | f7ed53c1-768c-cc71-a432-553b56f7f0a7@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/PV: make post-migration page state consistent | expand |
On 11/09/2020 11:34, Jan Beulich wrote: > When a page table page gets de-validated, its type reference count drops > to zero (and PGT_validated gets cleared), but its type remains intact. > XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for > such pages. An intermediate write to such a page via e.g. > MMU_NORMAL_PT_UPDATE, however, would transition the page's type to > PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would > return. In libxc the decision which pages to normalize / localize > depends solely on the type returned from the domctl. As a result without > further precautions the guest won't be able to tell whether such a page > has had its (apparent) PTE entries transitioned to the new MFNs. I'm afraid I don't follow what the problem is. Yes - unvalidated pages probably ought to be consistently NOTAB, so this is probably a good change, but I don't see how it impacts the migration logic. We already have to cope with a page really changing types in parallel with the normalise/localise logic (that was a "fun" one to debug), which is why errors in that logic are specifically not fatal while the guest is live - the frame gets re-marked as dirty, and deferred until the next round. Errors encountered after the VM has been paused are fatal. However, at no point, even with an unvalidated pagetable type, can the contents of the page be anything other than legal PTEs. (I think) ~Andrew
On 11.09.2020 13:55, Andrew Cooper wrote: > On 11/09/2020 11:34, Jan Beulich wrote: >> When a page table page gets de-validated, its type reference count drops >> to zero (and PGT_validated gets cleared), but its type remains intact. >> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for >> such pages. An intermediate write to such a page via e.g. >> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to >> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would >> return. In libxc the decision which pages to normalize / localize >> depends solely on the type returned from the domctl. As a result without >> further precautions the guest won't be able to tell whether such a page >> has had its (apparent) PTE entries transitioned to the new MFNs. > > I'm afraid I don't follow what the problem is. > > Yes - unvalidated pages probably ought to be consistently NOTAB, so this > is probably a good change, but I don't see how it impacts the migration > logic. It's not the migration logic itself that's impacted, but the state of guest pages after migration. I'm afraid I can only try to expand on the original description. Case 1: Once an Ln page has been unvalidated, due to the described behavior the migration code in libxc will normalize and then localize it. Therefore the guest could go and directly try to use it as a page table again. This should work as long as all of the entries in the page can still be successfully validated (i.e. unless the guest itself has made changes to the state of other pages). Case 2: Once an Ln page has been unvalidated, the guest for whatever reason still writes to it through e.g. MMU_NORMAL_PT_UPDATE. Prior to migration, and provided the new entry can be validated (and no other reference page has changed state), the page can still be converted to a proper page table one again. If, however, migration occurs inbetween, the page now won't get normalized and then localized. The MFNs in it are unlikely to make sense anymore, and hence an attempt to make the page a page table again is likely to fail (or if it doesn't fail the result is unlikely to be what's intended). Since there's no way to make case 2 "work", the only choice is to make case 1 behave like case 2, in order for the behavior to be predictable / consistent. > We already have to cope with a page really changing types in parallel > with the normalise/localise logic (that was a "fun" one to debug), which > is why errors in that logic are specifically not fatal while the guest > is live - the frame gets re-marked as dirty, and deferred until the next > round. > > Errors encountered after the VM has been paused are fatal. > > However, at no point, even with an unvalidated pagetable type, can the > contents of the page be anything other than legal PTEs. (I think) Correct, because in order to write to the page one has to either make it a page table one again (and then write through hypercall or for L1 through PTWR) or the mmu-normal-pt-update would first convert the page to a writable one. Jan
On 11.09.2020 14:37, Jan Beulich wrote: > On 11.09.2020 13:55, Andrew Cooper wrote: >> On 11/09/2020 11:34, Jan Beulich wrote: >>> When a page table page gets de-validated, its type reference count drops >>> to zero (and PGT_validated gets cleared), but its type remains intact. >>> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for >>> such pages. An intermediate write to such a page via e.g. >>> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to >>> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would >>> return. In libxc the decision which pages to normalize / localize >>> depends solely on the type returned from the domctl. As a result without >>> further precautions the guest won't be able to tell whether such a page >>> has had its (apparent) PTE entries transitioned to the new MFNs. >> >> I'm afraid I don't follow what the problem is. >> >> Yes - unvalidated pages probably ought to be consistently NOTAB, so this >> is probably a good change, but I don't see how it impacts the migration >> logic. > > It's not the migration logic itself that's impacted, but the state > of guest pages after migration. I'm afraid I can only try to expand > on the original description. > > Case 1: Once an Ln page has been unvalidated, due to the described > behavior the migration code in libxc will normalize and then localize > it. Therefore the guest could go and directly try to use it as a > page table again. This should work as long as all of the entries in > the page can still be successfully validated (i.e. unless the guest > itself has made changes to the state of other pages). > > Case 2: Once an Ln page has been unvalidated, the guest for whatever > reason still writes to it through e.g. MMU_NORMAL_PT_UPDATE. Prior > to migration, and provided the new entry can be validated (and no > other reference page has changed state), the page can still be > converted to a proper page table one again. If, however, migration > occurs inbetween, the page now won't get normalized and then > localized. The MFNs in it are unlikely to make sense anymore, and > hence an attempt to make the page a page table again is likely to > fail (or if it doesn't fail the result is unlikely to be what's > intended). > > Since there's no way to make case 2 "work", the only choice is to > make case 1 behave like case 2, in order for the behavior to be > predictable / consistent. > >> We already have to cope with a page really changing types in parallel >> with the normalise/localise logic (that was a "fun" one to debug), which >> is why errors in that logic are specifically not fatal while the guest >> is live - the frame gets re-marked as dirty, and deferred until the next >> round. >> >> Errors encountered after the VM has been paused are fatal. >> >> However, at no point, even with an unvalidated pagetable type, can the >> contents of the page be anything other than legal PTEs. (I think) > > Correct, because in order to write to the page one has to either > make it a page table one again (and then write through hypercall > or for L1 through PTWR) or the mmu-normal-pt-update would first > convert the page to a writable one. Besides wanting to ping this change / discussion, I'd also like to correct myself on this last part of the reply: The above applies to pages after having got de-validated. However, prior to validation pages have their type changed early (type ref count remains zero), but at this point it can't be told yet whether the page consists of all legal PTEs. Jan
On 11.09.2020 12:34, Jan Beulich wrote: > When a page table page gets de-validated, its type reference count drops > to zero (and PGT_validated gets cleared), but its type remains intact. > XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for > such pages. An intermediate write to such a page via e.g. > MMU_NORMAL_PT_UPDATE, however, would transition the page's type to > PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would > return. In libxc the decision which pages to normalize / localize > depends solely on the type returned from the domctl. As a result without > further precautions the guest won't be able to tell whether such a page > has had its (apparent) PTE entries transitioned to the new MFNs. > > Add a check of PGT_validated, thus consistently avoiding normalization / > localization in the tool stack. > > Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the > change at hand, also change the variable's initializer to use this > constant, too. Take the opportunity and also adjust its type. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> I think I did address all questions here. Jan > --- a/xen/arch/x86/domctl.c > +++ b/xen/arch/x86/domctl.c > @@ -215,7 +215,8 @@ long arch_do_domctl( > > for ( i = 0; i < num; ++i ) > { > - unsigned long gfn = 0, type = 0; > + unsigned long gfn = 0; > + unsigned int type = XEN_DOMCTL_PFINFO_NOTAB; > struct page_info *page; > p2m_type_t t; > > @@ -255,6 +256,8 @@ long arch_do_domctl( > > if ( page->u.inuse.type_info & PGT_pinned ) > type |= XEN_DOMCTL_PFINFO_LPINTAB; > + else if ( !(page->u.inuse.type_info & PGT_validated) ) > + type = XEN_DOMCTL_PFINFO_NOTAB; > > if ( page->count_info & PGC_broken ) > type = XEN_DOMCTL_PFINFO_BROKEN; >
On 11.09.2020 12:34, Jan Beulich wrote: > When a page table page gets de-validated, its type reference count drops > to zero (and PGT_validated gets cleared), but its type remains intact. > XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for > such pages. An intermediate write to such a page via e.g. > MMU_NORMAL_PT_UPDATE, however, would transition the page's type to > PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would > return. In libxc the decision which pages to normalize / localize > depends solely on the type returned from the domctl. As a result without > further precautions the guest won't be able to tell whether such a page > has had its (apparent) PTE entries transitioned to the new MFNs. > > Add a check of PGT_validated, thus consistently avoiding normalization / > localization in the tool stack. > > Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the > change at hand, also change the variable's initializer to use this > constant, too. Take the opportunity and also adjust its type. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> I think I did address all questions here. Jan > --- a/xen/arch/x86/domctl.c > +++ b/xen/arch/x86/domctl.c > @@ -215,7 +215,8 @@ long arch_do_domctl( > > for ( i = 0; i < num; ++i ) > { > - unsigned long gfn = 0, type = 0; > + unsigned long gfn = 0; > + unsigned int type = XEN_DOMCTL_PFINFO_NOTAB; > struct page_info *page; > p2m_type_t t; > > @@ -255,6 +256,8 @@ long arch_do_domctl( > > if ( page->u.inuse.type_info & PGT_pinned ) > type |= XEN_DOMCTL_PFINFO_LPINTAB; > + else if ( !(page->u.inuse.type_info & PGT_validated) ) > + type = XEN_DOMCTL_PFINFO_NOTAB; > > if ( page->count_info & PGC_broken ) > type = XEN_DOMCTL_PFINFO_BROKEN; >
--- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -215,7 +215,8 @@ long arch_do_domctl( for ( i = 0; i < num; ++i ) { - unsigned long gfn = 0, type = 0; + unsigned long gfn = 0; + unsigned int type = XEN_DOMCTL_PFINFO_NOTAB; struct page_info *page; p2m_type_t t; @@ -255,6 +256,8 @@ long arch_do_domctl( if ( page->u.inuse.type_info & PGT_pinned ) type |= XEN_DOMCTL_PFINFO_LPINTAB; + else if ( !(page->u.inuse.type_info & PGT_validated) ) + type = XEN_DOMCTL_PFINFO_NOTAB; if ( page->count_info & PGC_broken ) type = XEN_DOMCTL_PFINFO_BROKEN;
When a page table page gets de-validated, its type reference count drops to zero (and PGT_validated gets cleared), but its type remains intact. XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for such pages. An intermediate write to such a page via e.g. MMU_NORMAL_PT_UPDATE, however, would transition the page's type to PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would return. In libxc the decision which pages to normalize / localize depends solely on the type returned from the domctl. As a result without further precautions the guest won't be able to tell whether such a page has had its (apparent) PTE entries transitioned to the new MFNs. Add a check of PGT_validated, thus consistently avoiding normalization / localization in the tool stack. Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the change at hand, also change the variable's initializer to use this constant, too. Take the opportunity and also adjust its type. Signed-off-by: Jan Beulich <jbeulich@suse.com>