Message ID | 20190502221345.18459-2-tamas@tklengyel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4,1/4] x86/mem_sharing: reorder when pages are unlocked and released | expand |
>>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote: > --- a/xen/arch/x86/mm/mem_sharing.c > +++ b/xen/arch/x86/mm/mem_sharing.c > @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page) > > #endif /* MEM_SHARING_AUDIT */ > > -static inline int mem_sharing_page_lock(struct page_info *pg) > +/* > + * Private implementations of page_lock/unlock to bypass PV-only > + * sanity checks not applicable to mem-sharing. > + */ > +static inline bool _page_lock(struct page_info *page) > { > - int rc; > + unsigned long x, nx; > + > + do { > + while ( (x = page->u.inuse.type_info) & PGT_locked ) > + cpu_relax(); > + nx = x + (1 | PGT_locked); > + if ( !(x & PGT_validated) || > + !(x & PGT_count_mask) || > + !(nx & PGT_count_mask) ) > + return false; Just for my own understanding: Did you verify that the PGT_validated check is indeed needed here, or did you copy it "just in case"? In the latter case a comment may be worthwhile. Furthermore, are there any mem-sharing specific checks reasonable to do here in place of the PV ones you want to avoid? Like pages making it here only ever being of PGT_shared_page type? > --- a/xen/include/asm-x86/mm.h > +++ b/xen/include/asm-x86/mm.h > @@ -356,24 +356,12 @@ struct platform_bad_page { > const struct platform_bad_page *get_platform_badpages(unsigned int > *array_size); > > /* Per page locks: > - * page_lock() is used for two purposes: pte serialization, and memory sharing. > + * page_lock() is used for pte serialization. > * > * All users of page lock for pte serialization live in mm.c, use it > * to lock a page table page during pte updates, do not take other locks ithin > * the critical section delimited by page_lock/unlock, and perform no > * nesting. > - * > - * All users of page lock for memory sharing live in mm/mem_sharing.c. Page_lock > - * is used in memory sharing to protect addition (share) and removal (unshare) > - * of (gfn,domain) tupples to a list of gfn's that the shared page is currently > - * backing. Nesting may happen when sharing (and locking) two pages -- deadlock > - * is avoided by locking pages in increasing order. > - * All memory sharing code paths take the p2m lock of the affected gfn before > - * taking the lock for the underlying page. We enforce ordering between page_lock > - * and p2m_lock using an mm-locks.h construct. > - * > - * These two users (pte serialization and memory sharing) do not collide, since > - * sharing is only supported for hvm guests, which do not perform pv pte updates. > */ > int page_lock(struct page_info *page); > void page_unlock(struct page_info *page); I think it would be helpful to retain (in a slightly adjusted form) the last sentence of the comment above, to clarify that the PGT_locked uses are now what does not end up colliding. At this occasion "which do not perform pv pte updates" would also better be re-worded to e.g. "which do not have PV PTEs updated" (as PVH Dom0 is very much expected to issue PV page table operations for PV DomU-s). Jan
On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote: > > >>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote: > > --- a/xen/arch/x86/mm/mem_sharing.c > > +++ b/xen/arch/x86/mm/mem_sharing.c > > @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page) > > > > #endif /* MEM_SHARING_AUDIT */ > > > > -static inline int mem_sharing_page_lock(struct page_info *pg) > > +/* > > + * Private implementations of page_lock/unlock to bypass PV-only > > + * sanity checks not applicable to mem-sharing. > > + */ > > +static inline bool _page_lock(struct page_info *page) > > { > > - int rc; > > + unsigned long x, nx; > > + > > + do { > > + while ( (x = page->u.inuse.type_info) & PGT_locked ) > > + cpu_relax(); > > + nx = x + (1 | PGT_locked); > > + if ( !(x & PGT_validated) || > > + !(x & PGT_count_mask) || > > + !(nx & PGT_count_mask) ) > > + return false; > > Just for my own understanding: Did you verify that the PGT_validated > check is indeed needed here, or did you copy it "just in case"? In the > latter case a comment may be worthwhile. This is an exact copy of page_lock, sans the asserts that break it from mem_sharing. I didn't investigate which of these flags are necessary for mem_sharing. Frankly, I don't fully understand their meaning and I haven't came across documentation about it yet. I can certainly add a comment saying TODO: figure out which of these flags are actually needed. > Furthermore, are there any mem-sharing specific checks reasonable > to do here in place of the PV ones you want to avoid? Like pages > making it here only ever being of PGT_shared_page type? There are checks already in place after the lock is taken where necessary. Those checks can't be moved in here because they don't apply to all cases uniformly - for example, we also take the lock for when the page type is being converted between shared and not shared. > > --- a/xen/include/asm-x86/mm.h > > +++ b/xen/include/asm-x86/mm.h > > @@ -356,24 +356,12 @@ struct platform_bad_page { > > const struct platform_bad_page *get_platform_badpages(unsigned int > > *array_size); > > > > /* Per page locks: > > - * page_lock() is used for two purposes: pte serialization, and memory sharing. > > + * page_lock() is used for pte serialization. > > * > > * All users of page lock for pte serialization live in mm.c, use it > > * to lock a page table page during pte updates, do not take other locks ithin > > * the critical section delimited by page_lock/unlock, and perform no > > * nesting. > > - * > > - * All users of page lock for memory sharing live in mm/mem_sharing.c. Page_lock > > - * is used in memory sharing to protect addition (share) and removal (unshare) > > - * of (gfn,domain) tupples to a list of gfn's that the shared page is currently > > - * backing. Nesting may happen when sharing (and locking) two pages -- deadlock > > - * is avoided by locking pages in increasing order. > > - * All memory sharing code paths take the p2m lock of the affected gfn before > > - * taking the lock for the underlying page. We enforce ordering between page_lock > > - * and p2m_lock using an mm-locks.h construct. > > - * > > - * These two users (pte serialization and memory sharing) do not collide, since > > - * sharing is only supported for hvm guests, which do not perform pv pte updates. > > */ > > int page_lock(struct page_info *page); > > void page_unlock(struct page_info *page); > > I think it would be helpful to retain (in a slightly adjusted form) the last > sentence of the comment above, to clarify that the PGT_locked uses > are now what does not end up colliding. At this occasion "which do not > perform pv pte updates" would also better be re-worded to e.g. > "which do not have PV PTEs updated" (as PVH Dom0 is very much > expected to issue PV page table operations for PV DomU-s). Sure. Thanks, Tamas
>>> On 03.05.19 at 15:43, <tamas@tklengyel.com> wrote: > On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote: >> >> >>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote: >> > --- a/xen/arch/x86/mm/mem_sharing.c >> > +++ b/xen/arch/x86/mm/mem_sharing.c >> > @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page) >> > >> > #endif /* MEM_SHARING_AUDIT */ >> > >> > -static inline int mem_sharing_page_lock(struct page_info *pg) >> > +/* >> > + * Private implementations of page_lock/unlock to bypass PV-only >> > + * sanity checks not applicable to mem-sharing. >> > + */ >> > +static inline bool _page_lock(struct page_info *page) >> > { >> > - int rc; >> > + unsigned long x, nx; >> > + >> > + do { >> > + while ( (x = page->u.inuse.type_info) & PGT_locked ) >> > + cpu_relax(); >> > + nx = x + (1 | PGT_locked); >> > + if ( !(x & PGT_validated) || >> > + !(x & PGT_count_mask) || >> > + !(nx & PGT_count_mask) ) >> > + return false; >> >> Just for my own understanding: Did you verify that the PGT_validated >> check is indeed needed here, or did you copy it "just in case"? In the >> latter case a comment may be worthwhile. > > This is an exact copy of page_lock, sans the asserts that break it > from mem_sharing. I didn't investigate which of these flags are > necessary for mem_sharing. Frankly, I don't fully understand their > meaning and I haven't came across documentation about it yet. I can > certainly add a comment saying TODO: figure out which of these flags > are actually needed. Yes something along these lines is what I'm after. But "these flags" really is just PGT_validated. There's no question the PGT_locked and PGT_count_mask operations need to remain as they are. Jan
On 5/3/19 2:57 PM, Jan Beulich wrote: >>>> On 03.05.19 at 15:43, <tamas@tklengyel.com> wrote: >> On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote: >>> >>>>>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote: >>>> --- a/xen/arch/x86/mm/mem_sharing.c >>>> +++ b/xen/arch/x86/mm/mem_sharing.c >>>> @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page) >>>> >>>> #endif /* MEM_SHARING_AUDIT */ >>>> >>>> -static inline int mem_sharing_page_lock(struct page_info *pg) >>>> +/* >>>> + * Private implementations of page_lock/unlock to bypass PV-only >>>> + * sanity checks not applicable to mem-sharing. >>>> + */ >>>> +static inline bool _page_lock(struct page_info *page) >>>> { >>>> - int rc; >>>> + unsigned long x, nx; >>>> + >>>> + do { >>>> + while ( (x = page->u.inuse.type_info) & PGT_locked ) >>>> + cpu_relax(); >>>> + nx = x + (1 | PGT_locked); >>>> + if ( !(x & PGT_validated) || >>>> + !(x & PGT_count_mask) || >>>> + !(nx & PGT_count_mask) ) >>>> + return false; >>> >>> Just for my own understanding: Did you verify that the PGT_validated >>> check is indeed needed here, or did you copy it "just in case"? In the >>> latter case a comment may be worthwhile. >> >> This is an exact copy of page_lock, sans the asserts that break it >> from mem_sharing. I didn't investigate which of these flags are >> necessary for mem_sharing. Frankly, I don't fully understand their >> meaning and I haven't came across documentation about it yet. I hope to add some documentation sometime in the next 6 months or so. I've submitted a talk on the refcounting system to the XenSummit as well. Short answer: PGT_validated means that the page in question has been validated as safe to use as the designated type. For PGT_writable, this is instantaneous (i.e., the type is set to PGT_writable with the PGT_validated bit set in the same cmpxchg operation). Other types (such as PGT_l*_table) actually take time to verify, and so you need to first change the type to the new type (so nobody tries to use it as yet a different type), then verify that it's OK to use it as a page table (by recursively checking all the entries), then set the PGT_validated bit. > Yes something along these lines is what I'm after. But "these flags" > really is just PGT_validated. Here's my take: The sorts of "needs validation" types are PV-only I believe; if mem_sharing is only for HVM guests, then the types should only be PGT_writable, for which PGT_validated should always be set. *If* we somehow do get to a situation where the type count > 0 but PGT_validated is clear, something has probably gone terribly wrong; and it would probably be much safer to just fail the page lock. But perhaps that implies there should also be an ASSERT(!(x & PGT_validated))? -George
On Fri, May 3, 2019 at 8:35 AM George Dunlap <george.dunlap@citrix.com> wrote: > > On 5/3/19 2:57 PM, Jan Beulich wrote: > >>>> On 03.05.19 at 15:43, <tamas@tklengyel.com> wrote: > >> On Fri, May 3, 2019 at 2:27 AM Jan Beulich <JBeulich@suse.com> wrote: > >>> > >>>>>> On 03.05.19 at 00:13, <tamas@tklengyel.com> wrote: > >>>> --- a/xen/arch/x86/mm/mem_sharing.c > >>>> +++ b/xen/arch/x86/mm/mem_sharing.c > >>>> @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page) > >>>> > >>>> #endif /* MEM_SHARING_AUDIT */ > >>>> > >>>> -static inline int mem_sharing_page_lock(struct page_info *pg) > >>>> +/* > >>>> + * Private implementations of page_lock/unlock to bypass PV-only > >>>> + * sanity checks not applicable to mem-sharing. > >>>> + */ > >>>> +static inline bool _page_lock(struct page_info *page) > >>>> { > >>>> - int rc; > >>>> + unsigned long x, nx; > >>>> + > >>>> + do { > >>>> + while ( (x = page->u.inuse.type_info) & PGT_locked ) > >>>> + cpu_relax(); > >>>> + nx = x + (1 | PGT_locked); > >>>> + if ( !(x & PGT_validated) || > >>>> + !(x & PGT_count_mask) || > >>>> + !(nx & PGT_count_mask) ) > >>>> + return false; > >>> > >>> Just for my own understanding: Did you verify that the PGT_validated > >>> check is indeed needed here, or did you copy it "just in case"? In the > >>> latter case a comment may be worthwhile. > >> > >> This is an exact copy of page_lock, sans the asserts that break it > >> from mem_sharing. I didn't investigate which of these flags are > >> necessary for mem_sharing. Frankly, I don't fully understand their > >> meaning and I haven't came across documentation about it yet. > > I hope to add some documentation sometime in the next 6 months or so. > I've submitted a talk on the refcounting system to the XenSummit as well. Great, looking forward to it! > > Short answer: PGT_validated means that the page in question has been > validated as safe to use as the designated type. For PGT_writable, this > is instantaneous (i.e., the type is set to PGT_writable with the > PGT_validated bit set in the same cmpxchg operation). > > Other types (such as PGT_l*_table) actually take time to verify, and so > you need to first change the type to the new type (so nobody tries to > use it as yet a different type), then verify that it's OK to use it as a > page table (by recursively checking all the entries), then set the > PGT_validated bit. > > > Yes something along these lines is what I'm after. But "these flags" > > really is just PGT_validated. > > Here's my take: > > The sorts of "needs validation" types are PV-only I believe; if > mem_sharing is only for HVM guests, then the types should only be > PGT_writable, for which PGT_validated should always be set. In which case it does sound like it's a superfluous check but it's also harmless. > > *If* we somehow do get to a situation where the type count > 0 but > PGT_validated is clear, something has probably gone terribly wrong; and > it would probably be much safer to just fail the page lock. Isn't that exactly what happens? if ( !(x & PGT_validated) || !(x & PGT_count_mask) || !(nx & PGT_count_mask) ) return false; > > But perhaps that implies there should also be an ASSERT(!(x & > PGT_validated))? Well, if the lock failed we already BUG out when it's for runtime deduplication and return error when it's an operation from toolstack. So I don't think we need that extra ASSERT. Tamas
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 4b3a094481..baae7ceeda 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -112,13 +112,48 @@ static inline void page_sharing_dispose(struct page_info *page) #endif /* MEM_SHARING_AUDIT */ -static inline int mem_sharing_page_lock(struct page_info *pg) +/* + * Private implementations of page_lock/unlock to bypass PV-only + * sanity checks not applicable to mem-sharing. + */ +static inline bool _page_lock(struct page_info *page) { - int rc; + unsigned long x, nx; + + do { + while ( (x = page->u.inuse.type_info) & PGT_locked ) + cpu_relax(); + nx = x + (1 | PGT_locked); + if ( !(x & PGT_validated) || + !(x & PGT_count_mask) || + !(nx & PGT_count_mask) ) + return false; + } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x ); + + return true; +} + +static inline void _page_unlock(struct page_info *page) +{ + unsigned long x, nx, y = page->u.inuse.type_info; + + do { + x = y; + ASSERT((x & PGT_count_mask) && (x & PGT_locked)); + + nx = x - (1 | PGT_locked); + /* We must not drop the last reference here. */ + ASSERT(nx & PGT_count_mask); + } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x ); +} + +static inline bool mem_sharing_page_lock(struct page_info *pg) +{ + bool rc; pg_lock_data_t *pld = &(this_cpu(__pld)); page_sharing_mm_pre_lock(); - rc = page_lock(pg); + rc = _page_lock(pg); if ( rc ) { preempt_disable(); @@ -135,7 +170,7 @@ static inline void mem_sharing_page_unlock(struct page_info *pg) page_sharing_mm_unlock(pld->mm_unlock_level, &pld->recurse_count); preempt_enable(); - page_unlock(pg); + _page_unlock(pg); } static inline shr_handle_t get_next_handle(void) diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 6faa563167..7dc7e33f73 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -356,24 +356,12 @@ struct platform_bad_page { const struct platform_bad_page *get_platform_badpages(unsigned int *array_size); /* Per page locks: - * page_lock() is used for two purposes: pte serialization, and memory sharing. + * page_lock() is used for pte serialization. * * All users of page lock for pte serialization live in mm.c, use it * to lock a page table page during pte updates, do not take other locks within * the critical section delimited by page_lock/unlock, and perform no * nesting. - * - * All users of page lock for memory sharing live in mm/mem_sharing.c. Page_lock - * is used in memory sharing to protect addition (share) and removal (unshare) - * of (gfn,domain) tupples to a list of gfn's that the shared page is currently - * backing. Nesting may happen when sharing (and locking) two pages -- deadlock - * is avoided by locking pages in increasing order. - * All memory sharing code paths take the p2m lock of the affected gfn before - * taking the lock for the underlying page. We enforce ordering between page_lock - * and p2m_lock using an mm-locks.h construct. - * - * These two users (pte serialization and memory sharing) do not collide, since - * sharing is only supported for hvm guests, which do not perform pv pte updates. */ int page_lock(struct page_info *page); void page_unlock(struct page_info *page);
Patch cf4b30dca0a "Add debug code to detect illegal page_lock and put_page_type ordering" added extra sanity checking to page_lock/page_unlock for debug builds with the assumption that no hypervisor path ever locks two pages at once. This assumption doesn't hold during memory sharing so we copy a version of page_lock/unlock to be used exclusively in the memory sharing subsystem without the sanity checks. Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Roger Pau Monne <roger.pau@citrix.com> --- xen/arch/x86/mm/mem_sharing.c | 43 +++++++++++++++++++++++++++++++---- xen/include/asm-x86/mm.h | 14 +----------- 2 files changed, 40 insertions(+), 17 deletions(-)