Message ID | 1502228707-31883-3-git-send-email-boris.ostrovsky@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote: > .. so that it's easy to find pages that need to be scrubbed (those pages are > now marked with _PGC_need_scrub bit). > > We keep track of the first unscrubbed page in a page buddy using first_dirty > field. For now it can have two values, 0 (whole buddy needs scrubbing) or > INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent > patches > will allow scrubbing to be interrupted, resulting in first_dirty taking any > value. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with one remark: > --- a/xen/include/asm-x86/mm.h > +++ b/xen/include/asm-x86/mm.h > @@ -88,7 +88,15 @@ struct page_info > /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ > struct { > /* Do TLBs need flushing for safety before next page use? */ > - bool_t need_tlbflush; > + bool need_tlbflush:1; > + > + /* > + * Index of the first *possibly* unscrubbed page in the buddy. > + * One more bit than maximum possible order to accommodate > + * INVALID_DIRTY_IDX. > + */ > +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) > + unsigned long first_dirty:MAX_ORDER + 1; > } free; I think generated code will be better with the two fields swapped: That way reading first_dirty won't involve a shift, and accessing a single bit doesn't require shifts at all on many architectures. Jan
Hi Boris, On 08/08/17 22:45, Boris Ostrovsky wrote: > .. so that it's easy to find pages that need to be scrubbed (those pages are > now marked with _PGC_need_scrub bit). > > We keep track of the first unscrubbed page in a page buddy using first_dirty > field. For now it can have two values, 0 (whole buddy needs scrubbing) or > INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches > will allow scrubbing to be interrupted, resulting in first_dirty taking any > value. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> For the ARM bits and with Jan suggestion: Acked-by: Julien Grall <julien.grall@arm.com> Cheers,
On 08/14/2017 06:37 AM, Jan Beulich wrote: >>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote: >> .. so that it's easy to find pages that need to be scrubbed (those pages are >> now marked with _PGC_need_scrub bit). >> >> We keep track of the first unscrubbed page in a page buddy using first_dirty >> field. For now it can have two values, 0 (whole buddy needs scrubbing) or >> INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent >> patches >> will allow scrubbing to be interrupted, resulting in first_dirty taking any >> value. >> >> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > with one remark: > >> --- a/xen/include/asm-x86/mm.h >> +++ b/xen/include/asm-x86/mm.h >> @@ -88,7 +88,15 @@ struct page_info >> /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ >> struct { >> /* Do TLBs need flushing for safety before next page use? */ >> - bool_t need_tlbflush; >> + bool need_tlbflush:1; >> + >> + /* >> + * Index of the first *possibly* unscrubbed page in the buddy. >> + * One more bit than maximum possible order to accommodate >> + * INVALID_DIRTY_IDX. >> + */ >> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >> + unsigned long first_dirty:MAX_ORDER + 1; >> } free; > I think generated code will be better with the two fields swapped: > That way reading first_dirty won't involve a shift, and accessing a > single bit doesn't require shifts at all on many architectures. Ok, I will then keep need_tlbflush as the last field so the final struct (as defined in patch 7) will look like struct { unsigned long first_dirty:MAX_ORDER + 1; unsigned long scrub_state:2; bool need_tlbflush:1; }; -boris
>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote: > On 08/14/2017 06:37 AM, Jan Beulich wrote: >>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote: >>> --- a/xen/include/asm-x86/mm.h >>> +++ b/xen/include/asm-x86/mm.h >>> @@ -88,7 +88,15 @@ struct page_info >>> /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ >>> struct { >>> /* Do TLBs need flushing for safety before next page use? */ >>> - bool_t need_tlbflush; >>> + bool need_tlbflush:1; >>> + >>> + /* >>> + * Index of the first *possibly* unscrubbed page in the buddy. >>> + * One more bit than maximum possible order to accommodate >>> + * INVALID_DIRTY_IDX. >>> + */ >>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >>> + unsigned long first_dirty:MAX_ORDER + 1; >>> } free; >> I think generated code will be better with the two fields swapped: >> That way reading first_dirty won't involve a shift, and accessing a >> single bit doesn't require shifts at all on many architectures. > > Ok, I will then keep need_tlbflush as the last field so the final struct > (as defined in patch 7) will look like > > struct { > unsigned long first_dirty:MAX_ORDER + 1; > unsigned long scrub_state:2; > bool need_tlbflush:1; > }; Hmm, actually - why do you need bitfields on the x86 side at all? They're needed for 32-bit architectures only, 64-bit ones ought to be fine with struct { unsigned int first_dirty; bool need_tlbflush; uint8_t scrub_state; }; (plus a suitable BUILD_BUG_ON() to make sure first_dirty has at least MAX_ORDER + 1 bits). The ARM maintainers will know whether they would want to also differentiate ARM32 and ARM64 here. Jan
On 08/15/2017 04:18 AM, Jan Beulich wrote: >>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote: >> On 08/14/2017 06:37 AM, Jan Beulich wrote: >>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote: >>>> --- a/xen/include/asm-x86/mm.h >>>> +++ b/xen/include/asm-x86/mm.h >>>> @@ -88,7 +88,15 @@ struct page_info >>>> /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ >>>> struct { >>>> /* Do TLBs need flushing for safety before next page use? */ >>>> - bool_t need_tlbflush; >>>> + bool need_tlbflush:1; >>>> + >>>> + /* >>>> + * Index of the first *possibly* unscrubbed page in the buddy. >>>> + * One more bit than maximum possible order to accommodate >>>> + * INVALID_DIRTY_IDX. >>>> + */ >>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >>>> + unsigned long first_dirty:MAX_ORDER + 1; >>>> } free; >>> I think generated code will be better with the two fields swapped: >>> That way reading first_dirty won't involve a shift, and accessing a >>> single bit doesn't require shifts at all on many architectures. >> Ok, I will then keep need_tlbflush as the last field so the final struct >> (as defined in patch 7) will look like >> >> struct { >> unsigned long first_dirty:MAX_ORDER + 1; >> unsigned long scrub_state:2; >> bool need_tlbflush:1; >> }; > Hmm, actually - why do you need bitfields on the x86 side at all? > They're needed for 32-bit architectures only, 64-bit ones ought > to be fine with > > struct { > unsigned int first_dirty; > bool need_tlbflush; > uint8_t scrub_state; > }; IIRC it was exactly because of ARM32 and at some point you suggested to switch both x86 and ARM to bitfields. > > (plus a suitable BUILD_BUG_ON() to make sure first_dirty has > at least MAX_ORDER + 1 bits). The ARM maintainers will know > whether they would want to also differentiate ARM32 and > ARM64 here. Isn't using bitfields the only possibility for 32-bit? We can't fit first_dirty into 2 bytes. -boris
>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote: > On 08/15/2017 04:18 AM, Jan Beulich wrote: >>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote: >>> On 08/14/2017 06:37 AM, Jan Beulich wrote: >>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote: >>>>> --- a/xen/include/asm-x86/mm.h >>>>> +++ b/xen/include/asm-x86/mm.h >>>>> @@ -88,7 +88,15 @@ struct page_info >>>>> /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ >>>>> struct { >>>>> /* Do TLBs need flushing for safety before next page use? */ >>>>> - bool_t need_tlbflush; >>>>> + bool need_tlbflush:1; >>>>> + >>>>> + /* >>>>> + * Index of the first *possibly* unscrubbed page in the buddy. >>>>> + * One more bit than maximum possible order to accommodate >>>>> + * INVALID_DIRTY_IDX. >>>>> + */ >>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >>>>> + unsigned long first_dirty:MAX_ORDER + 1; >>>>> } free; >>>> I think generated code will be better with the two fields swapped: >>>> That way reading first_dirty won't involve a shift, and accessing a >>>> single bit doesn't require shifts at all on many architectures. >>> Ok, I will then keep need_tlbflush as the last field so the final struct >>> (as defined in patch 7) will look like >>> >>> struct { >>> unsigned long first_dirty:MAX_ORDER + 1; >>> unsigned long scrub_state:2; >>> bool need_tlbflush:1; >>> }; >> Hmm, actually - why do you need bitfields on the x86 side at all? >> They're needed for 32-bit architectures only, 64-bit ones ought >> to be fine with >> >> struct { >> unsigned int first_dirty; >> bool need_tlbflush; >> uint8_t scrub_state; >> }; > > IIRC it was exactly because of ARM32 and at some point you suggested to > switch both x86 and ARM to bitfields. I don't recall for sure whether I had asked for the change to be done uniformly; it was certainly ARM32 that triggered me notice the structure size change in your original version. >> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has >> at least MAX_ORDER + 1 bits). The ARM maintainers will know >> whether they would want to also differentiate ARM32 and >> ARM64 here. > > Isn't using bitfields the only possibility for 32-bit? We can't fit > first_dirty into 2 bytes. Yes, hence the question whether to stay with bitfields uniformly or make ARM64 follow x86, but ARM32 keep using bitfields. Jan
Hi Jan, On 15/08/17 15:51, Jan Beulich wrote: >>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote: >> On 08/15/2017 04:18 AM, Jan Beulich wrote: >>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote: >>>> On 08/14/2017 06:37 AM, Jan Beulich wrote: >>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote: >>>>>> --- a/xen/include/asm-x86/mm.h >>>>>> +++ b/xen/include/asm-x86/mm.h >>>>>> @@ -88,7 +88,15 @@ struct page_info >>>>>> /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ >>>>>> struct { >>>>>> /* Do TLBs need flushing for safety before next page use? */ >>>>>> - bool_t need_tlbflush; >>>>>> + bool need_tlbflush:1; >>>>>> + >>>>>> + /* >>>>>> + * Index of the first *possibly* unscrubbed page in the buddy. >>>>>> + * One more bit than maximum possible order to accommodate >>>>>> + * INVALID_DIRTY_IDX. >>>>>> + */ >>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >>>>>> + unsigned long first_dirty:MAX_ORDER + 1; >>>>>> } free; >>>>> I think generated code will be better with the two fields swapped: >>>>> That way reading first_dirty won't involve a shift, and accessing a >>>>> single bit doesn't require shifts at all on many architectures. >>>> Ok, I will then keep need_tlbflush as the last field so the final struct >>>> (as defined in patch 7) will look like >>>> >>>> struct { >>>> unsigned long first_dirty:MAX_ORDER + 1; >>>> unsigned long scrub_state:2; >>>> bool need_tlbflush:1; >>>> }; >>> Hmm, actually - why do you need bitfields on the x86 side at all? >>> They're needed for 32-bit architectures only, 64-bit ones ought >>> to be fine with >>> >>> struct { >>> unsigned int first_dirty; >>> bool need_tlbflush; >>> uint8_t scrub_state; >>> }; >> >> IIRC it was exactly because of ARM32 and at some point you suggested to >> switch both x86 and ARM to bitfields. > > I don't recall for sure whether I had asked for the change to be done > uniformly; it was certainly ARM32 that triggered me notice the > structure size change in your original version. > >>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has >>> at least MAX_ORDER + 1 bits). The ARM maintainers will know >>> whether they would want to also differentiate ARM32 and >>> ARM64 here. >> >> Isn't using bitfields the only possibility for 32-bit? We can't fit >> first_dirty into 2 bytes. > > Yes, hence the question whether to stay with bitfields uniformly > or make ARM64 follow x86, but ARM32 keep using bitfields. I would prefer to avoid differentiation between Arm32 and Arm64. Cheers,
On 08/15/2017 10:52 AM, Julien Grall wrote: > Hi Jan, > > On 15/08/17 15:51, Jan Beulich wrote: >>>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote: >>> On 08/15/2017 04:18 AM, Jan Beulich wrote: >>>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote: >>>>> On 08/14/2017 06:37 AM, Jan Beulich wrote: >>>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote: >>>>>>> --- a/xen/include/asm-x86/mm.h >>>>>>> +++ b/xen/include/asm-x86/mm.h >>>>>>> @@ -88,7 +88,15 @@ struct page_info >>>>>>> /* Page is on a free list: ((count_info & >>>>>>> PGC_count_mask) == 0). */ >>>>>>> struct { >>>>>>> /* Do TLBs need flushing for safety before next >>>>>>> page use? */ >>>>>>> - bool_t need_tlbflush; >>>>>>> + bool need_tlbflush:1; >>>>>>> + >>>>>>> + /* >>>>>>> + * Index of the first *possibly* unscrubbed page in >>>>>>> the buddy. >>>>>>> + * One more bit than maximum possible order to >>>>>>> accommodate >>>>>>> + * INVALID_DIRTY_IDX. >>>>>>> + */ >>>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >>>>>>> + unsigned long first_dirty:MAX_ORDER + 1; >>>>>>> } free; >>>>>> I think generated code will be better with the two fields swapped: >>>>>> That way reading first_dirty won't involve a shift, and accessing a >>>>>> single bit doesn't require shifts at all on many architectures. >>>>> Ok, I will then keep need_tlbflush as the last field so the final >>>>> struct >>>>> (as defined in patch 7) will look like >>>>> >>>>> struct { >>>>> unsigned long first_dirty:MAX_ORDER + 1; >>>>> unsigned long scrub_state:2; >>>>> bool need_tlbflush:1; >>>>> }; >>>> Hmm, actually - why do you need bitfields on the x86 side at all? >>>> They're needed for 32-bit architectures only, 64-bit ones ought >>>> to be fine with >>>> >>>> struct { >>>> unsigned int first_dirty; >>>> bool need_tlbflush; >>>> uint8_t scrub_state; >>>> }; >>> >>> IIRC it was exactly because of ARM32 and at some point you suggested to >>> switch both x86 and ARM to bitfields. >> >> I don't recall for sure whether I had asked for the change to be done >> uniformly; it was certainly ARM32 that triggered me notice the >> structure size change in your original version. >> >>>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has >>>> at least MAX_ORDER + 1 bits). The ARM maintainers will know >>>> whether they would want to also differentiate ARM32 and >>>> ARM64 here. >>> >>> Isn't using bitfields the only possibility for 32-bit? We can't fit >>> first_dirty into 2 bytes. >> >> Yes, hence the question whether to stay with bitfields uniformly >> or make ARM64 follow x86, but ARM32 keep using bitfields. > > I would prefer to avoid differentiation between Arm32 and Arm64. I can switch x86 only back to "normal" types but then we also have this in patch 7: static void check_and_stop_scrub(struct page_info *head) { if ( head->u.free.scrub_state == BUDDY_SCRUBBING ) { typeof(head->u.free) pgfree; head->u.free.scrub_state = BUDDY_SCRUB_ABORT; spin_lock_kick(); for ( ; ; ) { /* Can't ACCESS_ONCE() a bitfield. */ pgfree.val = ACCESS_ONCE(head->u.free.val); if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT ) break; cpu_relax(); } } } I'd rather avoid having '#ifdef <arch>' meaning that union { struct { unsigned int first_dirty; bool need_tlbflush; uint8_t scrub_state; }; unsigned long val; } free; is still kept for x86. -boris
>>> On 15.08.17 at 17:03, <boris.ostrovsky@oracle.com> wrote: > I can switch x86 only back to "normal" types but then we also have this > in patch 7: > > static void check_and_stop_scrub(struct page_info *head) > { > if ( head->u.free.scrub_state == BUDDY_SCRUBBING ) > { > typeof(head->u.free) pgfree; > > head->u.free.scrub_state = BUDDY_SCRUB_ABORT; > spin_lock_kick(); > for ( ; ; ) > { > /* Can't ACCESS_ONCE() a bitfield. */ > pgfree.val = ACCESS_ONCE(head->u.free.val); > if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT ) > break; > cpu_relax(); > } > } > } > > I'd rather avoid having '#ifdef <arch>' meaning that > > union { > struct { > unsigned int first_dirty; > bool need_tlbflush; > uint8_t scrub_state; > }; > > unsigned long val; > } free; > > is still kept for x86. That's fine I think. Jan
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 330f8ed..b857a44 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1]; static heap_by_zone_and_order_t *_heap[MAX_NUMNODES]; #define heap(node, zone, order) ((*_heap[node])[zone][order]) +static unsigned long node_need_scrub[MAX_NUMNODES]; + static unsigned long *avail[MAX_NUMNODES]; static long total_avail_pages; @@ -678,13 +680,30 @@ static void check_low_mem_virq(void) } } +/* Pages that need a scrub are added to tail, otherwise to head. */ +static void page_list_add_scrub(struct page_info *pg, unsigned int node, + unsigned int zone, unsigned int order, + unsigned int first_dirty) +{ + PFN_ORDER(pg) = order; + pg->u.free.first_dirty = first_dirty; + + if ( first_dirty != INVALID_DIRTY_IDX ) + { + ASSERT(first_dirty < (1U << order)); + page_list_add_tail(pg, &heap(node, zone, order)); + } + else + page_list_add(pg, &heap(node, zone, order)); +} + /* Allocate 2^@order contiguous pages. */ static struct page_info *alloc_heap_pages( unsigned int zone_lo, unsigned int zone_hi, unsigned int order, unsigned int memflags, struct domain *d) { - unsigned int i, j, zone = 0, nodemask_retry = 0; + unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty; nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node; unsigned long request = 1UL << order; struct page_info *pg; @@ -798,12 +817,26 @@ static struct page_info *alloc_heap_pages( return NULL; found: + + first_dirty = pg->u.free.first_dirty; + /* We may have to halve the chunk a number of times. */ while ( j != order ) { - PFN_ORDER(pg) = --j; - page_list_add_tail(pg, &heap(node, zone, j)); - pg += 1 << j; + j--; + page_list_add_scrub(pg, node, zone, j, + (1U << j) > first_dirty ? + first_dirty : INVALID_DIRTY_IDX); + pg += 1U << j; + + if ( first_dirty != INVALID_DIRTY_IDX ) + { + /* Adjust first_dirty */ + if ( first_dirty >= 1U << j ) + first_dirty -= 1U << j; + else + first_dirty = 0; /* We've moved past original first_dirty */ + } } ASSERT(avail[node][zone] >= request); @@ -850,12 +883,20 @@ static int reserve_offlined_page(struct page_info *head) unsigned int node = phys_to_nid(page_to_maddr(head)); int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0; struct page_info *cur_head; - int cur_order; + unsigned int cur_order, first_dirty; ASSERT(spin_is_locked(&heap_lock)); cur_head = head; + /* + * We may break the buddy so let's mark the head as clean. Then, when + * merging chunks back into the heap, we will see whether the chunk has + * unscrubbed pages and set its first_dirty properly. + */ + first_dirty = head->u.free.first_dirty; + head->u.free.first_dirty = INVALID_DIRTY_IDX; + page_list_del(head, &heap(node, zone, head_order)); while ( cur_head < (head + (1 << head_order)) ) @@ -866,6 +907,8 @@ static int reserve_offlined_page(struct page_info *head) if ( page_state_is(cur_head, offlined) ) { cur_head++; + if ( first_dirty != INVALID_DIRTY_IDX && first_dirty ) + first_dirty--; continue; } @@ -892,9 +935,20 @@ static int reserve_offlined_page(struct page_info *head) { merge: /* We don't consider merging outside the head_order. */ - page_list_add_tail(cur_head, &heap(node, zone, cur_order)); - PFN_ORDER(cur_head) = cur_order; + page_list_add_scrub(cur_head, node, zone, cur_order, + (1U << cur_order) > first_dirty ? + first_dirty : INVALID_DIRTY_IDX); cur_head += (1 << cur_order); + + /* Adjust first_dirty if needed. */ + if ( first_dirty != INVALID_DIRTY_IDX ) + { + if ( first_dirty >= 1U << cur_order ) + first_dirty -= 1U << cur_order; + else + first_dirty = 0; + } + break; } } @@ -919,9 +973,53 @@ static int reserve_offlined_page(struct page_info *head) return count; } +static void scrub_free_pages(unsigned int node) +{ + struct page_info *pg; + unsigned int zone; + + ASSERT(spin_is_locked(&heap_lock)); + + if ( !node_need_scrub[node] ) + return; + + for ( zone = 0; zone < NR_ZONES; zone++ ) + { + unsigned int order = MAX_ORDER; + + do { + while ( !page_list_empty(&heap(node, zone, order)) ) + { + unsigned int i; + + /* Unscrubbed pages are always at the end of the list. */ + pg = page_list_last(&heap(node, zone, order)); + if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX ) + break; + + for ( i = pg->u.free.first_dirty; i < (1U << order); i++) + { + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) + { + scrub_one_page(&pg[i]); + pg[i].count_info &= ~PGC_need_scrub; + node_need_scrub[node]--; + } + } + + page_list_del(pg, &heap(node, zone, order)); + page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX); + + if ( node_need_scrub[node] == 0 ) + return; + } + } while ( order-- != 0 ); + } +} + /* Free 2^@order set of pages. */ static void free_heap_pages( - struct page_info *pg, unsigned int order) + struct page_info *pg, unsigned int order, bool need_scrub) { unsigned long mask, mfn = page_to_mfn(pg); unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0; @@ -961,10 +1059,20 @@ static void free_heap_pages( /* This page is not a guest frame any more. */ page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */ set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY); + + if ( need_scrub ) + pg[i].count_info |= PGC_need_scrub; } avail[node][zone] += 1 << order; total_avail_pages += 1 << order; + if ( need_scrub ) + { + node_need_scrub[node] += 1 << order; + pg->u.free.first_dirty = 0; + } + else + pg->u.free.first_dirty = INVALID_DIRTY_IDX; if ( tmem_enabled() ) midsize_alloc_zone_pages = max( @@ -988,6 +1096,12 @@ static void free_heap_pages( page_list_del(predecessor, &heap(node, zone, order)); + /* Keep predecessor's first_dirty if it is already set. */ + if ( predecessor->u.free.first_dirty == INVALID_DIRTY_IDX && + pg->u.free.first_dirty != INVALID_DIRTY_IDX ) + predecessor->u.free.first_dirty = (1U << order) + + pg->u.free.first_dirty; + pg = predecessor; } else @@ -1007,12 +1121,14 @@ static void free_heap_pages( order++; } - PFN_ORDER(pg) = order; - page_list_add_tail(pg, &heap(node, zone, order)); + page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty); if ( tainted ) reserve_offlined_page(pg); + if ( need_scrub ) + scrub_free_pages(node); + spin_unlock(&heap_lock); } @@ -1233,7 +1349,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status) spin_unlock(&heap_lock); if ( (y & PGC_state) == PGC_state_offlined ) - free_heap_pages(pg, 0); + free_heap_pages(pg, 0, false); return ret; } @@ -1276,6 +1392,8 @@ static void init_heap_pages( { unsigned long i; + BUILD_BUG_ON(sizeof(pg->u) != sizeof(unsigned long)); + for ( i = 0; i < nr_pages; i++ ) { unsigned int nid = phys_to_nid(page_to_maddr(pg+i)); @@ -1302,7 +1420,7 @@ static void init_heap_pages( nr_pages -= n; } - free_heap_pages(pg+i, 0); + free_heap_pages(pg + i, 0, false); } } @@ -1629,7 +1747,7 @@ void free_xenheap_pages(void *v, unsigned int order) memguard_guard_range(v, 1 << (order + PAGE_SHIFT)); - free_heap_pages(virt_to_page(v), order); + free_heap_pages(virt_to_page(v), order, false); } #else @@ -1683,12 +1801,9 @@ void free_xenheap_pages(void *v, unsigned int order) pg = virt_to_page(v); for ( i = 0; i < (1u << order); i++ ) - { - scrub_one_page(&pg[i]); pg[i].count_info &= ~PGC_xen_heap; - } - free_heap_pages(pg, order); + free_heap_pages(pg, order, true); } #endif @@ -1797,7 +1912,7 @@ struct page_info *alloc_domheap_pages( if ( d && !(memflags & MEMF_no_owner) && assign_pages(d, pg, order, memflags) ) { - free_heap_pages(pg, order); + free_heap_pages(pg, order, false); return NULL; } @@ -1865,11 +1980,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order) scrub = 1; } - if ( unlikely(scrub) ) - for ( i = 0; i < (1 << order); i++ ) - scrub_one_page(&pg[i]); - - free_heap_pages(pg, order); + free_heap_pages(pg, order, scrub); } if ( drop_dom_ref ) diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index ef84b72..572337c 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -44,7 +44,16 @@ struct page_info /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ struct { /* Do TLBs need flushing for safety before next page use? */ - bool_t need_tlbflush; + bool need_tlbflush:1; + + /* + * Index of the first *possibly* unscrubbed page in the buddy. + * One more bit than maximum possible order to accommodate + * INVALID_DIRTY_IDX. + */ +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) + unsigned long first_dirty:MAX_ORDER + 1; + } free; } u; @@ -107,6 +116,13 @@ struct page_info #define PGC_count_width PG_shift(9) #define PGC_count_mask ((1UL<<PGC_count_width)-1) +/* + * Page needs to be scrubbed. Since this bit can only be set on a page that is + * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit. + */ +#define _PGC_need_scrub _PGC_allocated +#define PGC_need_scrub PGC_allocated + extern mfn_t xenheap_mfn_start, xenheap_mfn_end; extern vaddr_t xenheap_virt_end; #ifdef CONFIG_ARM_64 diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 2bf3f33..07dc963 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -88,7 +88,15 @@ struct page_info /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ struct { /* Do TLBs need flushing for safety before next page use? */ - bool_t need_tlbflush; + bool need_tlbflush:1; + + /* + * Index of the first *possibly* unscrubbed page in the buddy. + * One more bit than maximum possible order to accommodate + * INVALID_DIRTY_IDX. + */ +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) + unsigned long first_dirty:MAX_ORDER + 1; } free; } u; @@ -233,6 +241,13 @@ struct page_info #define PGC_count_width PG_shift(9) #define PGC_count_mask ((1UL<<PGC_count_width)-1) +/* + * Page needs to be scrubbed. Since this bit can only be set on a page that is + * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit. + */ +#define _PGC_need_scrub _PGC_allocated +#define PGC_need_scrub PGC_allocated + #define is_xen_heap_page(page) ((page)->count_info & PGC_xen_heap) #define is_xen_heap_mfn(mfn) \ (__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
.. so that it's easy to find pages that need to be scrubbed (those pages are now marked with _PGC_need_scrub bit). We keep track of the first unscrubbed page in a page buddy using first_dirty field. For now it can have two values, 0 (whole buddy needs scrubbing) or INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches will allow scrubbing to be interrupted, resulting in first_dirty taking any value. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- Changes in v7 * Simplified tracking dirty pages in reserve_offlined_page() to match how it is done in alloc_heap_pages() * Changed page_info.u.free.need_tlbflush definition to bool:1 * Added BUILD_BUG_ON() to make sure page_info.u stays within unsigned long xen/common/page_alloc.c | 157 ++++++++++++++++++++++++++++++++++++++++------- xen/include/asm-arm/mm.h | 18 +++++- xen/include/asm-x86/mm.h | 17 ++++- 3 files changed, 167 insertions(+), 25 deletions(-)