Message ID | 1501866346-9774-2-git-send-email-boris.ostrovsky@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>> >@@ -873,6 +916,8 @@ static int reserve_offlined_page(struct page_info *head) > >while ( cur_order < head_order ) >{ >+ unsigned int idx = INVALID_DIRTY_IDX; Is it correct for the variable to live in this scope, rather than ... >@@ -892,8 +937,28 @@ static int reserve_offlined_page(struct page_info *head) >{ >merge: ... in this one? Of course it's less the variable scope itself, but the initial value at the point here. >/* We don't consider merging outside the head_order. */ >- page_list_add_tail(cur_head, &heap(node, zone, cur_order)); >- PFN_ORDER(cur_head) = cur_order; >+ >+ /* See if any of the pages indeed need scrubbing. */ >+ if ( first_dirty != INVALID_DIRTY_IDX ) >+ { >+ if ( (1U << cur_order) > first_dirty ) >+ { >+ for ( i = first_dirty ; i < (1U << cur_order); i++ ) >+ if ( test_bit(_PGC_need_scrub, >+ &cur_head[i].count_info) ) >+ { >+ idx = i; >+ break; >+ } Why again do you need to look through all the pages here, rather than simply marking the chunks as needing scrubbing simply based on first_dirty? It seems to me that I've asked this before, which is a good indication that such special behavior would better have a comment attached. >@@ -977,35 +1096,49 @@ static void free_heap_pages( > >if ( (page_to_mfn(pg) & mask) ) >{ >+ struct page_info *predecessor = pg - mask; For this and ... >} >else >{ >+ struct page_info *successor = pg + mask; ... this, it would certainly help readability of the patch here if the introduction of the new local variables was broken out in a prereq patch. But yes, I should have asked for this earlier on, so I'm not going to insist. >--- a/xen/include/asm-x86/mm.h >+++ b/xen/include/asm-x86/mm.h >@@ -88,7 +88,15 @@ struct page_info >/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ >struct { >/* Do TLBs need flushing for safety before next page use? */ >- bool_t need_tlbflush; >+ unsigned long need_tlbflush:1; >+ >+ /* >+ * Index of the first *possibly* unscrubbed page in the buddy. >+ * One more than maximum possible order to accommodate "One more bit than ..." (I think I did point out the ambiguity of the wording here before). Jan
Hi Boris, I would have appreciated to be CCed as maintainer of the ARM bits... Please use scripts/get_maintainers.pl in the future. On 04/08/17 18:05, Boris Ostrovsky wrote: > . so that it's easy to find pages that need to be scrubbed (those pages are Pointless . > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > index ef84b72..d26b232 100644 > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -44,7 +44,16 @@ struct page_info > /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ > struct { > /* Do TLBs need flushing for safety before next page use? */ > - bool_t need_tlbflush; > + unsigned long need_tlbflush:1; You've turned need_tlbflush from bool to unsigned long : 1. But some of the users use true/false or may rely on the boolean property. So it sounds like to me you want to use bool bitfields here (and in the x86 part). > + > + /* > + * Index of the first *possibly* unscrubbed page in the buddy. > + * One more than maximum possible order to accommodate > + * INVALID_DIRTY_IDX. > + */ > +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) > + unsigned long first_dirty:MAX_ORDER + 1; We need to make sure that this union will not be bigger than unsigned long. Otherwise this will limit lower down the maximum amount of memory we support. So this likely means a BUILD_BUG_ON(....). Cheers,
On 08/06/2017 01:41 PM, Jan Beulich wrote: >>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>> >> @@ -873,6 +916,8 @@ static int reserve_offlined_page(struct page_info *head) > > >> while ( cur_order < head_order ) >> { >> + unsigned int idx = INVALID_DIRTY_IDX; > Is it correct for the variable to live in this scope, rather than ... > >> @@ -892,8 +937,28 @@ static int reserve_offlined_page(struct page_info *head) >> { >> merge: > ... in this one? Of course it's less the variable scope itself, but the initial > value at the point here. I should move it to the inner scope --- I actually *want* to reinitialize it on each iteration. > >> /* We don't consider merging outside the head_order. */ >> - page_list_add_tail(cur_head, &heap(node, zone, cur_order)); >> - PFN_ORDER(cur_head) = cur_order; >> + >> + /* See if any of the pages indeed need scrubbing. */ >> + if ( first_dirty != INVALID_DIRTY_IDX ) >> + { >> + if ( (1U << cur_order) > first_dirty ) >> + { >> + for ( i = first_dirty ; i < (1U << cur_order); i++ ) >> + if ( test_bit(_PGC_need_scrub, >> + &cur_head[i].count_info) ) >> + { >> + idx = i; >> + break; >> + } > Why again do you need to look through all the pages here, rather than > simply marking the chunks as needing scrubbing simply based on first_dirty? > It seems to me that I've asked this before, which is a good indication that > such special behavior would better have a comment attached. We want to make sure that there are in fact dirty pages in the (sub-)buddy: first_dirty is only a hint there *may be* some. That's why I have the word "indeed" in the comment there but it's probably worth expanding on that. > >> @@ -977,35 +1096,49 @@ static void free_heap_pages( > > >> if ( (page_to_mfn(pg) & mask) ) >> { >> + struct page_info *predecessor = pg - mask; > For this and ... > >> } >> else >> { >> + struct page_info *successor = pg + mask; > ... this, it would certainly help readability of the patch here if the introduction > of the new local variables was broken out in a prereq patch. But yes, I should > have asked for this earlier on, so I'm not going to insist. Since I will be re-spinning this patch I will split this out. -boris
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:16 PM >>> >On 08/06/2017 01:41 PM, Jan Beulich wrote: >>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>> >>> + /* See if any of the pages indeed need scrubbing. */ >>> + if ( first_dirty != INVALID_DIRTY_IDX ) >>> + { >>> + if ( (1U << cur_order) > first_dirty ) >>> + { >>> + for ( i = first_dirty ; i < (1U << cur_order); i++ ) >>> + if ( test_bit(_PGC_need_scrub, >>> + &cur_head[i].count_info) ) >>> + { >>> + idx = i; >>> + break; >>> + } >> Why again do you need to look through all the pages here, rather than >> simply marking the chunks as needing scrubbing simply based on first_dirty? >> It seems to me that I've asked this before, which is a good indication that >> such special behavior would better have a comment attached. > >We want to make sure that there are in fact dirty pages in the >(sub-)buddy: first_dirty is only a hint there *may be* some. But why is that needed? Iirc you don't go to such length when splitting a buddy during allocation. Jan
On 08/07/2017 06:45 AM, Julien Grall wrote: > Hi Boris, > > I would have appreciated to be CCed as maintainer of the ARM bits... > Please use scripts/get_maintainers.pl in the future. Ugh, sorry about that. (I did test builds for both ARM64 and ARM32, if this make my transgression any less serious ;-)) > > On 04/08/17 18:05, Boris Ostrovsky wrote: >> . so that it's easy to find pages that need to be scrubbed (those >> pages are > > Pointless . > >> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h >> index ef84b72..d26b232 100644 >> --- a/xen/include/asm-arm/mm.h >> +++ b/xen/include/asm-arm/mm.h >> @@ -44,7 +44,16 @@ struct page_info >> /* Page is on a free list: ((count_info & PGC_count_mask) == >> 0). */ >> struct { >> /* Do TLBs need flushing for safety before next page >> use? */ >> - bool_t need_tlbflush; >> + unsigned long need_tlbflush:1; > > You've turned need_tlbflush from bool to unsigned long : 1. But some > of the users use true/false or may rely on the boolean property. So > it sounds like to me you want to use bool bitfields here (and in the > x86 part). This is what we have (with this series applied): root@ovs104> git grep need_tlbflush . common/memory.c: bool need_tlbflush = false; common/memory.c: accumulate_tlbflush(&need_tlbflush, &page[j], common/memory.c: if ( need_tlbflush ) common/page_alloc.c: bool need_tlbflush = false; common/page_alloc.c: accumulate_tlbflush(&need_tlbflush, &pg[i], common/page_alloc.c: if ( need_tlbflush ) * common/page_alloc.c: pg[i].u.free.need_tlbflush = (page_get_owner(&pg[i]) != NULL); common/page_alloc.c: if ( pg[i].u.free.need_tlbflush ) include/asm-arm/mm.h: unsigned long need_tlbflush:1; include/asm-x86/mm.h: unsigned long need_tlbflush:1; include/xen/mm.h:static inline void accumulate_tlbflush(bool *need_tlbflush, include/xen/mm.h: if ( page->u.free.need_tlbflush && include/xen/mm.h: (!*need_tlbflush || include/xen/mm.h: *need_tlbflush = true; root@ovs104> The only possibly boolean-style use is '*' and event that I think is allowed. Stand-alone need_tlbflush variables above have nothing to do with this change. > >> + >> + /* >> + * Index of the first *possibly* unscrubbed page in the >> buddy. >> + * One more than maximum possible order to accommodate >> + * INVALID_DIRTY_IDX. >> + */ >> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >> + unsigned long first_dirty:MAX_ORDER + 1; > > We need to make sure that this union will not be bigger than unsigned > long. Otherwise this will limit lower down the maximum amount of > memory we support. > So this likely means a BUILD_BUG_ON(....). Are you concerned that (MAX_ORDER+1) will be larger than sizeof(unsigned long)? If yes, the compiler should complain anyway, shouldn't it? -boris
On 08/07/2017 10:37 AM, Jan Beulich wrote: >>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:16 PM >>> >> On 08/06/2017 01:41 PM, Jan Beulich wrote: >>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>> >>>> + /* See if any of the pages indeed need scrubbing. */ >>>> + if ( first_dirty != INVALID_DIRTY_IDX ) >>>> + { >>>> + if ( (1U << cur_order) > first_dirty ) >>>> + { >>>> + for ( i = first_dirty ; i < (1U << cur_order); i++ ) >>>> + if ( test_bit(_PGC_need_scrub, >>>> + &cur_head[i].count_info) ) >>>> + { >>>> + idx = i; >>>> + break; >>>> + } >>> Why again do you need to look through all the pages here, rather than >>> simply marking the chunks as needing scrubbing simply based on first_dirty? >>> It seems to me that I've asked this before, which is a good indication that >>> such special behavior would better have a comment attached. >> We want to make sure that there are in fact dirty pages in the >> (sub-)buddy: first_dirty is only a hint there *may be* some. > But why is that needed? Iirc you don't go to such length when splitting a > buddy during allocation. I felt that allocation is more time-critical and so I decided against trying to be "neat" as far as tracking dirty pages exactly. -boris
Hi, On 07/08/17 15:46, Boris Ostrovsky wrote: > On 08/07/2017 06:45 AM, Julien Grall wrote: >> Hi Boris, >> >> I would have appreciated to be CCed as maintainer of the ARM bits... >> Please use scripts/get_maintainers.pl in the future. > > Ugh, sorry about that. (I did test builds for both ARM64 and ARM32, if > this make my transgression any less serious ;-)) > >> >> On 04/08/17 18:05, Boris Ostrovsky wrote: >>> . so that it's easy to find pages that need to be scrubbed (those >>> pages are >> >> Pointless . >> >>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h >>> index ef84b72..d26b232 100644 >>> --- a/xen/include/asm-arm/mm.h >>> +++ b/xen/include/asm-arm/mm.h >>> @@ -44,7 +44,16 @@ struct page_info >>> /* Page is on a free list: ((count_info & PGC_count_mask) == >>> 0). */ >>> struct { >>> /* Do TLBs need flushing for safety before next page >>> use? */ >>> - bool_t need_tlbflush; >>> + unsigned long need_tlbflush:1; >> >> You've turned need_tlbflush from bool to unsigned long : 1. But some >> of the users use true/false or may rely on the boolean property. So >> it sounds like to me you want to use bool bitfields here (and in the >> x86 part). > > This is what we have (with this series applied): > > root@ovs104> git grep need_tlbflush . > common/memory.c: bool need_tlbflush = false; > common/memory.c: > accumulate_tlbflush(&need_tlbflush, &page[j], > common/memory.c: if ( need_tlbflush ) > common/page_alloc.c: bool need_tlbflush = false; > common/page_alloc.c: accumulate_tlbflush(&need_tlbflush, &pg[i], > common/page_alloc.c: if ( need_tlbflush ) > * common/page_alloc.c: pg[i].u.free.need_tlbflush = > (page_get_owner(&pg[i]) != NULL); > common/page_alloc.c: if ( pg[i].u.free.need_tlbflush ) > include/asm-arm/mm.h: unsigned long need_tlbflush:1; > include/asm-x86/mm.h: unsigned long need_tlbflush:1; > include/xen/mm.h:static inline void accumulate_tlbflush(bool *need_tlbflush, > include/xen/mm.h: if ( page->u.free.need_tlbflush && > include/xen/mm.h: (!*need_tlbflush || > include/xen/mm.h: *need_tlbflush = true; > root@ovs104> > > The only possibly boolean-style use is '*' and event that I think is > allowed. > > Stand-alone need_tlbflush variables above have nothing to do with this > change. If you look at it, all of them use bool semantic. Now you mix bool and int. We are trying to remove that in the new code, so please don't introduce new one. > > >> >>> + >>> + /* >>> + * Index of the first *possibly* unscrubbed page in the >>> buddy. >>> + * One more than maximum possible order to accommodate >>> + * INVALID_DIRTY_IDX. >>> + */ >>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >>> + unsigned long first_dirty:MAX_ORDER + 1; >> >> We need to make sure that this union will not be bigger than unsigned >> long. Otherwise this will limit lower down the maximum amount of >> memory we support. >> So this likely means a BUILD_BUG_ON(....). > > > Are you concerned that (MAX_ORDER+1) will be larger than sizeof(unsigned > long)? If yes, the compiler should complain anyway, shouldn't it? I am more concerned about the sizeof of the union u to be larger than unsigned long. first_dirty should not be greater than 63 bits (or 31 bits for 32-bits architecture). Otherwise likely the compiler will add a padding between need_tlbflush and first_dirty. This would increase the page_info by 4/8 byte. The BUILD_BUG_ON(...) would be here to catch such error. Cheers,
>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:56 PM >>> >On 08/07/2017 10:37 AM, Jan Beulich wrote: >>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:16 PM >>> >>> On 08/06/2017 01:41 PM, Jan Beulich wrote: >>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>> >>>>> + /* See if any of the pages indeed need scrubbing. */ >>>>> + if ( first_dirty != INVALID_DIRTY_IDX ) >>>>> + { >>>>> + if ( (1U << cur_order) > first_dirty ) >>>>> + { >>>>> + for ( i = first_dirty ; i < (1U << cur_order); i++ ) >>>>> + if ( test_bit(_PGC_need_scrub, >>>>> + &cur_head[i].count_info) ) >>>>> + { >>>>> + idx = i; >>>>> + break; >>>>> + } >>>> Why again do you need to look through all the pages here, rather than >>>> simply marking the chunks as needing scrubbing simply based on first_dirty? >>>> It seems to me that I've asked this before, which is a good indication that >>>> such special behavior would better have a comment attached. >>> We want to make sure that there are in fact dirty pages in the >>> (sub-)buddy: first_dirty is only a hint there *may be* some. >> But why is that needed? Iirc you don't go to such length when splitting a >> buddy during allocation. > >I felt that allocation is more time-critical and so I decided against >trying to be "neat" as far as tracking dirty pages exactly. I'd suggest to use the simpler approach here too, if the more involved one isn't needed for correctness. Jan
>>> >>>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h >>>> index ef84b72..d26b232 100644 >>>> --- a/xen/include/asm-arm/mm.h >>>> +++ b/xen/include/asm-arm/mm.h >>>> @@ -44,7 +44,16 @@ struct page_info >>>> /* Page is on a free list: ((count_info & PGC_count_mask) == >>>> 0). */ >>>> struct { >>>> /* Do TLBs need flushing for safety before next page >>>> use? */ >>>> - bool_t need_tlbflush; >>>> + unsigned long need_tlbflush:1; >>> >>> You've turned need_tlbflush from bool to unsigned long : 1. But some >>> of the users use true/false or may rely on the boolean property. So >>> it sounds like to me you want to use bool bitfields here (and in the >>> x86 part). >> >> This is what we have (with this series applied): >> >> root@ovs104> git grep need_tlbflush . >> common/memory.c: bool need_tlbflush = false; >> common/memory.c: >> accumulate_tlbflush(&need_tlbflush, &page[j], >> common/memory.c: if ( need_tlbflush ) >> common/page_alloc.c: bool need_tlbflush = false; >> common/page_alloc.c: accumulate_tlbflush(&need_tlbflush, >> &pg[i], >> common/page_alloc.c: if ( need_tlbflush ) >> * common/page_alloc.c: pg[i].u.free.need_tlbflush = >> (page_get_owner(&pg[i]) != NULL); >> common/page_alloc.c: if ( pg[i].u.free.need_tlbflush ) >> include/asm-arm/mm.h: unsigned long need_tlbflush:1; >> include/asm-x86/mm.h: unsigned long need_tlbflush:1; >> include/xen/mm.h:static inline void accumulate_tlbflush(bool >> *need_tlbflush, >> include/xen/mm.h: if ( page->u.free.need_tlbflush && >> include/xen/mm.h: (!*need_tlbflush || >> include/xen/mm.h: *need_tlbflush = true; >> root@ovs104> >> >> The only possibly boolean-style use is '*' and event that I think is >> allowed. >> >> Stand-alone need_tlbflush variables above have nothing to do with this >> change. > > If you look at it, all of them use bool semantic. Now you mix bool and > int. We are trying to remove that in the new code, so please don't > introduce new one. I am not sure I see how we are mixing the semantics here. <datatype>:1 is really a tightly-packed bool. Switching to bitfields was, btw, suggested by Jan at some point so if the two of you agree on how to proceed I can go either way (but by preference is to keep it as a single-bit bitfield). > >> >> >>> >>>> + >>>> + /* >>>> + * Index of the first *possibly* unscrubbed page in the >>>> buddy. >>>> + * One more than maximum possible order to accommodate >>>> + * INVALID_DIRTY_IDX. >>>> + */ >>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) >>>> + unsigned long first_dirty:MAX_ORDER + 1; >>> >>> We need to make sure that this union will not be bigger than unsigned >>> long. Otherwise this will limit lower down the maximum amount of >>> memory we support. >>> So this likely means a BUILD_BUG_ON(....). >> >> >> Are you concerned that (MAX_ORDER+1) will be larger than sizeof(unsigned >> long)? If yes, the compiler should complain anyway, shouldn't it? > > I am more concerned about the sizeof of the union u to be larger than > unsigned long. > > first_dirty should not be greater than 63 bits (or 31 bits for 32-bits > architecture). Otherwise likely the compiler will add a padding > between need_tlbflush and first_dirty. This would increase the > page_info by 4/8 byte. > > The BUILD_BUG_ON(...) would be here to catch such error. Oh, I see. Sure, I'll add it. -boris
On 07/08/17 17:57, Boris Ostrovsky wrote: > >>>> >>>>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h >>>>> index ef84b72..d26b232 100644 >>>>> --- a/xen/include/asm-arm/mm.h >>>>> +++ b/xen/include/asm-arm/mm.h >>>>> @@ -44,7 +44,16 @@ struct page_info >>>>> /* Page is on a free list: ((count_info & PGC_count_mask) == >>>>> 0). */ >>>>> struct { >>>>> /* Do TLBs need flushing for safety before next page >>>>> use? */ >>>>> - bool_t need_tlbflush; >>>>> + unsigned long need_tlbflush:1; >>>> >>>> You've turned need_tlbflush from bool to unsigned long : 1. But some >>>> of the users use true/false or may rely on the boolean property. So >>>> it sounds like to me you want to use bool bitfields here (and in the >>>> x86 part). >>> >>> This is what we have (with this series applied): >>> >>> root@ovs104> git grep need_tlbflush . >>> common/memory.c: bool need_tlbflush = false; >>> common/memory.c: >>> accumulate_tlbflush(&need_tlbflush, &page[j], >>> common/memory.c: if ( need_tlbflush ) >>> common/page_alloc.c: bool need_tlbflush = false; >>> common/page_alloc.c: accumulate_tlbflush(&need_tlbflush, >>> &pg[i], >>> common/page_alloc.c: if ( need_tlbflush ) >>> * common/page_alloc.c: pg[i].u.free.need_tlbflush = >>> (page_get_owner(&pg[i]) != NULL); >>> common/page_alloc.c: if ( pg[i].u.free.need_tlbflush ) >>> include/asm-arm/mm.h: unsigned long need_tlbflush:1; >>> include/asm-x86/mm.h: unsigned long need_tlbflush:1; >>> include/xen/mm.h:static inline void accumulate_tlbflush(bool >>> *need_tlbflush, >>> include/xen/mm.h: if ( page->u.free.need_tlbflush && >>> include/xen/mm.h: (!*need_tlbflush || >>> include/xen/mm.h: *need_tlbflush = true; >>> root@ovs104> >>> >>> The only possibly boolean-style use is '*' and event that I think is >>> allowed. >>> >>> Stand-alone need_tlbflush variables above have nothing to do with this >>> change. >> >> If you look at it, all of them use bool semantic. Now you mix bool and >> int. We are trying to remove that in the new code, so please don't >> introduce new one. > > > I am not sure I see how we are mixing the semantics here. <datatype>:1 > is really a tightly-packed bool. It is not a bool. It is an unsigned int. So if I am not mistaken, if you assign 0x2, the variable will be 0 unless you do !!(...). Whilst with bool you would get 1. > > Switching to bitfields was, btw, suggested by Jan at some point so if > the two of you agree on how to proceed I can go either way (but by > preference is to keep it as a single-bit bitfield). If you use a single-bit bitfield of bool (i.e bool need_flush : 1) you would address both Jan's comment and mine. Cheers,
>> >> Switching to bitfields was, btw, suggested by Jan at some point so if >> the two of you agree on how to proceed I can go either way (but by >> preference is to keep it as a single-bit bitfield). > > If you use a single-bit bitfield of bool (i.e bool need_flush : 1) you > would address both Jan's comment and mine. Haven't considered this. I thought you meant a fully-sized bool. I'll do this, thanks. -boris
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index 8bcef6a..9e787e0 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1]; static heap_by_zone_and_order_t *_heap[MAX_NUMNODES]; #define heap(node, zone, order) ((*_heap[node])[zone][order]) +static unsigned long node_need_scrub[MAX_NUMNODES]; + static unsigned long *avail[MAX_NUMNODES]; static long total_avail_pages; @@ -678,13 +680,30 @@ static void check_low_mem_virq(void) } } +/* Pages that need a scrub are added to tail, otherwise to head. */ +static void page_list_add_scrub(struct page_info *pg, unsigned int node, + unsigned int zone, unsigned int order, + unsigned int first_dirty) +{ + PFN_ORDER(pg) = order; + pg->u.free.first_dirty = first_dirty; + + if ( first_dirty != INVALID_DIRTY_IDX ) + { + ASSERT(first_dirty < (1U << order)); + page_list_add_tail(pg, &heap(node, zone, order)); + } + else + page_list_add(pg, &heap(node, zone, order)); +} + /* Allocate 2^@order contiguous pages. */ static struct page_info *alloc_heap_pages( unsigned int zone_lo, unsigned int zone_hi, unsigned int order, unsigned int memflags, struct domain *d) { - unsigned int i, j, zone = 0, nodemask_retry = 0; + unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty; nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node; unsigned long request = 1UL << order; struct page_info *pg; @@ -798,12 +817,26 @@ static struct page_info *alloc_heap_pages( return NULL; found: + + first_dirty = pg->u.free.first_dirty; + /* We may have to halve the chunk a number of times. */ while ( j != order ) { - PFN_ORDER(pg) = --j; - page_list_add_tail(pg, &heap(node, zone, j)); - pg += 1 << j; + j--; + page_list_add_scrub(pg, node, zone, j, + (1U << j) > first_dirty ? + first_dirty : INVALID_DIRTY_IDX); + pg += 1U << j; + + if ( first_dirty != INVALID_DIRTY_IDX ) + { + /* Adjust first_dirty */ + if ( first_dirty >= 1U << j ) + first_dirty -= 1U << j; + else + first_dirty = 0; /* We've moved past original first_dirty */ + } } ASSERT(avail[node][zone] >= request); @@ -850,12 +883,20 @@ static int reserve_offlined_page(struct page_info *head) unsigned int node = phys_to_nid(page_to_maddr(head)); int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0; struct page_info *cur_head; - int cur_order; + unsigned int cur_order, first_dirty; ASSERT(spin_is_locked(&heap_lock)); cur_head = head; + /* + * We may break the buddy so let's mark the head as clean. Then, when + * merging chunks back into the heap, we will see whether the chunk has + * unscrubbed pages and set its first_dirty properly. + */ + first_dirty = head->u.free.first_dirty; + head->u.free.first_dirty = INVALID_DIRTY_IDX; + page_list_del(head, &heap(node, zone, head_order)); while ( cur_head < (head + (1 << head_order)) ) @@ -866,6 +907,8 @@ static int reserve_offlined_page(struct page_info *head) if ( page_state_is(cur_head, offlined) ) { cur_head++; + if ( first_dirty != INVALID_DIRTY_IDX && first_dirty ) + first_dirty--; continue; } @@ -873,6 +916,8 @@ static int reserve_offlined_page(struct page_info *head) while ( cur_order < head_order ) { + unsigned int idx = INVALID_DIRTY_IDX; + next_order = cur_order + 1; if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) ) @@ -892,8 +937,28 @@ static int reserve_offlined_page(struct page_info *head) { merge: /* We don't consider merging outside the head_order. */ - page_list_add_tail(cur_head, &heap(node, zone, cur_order)); - PFN_ORDER(cur_head) = cur_order; + + /* See if any of the pages indeed need scrubbing. */ + if ( first_dirty != INVALID_DIRTY_IDX ) + { + if ( (1U << cur_order) > first_dirty ) + { + for ( i = first_dirty ; i < (1U << cur_order); i++ ) + if ( test_bit(_PGC_need_scrub, + &cur_head[i].count_info) ) + { + idx = i; + break; + } + ASSERT(idx != INVALID_DIRTY_IDX); + first_dirty = 0; + } + else + first_dirty -= 1U << cur_order; + } + + page_list_add_scrub(cur_head, node, zone, + cur_order, idx); cur_head += (1 << cur_order); break; } @@ -919,9 +984,53 @@ static int reserve_offlined_page(struct page_info *head) return count; } +static void scrub_free_pages(unsigned int node) +{ + struct page_info *pg; + unsigned int zone; + + ASSERT(spin_is_locked(&heap_lock)); + + if ( !node_need_scrub[node] ) + return; + + for ( zone = 0; zone < NR_ZONES; zone++ ) + { + unsigned int order = MAX_ORDER; + + do { + while ( !page_list_empty(&heap(node, zone, order)) ) + { + unsigned int i; + + /* Unscrubbed pages are always at the end of the list. */ + pg = page_list_last(&heap(node, zone, order)); + if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX ) + break; + + for ( i = pg->u.free.first_dirty; i < (1U << order); i++) + { + if ( test_bit(_PGC_need_scrub, &pg[i].count_info) ) + { + scrub_one_page(&pg[i]); + pg[i].count_info &= ~PGC_need_scrub; + node_need_scrub[node]--; + } + } + + page_list_del(pg, &heap(node, zone, order)); + page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX); + + if ( node_need_scrub[node] == 0 ) + return; + } + } while ( order-- != 0 ); + } +} + /* Free 2^@order set of pages. */ static void free_heap_pages( - struct page_info *pg, unsigned int order) + struct page_info *pg, unsigned int order, bool need_scrub) { unsigned long mask, mfn = page_to_mfn(pg); unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0; @@ -961,10 +1070,20 @@ static void free_heap_pages( /* This page is not a guest frame any more. */ page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */ set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY); + + if ( need_scrub ) + pg[i].count_info |= PGC_need_scrub; } avail[node][zone] += 1 << order; total_avail_pages += 1 << order; + if ( need_scrub ) + { + node_need_scrub[node] += 1 << order; + pg->u.free.first_dirty = 0; + } + else + pg->u.free.first_dirty = INVALID_DIRTY_IDX; if ( tmem_enabled() ) midsize_alloc_zone_pages = max( @@ -977,35 +1096,49 @@ static void free_heap_pages( if ( (page_to_mfn(pg) & mask) ) { + struct page_info *predecessor = pg - mask; + /* Merge with predecessor block? */ - if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) || - !page_state_is(pg-mask, free) || - (PFN_ORDER(pg-mask) != order) || - (phys_to_nid(page_to_maddr(pg-mask)) != node) ) + if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) || + !page_state_is(predecessor, free) || + (PFN_ORDER(predecessor) != order) || + (phys_to_nid(page_to_maddr(predecessor)) != node) ) break; - pg -= mask; - page_list_del(pg, &heap(node, zone, order)); + + page_list_del(predecessor, &heap(node, zone, order)); + + /* Keep predecessor's first_dirty if it is already set. */ + if ( predecessor->u.free.first_dirty == INVALID_DIRTY_IDX && + pg->u.free.first_dirty != INVALID_DIRTY_IDX ) + predecessor->u.free.first_dirty = (1U << order) + + pg->u.free.first_dirty; + + pg = predecessor; } else { + struct page_info *successor = pg + mask; + /* Merge with successor block? */ - if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) || - !page_state_is(pg+mask, free) || - (PFN_ORDER(pg+mask) != order) || - (phys_to_nid(page_to_maddr(pg+mask)) != node) ) + if ( !mfn_valid(_mfn(page_to_mfn(successor))) || + !page_state_is(successor, free) || + (PFN_ORDER(successor) != order) || + (phys_to_nid(page_to_maddr(successor)) != node) ) break; - page_list_del(pg + mask, &heap(node, zone, order)); + page_list_del(successor, &heap(node, zone, order)); } order++; } - PFN_ORDER(pg) = order; - page_list_add_tail(pg, &heap(node, zone, order)); + page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty); if ( tainted ) reserve_offlined_page(pg); + if ( need_scrub ) + scrub_free_pages(node); + spin_unlock(&heap_lock); } @@ -1226,7 +1359,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status) spin_unlock(&heap_lock); if ( (y & PGC_state) == PGC_state_offlined ) - free_heap_pages(pg, 0); + free_heap_pages(pg, 0, false); return ret; } @@ -1295,7 +1428,7 @@ static void init_heap_pages( nr_pages -= n; } - free_heap_pages(pg+i, 0); + free_heap_pages(pg + i, 0, false); } } @@ -1622,7 +1755,7 @@ void free_xenheap_pages(void *v, unsigned int order) memguard_guard_range(v, 1 << (order + PAGE_SHIFT)); - free_heap_pages(virt_to_page(v), order); + free_heap_pages(virt_to_page(v), order, false); } #else @@ -1676,12 +1809,9 @@ void free_xenheap_pages(void *v, unsigned int order) pg = virt_to_page(v); for ( i = 0; i < (1u << order); i++ ) - { - scrub_one_page(&pg[i]); pg[i].count_info &= ~PGC_xen_heap; - } - free_heap_pages(pg, order); + free_heap_pages(pg, order, true); } #endif @@ -1790,7 +1920,7 @@ struct page_info *alloc_domheap_pages( if ( d && !(memflags & MEMF_no_owner) && assign_pages(d, pg, order, memflags) ) { - free_heap_pages(pg, order); + free_heap_pages(pg, order, false); return NULL; } @@ -1858,11 +1988,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order) scrub = 1; } - if ( unlikely(scrub) ) - for ( i = 0; i < (1 << order); i++ ) - scrub_one_page(&pg[i]); - - free_heap_pages(pg, order); + free_heap_pages(pg, order, scrub); } if ( drop_dom_ref ) diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index ef84b72..d26b232 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -44,7 +44,16 @@ struct page_info /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ struct { /* Do TLBs need flushing for safety before next page use? */ - bool_t need_tlbflush; + unsigned long need_tlbflush:1; + + /* + * Index of the first *possibly* unscrubbed page in the buddy. + * One more than maximum possible order to accommodate + * INVALID_DIRTY_IDX. + */ +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) + unsigned long first_dirty:MAX_ORDER + 1; + } free; } u; @@ -107,6 +116,13 @@ struct page_info #define PGC_count_width PG_shift(9) #define PGC_count_mask ((1UL<<PGC_count_width)-1) +/* + * Page needs to be scrubbed. Since this bit can only be set on a page that is + * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit. + */ +#define _PGC_need_scrub _PGC_allocated +#define PGC_need_scrub PGC_allocated + extern mfn_t xenheap_mfn_start, xenheap_mfn_end; extern vaddr_t xenheap_virt_end; #ifdef CONFIG_ARM_64 diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 2bf3f33..9b7fd05 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -88,7 +88,15 @@ struct page_info /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */ struct { /* Do TLBs need flushing for safety before next page use? */ - bool_t need_tlbflush; + unsigned long need_tlbflush:1; + + /* + * Index of the first *possibly* unscrubbed page in the buddy. + * One more than maximum possible order to accommodate + * INVALID_DIRTY_IDX. + */ +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1) + unsigned long first_dirty:MAX_ORDER + 1; } free; } u; @@ -233,6 +241,13 @@ struct page_info #define PGC_count_width PG_shift(9) #define PGC_count_mask ((1UL<<PGC_count_width)-1) +/* + * Page needs to be scrubbed. Since this bit can only be set on a page that is + * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit. + */ +#define _PGC_need_scrub _PGC_allocated +#define PGC_need_scrub PGC_allocated + #define is_xen_heap_page(page) ((page)->count_info & PGC_xen_heap) #define is_xen_heap_mfn(mfn) \ (__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
. so that it's easy to find pages that need to be scrubbed (those pages are now marked with _PGC_need_scrub bit). We keep track of the first unscrubbed page in a page buddy using first_dirty field. For now it can have two values, 0 (whole buddy needs scrubbing) or INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches will allow scrubbing to be interrupted, resulting in first_dirty taking any value. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- Changes in v6: * Track indexes and not pointers in alloc_heap_pages() and reserve_offlined_page() when breaking a potentially dirty buddy and merging the fragments. * Reduced (by one bit) width of INVALID_DIRTY_IDX * Added a coupe of ASSERT()s xen/common/page_alloc.c | 194 ++++++++++++++++++++++++++++++++++++++--------- xen/include/asm-arm/mm.h | 18 ++++- xen/include/asm-x86/mm.h | 17 ++++- 3 files changed, 193 insertions(+), 36 deletions(-)