diff mbox

[v5,3/8] mm: Scrub pages in alloc_heap_pages() if needed

Message ID 1498157830-21845-4-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris Ostrovsky June 22, 2017, 6:57 p.m. UTC
When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Added comment explaining why we always grab order 0 pages in alloc_heap_pages)
* Dropped the somewhat confusing comment about not needing to set first_dirty
  in alloc_heap_pages().
* Moved first bit of _MEMF_node by 8 to accommodate MEMF_no_scrub (bit 7 is
  no longer available)


 xen/common/page_alloc.c | 36 +++++++++++++++++++++++++++++++-----
 xen/include/xen/mm.h    |  4 +++-
 2 files changed, 34 insertions(+), 6 deletions(-)

Comments

Jan Beulich June 27, 2017, 6 p.m. UTC | #1
>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>      if ( d != NULL )
>          d->last_alloc_node = node;
>  
> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);

No need for !! here. But I wonder whether that part of the check is really
useful anyway, considering the sole use ...

>      for ( i = 0; i < (1 << order); i++ )
>      {
>          /* Reference count must continuously be zero for free pages. */
> -        BUG_ON(pg[i].count_info != PGC_state_free);
> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
> +
> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> +        {
> +            if ( need_scrub )
> +                scrub_one_page(&pg[i]);

... here. If it isn't, I think the local variable isn't warranted either.
If you agree, the thus adjusted patch can have
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(otherwise I'll wait with it to understand the reason first).

Jan
Boris Ostrovsky July 23, 2017, 2:07 a.m. UTC | #2
On 06/27/2017 02:00 PM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>>       if ( d != NULL )
>>           d->last_alloc_node = node;
>>   
>> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
> 
> No need for !! here. But I wonder whether that part of the check is really
> useful anyway, considering the sole use ...
> 
>>       for ( i = 0; i < (1 << order); i++ )
>>       {
>>           /* Reference count must continuously be zero for free pages. */
>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
>> +
>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>> +        {
>> +            if ( need_scrub )
>> +                scrub_one_page(&pg[i]);
> 
> ... here. If it isn't, I think the local variable isn't warranted either.
> If you agree, the thus adjusted patch can have
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> (otherwise I'll wait with it to understand the reason first).



first_dirty_pg is indeed unnecessary but I think local variable is 
useful to avoid ANDing memflags inside the loop on each iteration 
(unless you think compiler is smart enough to realize that memflags is 
not changing).


-boris
Jan Beulich July 31, 2017, 3:16 p.m. UTC | #3
>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:07 AM >>>
>On 06/27/2017 02:00 PM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>>>       if ( d != NULL )
>>>           d->last_alloc_node = node;
>>>   
>>> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
>> 
>> No need for !! here. But I wonder whether that part of the check is really
>> useful anyway, considering the sole use ...
>> 
>>>       for ( i = 0; i < (1 << order); i++ )
>>>       {
>>>           /* Reference count must continuously be zero for free pages. */
>>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
>>> +
>>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>> +        {
>>> +            if ( need_scrub )
>>> +                scrub_one_page(&pg[i]);
>> 
>> ... here. If it isn't, I think the local variable isn't warranted either.
>> If you agree, the thus adjusted patch can have
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> (otherwise I'll wait with it to understand the reason first).
>
>first_dirty_pg is indeed unnecessary but I think local variable is 
>useful to avoid ANDing memflags inside the loop on each iteration 
>(unless you think compiler is smart enough to realize that memflags is 
>not changing).

I don't understand: At least on x86 I'd expect the compiler to use a
single TEST if you used memflags inside the loop, whereas the local
variable would likely be a single CMP inside the loop plus setup code
outside of it.

Jan
Boris Ostrovsky July 31, 2017, 4:07 p.m. UTC | #4
On 07/31/2017 11:16 AM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:07 AM >>>
>> On 06/27/2017 02:00 PM, Jan Beulich wrote:
>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>>> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>>>>       if ( d != NULL )
>>>>           d->last_alloc_node = node;
>>>>   
>>>> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
>>> No need for !! here. But I wonder whether that part of the check is really
>>> useful anyway, considering the sole use ...
>>>
>>>>       for ( i = 0; i < (1 << order); i++ )
>>>>       {
>>>>           /* Reference count must continuously be zero for free pages. */
>>>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>>>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
>>>> +
>>>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>>> +        {
>>>> +            if ( need_scrub )
>>>> +                scrub_one_page(&pg[i]);
>>> ... here. If it isn't, I think the local variable isn't warranted either.
>>> If you agree, the thus adjusted patch can have
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> (otherwise I'll wait with it to understand the reason first).
>> first_dirty_pg is indeed unnecessary but I think local variable is 
>> useful to avoid ANDing memflags inside the loop on each iteration 
>> (unless you think compiler is smart enough to realize that memflags is 
>> not changing).
> I don't understand: At least on x86 I'd expect the compiler to use a
> single TEST if you used memflags inside the loop, whereas the local
> variable would likely be a single CMP inside the loop plus setup code
> outside of it.

OK, I haven't considered that you don't actually need to AND and then
CMP. Then yes, local variable is unnecessary.

-boris
diff mbox

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 89fe3ce..9aac196 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -703,6 +703,7 @@  static struct page_info *get_free_buddy(unsigned int zone_lo,
     nodemask_t nodemask = d ? d->node_affinity : node_online_map;
     unsigned int j, zone, nodemask_retry = 0;
     struct page_info *pg;
+    bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
     if ( node == NUMA_NO_NODE )
     {
@@ -734,8 +735,20 @@  static struct page_info *get_free_buddy(unsigned int zone_lo,
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
+            {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    return pg;
+                {
+                    /*
+                     * We grab single pages (order=0) even if they are
+                     * unscrubbed. Given that scrubbing one page is fairly quick
+                     * it is not worth breaking higher orders.
+                     */
+                    if ( (order == 0) || use_unscrubbed ||
+                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                        return pg;
+                    page_list_add_tail(pg, &heap(node, zone, j));
+                }
+            }
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -775,7 +788,7 @@  static struct page_info *alloc_heap_pages(
     unsigned int i, buddy_order, zone;
     unsigned long request = 1UL << order;
     struct page_info *pg, *first_dirty_pg = NULL;
-    bool_t need_tlbflush = 0;
+    bool need_scrub, need_tlbflush = false;
     uint32_t tlbflush_timestamp = 0;
 
     /* Make sure there are enough bits in memflags for nodeID. */
@@ -819,6 +832,10 @@  static struct page_info *alloc_heap_pages(
      }
  
     pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    /* Try getting a dirty buddy if we couldn't get a clean one. */
+    if ( !pg && !(memflags & MEMF_no_scrub) )
+        pg = get_free_buddy(zone_lo, zone_hi, order,
+                            memflags | MEMF_no_scrub, d);
     if ( !pg )
     {
         /* No suitable memory blocks. Fail the request. */
@@ -862,10 +879,19 @@  static struct page_info *alloc_heap_pages(
     if ( d != NULL )
         d->last_alloc_node = node;
 
+    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
-        BUG_ON(pg[i].count_info != PGC_state_free);
+        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
+
+        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+        {
+            if ( need_scrub )
+                scrub_one_page(&pg[i]);
+            node_need_scrub[node]--;
+        }
+
         pg[i].count_info = PGC_state_inuse;
 
         if ( !(memflags & MEMF_no_tlbflush) )
@@ -1749,7 +1775,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     ASSERT(!in_irq());
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-                          order, memflags, NULL);
+                          order, memflags | MEMF_no_scrub, NULL);
     if ( unlikely(pg == NULL) )
         return NULL;
 
@@ -1799,7 +1825,7 @@  void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( !(memflags >> _MEMF_bits) )
         memflags |= MEMF_bits(xenheap_bits);
 
-    pg = alloc_domheap_pages(NULL, order, memflags);
+    pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
     if ( unlikely(pg == NULL) )
         return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 3d3f31b..5f3d84a 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -238,7 +238,9 @@  struct npfec {
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
 #define _MEMF_no_icache_flush 7
 #define  MEMF_no_icache_flush (1U<<_MEMF_no_icache_flush)
-#define _MEMF_node        8
+#define _MEMF_no_scrub    8
+#define  MEMF_no_scrub    (1U<<_MEMF_no_scrub)
+#define _MEMF_node        16
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)
 #define  MEMF_get_node(f) ((((f) >> _MEMF_node) - 1) & MEMF_node_mask)