diff mbox

[v7,2/9] mm: Place unscrubbed pages at the end of pagelist

Message ID 1502228707-31883-3-git-send-email-boris.ostrovsky@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris Ostrovsky Aug. 8, 2017, 9:45 p.m. UTC
.. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v7
* Simplified tracking dirty pages in reserve_offlined_page() to match how
  it is done in alloc_heap_pages()
* Changed page_info.u.free.need_tlbflush definition to bool:1
* Added BUILD_BUG_ON() to make sure page_info.u stays within unsigned long

 xen/common/page_alloc.c  | 157 ++++++++++++++++++++++++++++++++++++++++-------
 xen/include/asm-arm/mm.h |  18 +++++-
 xen/include/asm-x86/mm.h |  17 ++++-
 3 files changed, 167 insertions(+), 25 deletions(-)

Comments

Jan Beulich Aug. 14, 2017, 10:37 a.m. UTC | #1
>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
> .. so that it's easy to find pages that need to be scrubbed (those pages are
> now marked with _PGC_need_scrub bit).
> 
> We keep track of the first unscrubbed page in a page buddy using first_dirty
> field. For now it can have two values, 0 (whole buddy needs scrubbing) or
> INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent 
> patches
> will allow scrubbing to be interrupted, resulting in first_dirty taking any
> value.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one remark:

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -88,7 +88,15 @@ struct page_info
>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>          struct {
>              /* Do TLBs need flushing for safety before next page use? */
> -            bool_t need_tlbflush;
> +            bool need_tlbflush:1;
> +
> +            /*
> +             * Index of the first *possibly* unscrubbed page in the buddy.
> +             * One more bit than maximum possible order to accommodate
> +             * INVALID_DIRTY_IDX.
> +             */
> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
> +            unsigned long first_dirty:MAX_ORDER + 1;
>          } free;

I think generated code will be better with the two fields swapped:
That way reading first_dirty won't involve a shift, and accessing a
single bit doesn't require shifts at all on many architectures.

Jan
Julien Grall Aug. 14, 2017, 11:16 a.m. UTC | #2
Hi Boris,

On 08/08/17 22:45, Boris Ostrovsky wrote:
> .. so that it's easy to find pages that need to be scrubbed (those pages are
> now marked with _PGC_need_scrub bit).
>
> We keep track of the first unscrubbed page in a page buddy using first_dirty
> field. For now it can have two values, 0 (whole buddy needs scrubbing) or
> INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
> will allow scrubbing to be interrupted, resulting in first_dirty taking any
> value.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

For the ARM bits and with Jan suggestion:

Acked-by: Julien Grall <julien.grall@arm.com>

Cheers,
Boris Ostrovsky Aug. 14, 2017, 2:29 p.m. UTC | #3
On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>> .. so that it's easy to find pages that need to be scrubbed (those pages are
>> now marked with _PGC_need_scrub bit).
>>
>> We keep track of the first unscrubbed page in a page buddy using first_dirty
>> field. For now it can have two values, 0 (whole buddy needs scrubbing) or
>> INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent 
>> patches
>> will allow scrubbing to be interrupted, resulting in first_dirty taking any
>> value.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one remark:
>
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -88,7 +88,15 @@ struct page_info
>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>          struct {
>>              /* Do TLBs need flushing for safety before next page use? */
>> -            bool_t need_tlbflush;
>> +            bool need_tlbflush:1;
>> +
>> +            /*
>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>> +             * One more bit than maximum possible order to accommodate
>> +             * INVALID_DIRTY_IDX.
>> +             */
>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>          } free;
> I think generated code will be better with the two fields swapped:
> That way reading first_dirty won't involve a shift, and accessing a
> single bit doesn't require shifts at all on many architectures.

Ok, I will then keep need_tlbflush as the last field so the final struct
(as defined in patch 7) will look like

struct {
        unsigned long first_dirty:MAX_ORDER + 1;
        unsigned long scrub_state:2;
        bool need_tlbflush:1;
};


-boris
Jan Beulich Aug. 15, 2017, 8:18 a.m. UTC | #4
>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/include/asm-x86/mm.h
>>> +++ b/xen/include/asm-x86/mm.h
>>> @@ -88,7 +88,15 @@ struct page_info
>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>          struct {
>>>              /* Do TLBs need flushing for safety before next page use? */
>>> -            bool_t need_tlbflush;
>>> +            bool need_tlbflush:1;
>>> +
>>> +            /*
>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>> +             * One more bit than maximum possible order to accommodate
>>> +             * INVALID_DIRTY_IDX.
>>> +             */
>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>          } free;
>> I think generated code will be better with the two fields swapped:
>> That way reading first_dirty won't involve a shift, and accessing a
>> single bit doesn't require shifts at all on many architectures.
> 
> Ok, I will then keep need_tlbflush as the last field so the final struct
> (as defined in patch 7) will look like
> 
> struct {
>         unsigned long first_dirty:MAX_ORDER + 1;
>         unsigned long scrub_state:2;
>         bool need_tlbflush:1;
> };

Hmm, actually - why do you need bitfields on the x86 side at all?
They're needed for 32-bit architectures only, 64-bit ones ought
to be fine with

struct {
        unsigned int first_dirty;
        bool need_tlbflush;
        uint8_t scrub_state;
};

(plus a suitable BUILD_BUG_ON() to make sure first_dirty has
at least MAX_ORDER + 1 bits). The ARM maintainers will know
whether they would want to also differentiate ARM32 and
ARM64 here.

Jan
Boris Ostrovsky Aug. 15, 2017, 2:41 p.m. UTC | #5
On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>> --- a/xen/include/asm-x86/mm.h
>>>> +++ b/xen/include/asm-x86/mm.h
>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>          struct {
>>>>              /* Do TLBs need flushing for safety before next page use? */
>>>> -            bool_t need_tlbflush;
>>>> +            bool need_tlbflush:1;
>>>> +
>>>> +            /*
>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>> +             * One more bit than maximum possible order to accommodate
>>>> +             * INVALID_DIRTY_IDX.
>>>> +             */
>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>          } free;
>>> I think generated code will be better with the two fields swapped:
>>> That way reading first_dirty won't involve a shift, and accessing a
>>> single bit doesn't require shifts at all on many architectures.
>> Ok, I will then keep need_tlbflush as the last field so the final struct
>> (as defined in patch 7) will look like
>>
>> struct {
>>         unsigned long first_dirty:MAX_ORDER + 1;
>>         unsigned long scrub_state:2;
>>         bool need_tlbflush:1;
>> };
> Hmm, actually - why do you need bitfields on the x86 side at all?
> They're needed for 32-bit architectures only, 64-bit ones ought
> to be fine with
>
> struct {
>         unsigned int first_dirty;
>         bool need_tlbflush;
>         uint8_t scrub_state;
> };

IIRC it was exactly because of ARM32 and at some point you suggested to
switch both x86 and ARM to bitfields.


>
> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
> at least MAX_ORDER + 1 bits). The ARM maintainers will know
> whether they would want to also differentiate ARM32 and
> ARM64 here.

Isn't using bitfields the only possibility for 32-bit? We can't fit
first_dirty into 2 bytes.

-boris
Jan Beulich Aug. 15, 2017, 2:51 p.m. UTC | #6
>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote:
> On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>>> --- a/xen/include/asm-x86/mm.h
>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>>          struct {
>>>>>              /* Do TLBs need flushing for safety before next page use? */
>>>>> -            bool_t need_tlbflush;
>>>>> +            bool need_tlbflush:1;
>>>>> +
>>>>> +            /*
>>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>>> +             * One more bit than maximum possible order to accommodate
>>>>> +             * INVALID_DIRTY_IDX.
>>>>> +             */
>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>>          } free;
>>>> I think generated code will be better with the two fields swapped:
>>>> That way reading first_dirty won't involve a shift, and accessing a
>>>> single bit doesn't require shifts at all on many architectures.
>>> Ok, I will then keep need_tlbflush as the last field so the final struct
>>> (as defined in patch 7) will look like
>>>
>>> struct {
>>>         unsigned long first_dirty:MAX_ORDER + 1;
>>>         unsigned long scrub_state:2;
>>>         bool need_tlbflush:1;
>>> };
>> Hmm, actually - why do you need bitfields on the x86 side at all?
>> They're needed for 32-bit architectures only, 64-bit ones ought
>> to be fine with
>>
>> struct {
>>         unsigned int first_dirty;
>>         bool need_tlbflush;
>>         uint8_t scrub_state;
>> };
> 
> IIRC it was exactly because of ARM32 and at some point you suggested to
> switch both x86 and ARM to bitfields.

I don't recall for sure whether I had asked for the change to be done
uniformly; it was certainly ARM32 that triggered me notice the
structure size change in your original version.

>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
>> at least MAX_ORDER + 1 bits). The ARM maintainers will know
>> whether they would want to also differentiate ARM32 and
>> ARM64 here.
> 
> Isn't using bitfields the only possibility for 32-bit? We can't fit
> first_dirty into 2 bytes.

Yes, hence the question whether to stay with bitfields uniformly
or make ARM64 follow x86, but ARM32 keep using bitfields.

Jan
Julien Grall Aug. 15, 2017, 2:52 p.m. UTC | #7
Hi Jan,

On 15/08/17 15:51, Jan Beulich wrote:
>>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote:
>> On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>>>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>>>> --- a/xen/include/asm-x86/mm.h
>>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>>>          struct {
>>>>>>              /* Do TLBs need flushing for safety before next page use? */
>>>>>> -            bool_t need_tlbflush;
>>>>>> +            bool need_tlbflush:1;
>>>>>> +
>>>>>> +            /*
>>>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>>>> +             * One more bit than maximum possible order to accommodate
>>>>>> +             * INVALID_DIRTY_IDX.
>>>>>> +             */
>>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>>>          } free;
>>>>> I think generated code will be better with the two fields swapped:
>>>>> That way reading first_dirty won't involve a shift, and accessing a
>>>>> single bit doesn't require shifts at all on many architectures.
>>>> Ok, I will then keep need_tlbflush as the last field so the final struct
>>>> (as defined in patch 7) will look like
>>>>
>>>> struct {
>>>>         unsigned long first_dirty:MAX_ORDER + 1;
>>>>         unsigned long scrub_state:2;
>>>>         bool need_tlbflush:1;
>>>> };
>>> Hmm, actually - why do you need bitfields on the x86 side at all?
>>> They're needed for 32-bit architectures only, 64-bit ones ought
>>> to be fine with
>>>
>>> struct {
>>>         unsigned int first_dirty;
>>>         bool need_tlbflush;
>>>         uint8_t scrub_state;
>>> };
>>
>> IIRC it was exactly because of ARM32 and at some point you suggested to
>> switch both x86 and ARM to bitfields.
>
> I don't recall for sure whether I had asked for the change to be done
> uniformly; it was certainly ARM32 that triggered me notice the
> structure size change in your original version.
>
>>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
>>> at least MAX_ORDER + 1 bits). The ARM maintainers will know
>>> whether they would want to also differentiate ARM32 and
>>> ARM64 here.
>>
>> Isn't using bitfields the only possibility for 32-bit? We can't fit
>> first_dirty into 2 bytes.
>
> Yes, hence the question whether to stay with bitfields uniformly
> or make ARM64 follow x86, but ARM32 keep using bitfields.

I would prefer to avoid differentiation between Arm32 and Arm64.

Cheers,
Boris Ostrovsky Aug. 15, 2017, 3:03 p.m. UTC | #8
On 08/15/2017 10:52 AM, Julien Grall wrote:
> Hi Jan,
>
> On 15/08/17 15:51, Jan Beulich wrote:
>>>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote:
>>> On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>>>>> --- a/xen/include/asm-x86/mm.h
>>>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>>>          /* Page is on a free list: ((count_info &
>>>>>>> PGC_count_mask) == 0). */
>>>>>>>          struct {
>>>>>>>              /* Do TLBs need flushing for safety before next
>>>>>>> page use? */
>>>>>>> -            bool_t need_tlbflush;
>>>>>>> +            bool need_tlbflush:1;
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * Index of the first *possibly* unscrubbed page in
>>>>>>> the buddy.
>>>>>>> +             * One more bit than maximum possible order to
>>>>>>> accommodate
>>>>>>> +             * INVALID_DIRTY_IDX.
>>>>>>> +             */
>>>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>>>>          } free;
>>>>>> I think generated code will be better with the two fields swapped:
>>>>>> That way reading first_dirty won't involve a shift, and accessing a
>>>>>> single bit doesn't require shifts at all on many architectures.
>>>>> Ok, I will then keep need_tlbflush as the last field so the final
>>>>> struct
>>>>> (as defined in patch 7) will look like
>>>>>
>>>>> struct {
>>>>>         unsigned long first_dirty:MAX_ORDER + 1;
>>>>>         unsigned long scrub_state:2;
>>>>>         bool need_tlbflush:1;
>>>>> };
>>>> Hmm, actually - why do you need bitfields on the x86 side at all?
>>>> They're needed for 32-bit architectures only, 64-bit ones ought
>>>> to be fine with
>>>>
>>>> struct {
>>>>         unsigned int first_dirty;
>>>>         bool need_tlbflush;
>>>>         uint8_t scrub_state;
>>>> };
>>>
>>> IIRC it was exactly because of ARM32 and at some point you suggested to
>>> switch both x86 and ARM to bitfields.
>>
>> I don't recall for sure whether I had asked for the change to be done
>> uniformly; it was certainly ARM32 that triggered me notice the
>> structure size change in your original version.
>>
>>>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
>>>> at least MAX_ORDER + 1 bits). The ARM maintainers will know
>>>> whether they would want to also differentiate ARM32 and
>>>> ARM64 here.
>>>
>>> Isn't using bitfields the only possibility for 32-bit? We can't fit
>>> first_dirty into 2 bytes.
>>
>> Yes, hence the question whether to stay with bitfields uniformly
>> or make ARM64 follow x86, but ARM32 keep using bitfields.
>
> I would prefer to avoid differentiation between Arm32 and Arm64.

I can switch x86 only back to "normal" types but then we also have this
in patch 7:

static void check_and_stop_scrub(struct page_info *head)
{
    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
    {
        typeof(head->u.free) pgfree;

        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
        spin_lock_kick();
        for ( ; ; )
        {
            /* Can't ACCESS_ONCE() a bitfield. */
            pgfree.val = ACCESS_ONCE(head->u.free.val);
            if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT )
                break;
            cpu_relax();
        }
    }
}

I'd rather avoid having '#ifdef <arch>' meaning that

union {
            struct {
                unsigned int first_dirty;
                bool need_tlbflush;
                uint8_t scrub_state;
            };

            unsigned long val;
} free;

is still kept for x86.

-boris
Jan Beulich Aug. 15, 2017, 3:08 p.m. UTC | #9
>>> On 15.08.17 at 17:03, <boris.ostrovsky@oracle.com> wrote:
> I can switch x86 only back to "normal" types but then we also have this
> in patch 7:
> 
> static void check_and_stop_scrub(struct page_info *head)
> {
>     if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
>     {
>         typeof(head->u.free) pgfree;
> 
>         head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
>         spin_lock_kick();
>         for ( ; ; )
>         {
>             /* Can't ACCESS_ONCE() a bitfield. */
>             pgfree.val = ACCESS_ONCE(head->u.free.val);
>             if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT )
>                 break;
>             cpu_relax();
>         }
>     }
> }
> 
> I'd rather avoid having '#ifdef <arch>' meaning that
> 
> union {
>             struct {
>                 unsigned int first_dirty;
>                 bool need_tlbflush;
>                 uint8_t scrub_state;
>             };
> 
>             unsigned long val;
> } free;
> 
> is still kept for x86.

That's fine I think.

Jan
diff mbox

Patch

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 330f8ed..b857a44 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@  typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,13 +680,30 @@  static void check_low_mem_virq(void)
     }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+                                unsigned int zone, unsigned int order,
+                                unsigned int first_dirty)
+{
+    PFN_ORDER(pg) = order;
+    pg->u.free.first_dirty = first_dirty;
+
+    if ( first_dirty != INVALID_DIRTY_IDX )
+    {
+        ASSERT(first_dirty < (1U << order));
+        page_list_add_tail(pg, &heap(node, zone, order));
+    }
+    else
+        page_list_add(pg, &heap(node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
     unsigned int order, unsigned int memflags,
     struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0;
+    unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
     unsigned long request = 1UL << order;
     struct page_info *pg;
@@ -798,12 +817,26 @@  static struct page_info *alloc_heap_pages(
     return NULL;
 
  found: 
+
+    first_dirty = pg->u.free.first_dirty;
+
     /* We may have to halve the chunk a number of times. */
     while ( j != order )
     {
-        PFN_ORDER(pg) = --j;
-        page_list_add_tail(pg, &heap(node, zone, j));
-        pg += 1 << j;
+        j--;
+        page_list_add_scrub(pg, node, zone, j,
+                            (1U << j) > first_dirty ?
+                            first_dirty : INVALID_DIRTY_IDX);
+        pg += 1U << j;
+
+        if ( first_dirty != INVALID_DIRTY_IDX )
+        {
+            /* Adjust first_dirty */
+            if ( first_dirty >= 1U << j )
+                first_dirty -= 1U << j;
+            else
+                first_dirty = 0; /* We've moved past original first_dirty */
+        }
     }
 
     ASSERT(avail[node][zone] >= request);
@@ -850,12 +883,20 @@  static int reserve_offlined_page(struct page_info *head)
     unsigned int node = phys_to_nid(page_to_maddr(head));
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
     struct page_info *cur_head;
-    int cur_order;
+    unsigned int cur_order, first_dirty;
 
     ASSERT(spin_is_locked(&heap_lock));
 
     cur_head = head;
 
+    /*
+     * We may break the buddy so let's mark the head as clean. Then, when
+     * merging chunks back into the heap, we will see whether the chunk has
+     * unscrubbed pages and set its first_dirty properly.
+     */
+    first_dirty = head->u.free.first_dirty;
+    head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -866,6 +907,8 @@  static int reserve_offlined_page(struct page_info *head)
         if ( page_state_is(cur_head, offlined) )
         {
             cur_head++;
+            if ( first_dirty != INVALID_DIRTY_IDX && first_dirty )
+                first_dirty--;
             continue;
         }
 
@@ -892,9 +935,20 @@  static int reserve_offlined_page(struct page_info *head)
             {
             merge:
                 /* We don't consider merging outside the head_order. */
-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
-                PFN_ORDER(cur_head) = cur_order;
+                page_list_add_scrub(cur_head, node, zone, cur_order,
+                                    (1U << cur_order) > first_dirty ?
+                                    first_dirty : INVALID_DIRTY_IDX);
                 cur_head += (1 << cur_order);
+
+                /* Adjust first_dirty if needed. */
+                if ( first_dirty != INVALID_DIRTY_IDX )
+                {
+                    if ( first_dirty >=  1U << cur_order )
+                        first_dirty -= 1U << cur_order;
+                    else
+                        first_dirty = 0;
+                }
+
                 break;
             }
         }
@@ -919,9 +973,53 @@  static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
+static void scrub_free_pages(unsigned int node)
+{
+    struct page_info *pg;
+    unsigned int zone;
+
+    ASSERT(spin_is_locked(&heap_lock));
+
+    if ( !node_need_scrub[node] )
+        return;
+
+    for ( zone = 0; zone < NR_ZONES; zone++ )
+    {
+        unsigned int order = MAX_ORDER;
+
+        do {
+            while ( !page_list_empty(&heap(node, zone, order)) )
+            {
+                unsigned int i;
+
+                /* Unscrubbed pages are always at the end of the list. */
+                pg = page_list_last(&heap(node, zone, order));
+                if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                    break;
+
+                for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
+                {
+                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    {
+                        scrub_one_page(&pg[i]);
+                        pg[i].count_info &= ~PGC_need_scrub;
+                        node_need_scrub[node]--;
+                    }
+                }
+
+                page_list_del(pg, &heap(node, zone, order));
+                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+
+                if ( node_need_scrub[node] == 0 )
+                    return;
+            }
+        } while ( order-- != 0 );
+    }
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
-    struct page_info *pg, unsigned int order)
+    struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mask, mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
@@ -961,10 +1059,20 @@  static void free_heap_pages(
         /* This page is not a guest frame any more. */
         page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
+
+        if ( need_scrub )
+            pg[i].count_info |= PGC_need_scrub;
     }
 
     avail[node][zone] += 1 << order;
     total_avail_pages += 1 << order;
+    if ( need_scrub )
+    {
+        node_need_scrub[node] += 1 << order;
+        pg->u.free.first_dirty = 0;
+    }
+    else
+        pg->u.free.first_dirty = INVALID_DIRTY_IDX;
 
     if ( tmem_enabled() )
         midsize_alloc_zone_pages = max(
@@ -988,6 +1096,12 @@  static void free_heap_pages(
 
             page_list_del(predecessor, &heap(node, zone, order));
 
+            /* Keep predecessor's first_dirty if it is already set. */
+            if ( predecessor->u.free.first_dirty == INVALID_DIRTY_IDX &&
+                 pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+                predecessor->u.free.first_dirty = (1U << order) +
+                                                  pg->u.free.first_dirty;
+
             pg = predecessor;
         }
         else
@@ -1007,12 +1121,14 @@  static void free_heap_pages(
         order++;
     }
 
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty);
 
     if ( tainted )
         reserve_offlined_page(pg);
 
+    if ( need_scrub )
+        scrub_free_pages(node);
+
     spin_unlock(&heap_lock);
 }
 
@@ -1233,7 +1349,7 @@  unsigned int online_page(unsigned long mfn, uint32_t *status)
     spin_unlock(&heap_lock);
 
     if ( (y & PGC_state) == PGC_state_offlined )
-        free_heap_pages(pg, 0);
+        free_heap_pages(pg, 0, false);
 
     return ret;
 }
@@ -1276,6 +1392,8 @@  static void init_heap_pages(
 {
     unsigned long i;
 
+    BUILD_BUG_ON(sizeof(pg->u) != sizeof(unsigned long));
+
     for ( i = 0; i < nr_pages; i++ )
     {
         unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
@@ -1302,7 +1420,7 @@  static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg+i, 0);
+        free_heap_pages(pg + i, 0, false);
     }
 }
 
@@ -1629,7 +1747,7 @@  void free_xenheap_pages(void *v, unsigned int order)
 
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order);
+    free_heap_pages(virt_to_page(v), order, false);
 }
 
 #else
@@ -1683,12 +1801,9 @@  void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
-    {
-        scrub_one_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
-    }
 
-    free_heap_pages(pg, order);
+    free_heap_pages(pg, order, true);
 }
 
 #endif
@@ -1797,7 +1912,7 @@  struct page_info *alloc_domheap_pages(
     if ( d && !(memflags & MEMF_no_owner) &&
          assign_pages(d, pg, order, memflags) )
     {
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, false);
         return NULL;
     }
     
@@ -1865,11 +1980,7 @@  void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
-
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index ef84b72..572337c 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -44,7 +44,16 @@  struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            bool need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more bit than maximum possible order to accommodate
+             * INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
+            unsigned long first_dirty:MAX_ORDER + 1;
+
         } free;
 
     } u;
@@ -107,6 +116,13 @@  struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 extern mfn_t xenheap_mfn_start, xenheap_mfn_end;
 extern vaddr_t xenheap_virt_end;
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2bf3f33..07dc963 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -88,7 +88,15 @@  struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            bool need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more bit than maximum possible order to accommodate
+             * INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
+            unsigned long first_dirty:MAX_ORDER + 1;
         } free;
 
     } u;
@@ -233,6 +241,13 @@  struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 #define is_xen_heap_page(page) ((page)->count_info & PGC_xen_heap)
 #define is_xen_heap_mfn(mfn) \
     (__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))