diff mbox

[v5,12/22] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

Message ID 560E9004.8030604@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Vrabel Oct. 2, 2015, 2:09 p.m. UTC
On 30/09/15 11:45, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
> 
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
> 
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
> 
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.

This causes a BUG() in x86 PV guests when decreasing the reservation.

Xen says:

(XEN) d0v2 Error pfn 0: rd=0 od=32753 caf=8000000000000001
taf=7400000000000001
(XEN) memory.c:250:d0v2 Bad page free for domain 0

And Linux BUGs with:

[   82.032654] kernel BUG at
/anfs/drall/scratch/davidvr/x86/linux/drivers/xen/balloon.c:540!

Which is a non-zero return value from the decrease_reservation hypercall.

The frame_list[] has been incorrectly populated.  The below patch fixes
it for me.  Please test as well.

 		/*

David

Comments

Julien Grall Oct. 2, 2015, 2:31 p.m. UTC | #1
Hi David,

On 02/10/15 15:09, David Vrabel wrote:
> On 30/09/15 11:45, Julien Grall wrote:
>> For ARM64 guests, Linux is able to support either 64K or 4K page
>> granularity. Although, the hypercall interface is always based on 4K
>> page granularity.
>>
>> With 64K page granularity, a single page will be spread over multiple
>> Xen frame.
>>
>> To avoid splitting the page into 4K frame, take advantage of the
>> extent_order field to directly allocate/free chunk of the Linux page
>> size.
>>
>> Note that PVMMU is only used for PV guest (which is x86) and the page
>> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
>> that because the code has not been modified.
> 
> This causes a BUG() in x86 PV guests when decreasing the reservation.
> 
> Xen says:
> 
> (XEN) d0v2 Error pfn 0: rd=0 od=32753 caf=8000000000000001
> taf=7400000000000001
> (XEN) memory.c:250:d0v2 Bad page free for domain 0
> 
> And Linux BUGs with:
> 
> [   82.032654] kernel BUG at
> /anfs/drall/scratch/davidvr/x86/linux/drivers/xen/balloon.c:540!
> 
> Which is a non-zero return value from the decrease_reservation hypercall.
> 
> The frame_list[] has been incorrectly populated.  The below patch fixes
> it for me.  Please test as well.

Sorry for the breakage, I think I haven't spot the bug on my board
because most the PV drivers are allocating one balloon page at the time
by default.

This patch looks valid to me. i was resetting and incremented for each
loop on an early version. Although I dropped it by mistake when I use a
different way to decrease the reservation.

Regards,
Julien Grall Oct. 2, 2015, 2:52 p.m. UTC | #2
On 02/10/15 15:31, Julien Grall wrote:
> Hi David,
> 
> On 02/10/15 15:09, David Vrabel wrote:
>> On 30/09/15 11:45, Julien Grall wrote:
>>> For ARM64 guests, Linux is able to support either 64K or 4K page
>>> granularity. Although, the hypercall interface is always based on 4K
>>> page granularity.
>>>
>>> With 64K page granularity, a single page will be spread over multiple
>>> Xen frame.
>>>
>>> To avoid splitting the page into 4K frame, take advantage of the
>>> extent_order field to directly allocate/free chunk of the Linux page
>>> size.
>>>
>>> Note that PVMMU is only used for PV guest (which is x86) and the page
>>> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
>>> that because the code has not been modified.
>>
>> This causes a BUG() in x86 PV guests when decreasing the reservation.
>>
>> Xen says:
>>
>> (XEN) d0v2 Error pfn 0: rd=0 od=32753 caf=8000000000000001
>> taf=7400000000000001
>> (XEN) memory.c:250:d0v2 Bad page free for domain 0
>>
>> And Linux BUGs with:
>>
>> [   82.032654] kernel BUG at
>> /anfs/drall/scratch/davidvr/x86/linux/drivers/xen/balloon.c:540!
>>
>> Which is a non-zero return value from the decrease_reservation hypercall.
>>
>> The frame_list[] has been incorrectly populated.  The below patch fixes
>> it for me.  Please test as well.

FIY, I've just tested with the patch on ARM64 and I haven't see any issue.

> Sorry for the breakage, I think I haven't spot the bug on my board
> because most the PV drivers are allocating one balloon page at the time
> by default.
> 
> This patch looks valid to me. i was resetting and incremented for each
> loop on an early version. Although I dropped it by mistake when I use a
> different way to decrease the reservation.
Boris Ostrovsky Oct. 2, 2015, 3:18 p.m. UTC | #3
On 10/02/2015 10:52 AM, Julien Grall wrote:
> On 02/10/15 15:31, Julien Grall wrote:
>> Hi David,
>>
>> On 02/10/15 15:09, David Vrabel wrote:
>>> On 30/09/15 11:45, Julien Grall wrote:
>>>> For ARM64 guests, Linux is able to support either 64K or 4K page
>>>> granularity. Although, the hypercall interface is always based on 4K
>>>> page granularity.
>>>>
>>>> With 64K page granularity, a single page will be spread over multiple
>>>> Xen frame.
>>>>
>>>> To avoid splitting the page into 4K frame, take advantage of the
>>>> extent_order field to directly allocate/free chunk of the Linux page
>>>> size.
>>>>
>>>> Note that PVMMU is only used for PV guest (which is x86) and the page
>>>> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
>>>> that because the code has not been modified.
>>> This causes a BUG() in x86 PV guests when decreasing the reservation.
>>>
>>> Xen says:
>>>
>>> (XEN) d0v2 Error pfn 0: rd=0 od=32753 caf=8000000000000001
>>> taf=7400000000000001
>>> (XEN) memory.c:250:d0v2 Bad page free for domain 0
>>>
>>> And Linux BUGs with:
>>>
>>> [   82.032654] kernel BUG at
>>> /anfs/drall/scratch/davidvr/x86/linux/drivers/xen/balloon.c:540!
>>>
>>> Which is a non-zero return value from the decrease_reservation hypercall.
>>>
>>> The frame_list[] has been incorrectly populated.  The below patch fixes
>>> it for me.  Please test as well.
> FIY, I've just tested with the patch on ARM64 and I haven't see any issue.


I had a quick one-off test and this fixes it on x86. I'll schedule it 
for the overnight run too.

-boris


>
>> Sorry for the breakage, I think I haven't spot the bug on my board
>> because most the PV drivers are allocating one balloon page at the time
>> by default.
>>
>> This patch looks valid to me. i was resetting and incremented for each
>> loop on an early version. Although I dropped it by mistake when I use a
>> different way to decrease the reservation.
>
>
>
David Vrabel Oct. 2, 2015, 3:19 p.m. UTC | #4
On 02/10/15 15:52, Julien Grall wrote:
> On 02/10/15 15:31, Julien Grall wrote:
>> Hi David,
>>
>> On 02/10/15 15:09, David Vrabel wrote:
>>> On 30/09/15 11:45, Julien Grall wrote:
>>>> For ARM64 guests, Linux is able to support either 64K or 4K page
>>>> granularity. Although, the hypercall interface is always based on 4K
>>>> page granularity.
>>>>
>>>> With 64K page granularity, a single page will be spread over multiple
>>>> Xen frame.
>>>>
>>>> To avoid splitting the page into 4K frame, take advantage of the
>>>> extent_order field to directly allocate/free chunk of the Linux page
>>>> size.
>>>>
>>>> Note that PVMMU is only used for PV guest (which is x86) and the page
>>>> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
>>>> that because the code has not been modified.
>>>
>>> This causes a BUG() in x86 PV guests when decreasing the reservation.
>>>
>>> Xen says:
>>>
>>> (XEN) d0v2 Error pfn 0: rd=0 od=32753 caf=8000000000000001
>>> taf=7400000000000001
>>> (XEN) memory.c:250:d0v2 Bad page free for domain 0
>>>
>>> And Linux BUGs with:
>>>
>>> [   82.032654] kernel BUG at
>>> /anfs/drall/scratch/davidvr/x86/linux/drivers/xen/balloon.c:540!
>>>
>>> Which is a non-zero return value from the decrease_reservation hypercall.
>>>
>>> The frame_list[] has been incorrectly populated.  The below patch fixes
>>> it for me.  Please test as well.
> 
> FIY, I've just tested with the patch on ARM64 and I haven't see any issue.

Thanks, I've folded it in.

Boris, can we get another test run on the for-linus-4.4 branch, please?

David
diff mbox

Patch

--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -504,9 +504,10 @@  static enum bp_state decrease_reservation(unsigned
long nr_pages, gfp_t gfp)
 	 * Setup the frame, update direct mapping, invalidate P2M,
 	 * and add to balloon.
 	 */
+	i = 0;
 	list_for_each_entry_safe(page, tmp, &pages, lru) {
 		/* XENMEM_decrease_reservation requires a GFN */
-		frame_list[i] = xen_page_to_gfn(page);
+		frame_list[i++] = xen_page_to_gfn(page);

 #ifdef CONFIG_XEN_HAVE_PVMMU