diff mbox

[V4] xen/arm: domain_build: allocate lowmem for dom0 as much as possible

Message ID 1474599334-30709-1-git-send-email-peng.fan@nxp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Peng Fan Sept. 23, 2016, 2:55 a.m. UTC
On AArch64 SoCs, some IPs may only have the capability to access
32 bits address space. The physical memory assigned for Dom0 maybe
not in 4GB address space, then the IPs will not work properly.
So need to allocate memory under 4GB for Dom0.

There is no restriction that how much lowmem needs to be allocated for
Dom0 ,so allocate lowmem as much as possible for Dom0.

This patch does not affect 32-bit domain, because Variable "lowmem" is
set to true at the beginning. If failed to allocate bank0 under 4GB,
need to panic for 32-bit domain, because 32-bit domain requires bank0
be allocated under 4GB.

For 64-bit domain, set "lowmem" to false, and continue allocating
memory from above 4GB.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
---

This patch is to resolve the issue mentioned in
https://lists.xen.org/archives/html/xen-devel/2016-09/msg00235.html

 Tested results:
 (XEN) Allocating 1:1 mappings totalling 2048MB for dom0:
 (XEN) BANK[0] 0x00000088000000-0x000000f8000000 (1792MB)
 (XEN) BANK[1] 0x000009e0000000-0x000009f0000000 (256MB)
 1792M allocated in 4GB address space.

V4:
 Address comments in V3: https://lists.xen.org/archives/html/xen-devel/2016-09/msg02499.html
 Drop uneccessary check when failed to allocate memory under 4GB
 Refine comments according to Julien's suggestion in V3.
 Keep "bits <= (lowmem ? 32 : PADDR_BITS)", but not changed to "bits <= 32"

V3:
 Add more commit log
 Add more comments
 Add back panic if failed to allocate bank0 under 4GB for 32-bit domain.

V2:
 Remove the bootargs dom0_lowmem introduced in V1.
 Following "https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg01459.html"
 to allocate as much as possible lowmem.

 xen/arch/arm/domain_build.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

Comments

Peng Fan Oct. 8, 2016, 2:25 a.m. UTC | #1
Hi Stefano, Julien

Any comments on this v4 patch?

Thanks,
Peng
On Fri, Sep 23, 2016 at 10:55:34AM +0800, Peng Fan wrote:
>On AArch64 SoCs, some IPs may only have the capability to access
>32 bits address space. The physical memory assigned for Dom0 maybe
>not in 4GB address space, then the IPs will not work properly.
>So need to allocate memory under 4GB for Dom0.
>
>There is no restriction that how much lowmem needs to be allocated for
>Dom0 ,so allocate lowmem as much as possible for Dom0.
>
>This patch does not affect 32-bit domain, because Variable "lowmem" is
>set to true at the beginning. If failed to allocate bank0 under 4GB,
>need to panic for 32-bit domain, because 32-bit domain requires bank0
>be allocated under 4GB.
>
>For 64-bit domain, set "lowmem" to false, and continue allocating
>memory from above 4GB.
>
>Signed-off-by: Peng Fan <peng.fan@nxp.com>
>Cc: Stefano Stabellini <sstabellini@kernel.org>
>Cc: Julien Grall <julien.grall@arm.com>
>---
>
>This patch is to resolve the issue mentioned in
>https://lists.xen.org/archives/html/xen-devel/2016-09/msg00235.html
>
> Tested results:
> (XEN) Allocating 1:1 mappings totalling 2048MB for dom0:
> (XEN) BANK[0] 0x00000088000000-0x000000f8000000 (1792MB)
> (XEN) BANK[1] 0x000009e0000000-0x000009f0000000 (256MB)
> 1792M allocated in 4GB address space.
>
>V4:
> Address comments in V3: https://lists.xen.org/archives/html/xen-devel/2016-09/msg02499.html
> Drop uneccessary check when failed to allocate memory under 4GB
> Refine comments according to Julien's suggestion in V3.
> Keep "bits <= (lowmem ? 32 : PADDR_BITS)", but not changed to "bits <= 32"
>
>V3:
> Add more commit log
> Add more comments
> Add back panic if failed to allocate bank0 under 4GB for 32-bit domain.
>
>V2:
> Remove the bootargs dom0_lowmem introduced in V1.
> Following "https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg01459.html"
> to allocate as much as possible lowmem.
>
> xen/arch/arm/domain_build.c | 33 ++++++++++++++++++++++-----------
> 1 file changed, 22 insertions(+), 11 deletions(-)
>
>diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>index 35ab08d..6b5ac8d 100644
>--- a/xen/arch/arm/domain_build.c
>+++ b/xen/arch/arm/domain_build.c
>@@ -195,9 +195,9 @@ fail:
>  *    bank. Partly this is just easier for us to deal with, but also
>  *    the ramdisk and DTB must be placed within a certain proximity of
>  *    the kernel within RAM.
>- * 3. For 32-bit dom0 we want to place as much of the RAM as we
>- *    reasonably can below 4GB, so that it can be used by non-LPAE
>- *    enabled kernels.
>+ * 3. For dom0 we want to place as much of the RAM as we reasonably can
>+ *    below 4GB, so that it can be used by non-LPAE enabled kernels (32-bit)
>+ *    or when a device assigned to dom0 can only do 32-bit DMA access.
>  * 4. For 32-bit dom0 the kernel must be located below 4GB.
>  * 5. We want to have a few largers banks rather than many smaller ones.
>  *
>@@ -230,7 +230,8 @@ fail:
>  * we give up.
>  *
>  * For 32-bit domain we require that the initial allocation for the
>- * first bank is under 4G. Then for the subsequent allocations we
>+ * first bank is under 4G. For 64-bit domain, the first bank is preferred
>+ * to be allocated under 4G. Then for the subsequent allocations we
>  * initially allocate memory only from below 4GB. Once that runs out
>  * (as described above) we allow higher allocations and continue until
>  * that runs out (or we have allocated sufficient dom0 memory).
>@@ -244,7 +245,7 @@ static void allocate_memory(struct domain *d, struct kernel_info *kinfo)
>     unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
>     int i;
> 
>-    bool_t lowmem = is_32bit_domain(d);
>+    bool_t lowmem = true;
>     unsigned int bits;
> 
>     /*
>@@ -269,20 +270,30 @@ static void allocate_memory(struct domain *d, struct kernel_info *kinfo)
>         {
>             pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
>             if ( pg != NULL )
>+            {
>+                if ( !insert_11_bank(d, kinfo, pg, order) )
>+                    BUG(); /* Cannot fail for first bank */
>+
>                 goto got_bank0;
>+            }
>         }
>         order--;
>     }
> 
>-    panic("Unable to allocate first memory bank");
>-
>- got_bank0:
>+    /* Failed to allocate bank0 under 4GB */
>+    if ( is_32bit_domain(d) )
>+        panic("Unable to allocate first memory bank.");
> 
>-    if ( !insert_11_bank(d, kinfo, pg, order) )
>-        BUG(); /* Cannot fail for first bank */
>+    /* Try to allocate memory from above 4GB */
>+    printk(XENLOG_INFO "No bank has been allocated below 4GB.\n");
>+    lowmem = false;
> 
>-    /* Now allocate more memory and fill in additional banks */
>+ got_bank0:
> 
>+    /*
>+     * If we failed to allocate bank0 under 4GB, continue allocating
>+     * memory from above 4GB and fill in banks.
>+     */
>     order = get_11_allocation_size(kinfo->unassigned_mem);
>     while ( kinfo->unassigned_mem && kinfo->mem.nr_banks < NR_MEM_BANKS )
>     {
>-- 
>2.6.6
>
Julien Grall Nov. 1, 2016, 2:42 p.m. UTC | #2
Hi Peng,

Sorry for the late answer.

On 23/09/2016 03:55, Peng Fan wrote:
> On AArch64 SoCs, some IPs may only have the capability to access
> 32 bits address space. The physical memory assigned for Dom0 maybe
> not in 4GB address space, then the IPs will not work properly.
> So need to allocate memory under 4GB for Dom0.
>
> There is no restriction that how much lowmem needs to be allocated for
> Dom0 ,so allocate lowmem as much as possible for Dom0.
>
> This patch does not affect 32-bit domain, because Variable "lowmem" is
> set to true at the beginning. If failed to allocate bank0 under 4GB,
> need to panic for 32-bit domain, because 32-bit domain requires bank0
> be allocated under 4GB.
>
> For 64-bit domain, set "lowmem" to false, and continue allocating
> memory from above 4GB.
>
> Signed-off-by: Peng Fan <peng.fan@nxp.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>

Reviewed-by: Julien Grall <julien.grall@arm.com>

I am undecided whether this should be considered as a bug fix for Xen 
4.8. Are you aware of any ARM64 platform we currently support requiring 
allocation of memory below 4GB?

Regards,
Stefano Stabellini Nov. 1, 2016, 6:12 p.m. UTC | #3
On Tue, 1 Nov 2016, Julien Grall wrote:
> Hi Peng,
> 
> Sorry for the late answer.
> 
> On 23/09/2016 03:55, Peng Fan wrote:
> > On AArch64 SoCs, some IPs may only have the capability to access
> > 32 bits address space. The physical memory assigned for Dom0 maybe
> > not in 4GB address space, then the IPs will not work properly.
> > So need to allocate memory under 4GB for Dom0.
> > 
> > There is no restriction that how much lowmem needs to be allocated for
> > Dom0 ,so allocate lowmem as much as possible for Dom0.
> > 
> > This patch does not affect 32-bit domain, because Variable "lowmem" is
> > set to true at the beginning. If failed to allocate bank0 under 4GB,
> > need to panic for 32-bit domain, because 32-bit domain requires bank0
> > be allocated under 4GB.
> > 
> > For 64-bit domain, set "lowmem" to false, and continue allocating
> > memory from above 4GB.
> > 
> > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Julien Grall <julien.grall@arm.com>
> 
> Reviewed-by: Julien Grall <julien.grall@arm.com>
> 
> I am undecided whether this should be considered as a bug fix for Xen 4.8. Are
> you aware of any ARM64 platform we currently support requiring allocation of
> memory below 4GB?

I am more comfortable having this in 4.9 (I queued it up in xen-arm-next
for now), unless we have a regression, a concrete problem, with an
existing supported platform, like you wrote.
Peng Fan Nov. 10, 2016, 8:30 a.m. UTC | #4
Hi Julien,

Sorry for late reply.

On Tue, Nov 01, 2016 at 02:42:06PM +0000, Julien Grall wrote:
>Hi Peng,
>
>Sorry for the late answer.
>
>On 23/09/2016 03:55, Peng Fan wrote:
>>On AArch64 SoCs, some IPs may only have the capability to access
>>32 bits address space. The physical memory assigned for Dom0 maybe
>>not in 4GB address space, then the IPs will not work properly.
>>So need to allocate memory under 4GB for Dom0.
>>
>>There is no restriction that how much lowmem needs to be allocated for
>>Dom0 ,so allocate lowmem as much as possible for Dom0.
>>
>>This patch does not affect 32-bit domain, because Variable "lowmem" is
>>set to true at the beginning. If failed to allocate bank0 under 4GB,
>>need to panic for 32-bit domain, because 32-bit domain requires bank0
>>be allocated under 4GB.
>>
>>For 64-bit domain, set "lowmem" to false, and continue allocating
>>memory from above 4GB.
>>
>>Signed-off-by: Peng Fan <peng.fan@nxp.com>
>>Cc: Stefano Stabellini <sstabellini@kernel.org>
>>Cc: Julien Grall <julien.grall@arm.com>
>
>Reviewed-by: Julien Grall <julien.grall@arm.com>
>
>I am undecided whether this should be considered as a bug fix for Xen 4.8.
>Are you aware of any ARM64 platform we currently support requiring allocation
>of memory below 4GB?

I have no idea about this (:, but I think this is a bug fix. Alought current
supported platforms works well, users may choose 4.8 to support their
new platform which has the limitation to access 64bit address.

Regards,
Peng.

>
>Regards,
>
>-- 
>Julien Grall
Julien Grall Nov. 10, 2016, 1:01 p.m. UTC | #5
(CC Wei as release manager)

On 10/11/16 08:30, Peng Fan wrote:
> Hi Julien,

Hi Peng,

> On Tue, Nov 01, 2016 at 02:42:06PM +0000, Julien Grall wrote:
>> Hi Peng,
>>
>> Sorry for the late answer.
>>
>> On 23/09/2016 03:55, Peng Fan wrote:
>>> On AArch64 SoCs, some IPs may only have the capability to access
>>> 32 bits address space. The physical memory assigned for Dom0 maybe
>>> not in 4GB address space, then the IPs will not work properly.
>>> So need to allocate memory under 4GB for Dom0.
>>>
>>> There is no restriction that how much lowmem needs to be allocated for
>>> Dom0 ,so allocate lowmem as much as possible for Dom0.
>>>
>>> This patch does not affect 32-bit domain, because Variable "lowmem" is
>>> set to true at the beginning. If failed to allocate bank0 under 4GB,
>>> need to panic for 32-bit domain, because 32-bit domain requires bank0
>>> be allocated under 4GB.
>>>
>>> For 64-bit domain, set "lowmem" to false, and continue allocating
>>> memory from above 4GB.
>>>
>>> Signed-off-by: Peng Fan <peng.fan@nxp.com>
>>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>>> Cc: Julien Grall <julien.grall@arm.com>
>>
>> Reviewed-by: Julien Grall <julien.grall@arm.com>
>>
>> I am undecided whether this should be considered as a bug fix for Xen 4.8.
>> Are you aware of any ARM64 platform we currently support requiring allocation
>> of memory below 4GB?
>
> I have no idea about this (:, but I think this is a bug fix. Alought current
> supported platforms works well, users may choose 4.8 to support their
> new platform which has the limitation to access 64bit address.

We are already late in the release process (rc5) for Xen 4.8. So we need 
to be careful when including a bug fix and evaluate the pros and cons.

This patch is modifying the DOM0 memory layout for all 64-bit platforms. 
So it could potentially break one of the platform we  officially support 
(see [1] for a non-exhaustive list). We don't have a test suite running 
automatically for ARM64 at the moment (it is been working on), this 
means that manual testing needs to be done. I am not aware of any 
platform, in the list we supports, having this issue so I prefer to stay 
on the safe side and defer this patch for Xen 4.9.

If a user cares about Xen 4.8 for their platforms, then they could 
request the patch to be backported in Xen 4.8 after the release and 
after extensive testing in staging.

Regards,

[1] 
https://wiki.xenproject.org/wiki/Xen_ARM_with_Virtualization_Extensions#Hardware

>
> Regards,
> Peng.
>
>>
>> Regards,
>>
>> --
>> Julien Grall
Peng Fan Nov. 11, 2016, 1:10 a.m. UTC | #6
On Thu, Nov 10, 2016 at 01:01:38PM +0000, Julien Grall wrote:
>(CC Wei as release manager)
>
>On 10/11/16 08:30, Peng Fan wrote:
>>Hi Julien,
>
>Hi Peng,
>
>>On Tue, Nov 01, 2016 at 02:42:06PM +0000, Julien Grall wrote:
>>>Hi Peng,
>>>
>>>Sorry for the late answer.
>>>
>>>On 23/09/2016 03:55, Peng Fan wrote:
>>>>On AArch64 SoCs, some IPs may only have the capability to access
>>>>32 bits address space. The physical memory assigned for Dom0 maybe
>>>>not in 4GB address space, then the IPs will not work properly.
>>>>So need to allocate memory under 4GB for Dom0.
>>>>
>>>>There is no restriction that how much lowmem needs to be allocated for
>>>>Dom0 ,so allocate lowmem as much as possible for Dom0.
>>>>
>>>>This patch does not affect 32-bit domain, because Variable "lowmem" is
>>>>set to true at the beginning. If failed to allocate bank0 under 4GB,
>>>>need to panic for 32-bit domain, because 32-bit domain requires bank0
>>>>be allocated under 4GB.
>>>>
>>>>For 64-bit domain, set "lowmem" to false, and continue allocating
>>>>memory from above 4GB.
>>>>
>>>>Signed-off-by: Peng Fan <peng.fan@nxp.com>
>>>>Cc: Stefano Stabellini <sstabellini@kernel.org>
>>>>Cc: Julien Grall <julien.grall@arm.com>
>>>
>>>Reviewed-by: Julien Grall <julien.grall@arm.com>
>>>
>>>I am undecided whether this should be considered as a bug fix for Xen 4.8.
>>>Are you aware of any ARM64 platform we currently support requiring allocation
>>>of memory below 4GB?
>>
>>I have no idea about this (:, but I think this is a bug fix. Alought current
>>supported platforms works well, users may choose 4.8 to support their
>>new platform which has the limitation to access 64bit address.
>
>We are already late in the release process (rc5) for Xen 4.8. So we need to
>be careful when including a bug fix and evaluate the pros and cons.
>
>This patch is modifying the DOM0 memory layout for all 64-bit platforms. So
>it could potentially break one of the platform we  officially support (see
>[1] for a non-exhaustive list). We don't have a test suite running
>automatically for ARM64 at the moment (it is been working on), this means
>that manual testing needs to be done. I am not aware of any platform, in the
>list we supports, having this issue so I prefer to stay on the safe side and
>defer this patch for Xen 4.9.

Ok. Defer it for 4.9 to avoid breaking any platforms. :)

>
>If a user cares about Xen 4.8 for their platforms, then they could request
>the patch to be backported in Xen 4.8 after the release and after extensive
>testing in staging.

Yeah. Agree

Thanks,
Peng.
Wei Liu Nov. 11, 2016, 1:41 a.m. UTC | #7
On Thu, Nov 10, 2016 at 01:01:38PM +0000, Julien Grall wrote:
> (CC Wei as release manager)
> 
> On 10/11/16 08:30, Peng Fan wrote:
> >Hi Julien,
> 
> Hi Peng,
> 
> >On Tue, Nov 01, 2016 at 02:42:06PM +0000, Julien Grall wrote:
> >>Hi Peng,
> >>
> >>Sorry for the late answer.
> >>
> >>On 23/09/2016 03:55, Peng Fan wrote:
> >>>On AArch64 SoCs, some IPs may only have the capability to access
> >>>32 bits address space. The physical memory assigned for Dom0 maybe
> >>>not in 4GB address space, then the IPs will not work properly.
> >>>So need to allocate memory under 4GB for Dom0.
> >>>
> >>>There is no restriction that how much lowmem needs to be allocated for
> >>>Dom0 ,so allocate lowmem as much as possible for Dom0.
> >>>
> >>>This patch does not affect 32-bit domain, because Variable "lowmem" is
> >>>set to true at the beginning. If failed to allocate bank0 under 4GB,
> >>>need to panic for 32-bit domain, because 32-bit domain requires bank0
> >>>be allocated under 4GB.
> >>>
> >>>For 64-bit domain, set "lowmem" to false, and continue allocating
> >>>memory from above 4GB.
> >>>
> >>>Signed-off-by: Peng Fan <peng.fan@nxp.com>
> >>>Cc: Stefano Stabellini <sstabellini@kernel.org>
> >>>Cc: Julien Grall <julien.grall@arm.com>
> >>
> >>Reviewed-by: Julien Grall <julien.grall@arm.com>
> >>
> >>I am undecided whether this should be considered as a bug fix for Xen 4.8.
> >>Are you aware of any ARM64 platform we currently support requiring allocation
> >>of memory below 4GB?
> >
> >I have no idea about this (:, but I think this is a bug fix. Alought current
> >supported platforms works well, users may choose 4.8 to support their
> >new platform which has the limitation to access 64bit address.
> 
> We are already late in the release process (rc5) for Xen 4.8. So we need to
> be careful when including a bug fix and evaluate the pros and cons.
> 
> This patch is modifying the DOM0 memory layout for all 64-bit platforms. So
> it could potentially break one of the platform we  officially support (see
> [1] for a non-exhaustive list). We don't have a test suite running
> automatically for ARM64 at the moment (it is been working on), this means
> that manual testing needs to be done. I am not aware of any platform, in the
> list we supports, having this issue so I prefer to stay on the safe side and
> defer this patch for Xen 4.9.
> 
> If a user cares about Xen 4.8 for their platforms, then they could request
> the patch to be backported in Xen 4.8 after the release and after extensive
> testing in staging.
> 

I agree with your reasoning.

Wei.
Andrii Anisov Nov. 11, 2016, 11:35 a.m. UTC | #8
Sorry for the late intrusion into this discussion. I would introduce my
vision of the issues behind the 32 bits addressing DMA controllers in
ARMv7/v8 SoCs.

On AArch64 SoCs, some IPs may only have the capability to access
> 32 bits address space. The physical memory assigned for Dom0 maybe
> not in 4GB address space, then the IPs will not work properly.
> So need to allocate memory under 4GB for Dom0.
>
IMHO that is a wrong approach. Unfortunately the problem is much bigger.
Normally you would need to run guest domains as well. With at least PV
Block and PV NET drivers. Due to the fact that PV drivers made in a way
that DMA controller at last will work with DomU's pages, those pages should
be from below 4GB.
So any DomU running PV drivers should have some amount of pages from below
4GB. Moreover, the OS running in DomU should be knowing that only those
pages are DMA-able, and that PV drivers should be working with DMA-able
pages only: I.e. pages should be mapped correspondingly into different
banks under and over 4GB.

The approach, I believe is more suitable, is to specify explicitly an
amount of RAM below 4GB and above 4GB for any domain. For dom0 through xen
command line, for domU through domain configuration file.

Such approach was implemented by GL. You can find preliminary patches over
here: https://lists.xen.org/archives/html/xen-devel/2016-05/msg01785.html
https://lists.xen.org/archives/html/xen-devel/2016-05/msg01786.html .
I really hope GL will decide to tailor and upstream the feature.

I am undecided whether this should be considered as a bug fix for Xen 4.8.
> Are you aware of any ARM64 platform we currently support requiring
> allocation of memory below 4GB?

That is not only ARM64 problem. Any ARMv7/v8 based platform having no IOMMU
supported (or owned) by XEN with 32-bit DMA controllers and RAM over 4GB
will suffer from this problem. From living products: new J6 EVM with 4GB
RAM, Salvator-X.

Sincerely,
Andrii Anisov.
Julien Grall Nov. 11, 2016, 11:59 a.m. UTC | #9
Hello,

On 11/11/16 11:35, Andrii Anisov wrote:
> Sorry for the late intrusion into this discussion. I would introduce my
> vision of the issues behind the 32 bits addressing DMA controllers in
> ARMv7/v8 SoCs.
>
>     On AArch64 SoCs, some IPs may only have the capability to access
>     32 bits address space. The physical memory assigned for Dom0 maybe
>     not in 4GB address space, then the IPs will not work properly.
>     So need to allocate memory under 4GB for Dom0.
>
> IMHO that is a wrong approach. Unfortunately the problem is much bigger.
> Normally you would need to run guest domains as well. With at least PV
> Block and PV NET drivers. Due to the fact that PV drivers made in a way
> that DMA controller at last will work with DomU's pages, those pages
> should be from below 4GB.
> So any DomU running PV drivers should have some amount of pages from
> below 4GB. Moreover, the OS running in DomU should be knowing that only
> those pages are DMA-able, and that PV drivers should be working with
> DMA-able pages only: I.e. pages should be mapped correspondingly into
> different banks under and over 4GB.

 From my understanding of what you say, the problem is not because domU 
is using memory above 4GB but the fact that the backend driver does not 
take the right decision (e.g using bounce buffer when required).

The guest should be IPA agnostic and not care how the physical device is 
working when using PV drivers. So for me, this should be fixed in the 
DOM0 OS.

Regards,
Andrii Anisov Nov. 11, 2016, 2:24 p.m. UTC | #10
Hello Julien,

Please see my comments below:

> From my understanding of what you say, the problem is not because domU is using memory above 4GB but the fact that >the backend driver does not take the right decision

Yep, the problem could be treated in such a way.

> (e.g using bounce buffer when required).
I suppose unacceptable performance drop for such kind of solution.

An alternative here could be reverting of the FE-BE interaction scheme
in a following way: BE side domain provides buffers and maps them to
the FE side domain. Some time ago we estimated this approach as huge
architecture change and enormous implementation efforts. Also it does
answer to the next question:

> The guest should be IPA agnostic and not care how the physical device is working when using PV drivers. So for me,
> this should be fixed in the DOM0 OS.
Do you consider driver domain guests?

Sincerely,
Andrii Anisov.
Andrii Anisov Nov. 11, 2016, 2:39 p.m. UTC | #11
Sorry for a confusion.

The sentence:
> Also it does answer to the next question:
should be typed as:
> Also it does NOT answer to the next question:

> > The guest should be IPA agnostic and not care how the physical device is working when using PV drivers. So for me,
> > this should be fixed in the DOM0 OS.
> Do you consider driver domain guests?

Sincerely,
Andrii Anisov.

On Fri, Nov 11, 2016 at 4:24 PM, Andrii Anisov <andrii.anisov@gmail.com> wrote:
>
> Hello Julien,
>
> Please see my comments below:
>
> > From my understanding of what you say, the problem is not because domU is using memory above 4GB but the fact that >the backend driver does not take the right decision
>
> Yep, the problem could be treated in such a way.
>
> > (e.g using bounce buffer when required).
> I suppose unacceptable performance drop for such kind of solution.
>
> An alternative here could be reverting of the FE-BE interaction scheme
> in a following way: BE side domain provides buffers and maps them to
> the FE side domain. Some time ago we estimated this approach as huge
> architecture change and enormous implementation efforts. Also it does
> answer to the next question:
>
> > The guest should be IPA agnostic and not care how the physical device is working when using PV drivers. So for me,
> > this should be fixed in the DOM0 OS.
> Do you consider driver domain guests?
>
> Sincerely,
> Andrii Anisov.
Julien Grall Nov. 11, 2016, 4:02 p.m. UTC | #12
On 11/11/16 14:24, Andrii Anisov wrote:
> Hello Julien,
>
> Please see my comments below:
>
>> From my understanding of what you say, the problem is not because domU is using memory above 4GB but the fact that >the backend driver does not take the right decision
>
> Yep, the problem could be treated in such a way.
>
>> (e.g using bounce buffer when required).
> I suppose unacceptable performance drop for such kind of solution.

Could you define unacceptable performance drop? Have you tried to 
measure what would be the impact?

> An alternative here could be reverting of the FE-BE interaction scheme
> in a following way: BE side domain provides buffers and maps them to
> the FE side domain. Some time ago we estimated this approach as huge
> architecture change and enormous implementation efforts.

You could also exhaust the memory of the backend domain.

> Also it does
> answer to the next question:
>
>> The guest should be IPA agnostic and not care how the physical device is working when using PV drivers. So for me,
>> this should be fixed in the DOM0 OS.
> Do you consider driver domain guests?

The main point of driver domain is isolating a device/driver in a 
specific guest. For that you need an SMMU to secure the device, which 
would also solve the issue with 32-bit DMA-capable device.

So why would you want to do driver domain without an SMMU present?

Regards,
Stefano Stabellini Nov. 11, 2016, 7:25 p.m. UTC | #13
On Fri, 11 Nov 2016, Andrii Anisov wrote:
> Hello Julien,
> 
> Please see my comments below:
> 
> > From my understanding of what you say, the problem is not because domU is using memory above 4GB but the fact that >the backend driver does not take the right decision
> 
> Yep, the problem could be treated in such a way.

That is the solution that was adopted on x86 to solve the same problem,
see drivers/xen/swiotlb-xen.c in Linux.


> > (e.g using bounce buffer when required).
> I suppose unacceptable performance drop for such kind of solution.

I know it can be bad, depending on the class of protocols. I think that
if numbers were provided to demonstrate that bounce buffers (the swiotlb
in Linux) are too slow for a given use case, we could consider the
approach you suggested. However given that it increases complexity I
would rather avoid it unless the performance benefits are major.


> An alternative here could be reverting of the FE-BE interaction scheme
> in a following way: BE side domain provides buffers and maps them to
> the FE side domain. Some time ago we estimated this approach as huge
> architecture change and enormous implementation efforts. Also it does
> answer to the next question:

The problem with this is not much the code changes but the risk of
exhausting Dom0 memory. I think the approach you proposed previously,
explicitly giving memory below 4G to DomUs, is better.


> > The guest should be IPA agnostic and not care how the physical device is working when using PV drivers. So for me,
> > this should be fixed in the DOM0 OS.
> Do you consider driver domain guests?

Yes, they are guests, but Dom0 is a guest too. Maybe a better question
is: are driver domains unprivileged guests? Yes, they should be only
privilege enough to have control over the device assigned to them.

However without an SMMU there is no way to enforce security, because
driver domains could use the device to DMA anything they want into Dom0
or Xen memory. In practice without an SMMU driver domains are just like
Dom0.
Stefano Stabellini Nov. 11, 2016, 7:27 p.m. UTC | #14
On Fri, 11 Nov 2016, Julien Grall wrote:
> > > The guest should be IPA agnostic and not care how the physical device is
> > > working when using PV drivers. So for me,
> > > this should be fixed in the DOM0 OS.
> > Do you consider driver domain guests?
> 
> The main point of driver domain is isolating a device/driver in a specific
> guest. For that you need an SMMU to secure the device, which would also solve
> the issue with 32-bit DMA-capable device.
> 
> So why would you want to do driver domain without an SMMU present?

There are many reasons: for example because you want Dom0 to be Linux
and the storage driver domain to be FreeBSD. Or because you want the
network driver domain to be QNX.

Without an SMMU, driver domains are not about security anymore, they are
about disaggregation and componentization, but they are still a valid
choice.
Andrii Anisov Nov. 14, 2016, 8:54 a.m. UTC | #15
> Without an SMMU, driver domains are not about security anymore, they are
> about disaggregation and componentization
That is our case. And the thing we can provide to customers on chips
without SMMU.

Sincerely,
Andrii Anisov.
Andrii Anisov Nov. 14, 2016, 9:11 a.m. UTC | #16
> There are many reasons: for example because you want Dom0 to be Linux
> and the storage driver domain to be FreeBSD. Or because you want the
> network driver domain to be QNX.
What we estimate now is a thin Dom0 without any drivers running with
ramdisk. All drivers would be moved to a special guest domain.

Sincerely,
Andrii Anisov.
Andrii Anisov Nov. 14, 2016, 9:25 a.m. UTC | #17
> You could also exhaust the memory of the backend domain.

> The problem with this is not much the code changes but the risk of
> exhausting Dom0 memory. I think the approach you proposed previously,
> explicitly giving memory below 4G to DomUs, is better.

I see the point.

Sincerely,
Andrii Anisov.
Andrii Anisov Nov. 14, 2016, 9:43 a.m. UTC | #18
> Could you define unacceptable performance drop? Have you tried to measure
> what would be the impact?

> I know it can be bad, depending on the class of protocols. I think that
> if numbers were provided to demonstrate that bounce buffers (the swiotlb
> in Linux) are too slow for a given use case

Unfortunately I could not come up with exact requirements numbers.
Introducing another memcpy (what bouncing buffer approach does) for
block or network IO would not only reduce the operation performance
but also increase the overall system load.
All what we does for any of our PV driver solutions is avoiding data
copying inside FE-BE pair in order to increase performance, reduce
latency and system load.

Sincerely,
Andrii Anisov.
Julien Grall Nov. 14, 2016, 8:30 p.m. UTC | #19
Hi Andrii,

On 14/11/2016 03:11, Andrii Anisov wrote:
>> There are many reasons: for example because you want Dom0 to be Linux
>> and the storage driver domain to be FreeBSD. Or because you want the
>> network driver domain to be QNX.
> What we estimate now is a thin Dom0 without any drivers running with
> ramdisk. All drivers would be moved to a special guest domain.

You may want to give a look what has been done on x86 with the 
"Dedicated hardware domain".

Another solution, is rather than moving the devices in a separate 
domain, you move the toolstack. The latter may cause less trouble on 
platform without SMMU.

Regards,
Stefano Stabellini Nov. 14, 2016, 11:28 p.m. UTC | #20
On Mon, 14 Nov 2016, Andrii Anisov wrote:
> > Could you define unacceptable performance drop? Have you tried to measure
> > what would be the impact?
> 
> > I know it can be bad, depending on the class of protocols. I think that
> > if numbers were provided to demonstrate that bounce buffers (the swiotlb
> > in Linux) are too slow for a given use case
> 
> Unfortunately I could not come up with exact requirements numbers.
> Introducing another memcpy (what bouncing buffer approach does) for
> block or network IO would not only reduce the operation performance
> but also increase the overall system load.
> All what we does for any of our PV driver solutions is avoiding data
> copying inside FE-BE pair in order to increase performance, reduce
> latency and system load.
 
I think it might be worth running those numbers: you might be surprised
by how well a simple data copy protocol can perform, even on ARM.

For example, take a look at PVCalls which is entirely based on data
copies:

http://marc.info/?l=xen-devel&m=147639616310487 


I have already shown that it performs better than netfront/netback on
x86 in this blog post:

https://blog.xenproject.org/2016/08/30/pv-calls-a-new-paravirtualized-protocol-for-posix-syscalls/


I have just run the numbers on ARM64 (APM m400) and it is still much
faster than netfront/netback. This is what I get by running iperf -c in
a VM and iperf -s in Dom0:

        PVCalls             Netfront/Netback
-P 1    9.9 gbit/s          4.53 gbit/s
-P 2    17.4 gbit/s         5.57 gbit/s
-P 4    24.36 gbit/s        5.34 gbit/s

PVCalls is still significantly faster than Netfront/Netback.
Andrii Anisov Nov. 16, 2016, 2:49 p.m. UTC | #21
Julien,

>> What we estimate now is a thin Dom0 without any drivers running with
>> ramdisk. All drivers would be moved to a special guest domain.
>
> You may want to give a look what has been done on x86 with the "Dedicated
> hardware domain".
I have to look at the stuff.

> Another solution, is rather than moving the devices in a separate domain,
> you move the toolstack.
I see the point.
But there are number of different reasons to have a thin initial
domain. F.e. system boot time optimization which is critical for
applications we focus on.
For cases as following: a thin initial domain would start special
domain, i.e. responsible for CAN communication (with minimal needed
devices set) prior one with rich devices set, and a domain actually
running IVI (with PV drivers only) would be started finally.

> The latter may cause less trouble on platform without SMMU.
I hope we do switch to a IOMMU capable platform. But still have some
flashbacks to IOMMU-less systems.

Sincerely,
Andrii Anisov.


On Mon, Nov 14, 2016 at 10:30 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Andrii,
>
> On 14/11/2016 03:11, Andrii Anisov wrote:
>>>
>>> There are many reasons: for example because you want Dom0 to be Linux
>>> and the storage driver domain to be FreeBSD. Or because you want the
>>> network driver domain to be QNX.
>>
>> What we estimate now is a thin Dom0 without any drivers running with
>> ramdisk. All drivers would be moved to a special guest domain.
>
>
> You may want to give a look what has been done on x86 with the "Dedicated
> hardware domain".
>
> Another solution, is rather than moving the devices in a separate domain,
> you move the toolstack. The latter may cause less trouble on platform
> without SMMU.
>
> Regards,
>
> --
> Julien Grall
Andrii Anisov Nov. 16, 2016, 3:28 p.m. UTC | #22
> For example, take a look at PVCalls which is entirely based on data
> copies:
>
> http://marc.info/?l=xen-devel&m=147639616310487
>
>
> I have already shown that it performs better than netfront/netback on
> x86 in this blog post:
>
> https://blog.xenproject.org/2016/08/30/pv-calls-a-new-paravirtualized-protocol-for-posix-syscalls/
>
>
> I have just run the numbers on ARM64 (APM m400) and it is still much
> faster than netfront/netback. This is what I get by running iperf -c in
> a VM and iperf -s in Dom0:
>
>         PVCalls             Netfront/Netback
> -P 1    9.9 gbit/s          4.53 gbit/s
> -P 2    17.4 gbit/s         5.57 gbit/s
> -P 4    24.36 gbit/s        5.34 gbit/s
>
> PVCalls is still significantly faster than Netfront/Netback.
This seems to be not a really fair comparison. And does not reflect
performance impact of the data copying itself.
Among all, our team is working on PV DRM implementation now. I guess
the first implementation would have a data copying, then we will
introduce a zero-copy. So it should be a good example to collect and
share impact numbers.

In embedded applications area, we are currently focused on, acceptable
performance drop, f.e. for io operations, is estimated as 3-5%
comparing to bare-metal system.

Anyway thank you for your comments, suggestions and examples.
I've got the point that we have to have solid reasoning baked with
pack of numbers to get something specific to us accepted by community.

Sincerely,
Andrii Anisov.
Stefano Stabellini Nov. 17, 2016, 5:21 p.m. UTC | #23
On Wed, 16 Nov 2016, Andrii Anisov wrote:
> > For example, take a look at PVCalls which is entirely based on data
> > copies:
> >
> > http://marc.info/?l=xen-devel&m=147639616310487
> >
> >
> > I have already shown that it performs better than netfront/netback on
> > x86 in this blog post:
> >
> > https://blog.xenproject.org/2016/08/30/pv-calls-a-new-paravirtualized-protocol-for-posix-syscalls/
> >
> >
> > I have just run the numbers on ARM64 (APM m400) and it is still much
> > faster than netfront/netback. This is what I get by running iperf -c in
> > a VM and iperf -s in Dom0:
> >
> >         PVCalls             Netfront/Netback
> > -P 1    9.9 gbit/s          4.53 gbit/s
> > -P 2    17.4 gbit/s         5.57 gbit/s
> > -P 4    24.36 gbit/s        5.34 gbit/s
> >
> > PVCalls is still significantly faster than Netfront/Netback.
> This seems to be not a really fair comparison. And does not reflect
> performance impact of the data copying itself.

Why it is not a fair comparison? Because the design is different or
because of the settings? I am happy to adjust benchmarking settings to
make the comparison fairer.


> Among all, our team is working on PV DRM implementation now. I guess
> the first implementation would have a data copying, then we will
> introduce a zero-copy. So it should be a good example to collect and
> share impact numbers.

Looking forward to them.


> In embedded applications area, we are currently focused on, acceptable
> performance drop, f.e. for io operations, is estimated as 3-5%
> comparing to bare-metal system.

That's very interesting info and certainly a difficult mark to reach.


> Anyway thank you for your comments, suggestions and examples.
> I've got the point that we have to have solid reasoning baked with
> pack of numbers to get something specific to us accepted by community.

Cheers,

Stefano
Stefano Stabellini Nov. 18, 2016, 6 p.m. UTC | #24
On Thu, 17 Nov 2016, Stefano Stabellini wrote:
> > > I have just run the numbers on ARM64 (APM m400) and it is still much
> > > faster than netfront/netback. This is what I get by running iperf -c in
> > > a VM and iperf -s in Dom0:
> > >
> > >         PVCalls             Netfront/Netback
> > > -P 1    9.9 gbit/s          4.53 gbit/s
> > > -P 2    17.4 gbit/s         5.57 gbit/s
> > > -P 4    24.36 gbit/s        5.34 gbit/s
> > >
> > > PVCalls is still significantly faster than Netfront/Netback.
> > This seems to be not a really fair comparison. And does not reflect
> > performance impact of the data copying itself.
> 
> Why it is not a fair comparison? Because the design is different or
> because of the settings? I am happy to adjust benchmarking settings to
> make the comparison fairer.
 
Actually it turns out that Netfront/Netback use another form of copy:
grant copies. So you are right that this comparison doesn't really
reflect copy vs. mapping performance.
Andrii Anisov Nov. 21, 2016, 11:04 a.m. UTC | #25
> Why it is not a fair comparison? Because the design is different or
> because of the settings?
Because the design difference.
It's not about memcpy vs mapping within the same stack (design). And
you measured interdomain communication only, not involving hardware
interfaces.

> I am happy to adjust benchmarking settings to make the comparison fairer.
BTW, what about communication between DomU and the external network
through the hw interface?

Sincerely,
Andrii Anisov.
Stefano Stabellini Nov. 21, 2016, 5:52 p.m. UTC | #26
On Mon, 21 Nov 2016, Andrii Anisov wrote:
> > Why it is not a fair comparison? Because the design is different or
> > because of the settings?
> Because the design difference.
> It's not about memcpy vs mapping within the same stack (design). And
> you measured interdomain communication only, not involving hardware
> interfaces.
> 
> > I am happy to adjust benchmarking settings to make the comparison fairer.
> BTW, what about communication between DomU and the external network
> through the hw interface?

Unfortunately I don't have the HW to do that test at the moment, but I
would be interested in seeing the numbers.
diff mbox

Patch

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 35ab08d..6b5ac8d 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -195,9 +195,9 @@  fail:
  *    bank. Partly this is just easier for us to deal with, but also
  *    the ramdisk and DTB must be placed within a certain proximity of
  *    the kernel within RAM.
- * 3. For 32-bit dom0 we want to place as much of the RAM as we
- *    reasonably can below 4GB, so that it can be used by non-LPAE
- *    enabled kernels.
+ * 3. For dom0 we want to place as much of the RAM as we reasonably can
+ *    below 4GB, so that it can be used by non-LPAE enabled kernels (32-bit)
+ *    or when a device assigned to dom0 can only do 32-bit DMA access.
  * 4. For 32-bit dom0 the kernel must be located below 4GB.
  * 5. We want to have a few largers banks rather than many smaller ones.
  *
@@ -230,7 +230,8 @@  fail:
  * we give up.
  *
  * For 32-bit domain we require that the initial allocation for the
- * first bank is under 4G. Then for the subsequent allocations we
+ * first bank is under 4G. For 64-bit domain, the first bank is preferred
+ * to be allocated under 4G. Then for the subsequent allocations we
  * initially allocate memory only from below 4GB. Once that runs out
  * (as described above) we allow higher allocations and continue until
  * that runs out (or we have allocated sufficient dom0 memory).
@@ -244,7 +245,7 @@  static void allocate_memory(struct domain *d, struct kernel_info *kinfo)
     unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
     int i;
 
-    bool_t lowmem = is_32bit_domain(d);
+    bool_t lowmem = true;
     unsigned int bits;
 
     /*
@@ -269,20 +270,30 @@  static void allocate_memory(struct domain *d, struct kernel_info *kinfo)
         {
             pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
             if ( pg != NULL )
+            {
+                if ( !insert_11_bank(d, kinfo, pg, order) )
+                    BUG(); /* Cannot fail for first bank */
+
                 goto got_bank0;
+            }
         }
         order--;
     }
 
-    panic("Unable to allocate first memory bank");
-
- got_bank0:
+    /* Failed to allocate bank0 under 4GB */
+    if ( is_32bit_domain(d) )
+        panic("Unable to allocate first memory bank.");
 
-    if ( !insert_11_bank(d, kinfo, pg, order) )
-        BUG(); /* Cannot fail for first bank */
+    /* Try to allocate memory from above 4GB */
+    printk(XENLOG_INFO "No bank has been allocated below 4GB.\n");
+    lowmem = false;
 
-    /* Now allocate more memory and fill in additional banks */
+ got_bank0:
 
+    /*
+     * If we failed to allocate bank0 under 4GB, continue allocating
+     * memory from above 4GB and fill in banks.
+     */
     order = get_11_allocation_size(kinfo->unassigned_mem);
     while ( kinfo->unassigned_mem && kinfo->mem.nr_banks < NR_MEM_BANKS )
     {