mbox series

[00/12] Arm cache coloring

Message ID 20220826125111.152261-1-carlo.nonato@minervasys.tech (mailing list archive)
Headers show
Series Arm cache coloring | expand

Message

Carlo Nonato Aug. 26, 2022, 12:50 p.m. UTC
Shared caches in multi-core CPU architectures represent a problem for
predictability of memory access latency. This jeopardizes applicability
of many Arm platform in real-time critical and mixed-criticality
scenarios. We introduce support for cache partitioning with page
coloring, a transparent software technique that enables isolation
between domains and Xen, and thus avoids cache interference.

When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
the user to define assignments of cache partitions ids, called colors,
where assigning different colors guarantees no mutual eviction on cache
will ever happen. This instructs the Xen memory allocator to provide
the i-th color assignee only with pages that maps to color i, i.e. that
are indexed in the i-th cache partition.

The proposed implementation supports the dom0less feature.
The solution has been tested in several scenarios, including Xilinx Zynq
MPSoCs.

Overview of implementation and commits structure
------------------------------------------------

- [1-3] Coloring initialization, cache layout auto-probing and coloring
  data for domains are added.
- [4-5] xl and Device Tree support for coloring is addedd.
- [6-7] A new page allocator for domain memory that implement the cache
  coloring mechanism is introduced.
- [8-12] Coloring support is added for Xen .text region.

Changes in v2
-------------

Lot of things changed between the two versions, mainly I tried to follow
all the comments left by the maintainers after the previous version review.
Here is a brief list of the major points (even if, imho, it's easier to
repeat all the review process):

 - One of the easiest change to spot is the reduced number of patches in the
   series. A lot of problems of bad splitting of commits where present before
   (documentation only in last commits, functionalities firstly introduced
   and later used in other commits, etc).
 - Definition of LLC (Last Level Cache) as the place where coloring applies
   should be more consistent throughout all the series (documentation and
   cache layout auto-probing code).
 - Kconfig option to let configure the maximum number of cache colors.
 - Only one kind of syntax to specify color configurations.
 - Only arrays to store colors (no more need for bitmaps).
 - No more limitations on the max number of colors (previously, because of
   a static assert failure, it was limited to 64).
 - Kconfig option to let configure the buddy allocator reserved size.
 - Removed the duplicated version of setup_pagetables.
 - No more need to expose vm_alloc function as non-static.

Open points and possible problems
---------------------------------

- The way xl passes user space memory to Xen it's adapted from various 
  points of the xl code itself (e.g. xc_domain_node_setaffinity) and it
  works, but it really needs attention from expert maintainers since 
  I'm not completely sure this is the correct way of doing things.
- We still need to bring back the relocation feature (part of) in order
  to move Xen memory to a colored space where the hypervisor could be
  isolated from VMs interference (see the revert commit #10 and the
  get_xen_paddr function in #12).
- Revert commits #8 and #9 are needed because coloring has the command
  line parsing as a prerequisite for its initialization and
  setup_pagetables must be called after it in order to color the Xen
  mapping. The DTB mapping is then added to the boot page tables instead
  of the Xen ones. Probably the way this is done is a bit simplistic.
  Looking forward for comments on the subject.
- A temporary mapping of the old Xen code (old here means non-colored)
  is used to reach variables in the old physical space so that secondary
  CPUs can boot. There were some comments in the previous version on that
  because the mapping is available for all the CPUs while only CPU0 is
  the one supposed to access it. I'm not sure how to temporarily mapping
  things only for the master CPU.
- A lot of #ifdef for cache coloring are introduced because I prefer to
  define functions only if they are actually needed. Let me know if you
  prefer a different approach.
- Julien posted an RFC to address a problem with the switch_ttbr function.
  For the moment I haven't considered it since it's still a work in progress.

Acknowledgements
----------------

This work is sponsored by Xilinx Inc., and supported by University of
Modena and Reggio Emilia and Minerva Systems.


Carlo Nonato (10):
  xen/arm: add cache coloring initialization
  xen/arm: add cache coloring initialization for domains
  xen/arm: dump cache colors in domain info debug-key
  tools/xl: add support for cache coloring configuration
  xen/arm: add support for cache coloring configuration via device-tree
  xen/common: add cache coloring allocator for domains
  xen/common: add colored heap info debug-key
  Revert "xen/arm: Remove unused BOOT_RELOC_VIRT_START"
  xen/arm: add Xen cache colors command line parameter
  xen/arm: add cache coloring support for Xen

Luca Miccio (2):
  Revert "xen/arm: setup: Add Xen as boot module before printing all
    boot modules"
  Revert "xen/arm: mm: Initialize page-tables earlier"

 docs/man/xl.cfg.5.pod.in              |  10 +
 docs/misc/arm/cache-coloring.rst      | 201 ++++++++++++++
 docs/misc/arm/device-tree/booting.txt |   4 +
 docs/misc/xen-command-line.pandoc     |  45 ++++
 tools/libs/light/libxl_create.c       |  12 +
 tools/libs/light/libxl_types.idl      |   1 +
 tools/xl/xl_parse.c                   |  52 +++-
 xen/arch/arm/Kconfig                  |  28 ++
 xen/arch/arm/Makefile                 |   1 +
 xen/arch/arm/alternative.c            |   5 +
 xen/arch/arm/coloring.c               | 367 ++++++++++++++++++++++++++
 xen/arch/arm/domain.c                 |  14 +
 xen/arch/arm/domain_build.c           |  22 +-
 xen/arch/arm/include/asm/coloring.h   |  60 +++++
 xen/arch/arm/include/asm/config.h     |   4 +-
 xen/arch/arm/include/asm/domain.h     |   4 +
 xen/arch/arm/include/asm/mm.h         |  22 +-
 xen/arch/arm/include/asm/processor.h  |  16 ++
 xen/arch/arm/mm.c                     | 144 ++++++++--
 xen/arch/arm/psci.c                   |   4 +-
 xen/arch/arm/setup.c                  |  90 ++++++-
 xen/arch/arm/smpboot.c                |   3 +-
 xen/arch/arm/xen.lds.S                |   2 +-
 xen/common/page_alloc.c               | 237 ++++++++++++++++-
 xen/common/vmap.c                     |  25 ++
 xen/include/public/arch-arm.h         |   8 +
 xen/include/xen/vmap.h                |   4 +
 27 files changed, 1333 insertions(+), 52 deletions(-)
 create mode 100644 docs/misc/arm/cache-coloring.rst
 create mode 100644 xen/arch/arm/coloring.c
 create mode 100644 xen/arch/arm/include/asm/coloring.h

Comments

Julien Grall Sept. 10, 2022, 3:12 p.m. UTC | #1
Hi Carlo,

On 26/08/2022 13:50, Carlo Nonato wrote:
> Shared caches in multi-core CPU architectures represent a problem for
> predictability of memory access latency. This jeopardizes applicability
> of many Arm platform in real-time critical and mixed-criticality
> scenarios. We introduce support for cache partitioning with page
> coloring, a transparent software technique that enables isolation
> between domains and Xen, and thus avoids cache interference.
> 
> When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
> the user to define assignments of cache partitions ids, called colors,
> where assigning different colors guarantees no mutual eviction on cache
> will ever happen. This instructs the Xen memory allocator to provide
> the i-th color assignee only with pages that maps to color i, i.e. that
> are indexed in the i-th cache partition.
> 
> The proposed implementation supports the dom0less feature.
> The solution has been tested in several scenarios, including Xilinx Zynq
> MPSoCs.
> 
> Overview of implementation and commits structure
> ------------------------------------------------
> 
> - [1-3] Coloring initialization, cache layout auto-probing and coloring
>    data for domains are added.
> - [4-5] xl and Device Tree support for coloring is addedd.
> - [6-7] A new page allocator for domain memory that implement the cache
>    coloring mechanism is introduced.
> - [8-12] Coloring support is added for Xen .text region.
> 
> Changes in v2
> -------------
> 
> Lot of things changed between the two versions, mainly I tried to follow
> all the comments left by the maintainers after the previous version review.
> Here is a brief list of the major points (even if, imho, it's easier to
> repeat all the review process):

The series doesn't build on Arm64 without cache coloring. Please make 
sure to compile and check that Xen still boot on system after your 
series with cache coloring disabled.

Cheers,
Carlo Nonato Sept. 12, 2022, 1:24 p.m. UTC | #2
Hi Julien,

On Sat, Sep 10, 2022 at 5:12 PM Julien Grall <julien@xen.org> wrote:
>
> Hi Carlo,
>
> On 26/08/2022 13:50, Carlo Nonato wrote:
> > Shared caches in multi-core CPU architectures represent a problem for
> > predictability of memory access latency. This jeopardizes applicability
> > of many Arm platform in real-time critical and mixed-criticality
> > scenarios. We introduce support for cache partitioning with page
> > coloring, a transparent software technique that enables isolation
> > between domains and Xen, and thus avoids cache interference.
> >
> > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
> > the user to define assignments of cache partitions ids, called colors,
> > where assigning different colors guarantees no mutual eviction on cache
> > will ever happen. This instructs the Xen memory allocator to provide
> > the i-th color assignee only with pages that maps to color i, i.e. that
> > are indexed in the i-th cache partition.
> >
> > The proposed implementation supports the dom0less feature.
> > The solution has been tested in several scenarios, including Xilinx Zynq
> > MPSoCs.
> >
> > Overview of implementation and commits structure
> > ------------------------------------------------
> >
> > - [1-3] Coloring initialization, cache layout auto-probing and coloring
> >    data for domains are added.
> > - [4-5] xl and Device Tree support for coloring is addedd.
> > - [6-7] A new page allocator for domain memory that implement the cache
> >    coloring mechanism is introduced.
> > - [8-12] Coloring support is added for Xen .text region.
> >
> > Changes in v2
> > -------------
> >
> > Lot of things changed between the two versions, mainly I tried to follow
> > all the comments left by the maintainers after the previous version review.
> > Here is a brief list of the major points (even if, imho, it's easier to
> > repeat all the review process):
>
> The series doesn't build on Arm64 without cache coloring. Please make
> sure to compile and check that Xen still boot on system after your
> series with cache coloring disabled.

I'm sorry for that. Tested multiple times, but probably missed it after some
last minute change. The following diff fixes it.

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 00351ee014..6abe2fdef7 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -411,7 +411,7 @@ static inline void page_set_xenheap_gfn(struct
page_info *p, gfn_t gfn)
 #else
 #define virt_boot_xen(virt) virt
 #define set_value_for_secondary(var, val) \
-    var = val;
+    var = val; \
     clean_dcache(var);
 #endif

>
> Cheers,
>
> --
> Julien Grall

Thanks.

- Carlo Nonato
Jan Beulich Sept. 15, 2022, 1:29 p.m. UTC | #3
On 26.08.2022 14:50, Carlo Nonato wrote:
> Shared caches in multi-core CPU architectures represent a problem for
> predictability of memory access latency. This jeopardizes applicability
> of many Arm platform in real-time critical and mixed-criticality
> scenarios. We introduce support for cache partitioning with page
> coloring, a transparent software technique that enables isolation
> between domains and Xen, and thus avoids cache interference.
> 
> When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
> the user to define assignments of cache partitions ids, called colors,
> where assigning different colors guarantees no mutual eviction on cache
> will ever happen. This instructs the Xen memory allocator to provide
> the i-th color assignee only with pages that maps to color i, i.e. that
> are indexed in the i-th cache partition.
> 
> The proposed implementation supports the dom0less feature.
> The solution has been tested in several scenarios, including Xilinx Zynq
> MPSoCs.

Having looked at the non-Arm-specific parts of this I have one basic
question: Wouldn't it be possible to avoid the addition of entirely
new logic by treating the current model as just using a single color,
therefore merely becoming a special case of what you want?

Plus an advanced question: In how far does this interoperate with
static allocation, which again is (for now) an Arm-only feature?
Your reference to dom0less above doesn't cover this afaict.

Jan
Marco Solieri Sept. 15, 2022, 2:52 p.m. UTC | #4
Hi Jan,

On Thu, Sep 15, 2022 at 03:29:08PM +0200, Jan Beulich wrote:
> On 26.08.2022 14:50, Carlo Nonato wrote:
> > Shared caches in multi-core CPU architectures represent a problem for
> > predictability of memory access latency. This jeopardizes applicability
> > of many Arm platform in real-time critical and mixed-criticality
> > scenarios. We introduce support for cache partitioning with page
> > coloring, a transparent software technique that enables isolation
> > between domains and Xen, and thus avoids cache interference.
> > 
> > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows
> > the user to define assignments of cache partitions ids, called colors,
> > where assigning different colors guarantees no mutual eviction on cache
> > will ever happen. This instructs the Xen memory allocator to provide
> > the i-th color assignee only with pages that maps to color i, i.e. that
> > are indexed in the i-th cache partition.
> > 
> > The proposed implementation supports the dom0less feature.
> > The solution has been tested in several scenarios, including Xilinx Zynq
> > MPSoCs.
> 
> Having looked at the non-Arm-specific parts of this I have one basic
> question: Wouldn't it be possible to avoid the addition of entirely
> new logic by treating the current model as just using a single color,
> therefore merely becoming a special case of what you want?

Nice question.  Thanks!

In principle, you are quite right: monochrome is just a degenerate
choice of colouring---the colouring implementation with a single colour
allows assigning all the available pages, exactly as it happens with the
ordinary allocator.  The difference lies in the allocation algorithm.

In practice, that would be quite inefficient.  This is because the
allocation logic used by the coloured allocator is quite simpler, since
it operates with lists, instead of binary trees.  Now, upgrading the
logic of the coloured allocator would be an overkill because lowering
the complexity of insertion/removal operations from linear to
logarithmic does not change much, since in the real world, the longest
sequence of physically contiguous pages that may be assigned is
max_colours - 1.

Cheers.
Stefano Stabellini Sept. 15, 2022, 6:15 p.m. UTC | #5
On Thu, 15 Sep 2022, Jan Beulich wrote:
> Plus an advanced question: In how far does this interoperate with
> static allocation, which again is (for now) an Arm-only feature?
> Your reference to dom0less above doesn't cover this afaict.

I take you are referring to static-mem, the static memory ranges for
dom0less domUs described in docs/misc/arm/device-tree/booting.txt.

static-mem doesn't interoperate with cache coloring: each static range
would span across multiple colors. You have to choose either feature,
using both at the same time doesn't make sense.

Cheers,

Stefano
Julien Grall Oct. 22, 2022, 3:13 p.m. UTC | #6
Hi Carlo,

On 26/08/2022 13:50, Carlo Nonato wrote:
> - The way xl passes user space memory to Xen it's adapted from various
>    points of the xl code itself (e.g. xc_domain_node_setaffinity) and it
>    works, but it really needs attention from expert maintainers since
>    I'm not completely sure this is the correct way of doing things.
> - We still need to bring back the relocation feature (part of) in order
>    to move Xen memory to a colored space where the hypervisor could be
>    isolated from VMs interference (see the revert commit #10 and the
>    get_xen_paddr function in #12).
> - Revert commits #8 and #9 are needed because coloring has the command
>    line parsing as a prerequisite for its initialization and
>    setup_pagetables must be called after it in order to color the Xen
>    mapping. The DTB mapping is then added to the boot page tables instead
>    of the Xen ones. Probably the way this is done is a bit simplistic.
>    Looking forward for comments on the subject.
> - A temporary mapping of the old Xen code (old here means non-colored)
>    is used to reach variables in the old physical space so that secondary
>    CPUs can boot. There were some comments in the previous version on that
>    because the mapping is available for all the CPUs while only CPU0 is
>    the one supposed to access it. I'm not sure how to temporarily mapping
>    things only for the master CPU.

On Arm64, Xen will only use one set of page-tables for all the CPUs. So 
it will not be possible to have a temporary mapping for a single CPU. 
But what you can do is mapping the region and unmapping it when you are 
done.

That said, I would rather prefer if we can get rid of the old copy of 
Xen. This would means secondary CPUs will directly jump to the new Xen.

> - A lot of #ifdef for cache coloring are introduced because I prefer to
>    define functions only if they are actually needed. Let me know if you
>    prefer a different approach.

The preferred approach in Xen is to provide stub helpers in the #else part.

> - Julien posted an RFC to address a problem with the switch_ttbr function.
>    For the moment I haven't considered it since it's still a work in progress.

I have posted a new version for this:

https://lore.kernel.org/xen-devel/20221022150422.17707-1-julien@xen.org/

There are a couple of open questions about the interaction with cache 
coloring. Please have a look there.

Cheers,