diff mbox series

[v6,01/15] xen/common: add cache coloring common code

Message ID 20240129171811.21382-2-carlo.nonato@minervasys.tech (mailing list archive)
State Superseded
Headers show
Series Arm cache coloring | expand

Commit Message

Carlo Nonato Jan. 29, 2024, 5:17 p.m. UTC
Last Level Cache (LLC) coloring allows to partition the cache in smaller
chunks called cache colors. Since not all architectures can actually
implement it, add a HAS_LLC_COLORING Kconfig and put other options under
xen/arch.

LLC colors are a property of the domain, so the domain struct has to be
extended.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
---
v6:
- moved almost all code in common
- moved documentation in this patch
- reintroduced range for CONFIG_NR_LLC_COLORS
- reintroduced some stub functions to reduce the number of checks on
  llc_coloring_enabled
- moved domain_llc_coloring_free() in same patch where allocation happens
- turned "d->llc_colors" to pointer-to-const
- llc_coloring_init() now returns void and panics if errors are found
v5:
- used - instead of _ for filenames
- removed domain_create_llc_colored()
- removed stub functions
- coloring domain fields are now #ifdef protected
v4:
- Kconfig options moved to xen/arch
- removed range for CONFIG_NR_LLC_COLORS
- added "llc_coloring_enabled" global to later implement the boot-time
  switch
- added domain_create_llc_colored() to be able to pass colors
- added is_domain_llc_colored() macro
---
 docs/misc/cache-coloring.rst      | 87 +++++++++++++++++++++++++++++++
 docs/misc/xen-command-line.pandoc | 27 ++++++++++
 xen/arch/Kconfig                  | 17 ++++++
 xen/common/Kconfig                |  3 ++
 xen/common/Makefile               |  1 +
 xen/common/keyhandler.c           |  3 ++
 xen/common/llc-coloring.c         | 87 +++++++++++++++++++++++++++++++
 xen/include/xen/llc-coloring.h    | 38 ++++++++++++++
 xen/include/xen/sched.h           |  5 ++
 9 files changed, 268 insertions(+)
 create mode 100644 docs/misc/cache-coloring.rst
 create mode 100644 xen/common/llc-coloring.c
 create mode 100644 xen/include/xen/llc-coloring.h

Comments

Jan Beulich Jan. 31, 2024, 3:57 p.m. UTC | #1
On 29.01.2024 18:17, Carlo Nonato wrote:
> Last Level Cache (LLC) coloring allows to partition the cache in smaller
> chunks called cache colors. Since not all architectures can actually
> implement it, add a HAS_LLC_COLORING Kconfig and put other options under
> xen/arch.
> 
> LLC colors are a property of the domain, so the domain struct has to be
> extended.
> 
> Based on original work from: Luca Miccio <lucmiccio@gmail.com>
> 
> Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
> Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
> ---
> v6:
> - moved almost all code in common
> - moved documentation in this patch
> - reintroduced range for CONFIG_NR_LLC_COLORS
> - reintroduced some stub functions to reduce the number of checks on
>   llc_coloring_enabled
> - moved domain_llc_coloring_free() in same patch where allocation happens
> - turned "d->llc_colors" to pointer-to-const
> - llc_coloring_init() now returns void and panics if errors are found
> v5:
> - used - instead of _ for filenames
> - removed domain_create_llc_colored()
> - removed stub functions
> - coloring domain fields are now #ifdef protected
> v4:
> - Kconfig options moved to xen/arch
> - removed range for CONFIG_NR_LLC_COLORS
> - added "llc_coloring_enabled" global to later implement the boot-time
>   switch
> - added domain_create_llc_colored() to be able to pass colors
> - added is_domain_llc_colored() macro
> ---
>  docs/misc/cache-coloring.rst      | 87 +++++++++++++++++++++++++++++++
>  docs/misc/xen-command-line.pandoc | 27 ++++++++++
>  xen/arch/Kconfig                  | 17 ++++++
>  xen/common/Kconfig                |  3 ++
>  xen/common/Makefile               |  1 +
>  xen/common/keyhandler.c           |  3 ++
>  xen/common/llc-coloring.c         | 87 +++++++++++++++++++++++++++++++
>  xen/include/xen/llc-coloring.h    | 38 ++++++++++++++
>  xen/include/xen/sched.h           |  5 ++
>  9 files changed, 268 insertions(+)
>  create mode 100644 docs/misc/cache-coloring.rst
>  create mode 100644 xen/common/llc-coloring.c
>  create mode 100644 xen/include/xen/llc-coloring.h
> 
> diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
> new file mode 100644
> index 0000000000..9fe01e99e1
> --- /dev/null
> +++ b/docs/misc/cache-coloring.rst
> @@ -0,0 +1,87 @@
> +Xen cache coloring user guide
> +=============================
> +
> +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
> +
> +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> +
> +If needed, change the maximum number of colors with
> +``CONFIG_NR_LLC_COLORS=<n>``.
> +
> +Compile Xen and the toolstack and then configure it via
> +`Command line parameters`_.
> +
> +Background
> +**********
> +
> +Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
> +to each core (hence using multiple cache units), while the last level is shared
> +among all of them. Such configuration implies that memory operations on one
> +core (e.g. running a DomU) are able to generate interference on another core
> +(e.g .hosting another DomU). Cache coloring allows eliminating this
> +mutual interference, and thus guaranteeing higher and more predictable
> +performances for memory accesses.
> +The key concept underlying cache coloring is a fragmentation of the memory
> +space into a set of sub-spaces called colors that are mapped to disjoint cache
> +partitions. Technically, the whole memory space is first divided into a number
> +of subsequent regions. Then each region is in turn divided into a number of
> +subsequent sub-colors. The generic i-th color is then obtained by all the
> +i-th sub-colors in each region.
> +
> +::
> +
> +                            Region j            Region j+1
> +                .....................   ............
> +                .                     . .
> +                .                       .
> +            _ _ _______________ _ _____________________ _ _
> +                |     |     |     |     |     |     |
> +                | c_0 | c_1 |     | c_n | c_0 | c_1 |
> +           _ _ _|_____|_____|_ _ _|_____|_____|_____|_ _ _
> +                    :                       :
> +                    :                       :...         ... .
> +                    :                            color 0
> +                    :...........................         ... .
> +                                                :
> +          . . ..................................:
> +
> +There are two pragmatic lesson to be learnt.
> +
> +1. If one wants to avoid cache interference between two domains, different
> +   colors needs to be used for their memory.
> +
> +2. Color assignment must privilege contiguity in the partitioning. E.g.,
> +   assigning colors (0,1) to domain I  and (2,3) to domain  J is better than
> +   assigning colors (0,2) to I and (1,3) to J.

I can't connect this 2nd point with any of what was said above.

> +How to compute the number of colors
> +***********************************
> +
> +To compute the number of available colors for a specific platform, the size of
> +an LLC way and the page size used by Xen must be known. The first parameter can
> +be found in the processor manual or can be also computed dividing the total
> +cache size by the number of its ways. The second parameter is the minimum
> +amount of memory that can be mapped by the hypervisor,

I find "amount of memory that can be mapped" quite confusing here. Don't you
really mean the granularity at which memory can be mapped?

> thus dividing the way
> +size by the page size, the number of total cache partitions is found. So for
> +example, an Arm Cortex-A53 with a 16-ways associative 1 MiB LLC, can isolate up
> +to 16 colors when pages are 4 KiB in size.

I guess it's a matter of what one's use to, but to me talking of "way size"
and how the calculation is described is, well, unusual. What I would start
from is the smallest entity, i.e. a cache line. Then it would be relevant
to describe how, after removing the low so many bits to cover for cache line
size, the remaining address bits are used to map to a particular set. It
looks to me as if you're assuming that this mapping is linear, using the
next so many bits from the address. Afaik this isn't true on various modern
CPUs; instead hash functions are used. Without knowing at least certain
properties of such a hash function, I'm afraid your mapping from address to
color isn't necessarily guaranteeing the promised isolation. The guarantee
may hold for processors you specifically target, but then I think in this
description it would help if you would fully spell out any assumptions you
make on how hardware maps addresses to elements of the cache.

> +Cache layout is probed automatically by looking at the CLIDR_EL1 arm register.
> +This means that other system caches that aren't visible there, are ignored.
> +The possibility of manually setting the way size is left to the user to overcome
> +failing situations or for debugging/testing purposes. See
> +`Command line parameters`_ for more information on that.
> +
> +Command line parameters
> +***********************
> +
> +More specific documentation is available at `docs/misc/xen-command-line.pandoc`.
> +
> ++----------------------+-------------------------------+
> +| **Parameter**        | **Description**               |
> ++----------------------+-------------------------------+
> +| ``llc-coloring``     | enable coloring at runtime    |
> ++----------------------+-------------------------------+
> +| ``llc-way-size``     | set the LLC way size          |
> ++----------------------+-------------------------------+

As a result of the above, I also find it confusing to specify "way size"
as a command line option. Cache size, number of ways, and cache line size
would seem more natural to me.

I'll get to looking nat the actual code later.

Jan
Jan Beulich Feb. 1, 2024, 12:14 p.m. UTC | #2
On 31.01.2024 16:57, Jan Beulich wrote:
> On 29.01.2024 18:17, Carlo Nonato wrote:
>> +Command line parameters
>> +***********************
>> +
>> +More specific documentation is available at `docs/misc/xen-command-line.pandoc`.
>> +
>> ++----------------------+-------------------------------+
>> +| **Parameter**        | **Description**               |
>> ++----------------------+-------------------------------+
>> +| ``llc-coloring``     | enable coloring at runtime    |
>> ++----------------------+-------------------------------+
>> +| ``llc-way-size``     | set the LLC way size          |
>> ++----------------------+-------------------------------+
> 
> As a result of the above, I also find it confusing to specify "way size"
> as a command line option. Cache size, number of ways, and cache line size
> would seem more natural to me.

Or, alternatively, have the number of colors be specifiable directly.

Jan
Jan Beulich Feb. 1, 2024, 12:18 p.m. UTC | #3
On 29.01.2024 18:17, Carlo Nonato wrote:
> --- /dev/null
> +++ b/docs/misc/cache-coloring.rst
> @@ -0,0 +1,87 @@
> +Xen cache coloring user guide
> +=============================
> +
> +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
> +
> +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> +
> +If needed, change the maximum number of colors with
> +``CONFIG_NR_LLC_COLORS=<n>``.
> +
> +Compile Xen and the toolstack and then configure it via
> +`Command line parameters`_.
> +
> +Background
> +**********
> +
> +Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
> +to each core (hence using multiple cache units), while the last level is shared
> +among all of them. Such configuration implies that memory operations on one
> +core (e.g. running a DomU) are able to generate interference on another core
> +(e.g .hosting another DomU). Cache coloring allows eliminating this
> +mutual interference, and thus guaranteeing higher and more predictable
> +performances for memory accesses.

Since you say "eliminating" - what about shared mid-level caches? What about
shared TLBs?

Jan
Jan Beulich Feb. 1, 2024, 12:59 p.m. UTC | #4
On 29.01.2024 18:17, Carlo Nonato wrote:
> --- a/xen/arch/Kconfig
> +++ b/xen/arch/Kconfig
> @@ -31,3 +31,20 @@ config NR_NUMA_NODES
>  	  associated with multiple-nodes management. It is the upper bound of
>  	  the number of NUMA nodes that the scheduler, memory allocation and
>  	  other NUMA-aware components can handle.
> +
> +config LLC_COLORING
> +	bool "Last Level Cache (LLC) coloring" if EXPERT
> +	depends on HAS_LLC_COLORING
> +
> +config NR_LLC_COLORS
> +	int "Maximum number of LLC colors"
> +	range 2 1024

What's the reasoning behind this upper bound? IOW - can something to this
effect be said in the description, please?

> +	default 128
> +	depends on LLC_COLORING
> +	help
> +	  Controls the build-time size of various arrays associated with LLC
> +	  coloring. Refer to cache coloring documentation for how to compute the
> +	  number of colors supported by the platform. This is only an upper
> +	  bound. The runtime value is autocomputed or manually set via cmdline.
> +	  The default value corresponds to an 8 MiB 16-ways LLC, which should be
> +	  more than what needed in the general case.

Aiui while not outright wrong, non-power-of-2 values are meaningless to
specify. Perhaps that is worth mentioning (if not making this a value
that's used as exponent of 2 in the first place)?

As to the default and its description: As said for the documentation,
doesn't what this corresponds to also depend on cache line size? Even
if this was still Arm-specific rather than common code, I'd question
whether now and forever Arm chips may only use one pre-determined cache
line size.

> --- /dev/null
> +++ b/xen/common/llc-coloring.c
> @@ -0,0 +1,87 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Last Level Cache (LLC) coloring common code
> + *
> + * Copyright (C) 2022 Xilinx Inc.
> + */
> +#include <xen/keyhandler.h>
> +#include <xen/llc-coloring.h>
> +#include <xen/param.h>
> +
> +bool __ro_after_init llc_coloring_enabled;
> +boolean_param("llc-coloring", llc_coloring_enabled);

The variable has no use right now afaics, so it's unclear whether (a) it
is legitimately non-static and (b) placed in an appropriate section.

> +/* Size of an LLC way */
> +static unsigned int __ro_after_init llc_way_size;
> +size_param("llc-way-size", llc_way_size);
> +/* Number of colors available in the LLC */
> +static unsigned int __ro_after_init max_nr_colors = CONFIG_NR_LLC_COLORS;
> +
> +static void print_colors(const unsigned int *colors, unsigned int num_colors)
> +{
> +    unsigned int i;
> +
> +    printk("{ ");
> +    for ( i = 0; i < num_colors; i++ ) {

Nit (style): Brace placement.

> +        unsigned int start = colors[i], end = colors[i];
> +
> +        printk("%u", start);
> +
> +        for ( ;
> +              i < num_colors - 1 && colors[i] + 1 == colors[i + 1];

To reduce the number of array accesses, may I suggest to use "end + 1"
here instead of "colors[i] + 1"? (The initializer of "end" could also
be "start", but I guess the compiler will recognize this anyway.) This
would then (imo) also better justify the desire for having "end" in
the first place.

> +              i++, end++ );

Imo for clarity the semicolon want to live on its own line.

> +static void dump_coloring_info(unsigned char key)

This being common code now, I think it would be good practice to have
cf_check here right away, even if for now (for whatever reason) the
feature is meant to be limited to Arm. (Albeit see below for whether
this is to remain that way.)

> +void __init llc_coloring_init(void)
> +{
> +    if ( !llc_way_size && !(llc_way_size = get_llc_way_size()) )
> +        panic("Probed LLC coloring way size is 0 and no custom value found\n");
> +
> +    /*
> +     * The maximum number of colors must be a power of 2 in order to correctly
> +     * map them to bits of an address, so also the LLC way size must be so.
> +     */
> +    if ( llc_way_size & (llc_way_size - 1) )
> +        panic("LLC coloring way size (%u) isn't a power of 2\n", llc_way_size);
> +
> +    max_nr_colors = llc_way_size >> PAGE_SHIFT;

With this unconditionally initialized here, what's the purpose of the
variable's initializer?

> +    if ( max_nr_colors < 2 || max_nr_colors > CONFIG_NR_LLC_COLORS )
> +        panic("Number of LLC colors (%u) not in range [2, %u]\n",
> +              max_nr_colors, CONFIG_NR_LLC_COLORS);

I'm not convinced of panic()ing here (including the earlier two
instances). You could warn, taint, disable, and continue. If you want
to stick to panic(), please justify doing so in the description.

Plus, if you panic(), shouldn't that be limited to llc_coloring_enabled
being true? Or - not visible here, due to the lack of a caller of the
function - is that meant to be taken care of by the caller (to not call
here when the flag is off)? I think it would be cleaner if the check
lived here; quite possibly that would then further permit the flag
variable to become static.

> +    register_keyhandler('K', dump_coloring_info, "dump LLC coloring info", 1);

I'm also not convinced of using a separate key for this little bit of
information. How about attaching this to what 'm' or 'H' produce?

> +    arch_llc_coloring_init();
> +}
> +
> +void domain_dump_llc_colors(const struct domain *d)
> +{
> +    printk("Domain %pd has %u LLC colors: ", d, d->num_llc_colors);

%pd resolves to d<N> - why "Domain" as a prefix? And really - why the
domain identifier in the first place? All surrounding information is
already for this very domain.

> +    print_colors(d->llc_colors, d->num_llc_colors);

Imo this (or perhaps even the entire function) wants skipping when
num_llc_colors is zero, which would in particular also cover the
!llc_coloring_enabled case.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -626,6 +626,11 @@ struct domain
>  
>      /* Holding CDF_* constant. Internal flags for domain creation. */
>      unsigned int cdf;
> +
> +#ifdef CONFIG_LLC_COLORING
> +    unsigned const int *llc_colors;

const unsigned int * please.

Jan
Jan Beulich Feb. 1, 2024, 1:32 p.m. UTC | #5
On 29.01.2024 18:17, Carlo Nonato wrote:
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -626,6 +626,11 @@ struct domain
>  
>      /* Holding CDF_* constant. Internal flags for domain creation. */
>      unsigned int cdf;
> +
> +#ifdef CONFIG_LLC_COLORING
> +    unsigned const int *llc_colors;
> +    unsigned int num_llc_colors;
> +#endif
>  };

Btw, at this point flipping the order of the two fields will be more
efficient for 64-bit architectures (consuming a padding hole rather
than adding yet another one).

Jan
Carlo Nonato Feb. 3, 2024, 10:57 a.m. UTC | #6
Hi Jan,

On Wed, Jan 31, 2024 at 4:57 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 29.01.2024 18:17, Carlo Nonato wrote:
> > Last Level Cache (LLC) coloring allows to partition the cache in smaller
> > chunks called cache colors. Since not all architectures can actually
> > implement it, add a HAS_LLC_COLORING Kconfig and put other options under
> > xen/arch.
> >
> > LLC colors are a property of the domain, so the domain struct has to be
> > extended.
> >
> > Based on original work from: Luca Miccio <lucmiccio@gmail.com>
> >
> > Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
> > Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
> > ---
> > v6:
> > - moved almost all code in common
> > - moved documentation in this patch
> > - reintroduced range for CONFIG_NR_LLC_COLORS
> > - reintroduced some stub functions to reduce the number of checks on
> >   llc_coloring_enabled
> > - moved domain_llc_coloring_free() in same patch where allocation happens
> > - turned "d->llc_colors" to pointer-to-const
> > - llc_coloring_init() now returns void and panics if errors are found
> > v5:
> > - used - instead of _ for filenames
> > - removed domain_create_llc_colored()
> > - removed stub functions
> > - coloring domain fields are now #ifdef protected
> > v4:
> > - Kconfig options moved to xen/arch
> > - removed range for CONFIG_NR_LLC_COLORS
> > - added "llc_coloring_enabled" global to later implement the boot-time
> >   switch
> > - added domain_create_llc_colored() to be able to pass colors
> > - added is_domain_llc_colored() macro
> > ---
> >  docs/misc/cache-coloring.rst      | 87 +++++++++++++++++++++++++++++++
> >  docs/misc/xen-command-line.pandoc | 27 ++++++++++
> >  xen/arch/Kconfig                  | 17 ++++++
> >  xen/common/Kconfig                |  3 ++
> >  xen/common/Makefile               |  1 +
> >  xen/common/keyhandler.c           |  3 ++
> >  xen/common/llc-coloring.c         | 87 +++++++++++++++++++++++++++++++
> >  xen/include/xen/llc-coloring.h    | 38 ++++++++++++++
> >  xen/include/xen/sched.h           |  5 ++
> >  9 files changed, 268 insertions(+)
> >  create mode 100644 docs/misc/cache-coloring.rst
> >  create mode 100644 xen/common/llc-coloring.c
> >  create mode 100644 xen/include/xen/llc-coloring.h
> >
> > diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
> > new file mode 100644
> > index 0000000000..9fe01e99e1
> > --- /dev/null
> > +++ b/docs/misc/cache-coloring.rst
> > @@ -0,0 +1,87 @@
> > +Xen cache coloring user guide
> > +=============================
> > +
> > +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> > +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
> > +
> > +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> > +
> > +If needed, change the maximum number of colors with
> > +``CONFIG_NR_LLC_COLORS=<n>``.
> > +
> > +Compile Xen and the toolstack and then configure it via
> > +`Command line parameters`_.
> > +
> > +Background
> > +**********
> > +
> > +Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
> > +to each core (hence using multiple cache units), while the last level is shared
> > +among all of them. Such configuration implies that memory operations on one
> > +core (e.g. running a DomU) are able to generate interference on another core
> > +(e.g .hosting another DomU). Cache coloring allows eliminating this
> > +mutual interference, and thus guaranteeing higher and more predictable
> > +performances for memory accesses.
> > +The key concept underlying cache coloring is a fragmentation of the memory
> > +space into a set of sub-spaces called colors that are mapped to disjoint cache
> > +partitions. Technically, the whole memory space is first divided into a number
> > +of subsequent regions. Then each region is in turn divided into a number of
> > +subsequent sub-colors. The generic i-th color is then obtained by all the
> > +i-th sub-colors in each region.
> > +
> > +::
> > +
> > +                            Region j            Region j+1
> > +                .....................   ............
> > +                .                     . .
> > +                .                       .
> > +            _ _ _______________ _ _____________________ _ _
> > +                |     |     |     |     |     |     |
> > +                | c_0 | c_1 |     | c_n | c_0 | c_1 |
> > +           _ _ _|_____|_____|_ _ _|_____|_____|_____|_ _ _
> > +                    :                       :
> > +                    :                       :...         ... .
> > +                    :                            color 0
> > +                    :...........................         ... .
> > +                                                :
> > +          . . ..................................:
> > +
> > +There are two pragmatic lesson to be learnt.
> > +
> > +1. If one wants to avoid cache interference between two domains, different
> > +   colors needs to be used for their memory.
> > +
> > +2. Color assignment must privilege contiguity in the partitioning. E.g.,
> > +   assigning colors (0,1) to domain I  and (2,3) to domain  J is better than
> > +   assigning colors (0,2) to I and (1,3) to J.
>
> I can't connect this 2nd point with any of what was said above.

If colors are contiguous then a greater spatial locality is achievable. You
mean we should better explain this?

> > +How to compute the number of colors
> > +***********************************
> > +
> > +To compute the number of available colors for a specific platform, the size of
> > +an LLC way and the page size used by Xen must be known. The first parameter can
> > +be found in the processor manual or can be also computed dividing the total
> > +cache size by the number of its ways. The second parameter is the minimum
> > +amount of memory that can be mapped by the hypervisor,
>
> I find "amount of memory that can be mapped" quite confusing here. Don't you
> really mean the granularity at which memory can be mapped?

Yes that's what I wanted to describe. I'll change it.

> > thus dividing the way
> > +size by the page size, the number of total cache partitions is found. So for
> > +example, an Arm Cortex-A53 with a 16-ways associative 1 MiB LLC, can isolate up
> > +to 16 colors when pages are 4 KiB in size.
>
> I guess it's a matter of what one's use to, but to me talking of "way size"
> and how the calculation is described is, well, unusual. What I would start
> from is the smallest entity, i.e. a cache line. Then it would be relevant
> to describe how, after removing the low so many bits to cover for cache line
> size, the remaining address bits are used to map to a particular set. It
> looks to me as if you're assuming that this mapping is linear, using the
> next so many bits from the address. Afaik this isn't true on various modern
> CPUs; instead hash functions are used. Without knowing at least certain
> properties of such a hash function, I'm afraid your mapping from address to
> color isn't necessarily guaranteeing the promised isolation. The guarantee
> may hold for processors you specifically target, but then I think in this
> description it would help if you would fully spell out any assumptions you
> make on how hardware maps addresses to elements of the cache.

You're right, we are assuming a linear mapping. We are going to review and
extend the documentation in order to fully specify when coloring can be
applied.

About the "way size" it's a way of summarizing all the parameters into one.
We could ask for different cache parameters as you said, but in the end what
we are interested in is how many partitions is the cache capable of isolate
and how big they are. The answer is, in theory, as many partitions as the
number of sets, each one as big as a cache line, bacause we can't have
isolation inside a set.
Then memory mapping comes into place and the minimum granularity at which
mapping can happen actually lowers the number of partitions.
To recap we can isolate:
    nr_sets * line_size / page_size
Then we simply named:
    way_size = nr_sets * line_size
Another way of computing it:
    way_size = cache_size / nr_ways

We are ok with having two parameters: cache_size and nr_ways which are even
easier and intuitive to find for a normal user.

> Or, alternatively, have the number of colors be specifiable directly.

This is to be avoided in my opinion since it's more difficult to compute and
it requires more knowledge.

Thanks.

>
> > +Cache layout is probed automatically by looking at the CLIDR_EL1 arm register.
> > +This means that other system caches that aren't visible there, are ignored.
> > +The possibility of manually setting the way size is left to the user to overcome
> > +failing situations or for debugging/testing purposes. See
> > +`Command line parameters`_ for more information on that.
> > +
> > +Command line parameters
> > +***********************
> > +
> > +More specific documentation is available at `docs/misc/xen-command-line.pandoc`.
> > +
> > ++----------------------+-------------------------------+
> > +| **Parameter**        | **Description**               |
> > ++----------------------+-------------------------------+
> > +| ``llc-coloring``     | enable coloring at runtime    |
> > ++----------------------+-------------------------------+
> > +| ``llc-way-size``     | set the LLC way size          |
> > ++----------------------+-------------------------------+
>
> As a result of the above, I also find it confusing to specify "way size"
> as a command line option. Cache size, number of ways, and cache line size
> would seem more natural to me.
>
> I'll get to looking nat the actual code later.
>
> Jan
Carlo Nonato Feb. 3, 2024, 10:57 a.m. UTC | #7
Hi Jan,

On Thu, Feb 1, 2024 at 1:59 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 29.01.2024 18:17, Carlo Nonato wrote:
> > --- a/xen/arch/Kconfig
> > +++ b/xen/arch/Kconfig
> > @@ -31,3 +31,20 @@ config NR_NUMA_NODES
> >         associated with multiple-nodes management. It is the upper bound of
> >         the number of NUMA nodes that the scheduler, memory allocation and
> >         other NUMA-aware components can handle.
> > +
> > +config LLC_COLORING
> > +     bool "Last Level Cache (LLC) coloring" if EXPERT
> > +     depends on HAS_LLC_COLORING
> > +
> > +config NR_LLC_COLORS
> > +     int "Maximum number of LLC colors"
> > +     range 2 1024
>
> What's the reasoning behind this upper bound? IOW - can something to this
> effect be said in the description, please?

The only reason is that this is the number of colors that fit in a 4 KiB page.
I don't have any other good way of picking a number here. 1024 is already big
and probably nobody would use such a configuration. But 512 or 256 would be
equally arbitrary.

> > +     default 128
> > +     depends on LLC_COLORING
> > +     help
> > +       Controls the build-time size of various arrays associated with LLC
> > +       coloring. Refer to cache coloring documentation for how to compute the
> > +       number of colors supported by the platform. This is only an upper
> > +       bound. The runtime value is autocomputed or manually set via cmdline.
> > +       The default value corresponds to an 8 MiB 16-ways LLC, which should be
> > +       more than what needed in the general case.
>
> Aiui while not outright wrong, non-power-of-2 values are meaningless to
> specify. Perhaps that is worth mentioning (if not making this a value
> that's used as exponent of 2 in the first place)?

Yes, I prefer a better help message.

> As to the default and its description: As said for the documentation,
> doesn't what this corresponds to also depend on cache line size? Even
> if this was still Arm-specific rather than common code, I'd question
> whether now and forever Arm chips may only use one pre-determined cache
> line size.

I hope I answered in the previous mail why the line size (in the specific case
we are applying coloring to) can be ignored as a parameter in favor of cache
size and number of ways.

> > --- /dev/null
> > +++ b/xen/common/llc-coloring.c
> > @@ -0,0 +1,87 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Last Level Cache (LLC) coloring common code
> > + *
> > + * Copyright (C) 2022 Xilinx Inc.
> > + */
> > +#include <xen/keyhandler.h>
> > +#include <xen/llc-coloring.h>
> > +#include <xen/param.h>
> > +
> > +bool __ro_after_init llc_coloring_enabled;
> > +boolean_param("llc-coloring", llc_coloring_enabled);
>
> The variable has no use right now afaics, so it's unclear whether (a) it
> is legitimately non-static and (b) placed in an appropriate section.

My bad here. The variable should be tested for in llc_coloring_init() and in
domain_dump_llc_colors() (in domain_llc_coloring_free() as well, in later
patches). That change was lost in the rebase of the series.

Anyway per this patch, the global is only accessed from this file while it's
going to be accessed from outside in later patches. In this case what should
I do? Declare it static and then make it non-static afterwards?

> > +/* Size of an LLC way */
> > +static unsigned int __ro_after_init llc_way_size;
> > +size_param("llc-way-size", llc_way_size);
> > +/* Number of colors available in the LLC */
> > +static unsigned int __ro_after_init max_nr_colors = CONFIG_NR_LLC_COLORS;
> > +
> > +static void print_colors(const unsigned int *colors, unsigned int num_colors)
> > +{
> > +    unsigned int i;
> > +
> > +    printk("{ ");
> > +    for ( i = 0; i < num_colors; i++ ) {
>
> Nit (style): Brace placement.
>
> > +        unsigned int start = colors[i], end = colors[i];
> > +
> > +        printk("%u", start);
> > +
> > +        for ( ;
> > +              i < num_colors - 1 && colors[i] + 1 == colors[i + 1];
>
> To reduce the number of array accesses, may I suggest to use "end + 1"
> here instead of "colors[i] + 1"? (The initializer of "end" could also
> be "start", but I guess the compiler will recognize this anyway.) This
> would then (imo) also better justify the desire for having "end" in
> the first place.
>
> > +              i++, end++ );
>
> Imo for clarity the semicolon want to live on its own line.
>
> > +static void dump_coloring_info(unsigned char key)
>
> This being common code now, I think it would be good practice to have
> cf_check here right away, even if for now (for whatever reason) the
> feature is meant to be limited to Arm. (Albeit see below for whether
> this is to remain that way.)
>
> > +void __init llc_coloring_init(void)
> > +{
> > +    if ( !llc_way_size && !(llc_way_size = get_llc_way_size()) )
> > +        panic("Probed LLC coloring way size is 0 and no custom value found\n");
> > +
> > +    /*
> > +     * The maximum number of colors must be a power of 2 in order to correctly
> > +     * map them to bits of an address, so also the LLC way size must be so.
> > +     */
> > +    if ( llc_way_size & (llc_way_size - 1) )
> > +        panic("LLC coloring way size (%u) isn't a power of 2\n", llc_way_size);
> > +
> > +    max_nr_colors = llc_way_size >> PAGE_SHIFT;
>
> With this unconditionally initialized here, what's the purpose of the
> variable's initializer?

Previously I was using the global in parse_color_config() (later introduced),
but since now I'm not doing it anymore I can drop the initializer.

> > +    if ( max_nr_colors < 2 || max_nr_colors > CONFIG_NR_LLC_COLORS )
> > +        panic("Number of LLC colors (%u) not in range [2, %u]\n",
> > +              max_nr_colors, CONFIG_NR_LLC_COLORS);
>
> I'm not convinced of panic()ing here (including the earlier two
> instances). You could warn, taint, disable, and continue. If you want
> to stick to panic(), please justify doing so in the description.
>
> Plus, if you panic(), shouldn't that be limited to llc_coloring_enabled
> being true? Or - not visible here, due to the lack of a caller of the
> function - is that meant to be taken care of by the caller (to not call
> here when the flag is off)? I think it would be cleaner if the check
> lived here; quite possibly that would then further permit the flag
> variable to become static.

You're right. As I said here the check on llc_coloring_enabled is missing.
Obviously it's an error doing the initialization no matter what.

> > +    register_keyhandler('K', dump_coloring_info, "dump LLC coloring info", 1);
>
> I'm also not convinced of using a separate key for this little bit of
> information. How about attaching this to what 'm' or 'H' produce?

Ok. 'm' seems the right place.

> > +    arch_llc_coloring_init();
> > +}
> > +
> > +void domain_dump_llc_colors(const struct domain *d)
> > +{
> > +    printk("Domain %pd has %u LLC colors: ", d, d->num_llc_colors);
>
> %pd resolves to d<N> - why "Domain" as a prefix? And really - why the
> domain identifier in the first place? All surrounding information is
> already for this very domain.
>
> > +    print_colors(d->llc_colors, d->num_llc_colors);
>
> Imo this (or perhaps even the entire function) wants skipping when
> num_llc_colors is zero, which would in particular also cover the
> !llc_coloring_enabled case.

This shouldn't be possible. As I said this function should be a no-op when
!llc_coloring_enabled.

Thanks.

> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -626,6 +626,11 @@ struct domain
> >
> >      /* Holding CDF_* constant. Internal flags for domain creation. */
> >      unsigned int cdf;
> > +
> > +#ifdef CONFIG_LLC_COLORING
> > +    unsigned const int *llc_colors;
>
> const unsigned int * please.
>
> Jan
Carlo Nonato Feb. 3, 2024, 11:31 a.m. UTC | #8
Hi Jan,

On Thu, Feb 1, 2024 at 1:18 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 29.01.2024 18:17, Carlo Nonato wrote:
> > --- /dev/null
> > +++ b/docs/misc/cache-coloring.rst
> > @@ -0,0 +1,87 @@
> > +Xen cache coloring user guide
> > +=============================
> > +
> > +The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
> > +partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
> > +
> > +To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
> > +
> > +If needed, change the maximum number of colors with
> > +``CONFIG_NR_LLC_COLORS=<n>``.
> > +
> > +Compile Xen and the toolstack and then configure it via
> > +`Command line parameters`_.
> > +
> > +Background
> > +**********
> > +
> > +Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
> > +to each core (hence using multiple cache units), while the last level is shared
> > +among all of them. Such configuration implies that memory operations on one
> > +core (e.g. running a DomU) are able to generate interference on another core
> > +(e.g .hosting another DomU). Cache coloring allows eliminating this
> > +mutual interference, and thus guaranteeing higher and more predictable
> > +performances for memory accesses.
>
> Since you say "eliminating" - what about shared mid-level caches? What about
> shared TLBs?

Cache coloring can help in reducing the interference, but you're right and
there are other factors to be considered. We will update the documentation to
better specify the applicability range and relax the terminology concerning
"eliminating" etc.

Thanks

> Jan
Jan Beulich Feb. 5, 2024, 9:21 a.m. UTC | #9
On 03.02.2024 11:57, Carlo Nonato wrote:
> On Wed, Jan 31, 2024 at 4:57 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 29.01.2024 18:17, Carlo Nonato wrote:
>>> +Background
>>> +**********
>>> +
>>> +Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
>>> +to each core (hence using multiple cache units), while the last level is shared
>>> +among all of them. Such configuration implies that memory operations on one
>>> +core (e.g. running a DomU) are able to generate interference on another core
>>> +(e.g .hosting another DomU). Cache coloring allows eliminating this
>>> +mutual interference, and thus guaranteeing higher and more predictable
>>> +performances for memory accesses.
>>> +The key concept underlying cache coloring is a fragmentation of the memory
>>> +space into a set of sub-spaces called colors that are mapped to disjoint cache
>>> +partitions. Technically, the whole memory space is first divided into a number
>>> +of subsequent regions. Then each region is in turn divided into a number of
>>> +subsequent sub-colors. The generic i-th color is then obtained by all the
>>> +i-th sub-colors in each region.
>>> +
>>> +::
>>> +
>>> +                            Region j            Region j+1
>>> +                .....................   ............
>>> +                .                     . .
>>> +                .                       .
>>> +            _ _ _______________ _ _____________________ _ _
>>> +                |     |     |     |     |     |     |
>>> +                | c_0 | c_1 |     | c_n | c_0 | c_1 |
>>> +           _ _ _|_____|_____|_ _ _|_____|_____|_____|_ _ _
>>> +                    :                       :
>>> +                    :                       :...         ... .
>>> +                    :                            color 0
>>> +                    :...........................         ... .
>>> +                                                :
>>> +          . . ..................................:
>>> +
>>> +There are two pragmatic lesson to be learnt.
>>> +
>>> +1. If one wants to avoid cache interference between two domains, different
>>> +   colors needs to be used for their memory.
>>> +
>>> +2. Color assignment must privilege contiguity in the partitioning. E.g.,
>>> +   assigning colors (0,1) to domain I  and (2,3) to domain  J is better than
>>> +   assigning colors (0,2) to I and (1,3) to J.
>>
>> I can't connect this 2nd point with any of what was said above.
> 
> If colors are contiguous then a greater spatial locality is achievable. You
> mean we should better explain this?

Yes, but not just that. See how you using "must" in the text contradicts you
now suggesting this is merely an optimization.

>>> +How to compute the number of colors
>>> +***********************************
>>> +
>>> +To compute the number of available colors for a specific platform, the size of
>>> +an LLC way and the page size used by Xen must be known. The first parameter can
>>> +be found in the processor manual or can be also computed dividing the total
>>> +cache size by the number of its ways. The second parameter is the minimum
>>> +amount of memory that can be mapped by the hypervisor,
>>
>> I find "amount of memory that can be mapped" quite confusing here. Don't you
>> really mean the granularity at which memory can be mapped?
> 
> Yes that's what I wanted to describe. I'll change it.
> 
>>> thus dividing the way
>>> +size by the page size, the number of total cache partitions is found. So for
>>> +example, an Arm Cortex-A53 with a 16-ways associative 1 MiB LLC, can isolate up
>>> +to 16 colors when pages are 4 KiB in size.
>>
>> I guess it's a matter of what one's use to, but to me talking of "way size"
>> and how the calculation is described is, well, unusual. What I would start
>> from is the smallest entity, i.e. a cache line. Then it would be relevant
>> to describe how, after removing the low so many bits to cover for cache line
>> size, the remaining address bits are used to map to a particular set. It
>> looks to me as if you're assuming that this mapping is linear, using the
>> next so many bits from the address. Afaik this isn't true on various modern
>> CPUs; instead hash functions are used. Without knowing at least certain
>> properties of such a hash function, I'm afraid your mapping from address to
>> color isn't necessarily guaranteeing the promised isolation. The guarantee
>> may hold for processors you specifically target, but then I think in this
>> description it would help if you would fully spell out any assumptions you
>> make on how hardware maps addresses to elements of the cache.
> 
> You're right, we are assuming a linear mapping. We are going to review and
> extend the documentation in order to fully specify when coloring can be
> applied.
> 
> About the "way size" it's a way of summarizing all the parameters into one.
> We could ask for different cache parameters as you said, but in the end what
> we are interested in is how many partitions is the cache capable of isolate
> and how big they are. The answer is, in theory, as many partitions as the
> number of sets, each one as big as a cache line, bacause we can't have
> isolation inside a set.
> Then memory mapping comes into place and the minimum granularity at which
> mapping can happen actually lowers the number of partitions.
> To recap we can isolate:
>     nr_sets * line_size / page_size
> Then we simply named:
>     way_size = nr_sets * line_size
> Another way of computing it:
>     way_size = cache_size / nr_ways
> 
> We are ok with having two parameters: cache_size and nr_ways which are even
> easier and intuitive to find for a normal user.

Right, that's the aspect I was actually after.

Jan
Jan Beulich Feb. 5, 2024, 9:28 a.m. UTC | #10
On 03.02.2024 11:57, Carlo Nonato wrote:
> On Thu, Feb 1, 2024 at 1:59 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 29.01.2024 18:17, Carlo Nonato wrote:
>>> --- a/xen/arch/Kconfig
>>> +++ b/xen/arch/Kconfig
>>> @@ -31,3 +31,20 @@ config NR_NUMA_NODES
>>>         associated with multiple-nodes management. It is the upper bound of
>>>         the number of NUMA nodes that the scheduler, memory allocation and
>>>         other NUMA-aware components can handle.
>>> +
>>> +config LLC_COLORING
>>> +     bool "Last Level Cache (LLC) coloring" if EXPERT
>>> +     depends on HAS_LLC_COLORING
>>> +
>>> +config NR_LLC_COLORS
>>> +     int "Maximum number of LLC colors"
>>> +     range 2 1024
>>
>> What's the reasoning behind this upper bound? IOW - can something to this
>> effect be said in the description, please?
> 
> The only reason is that this is the number of colors that fit in a 4 KiB page.
> I don't have any other good way of picking a number here. 1024 is already big
> and probably nobody would use such a configuration. But 512 or 256 would be
> equally arbitrary.

And because of this I'm asking that you say in the description how you
arrived at this value. As to fitting in 4k-page: That makes two
assumptions (both true for all ports right now, but liable to be missed if
either changed down the road): PAGE_SIZE == 0x1000 && sizeof(int) == 4.

>>> --- /dev/null
>>> +++ b/xen/common/llc-coloring.c
>>> @@ -0,0 +1,87 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Last Level Cache (LLC) coloring common code
>>> + *
>>> + * Copyright (C) 2022 Xilinx Inc.
>>> + */
>>> +#include <xen/keyhandler.h>
>>> +#include <xen/llc-coloring.h>
>>> +#include <xen/param.h>
>>> +
>>> +bool __ro_after_init llc_coloring_enabled;
>>> +boolean_param("llc-coloring", llc_coloring_enabled);
>>
>> The variable has no use right now afaics, so it's unclear whether (a) it
>> is legitimately non-static and (b) placed in an appropriate section.
> 
> My bad here. The variable should be tested for in llc_coloring_init() and in
> domain_dump_llc_colors() (in domain_llc_coloring_free() as well, in later
> patches). That change was lost in the rebase of the series.
> 
> Anyway per this patch, the global is only accessed from this file while it's
> going to be accessed from outside in later patches. In this case what should
> I do? Declare it static and then make it non-static afterwards?

That would be preferred, considering that there may be an extended time
period between the 1st and 2nd patches going in. Explaining why a
variable is non-static despite not needing to be just yet would be an
alternative, but then you'd also need to justify why transiently
violating the respective Misra guideline is acceptable.

Jan
diff mbox series

Patch

diff --git a/docs/misc/cache-coloring.rst b/docs/misc/cache-coloring.rst
new file mode 100644
index 0000000000..9fe01e99e1
--- /dev/null
+++ b/docs/misc/cache-coloring.rst
@@ -0,0 +1,87 @@ 
+Xen cache coloring user guide
+=============================
+
+The cache coloring support in Xen allows to reserve Last Level Cache (LLC)
+partitions for Dom0, DomUs and Xen itself. Currently only ARM64 is supported.
+
+To compile LLC coloring support set ``CONFIG_LLC_COLORING=y``.
+
+If needed, change the maximum number of colors with
+``CONFIG_NR_LLC_COLORS=<n>``.
+
+Compile Xen and the toolstack and then configure it via
+`Command line parameters`_.
+
+Background
+**********
+
+Cache hierarchy of a modern multi-core CPU typically has first levels dedicated
+to each core (hence using multiple cache units), while the last level is shared
+among all of them. Such configuration implies that memory operations on one
+core (e.g. running a DomU) are able to generate interference on another core
+(e.g .hosting another DomU). Cache coloring allows eliminating this
+mutual interference, and thus guaranteeing higher and more predictable
+performances for memory accesses.
+The key concept underlying cache coloring is a fragmentation of the memory
+space into a set of sub-spaces called colors that are mapped to disjoint cache
+partitions. Technically, the whole memory space is first divided into a number
+of subsequent regions. Then each region is in turn divided into a number of
+subsequent sub-colors. The generic i-th color is then obtained by all the
+i-th sub-colors in each region.
+
+::
+
+                            Region j            Region j+1
+                .....................   ............
+                .                     . .
+                .                       .
+            _ _ _______________ _ _____________________ _ _
+                |     |     |     |     |     |     |
+                | c_0 | c_1 |     | c_n | c_0 | c_1 |
+           _ _ _|_____|_____|_ _ _|_____|_____|_____|_ _ _
+                    :                       :
+                    :                       :...         ... .
+                    :                            color 0
+                    :...........................         ... .
+                                                :
+          . . ..................................:
+
+There are two pragmatic lesson to be learnt.
+
+1. If one wants to avoid cache interference between two domains, different
+   colors needs to be used for their memory.
+
+2. Color assignment must privilege contiguity in the partitioning. E.g.,
+   assigning colors (0,1) to domain I  and (2,3) to domain  J is better than
+   assigning colors (0,2) to I and (1,3) to J.
+
+How to compute the number of colors
+***********************************
+
+To compute the number of available colors for a specific platform, the size of
+an LLC way and the page size used by Xen must be known. The first parameter can
+be found in the processor manual or can be also computed dividing the total
+cache size by the number of its ways. The second parameter is the minimum
+amount of memory that can be mapped by the hypervisor, thus dividing the way
+size by the page size, the number of total cache partitions is found. So for
+example, an Arm Cortex-A53 with a 16-ways associative 1 MiB LLC, can isolate up
+to 16 colors when pages are 4 KiB in size.
+
+Cache layout is probed automatically by looking at the CLIDR_EL1 arm register.
+This means that other system caches that aren't visible there, are ignored.
+The possibility of manually setting the way size is left to the user to overcome
+failing situations or for debugging/testing purposes. See
+`Command line parameters`_ for more information on that.
+
+Command line parameters
+***********************
+
+More specific documentation is available at `docs/misc/xen-command-line.pandoc`.
+
++----------------------+-------------------------------+
+| **Parameter**        | **Description**               |
++----------------------+-------------------------------+
+| ``llc-coloring``     | enable coloring at runtime    |
++----------------------+-------------------------------+
+| ``llc-way-size``     | set the LLC way size          |
++----------------------+-------------------------------+
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 8e65f8bd18..11f9f209d1 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1713,6 +1713,33 @@  This option is intended for debugging purposes only.  Enable MSR_DEBUGCTL.LBR
 in hypervisor context to be able to dump the Last Interrupt/Exception To/From
 record with other registers.
 
+### llc-coloring
+> `= <boolean>`
+
+> Default: `false`
+
+Flag to enable or disable LLC coloring support at runtime. This options is
+available only when `CONFIG_LLC_COLORING` is enabled. See the general
+cache coloring documentation for more info.
+
+### llc-way-size
+> `= <size>`
+
+> Default: `Obtained from the hardware`
+
+Specify the way size of the Last Level Cache. This options is available only
+when `CONFIG_LLC_COLORING` is enabled. It is an optional, expert-only parameter
+and it is used to calculate the number of available LLC colors on the platform.
+It can be obtained by dividing the total LLC size by the number of its
+associative ways.
+By default, the value is automatically computed by probing the hardware, but in
+case of specific needs, it can be manually set. Those include failing probing
+and debugging/testing purposes so that it's possibile to emulate platforms with
+different number of supported colors.
+An important detail to highlight is that the current implementation of the
+cache coloring technique requires the number of colors to be a power of 2, and
+consequently, also the LLC way size must be so.
+
 ### lock-depth-size
 > `= <integer>`
 
diff --git a/xen/arch/Kconfig b/xen/arch/Kconfig
index 67ba38f32f..c1157bcbcb 100644
--- a/xen/arch/Kconfig
+++ b/xen/arch/Kconfig
@@ -31,3 +31,20 @@  config NR_NUMA_NODES
 	  associated with multiple-nodes management. It is the upper bound of
 	  the number of NUMA nodes that the scheduler, memory allocation and
 	  other NUMA-aware components can handle.
+
+config LLC_COLORING
+	bool "Last Level Cache (LLC) coloring" if EXPERT
+	depends on HAS_LLC_COLORING
+
+config NR_LLC_COLORS
+	int "Maximum number of LLC colors"
+	range 2 1024
+	default 128
+	depends on LLC_COLORING
+	help
+	  Controls the build-time size of various arrays associated with LLC
+	  coloring. Refer to cache coloring documentation for how to compute the
+	  number of colors supported by the platform. This is only an upper
+	  bound. The runtime value is autocomputed or manually set via cmdline.
+	  The default value corresponds to an 8 MiB 16-ways LLC, which should be
+	  more than what needed in the general case.
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 310ad4229c..e383f09d97 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -71,6 +71,9 @@  config HAS_IOPORTS
 config HAS_KEXEC
 	bool
 
+config HAS_LLC_COLORING
+	bool
+
 config HAS_PMAP
 	bool
 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 69d6aa626c..409cc53e2a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -24,6 +24,7 @@  obj-y += keyhandler.o
 obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC) += kimage.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o livepatch_elf.o
+obj-$(CONFIG_LLC_COLORING) += llc-coloring.o
 obj-$(CONFIG_MEM_ACCESS) += mem_access.o
 obj-y += memory.o
 obj-y += multicall.o
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 99a2d72a02..8d90b613f7 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -6,6 +6,7 @@ 
 #include <xen/debugger.h>
 #include <xen/delay.h>
 #include <xen/keyhandler.h>
+#include <xen/llc-coloring.h>
 #include <xen/param.h>
 #include <xen/shutdown.h>
 #include <xen/event.h>
@@ -307,6 +308,8 @@  static void cf_check dump_domains(unsigned char key)
 
         arch_dump_domain_info(d);
 
+        domain_dump_llc_colors(d);
+
         rangeset_domain_printk(d);
 
         dump_pageframe_info(d);
diff --git a/xen/common/llc-coloring.c b/xen/common/llc-coloring.c
new file mode 100644
index 0000000000..10729e70c1
--- /dev/null
+++ b/xen/common/llc-coloring.c
@@ -0,0 +1,87 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Last Level Cache (LLC) coloring common code
+ *
+ * Copyright (C) 2022 Xilinx Inc.
+ */
+#include <xen/keyhandler.h>
+#include <xen/llc-coloring.h>
+#include <xen/param.h>
+
+bool __ro_after_init llc_coloring_enabled;
+boolean_param("llc-coloring", llc_coloring_enabled);
+
+/* Size of an LLC way */
+static unsigned int __ro_after_init llc_way_size;
+size_param("llc-way-size", llc_way_size);
+/* Number of colors available in the LLC */
+static unsigned int __ro_after_init max_nr_colors = CONFIG_NR_LLC_COLORS;
+
+static void print_colors(const unsigned int *colors, unsigned int num_colors)
+{
+    unsigned int i;
+
+    printk("{ ");
+    for ( i = 0; i < num_colors; i++ ) {
+        unsigned int start = colors[i], end = colors[i];
+
+        printk("%u", start);
+
+        for ( ;
+              i < num_colors - 1 && colors[i] + 1 == colors[i + 1];
+              i++, end++ );
+
+        if ( start != end )
+            printk("-%u", end);
+
+        if ( i < num_colors - 1 )
+            printk(", ");
+    }
+    printk(" }\n");
+}
+
+static void dump_coloring_info(unsigned char key)
+{
+    printk("'%c' pressed -> dumping LLC coloring general info\n", key);
+    printk("LLC way size: %u KiB\n", llc_way_size >> 10);
+    printk("Number of LLC colors supported: %u\n", max_nr_colors);
+}
+
+void __init llc_coloring_init(void)
+{
+    if ( !llc_way_size && !(llc_way_size = get_llc_way_size()) )
+        panic("Probed LLC coloring way size is 0 and no custom value found\n");
+
+    /*
+     * The maximum number of colors must be a power of 2 in order to correctly
+     * map them to bits of an address, so also the LLC way size must be so.
+     */
+    if ( llc_way_size & (llc_way_size - 1) )
+        panic("LLC coloring way size (%u) isn't a power of 2\n", llc_way_size);
+
+    max_nr_colors = llc_way_size >> PAGE_SHIFT;
+
+    if ( max_nr_colors < 2 || max_nr_colors > CONFIG_NR_LLC_COLORS )
+        panic("Number of LLC colors (%u) not in range [2, %u]\n",
+              max_nr_colors, CONFIG_NR_LLC_COLORS);
+
+    register_keyhandler('K', dump_coloring_info, "dump LLC coloring info", 1);
+
+    arch_llc_coloring_init();
+}
+
+void domain_dump_llc_colors(const struct domain *d)
+{
+    printk("Domain %pd has %u LLC colors: ", d, d->num_llc_colors);
+    print_colors(d->llc_colors, d->num_llc_colors);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/llc-coloring.h b/xen/include/xen/llc-coloring.h
new file mode 100644
index 0000000000..5e12eb426f
--- /dev/null
+++ b/xen/include/xen/llc-coloring.h
@@ -0,0 +1,38 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Last Level Cache (LLC) coloring common header
+ *
+ * Copyright (C) 2022 Xilinx Inc.
+ */
+#ifndef __COLORING_H__
+#define __COLORING_H__
+
+#include <xen/sched.h>
+#include <public/domctl.h>
+
+#ifdef CONFIG_LLC_COLORING
+extern bool llc_coloring_enabled;
+
+void llc_coloring_init(void);
+void domain_dump_llc_colors(const struct domain *d);
+#else
+#define llc_coloring_enabled false
+
+static inline void llc_coloring_init(void) {}
+static inline void domain_dump_llc_colors(const struct domain *d) {}
+#endif
+
+unsigned int get_llc_way_size(void);
+void arch_llc_coloring_init(void);
+
+#endif /* __COLORING_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9da91e0e62..8df0f29335 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -626,6 +626,11 @@  struct domain
 
     /* Holding CDF_* constant. Internal flags for domain creation. */
     unsigned int cdf;
+
+#ifdef CONFIG_LLC_COLORING
+    unsigned const int *llc_colors;
+    unsigned int num_llc_colors;
+#endif
 };
 
 static inline struct page_list_head *page_to_list(