Message ID | 20220826125111.152261-1-carlo.nonato@minervasys.tech (mailing list archive) |
---|---|
Headers | show |
Series | Arm cache coloring | expand |
Hi Carlo, On 26/08/2022 13:50, Carlo Nonato wrote: > Shared caches in multi-core CPU architectures represent a problem for > predictability of memory access latency. This jeopardizes applicability > of many Arm platform in real-time critical and mixed-criticality > scenarios. We introduce support for cache partitioning with page > coloring, a transparent software technique that enables isolation > between domains and Xen, and thus avoids cache interference. > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > the user to define assignments of cache partitions ids, called colors, > where assigning different colors guarantees no mutual eviction on cache > will ever happen. This instructs the Xen memory allocator to provide > the i-th color assignee only with pages that maps to color i, i.e. that > are indexed in the i-th cache partition. > > The proposed implementation supports the dom0less feature. > The solution has been tested in several scenarios, including Xilinx Zynq > MPSoCs. > > Overview of implementation and commits structure > ------------------------------------------------ > > - [1-3] Coloring initialization, cache layout auto-probing and coloring > data for domains are added. > - [4-5] xl and Device Tree support for coloring is addedd. > - [6-7] A new page allocator for domain memory that implement the cache > coloring mechanism is introduced. > - [8-12] Coloring support is added for Xen .text region. > > Changes in v2 > ------------- > > Lot of things changed between the two versions, mainly I tried to follow > all the comments left by the maintainers after the previous version review. > Here is a brief list of the major points (even if, imho, it's easier to > repeat all the review process): The series doesn't build on Arm64 without cache coloring. Please make sure to compile and check that Xen still boot on system after your series with cache coloring disabled. Cheers,
Hi Julien, On Sat, Sep 10, 2022 at 5:12 PM Julien Grall <julien@xen.org> wrote: > > Hi Carlo, > > On 26/08/2022 13:50, Carlo Nonato wrote: > > Shared caches in multi-core CPU architectures represent a problem for > > predictability of memory access latency. This jeopardizes applicability > > of many Arm platform in real-time critical and mixed-criticality > > scenarios. We introduce support for cache partitioning with page > > coloring, a transparent software technique that enables isolation > > between domains and Xen, and thus avoids cache interference. > > > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > > the user to define assignments of cache partitions ids, called colors, > > where assigning different colors guarantees no mutual eviction on cache > > will ever happen. This instructs the Xen memory allocator to provide > > the i-th color assignee only with pages that maps to color i, i.e. that > > are indexed in the i-th cache partition. > > > > The proposed implementation supports the dom0less feature. > > The solution has been tested in several scenarios, including Xilinx Zynq > > MPSoCs. > > > > Overview of implementation and commits structure > > ------------------------------------------------ > > > > - [1-3] Coloring initialization, cache layout auto-probing and coloring > > data for domains are added. > > - [4-5] xl and Device Tree support for coloring is addedd. > > - [6-7] A new page allocator for domain memory that implement the cache > > coloring mechanism is introduced. > > - [8-12] Coloring support is added for Xen .text region. > > > > Changes in v2 > > ------------- > > > > Lot of things changed between the two versions, mainly I tried to follow > > all the comments left by the maintainers after the previous version review. > > Here is a brief list of the major points (even if, imho, it's easier to > > repeat all the review process): > > The series doesn't build on Arm64 without cache coloring. Please make > sure to compile and check that Xen still boot on system after your > series with cache coloring disabled. I'm sorry for that. Tested multiple times, but probably missed it after some last minute change. The following diff fixes it. diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h index 00351ee014..6abe2fdef7 100644 --- a/xen/arch/arm/include/asm/mm.h +++ b/xen/arch/arm/include/asm/mm.h @@ -411,7 +411,7 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn) #else #define virt_boot_xen(virt) virt #define set_value_for_secondary(var, val) \ - var = val; + var = val; \ clean_dcache(var); #endif > > Cheers, > > -- > Julien Grall Thanks. - Carlo Nonato
On 26.08.2022 14:50, Carlo Nonato wrote: > Shared caches in multi-core CPU architectures represent a problem for > predictability of memory access latency. This jeopardizes applicability > of many Arm platform in real-time critical and mixed-criticality > scenarios. We introduce support for cache partitioning with page > coloring, a transparent software technique that enables isolation > between domains and Xen, and thus avoids cache interference. > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > the user to define assignments of cache partitions ids, called colors, > where assigning different colors guarantees no mutual eviction on cache > will ever happen. This instructs the Xen memory allocator to provide > the i-th color assignee only with pages that maps to color i, i.e. that > are indexed in the i-th cache partition. > > The proposed implementation supports the dom0less feature. > The solution has been tested in several scenarios, including Xilinx Zynq > MPSoCs. Having looked at the non-Arm-specific parts of this I have one basic question: Wouldn't it be possible to avoid the addition of entirely new logic by treating the current model as just using a single color, therefore merely becoming a special case of what you want? Plus an advanced question: In how far does this interoperate with static allocation, which again is (for now) an Arm-only feature? Your reference to dom0less above doesn't cover this afaict. Jan
Hi Jan, On Thu, Sep 15, 2022 at 03:29:08PM +0200, Jan Beulich wrote: > On 26.08.2022 14:50, Carlo Nonato wrote: > > Shared caches in multi-core CPU architectures represent a problem for > > predictability of memory access latency. This jeopardizes applicability > > of many Arm platform in real-time critical and mixed-criticality > > scenarios. We introduce support for cache partitioning with page > > coloring, a transparent software technique that enables isolation > > between domains and Xen, and thus avoids cache interference. > > > > When creating a domain, a simple syntax (e.g. `0-3` or `4-11`) allows > > the user to define assignments of cache partitions ids, called colors, > > where assigning different colors guarantees no mutual eviction on cache > > will ever happen. This instructs the Xen memory allocator to provide > > the i-th color assignee only with pages that maps to color i, i.e. that > > are indexed in the i-th cache partition. > > > > The proposed implementation supports the dom0less feature. > > The solution has been tested in several scenarios, including Xilinx Zynq > > MPSoCs. > > Having looked at the non-Arm-specific parts of this I have one basic > question: Wouldn't it be possible to avoid the addition of entirely > new logic by treating the current model as just using a single color, > therefore merely becoming a special case of what you want? Nice question. Thanks! In principle, you are quite right: monochrome is just a degenerate choice of colouring---the colouring implementation with a single colour allows assigning all the available pages, exactly as it happens with the ordinary allocator. The difference lies in the allocation algorithm. In practice, that would be quite inefficient. This is because the allocation logic used by the coloured allocator is quite simpler, since it operates with lists, instead of binary trees. Now, upgrading the logic of the coloured allocator would be an overkill because lowering the complexity of insertion/removal operations from linear to logarithmic does not change much, since in the real world, the longest sequence of physically contiguous pages that may be assigned is max_colours - 1. Cheers.
On Thu, 15 Sep 2022, Jan Beulich wrote: > Plus an advanced question: In how far does this interoperate with > static allocation, which again is (for now) an Arm-only feature? > Your reference to dom0less above doesn't cover this afaict. I take you are referring to static-mem, the static memory ranges for dom0less domUs described in docs/misc/arm/device-tree/booting.txt. static-mem doesn't interoperate with cache coloring: each static range would span across multiple colors. You have to choose either feature, using both at the same time doesn't make sense. Cheers, Stefano
Hi Carlo, On 26/08/2022 13:50, Carlo Nonato wrote: > - The way xl passes user space memory to Xen it's adapted from various > points of the xl code itself (e.g. xc_domain_node_setaffinity) and it > works, but it really needs attention from expert maintainers since > I'm not completely sure this is the correct way of doing things. > - We still need to bring back the relocation feature (part of) in order > to move Xen memory to a colored space where the hypervisor could be > isolated from VMs interference (see the revert commit #10 and the > get_xen_paddr function in #12). > - Revert commits #8 and #9 are needed because coloring has the command > line parsing as a prerequisite for its initialization and > setup_pagetables must be called after it in order to color the Xen > mapping. The DTB mapping is then added to the boot page tables instead > of the Xen ones. Probably the way this is done is a bit simplistic. > Looking forward for comments on the subject. > - A temporary mapping of the old Xen code (old here means non-colored) > is used to reach variables in the old physical space so that secondary > CPUs can boot. There were some comments in the previous version on that > because the mapping is available for all the CPUs while only CPU0 is > the one supposed to access it. I'm not sure how to temporarily mapping > things only for the master CPU. On Arm64, Xen will only use one set of page-tables for all the CPUs. So it will not be possible to have a temporary mapping for a single CPU. But what you can do is mapping the region and unmapping it when you are done. That said, I would rather prefer if we can get rid of the old copy of Xen. This would means secondary CPUs will directly jump to the new Xen. > - A lot of #ifdef for cache coloring are introduced because I prefer to > define functions only if they are actually needed. Let me know if you > prefer a different approach. The preferred approach in Xen is to provide stub helpers in the #else part. > - Julien posted an RFC to address a problem with the switch_ttbr function. > For the moment I haven't considered it since it's still a work in progress. I have posted a new version for this: https://lore.kernel.org/xen-devel/20221022150422.17707-1-julien@xen.org/ There are a couple of open questions about the interaction with cache coloring. Please have a look there. Cheers,