diff mbox series

[RFC,v2,11/27] arm64: mte: Reserve tag storage memory

Message ID 20231119165721.9849-12-alexandru.elisei@arm.com (mailing list archive)
State New, archived
Headers show
Series [RFC,v2,01/27] arm64: mte: Rework naming for tag manipulation functions | expand

Commit Message

Alexandru Elisei Nov. 19, 2023, 4:57 p.m. UTC
Allow the kernel to get the size and location of the MTE tag storage
regions from the DTB. This memory is marked as reserved for now.

The DTB node for the tag storage region is defined as:

        tags0: tag-storage@8f8000000 {
                compatible = "arm,mte-tag-storage";
                reg = <0x08 0xf8000000 0x00 0x4000000>;
                block-size = <0x1000>;
                memory = <&memory0>;	// Associated tagged memory node
        };

The tag storage region represents the largest contiguous memory region that
holds all the tags for the associated contiguous memory region which can be
tagged. For example, for a 32GB contiguous tagged memory the corresponding
tag storage region is 1GB of contiguous memory, not two adjacent 512M of
tag storage memory.

"block-size" represents the minimum multiple of 4K of tag storage where all
the tags stored in the block correspond to a contiguous memory region. This
is needed for platforms where the memory controller interleaves tag writes
to memory. For example, if the memory controller interleaves tag writes for
256KB of contiguous memory across 8K of tag storage (2-way interleave),
then the correct value for "block-size" is 0x2000. This value is a hardware
property, independent of the selected kernel page size.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/Kconfig                       |  12 ++
 arch/arm64/include/asm/mte_tag_storage.h |  15 ++
 arch/arm64/kernel/Makefile               |   1 +
 arch/arm64/kernel/mte_tag_storage.c      | 256 +++++++++++++++++++++++
 arch/arm64/kernel/setup.c                |   7 +
 5 files changed, 291 insertions(+)
 create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
 create mode 100644 arch/arm64/kernel/mte_tag_storage.c

Comments

Hyesoo Yu Nov. 29, 2023, 8:44 a.m. UTC | #1
Hello.

On Sun, Nov 19, 2023 at 04:57:05PM +0000, Alexandru Elisei wrote:
> Allow the kernel to get the size and location of the MTE tag storage
> regions from the DTB. This memory is marked as reserved for now.
> 
> The DTB node for the tag storage region is defined as:
> 
>         tags0: tag-storage@8f8000000 {
>                 compatible = "arm,mte-tag-storage";
>                 reg = <0x08 0xf8000000 0x00 0x4000000>;
>                 block-size = <0x1000>;
>                 memory = <&memory0>;	// Associated tagged memory node
>         };
>

How about using compatible = "shared-dma-pool" like below ?

&reserved_memory {
	tags0: tag0@8f8000000 {
		compatible = "arm,mte-tag-storage";
        	reg = <0x08 0xf8000000 0x00 0x4000000>;
	};
}

tag-storage {
        compatible = "arm,mte-tag-storage";
	memory-region = <&tag>;
        memory = <&memory0>;
	block-size = <0x1000>;
}

And then, the activation of CMA would be performed in the CMA code.
We just can get the region information from memory-region and allocate it directly
like alloc_contig_range, take_page_off_buddy. It seems like we can remove a lots of code.

> The tag storage region represents the largest contiguous memory region that
> holds all the tags for the associated contiguous memory region which can be
> tagged. For example, for a 32GB contiguous tagged memory the corresponding
> tag storage region is 1GB of contiguous memory, not two adjacent 512M of
> tag storage memory.
> 
> "block-size" represents the minimum multiple of 4K of tag storage where all
> the tags stored in the block correspond to a contiguous memory region. This
> is needed for platforms where the memory controller interleaves tag writes
> to memory. For example, if the memory controller interleaves tag writes for
> 256KB of contiguous memory across 8K of tag storage (2-way interleave),
> then the correct value for "block-size" is 0x2000. This value is a hardware
> property, independent of the selected kernel page size.
>

Is it considered for kernel page size like 16K page, 64K page ? The comment says
it should be a multiple of 4K, but it should be a multiple of the "page size" more accurately.
Please let me know if there's anything I misunderstood. :-)


> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/Kconfig                       |  12 ++
>  arch/arm64/include/asm/mte_tag_storage.h |  15 ++
>  arch/arm64/kernel/Makefile               |   1 +
>  arch/arm64/kernel/mte_tag_storage.c      | 256 +++++++++++++++++++++++
>  arch/arm64/kernel/setup.c                |   7 +
>  5 files changed, 291 insertions(+)
>  create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
>  create mode 100644 arch/arm64/kernel/mte_tag_storage.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7b071a00425d..fe8276fdc7a8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,18 @@ config ARM64_MTE
>  
>  	  Documentation/arch/arm64/memory-tagging-extension.rst.
>  
> +if ARM64_MTE
> +config ARM64_MTE_TAG_STORAGE
> +	bool "Dynamic MTE tag storage management"
> +	help
> +	  Adds support for dynamic management of the memory used by the hardware
> +	  for storing MTE tags. This memory, unlike normal memory, cannot be
> +	  tagged. When it is used to store tags for another memory location it
> +	  cannot be used for any type of allocation.
> +
> +	  If unsure, say N
> +endif # ARM64_MTE
> +
>  endmenu # "ARMv8.5 architectural features"
>  
>  menu "ARMv8.7 architectural features"
> diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h
> new file mode 100644
> index 000000000000..8f86c4f9a7c3
> --- /dev/null
> +++ b/arch/arm64/include/asm/mte_tag_storage.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +#ifndef __ASM_MTE_TAG_STORAGE_H
> +#define __ASM_MTE_TAG_STORAGE_H
> +
> +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> +void mte_tag_storage_init(void);
> +#else
> +static inline void mte_tag_storage_init(void)
> +{
> +}
> +#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
> +#endif /* __ASM_MTE_TAG_STORAGE_H  */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index d95b3d6b471a..5f031bf9f8f1 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -70,6 +70,7 @@ obj-$(CONFIG_CRASH_CORE)		+= crash_core.o
>  obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
>  obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
>  obj-$(CONFIG_ARM64_MTE)			+= mte.o
> +obj-$(CONFIG_ARM64_MTE_TAG_STORAGE)	+= mte_tag_storage.o
>  obj-y					+= vdso-wrap.o
>  obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
>  obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
> diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c
> new file mode 100644
> index 000000000000..fa6267ef8392
> --- /dev/null
> +++ b/arch/arm64/kernel/mte_tag_storage.c
> @@ -0,0 +1,256 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Support for dynamic tag storage.
> + *
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#include <linux/memblock.h>
> +#include <linux/mm.h>
> +#include <linux/of_device.h>
> +#include <linux/of_fdt.h>
> +#include <linux/range.h>
> +#include <linux/string.h>
> +#include <linux/xarray.h>
> +
> +#include <asm/mte_tag_storage.h>
> +
> +struct tag_region {
> +	struct range mem_range;	/* Memory associated with the tag storage, in PFNs. */
> +	struct range tag_range;	/* Tag storage memory, in PFNs. */
> +	u32 block_size;		/* Tag block size, in pages. */
> +};
> +
> +#define MAX_TAG_REGIONS	32
> +
> +static struct tag_region tag_regions[MAX_TAG_REGIONS];
> +static int num_tag_regions;
> +
> +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> +						int reg_len, struct range *range)
> +{
> +	int addr_cells = dt_root_addr_cells;
> +	int size_cells = dt_root_size_cells;
> +	u64 size;
> +
> +	if (reg_len / 4 > addr_cells + size_cells)
> +		return -EINVAL;
> +
> +	range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> +	size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> +	if (size == 0) {
> +		pr_err("Invalid node");
> +		return -EINVAL;
> +	}
> +	range->end = range->start + size - 1;
> +
> +	return 0;
> +}
> +
> +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> +						    struct range *tag_range)
> +{
> +	const __be32 *reg;
> +	int reg_len;
> +
> +	reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> +	if (reg == NULL) {
> +		pr_err("Invalid metadata node");
> +		return -EINVAL;
> +	}
> +
> +	return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> +}
> +
> +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> +{
> +	const __be32 *reg;
> +	int reg_len;
> +
> +	reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> +	if (reg == NULL)
> +		reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> +
> +	if (reg == NULL) {
> +		pr_err("Invalid memory node");
> +		return -EINVAL;
> +	}
> +
> +	return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> +}
> +
> +struct find_memory_node_arg {
> +	unsigned long node;
> +	u32 phandle;
> +};
> +
> +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> +				       int depth, void *data)
> +{
> +	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +	struct find_memory_node_arg *arg = data;
> +
> +	if (depth != 1 || !type || strcmp(type, "memory") != 0)
> +		return 0;
> +
> +	if (of_get_flat_dt_phandle(node) == arg->phandle) {
> +		arg->node = node;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> +{
> +	struct find_memory_node_arg arg = { 0 };
> +	const __be32 *memory_prop;
> +	u32 mem_phandle;
> +	int ret, reg_len;
> +
> +	memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> +	if (!memory_prop) {
> +		pr_err("Missing 'memory' property in the tag storage node");
> +		return -EINVAL;
> +	}
> +
> +	mem_phandle = be32_to_cpup(memory_prop);
> +	arg.phandle = mem_phandle;
> +
> +	ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
> +	if (ret != 1) {
> +		pr_err("Associated memory node not found");
> +		return -EINVAL;
> +	}
> +
> +	*mem_node = arg.node;
> +
> +	return 0;
> +}
> +
> +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> +					       u32 *retval)
> +{
> +	const __be32 *reg;
> +
> +	reg = of_get_flat_dt_prop(node, propname, NULL);
> +	if (!reg)
> +		return -EINVAL;
> +
> +	*retval = be32_to_cpup(reg);
> +	return 0;
> +}
> +
> +static u32 __init get_block_size_pages(u32 block_size_bytes)
> +{
> +	u32 a = PAGE_SIZE;
> +	u32 b = block_size_bytes;
> +	u32 r;
> +
> +	/* Find greatest common divisor using the Euclidian algorithm. */
> +	do {
> +		r = a % b;
> +		a = b;
> +		b = r;
> +	} while (b != 0);
> +
> +	return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> +}
> +
> +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> +				       int depth, void *data)
> +{
> +	struct tag_region *region;
> +	unsigned long mem_node;
> +	struct range *mem_range;
> +	struct range *tag_range;
> +	u32 block_size_bytes;
> +	u32 nid = 0;
> +	int ret;
> +
> +	if (depth != 1 || !strstr(uname, "tag-storage"))
> +		return 0;
> +
> +	if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> +		return 0;
> +
> +	if (num_tag_regions == MAX_TAG_REGIONS) {
> +		pr_err("Maximum number of tag storage regions exceeded");
> +		return -EINVAL;
> +	}
> +
> +	region = &tag_regions[num_tag_regions];
> +	mem_range = &region->mem_range;
> +	tag_range = &region->tag_range;
> +
> +	ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> +	if (ret) {
> +		pr_err("Invalid tag storage node");
> +		return ret;
> +	}
> +
> +	ret = tag_storage_get_memory_node(node, &mem_node);
> +	if (ret)
> +		return ret;
> +
> +	ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> +	if (ret) {
> +		pr_err("Invalid address for associated data memory node");
> +		return ret;
> +	}
> +
> +	/* The tag region must exactly match the corresponding memory. */
> +	if (range_len(tag_range) * 32 != range_len(mem_range)) {
> +		pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> +		       PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> +		       PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> +		return -EINVAL;
> +	}
> +
> +	ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> +	if (ret || block_size_bytes == 0) {
> +		pr_err("Invalid or missing 'block-size' property");
> +		return -EINVAL;
> +	}
> +	region->block_size = get_block_size_pages(block_size_bytes);
> +	if (range_len(tag_range) % region->block_size != 0) {
> +		pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> +		       PFN_PHYS(range_len(tag_range)), region->block_size);
> +		return -EINVAL;
> +	}
> +

I was confused about the variable "block_size", The block size declared in the device tree is
in bytes, but the actual block size used is in pages. I think the term "block_size" can cause
confusion as it might be interpreted as bytes. If possible, I suggest changing the term "block_size"
to something more readable, such as "block_nr_pages" (This is just a example!)

Thanks,
Regards.

> +	ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> +	if (ret)
> +		nid = numa_node_id();
> +
> +	ret = memblock_add_node(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)),
> +				nid, MEMBLOCK_NONE);
> +	if (ret) {
> +		pr_err("Error adding tag memblock (%d)", ret);
> +		return ret;
> +	}
> +	memblock_reserve(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> +
> +	pr_info("Found tag storage region 0x%llx-0x%llx, block size %u pages",
> +		PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end), region->block_size);
> +
> +	num_tag_regions++;
> +
> +	return 0;
> +}
> +
> +void __init mte_tag_storage_init(void)
> +{
> +	struct range *tag_range;
> +	int i, ret;
> +
> +	ret = of_scan_flat_dt(fdt_init_tag_storage, NULL);
> +	if (ret) {
> +		for (i = 0; i < num_tag_regions; i++) {
> +			tag_range = &tag_regions[i].tag_range;
> +			memblock_remove(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> +		}
> +		num_tag_regions = 0;
> +		pr_info("MTE tag storage region management disabled");
> +	}
> +}
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 417a8a86b2db..1b77138c1aa5 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -42,6 +42,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/kasan.h>
> +#include <asm/mte_tag_storage.h>
>  #include <asm/numa.h>
>  #include <asm/scs.h>
>  #include <asm/sections.h>
> @@ -342,6 +343,12 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
>  			   FW_BUG "Booted with MMU enabled!");
>  	}
>  
> +	/*
> +	 * Must be called before memory limits are enforced by
> +	 * arm64_memblock_init().
> +	 */
> +	mte_tag_storage_init();
> +
>  	arm64_memblock_init();
>  
>  	paging_init();
> -- 
> 2.42.1
> 
>
Alexandru Elisei Nov. 30, 2023, 11:56 a.m. UTC | #2
Hi,

On Wed, Nov 29, 2023 at 05:44:24PM +0900, Hyesoo Yu wrote:
> Hello.
> 
> On Sun, Nov 19, 2023 at 04:57:05PM +0000, Alexandru Elisei wrote:
> > Allow the kernel to get the size and location of the MTE tag storage
> > regions from the DTB. This memory is marked as reserved for now.
> > 
> > The DTB node for the tag storage region is defined as:
> > 
> >         tags0: tag-storage@8f8000000 {
> >                 compatible = "arm,mte-tag-storage";
> >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> >                 block-size = <0x1000>;
> >                 memory = <&memory0>;	// Associated tagged memory node
> >         };
> >
> 
> How about using compatible = "shared-dma-pool" like below ?
> 
> &reserved_memory {
> 	tags0: tag0@8f8000000 {
> 		compatible = "arm,mte-tag-storage";
>         	reg = <0x08 0xf8000000 0x00 0x4000000>;
> 	};
> }
> 
> tag-storage {
>         compatible = "arm,mte-tag-storage";
> 	memory-region = <&tag>;
>         memory = <&memory0>;
> 	block-size = <0x1000>;
> }

I'm sorry, but I don't follow where compatible = "shared-dma-pool" fits
with the examples.

> 
> And then, the activation of CMA would be performed in the CMA code.
> We just can get the region information from memory-region and allocate it directly
> like alloc_contig_range, take_page_off_buddy. It seems like we can remove a lots of code.

For the next iteration I am planning to integrate the code more tightly
with CMA, so any suggestions to that effect are very welcome :)

> 
> > The tag storage region represents the largest contiguous memory region that
> > holds all the tags for the associated contiguous memory region which can be
> > tagged. For example, for a 32GB contiguous tagged memory the corresponding
> > tag storage region is 1GB of contiguous memory, not two adjacent 512M of
> > tag storage memory.
> > 
> > "block-size" represents the minimum multiple of 4K of tag storage where all
> > the tags stored in the block correspond to a contiguous memory region. This
> > is needed for platforms where the memory controller interleaves tag writes
> > to memory. For example, if the memory controller interleaves tag writes for
> > 256KB of contiguous memory across 8K of tag storage (2-way interleave),
> > then the correct value for "block-size" is 0x2000. This value is a hardware
> > property, independent of the selected kernel page size.
> >
> 
> Is it considered for kernel page size like 16K page, 64K page ? The comment says
> it should be a multiple of 4K, but it should be a multiple of the "page size" more accurately.
> Please let me know if there's anything I misunderstood. :-)

The block size in the DTB is a hardware property, it's independent of the
kernel page size, which is a compile time option.

The function get_block_size_pages(), which computes the tag storage block
size as the kernel will use it, takes into account the fact that the
hardware block size is not necessarily a multiple of the kernel page size,
and computes the least common multiple by doing:

(kernel page size in bytes x DTB block size in bytes) / greatest common divisor

As for why the hardware block size is a multiple of 4k, that was chosen
because it will be part of the architecture update. Since the minimum
hardware page size is 4K, it doesn't make much sense to have the DTB
block-size smaller than that.

Hope that makes sense!

Thanks,
Alex

> 
> 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/Kconfig                       |  12 ++
> >  arch/arm64/include/asm/mte_tag_storage.h |  15 ++
> >  arch/arm64/kernel/Makefile               |   1 +
> >  arch/arm64/kernel/mte_tag_storage.c      | 256 +++++++++++++++++++++++
> >  arch/arm64/kernel/setup.c                |   7 +
> >  5 files changed, 291 insertions(+)
> >  create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
> >  create mode 100644 arch/arm64/kernel/mte_tag_storage.c
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 7b071a00425d..fe8276fdc7a8 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -2062,6 +2062,18 @@ config ARM64_MTE
> >  
> >  	  Documentation/arch/arm64/memory-tagging-extension.rst.
> >  
> > +if ARM64_MTE
> > +config ARM64_MTE_TAG_STORAGE
> > +	bool "Dynamic MTE tag storage management"
> > +	help
> > +	  Adds support for dynamic management of the memory used by the hardware
> > +	  for storing MTE tags. This memory, unlike normal memory, cannot be
> > +	  tagged. When it is used to store tags for another memory location it
> > +	  cannot be used for any type of allocation.
> > +
> > +	  If unsure, say N
> > +endif # ARM64_MTE
> > +
> >  endmenu # "ARMv8.5 architectural features"
> >  
> >  menu "ARMv8.7 architectural features"
> > diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h
> > new file mode 100644
> > index 000000000000..8f86c4f9a7c3
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/mte_tag_storage.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2023 ARM Ltd.
> > + */
> > +#ifndef __ASM_MTE_TAG_STORAGE_H
> > +#define __ASM_MTE_TAG_STORAGE_H
> > +
> > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> > +void mte_tag_storage_init(void);
> > +#else
> > +static inline void mte_tag_storage_init(void)
> > +{
> > +}
> > +#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
> > +#endif /* __ASM_MTE_TAG_STORAGE_H  */
> > diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> > index d95b3d6b471a..5f031bf9f8f1 100644
> > --- a/arch/arm64/kernel/Makefile
> > +++ b/arch/arm64/kernel/Makefile
> > @@ -70,6 +70,7 @@ obj-$(CONFIG_CRASH_CORE)		+= crash_core.o
> >  obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
> >  obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
> >  obj-$(CONFIG_ARM64_MTE)			+= mte.o
> > +obj-$(CONFIG_ARM64_MTE_TAG_STORAGE)	+= mte_tag_storage.o
> >  obj-y					+= vdso-wrap.o
> >  obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
> >  obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
> > diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c
> > new file mode 100644
> > index 000000000000..fa6267ef8392
> > --- /dev/null
> > +++ b/arch/arm64/kernel/mte_tag_storage.c
> > @@ -0,0 +1,256 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Support for dynamic tag storage.
> > + *
> > + * Copyright (C) 2023 ARM Ltd.
> > + */
> > +
> > +#include <linux/memblock.h>
> > +#include <linux/mm.h>
> > +#include <linux/of_device.h>
> > +#include <linux/of_fdt.h>
> > +#include <linux/range.h>
> > +#include <linux/string.h>
> > +#include <linux/xarray.h>
> > +
> > +#include <asm/mte_tag_storage.h>
> > +
> > +struct tag_region {
> > +	struct range mem_range;	/* Memory associated with the tag storage, in PFNs. */
> > +	struct range tag_range;	/* Tag storage memory, in PFNs. */
> > +	u32 block_size;		/* Tag block size, in pages. */
> > +};
> > +
> > +#define MAX_TAG_REGIONS	32
> > +
> > +static struct tag_region tag_regions[MAX_TAG_REGIONS];
> > +static int num_tag_regions;
> > +
> > +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> > +						int reg_len, struct range *range)
> > +{
> > +	int addr_cells = dt_root_addr_cells;
> > +	int size_cells = dt_root_size_cells;
> > +	u64 size;
> > +
> > +	if (reg_len / 4 > addr_cells + size_cells)
> > +		return -EINVAL;
> > +
> > +	range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> > +	size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> > +	if (size == 0) {
> > +		pr_err("Invalid node");
> > +		return -EINVAL;
> > +	}
> > +	range->end = range->start + size - 1;
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> > +						    struct range *tag_range)
> > +{
> > +	const __be32 *reg;
> > +	int reg_len;
> > +
> > +	reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > +	if (reg == NULL) {
> > +		pr_err("Invalid metadata node");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> > +}
> > +
> > +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> > +{
> > +	const __be32 *reg;
> > +	int reg_len;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> > +	if (reg == NULL)
> > +		reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > +
> > +	if (reg == NULL) {
> > +		pr_err("Invalid memory node");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> > +}
> > +
> > +struct find_memory_node_arg {
> > +	unsigned long node;
> > +	u32 phandle;
> > +};
> > +
> > +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> > +				       int depth, void *data)
> > +{
> > +	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> > +	struct find_memory_node_arg *arg = data;
> > +
> > +	if (depth != 1 || !type || strcmp(type, "memory") != 0)
> > +		return 0;
> > +
> > +	if (of_get_flat_dt_phandle(node) == arg->phandle) {
> > +		arg->node = node;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> > +{
> > +	struct find_memory_node_arg arg = { 0 };
> > +	const __be32 *memory_prop;
> > +	u32 mem_phandle;
> > +	int ret, reg_len;
> > +
> > +	memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> > +	if (!memory_prop) {
> > +		pr_err("Missing 'memory' property in the tag storage node");
> > +		return -EINVAL;
> > +	}
> > +
> > +	mem_phandle = be32_to_cpup(memory_prop);
> > +	arg.phandle = mem_phandle;
> > +
> > +	ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
> > +	if (ret != 1) {
> > +		pr_err("Associated memory node not found");
> > +		return -EINVAL;
> > +	}
> > +
> > +	*mem_node = arg.node;
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> > +					       u32 *retval)
> > +{
> > +	const __be32 *reg;
> > +
> > +	reg = of_get_flat_dt_prop(node, propname, NULL);
> > +	if (!reg)
> > +		return -EINVAL;
> > +
> > +	*retval = be32_to_cpup(reg);
> > +	return 0;
> > +}
> > +
> > +static u32 __init get_block_size_pages(u32 block_size_bytes)
> > +{
> > +	u32 a = PAGE_SIZE;
> > +	u32 b = block_size_bytes;
> > +	u32 r;
> > +
> > +	/* Find greatest common divisor using the Euclidian algorithm. */
> > +	do {
> > +		r = a % b;
> > +		a = b;
> > +		b = r;
> > +	} while (b != 0);
> > +
> > +	return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> > +}
> > +
> > +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> > +				       int depth, void *data)
> > +{
> > +	struct tag_region *region;
> > +	unsigned long mem_node;
> > +	struct range *mem_range;
> > +	struct range *tag_range;
> > +	u32 block_size_bytes;
> > +	u32 nid = 0;
> > +	int ret;
> > +
> > +	if (depth != 1 || !strstr(uname, "tag-storage"))
> > +		return 0;
> > +
> > +	if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> > +		return 0;
> > +
> > +	if (num_tag_regions == MAX_TAG_REGIONS) {
> > +		pr_err("Maximum number of tag storage regions exceeded");
> > +		return -EINVAL;
> > +	}
> > +
> > +	region = &tag_regions[num_tag_regions];
> > +	mem_range = &region->mem_range;
> > +	tag_range = &region->tag_range;
> > +
> > +	ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> > +	if (ret) {
> > +		pr_err("Invalid tag storage node");
> > +		return ret;
> > +	}
> > +
> > +	ret = tag_storage_get_memory_node(node, &mem_node);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> > +	if (ret) {
> > +		pr_err("Invalid address for associated data memory node");
> > +		return ret;
> > +	}
> > +
> > +	/* The tag region must exactly match the corresponding memory. */
> > +	if (range_len(tag_range) * 32 != range_len(mem_range)) {
> > +		pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> > +		       PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> > +		       PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> > +		return -EINVAL;
> > +	}
> > +
> > +	ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > +	if (ret || block_size_bytes == 0) {
> > +		pr_err("Invalid or missing 'block-size' property");
> > +		return -EINVAL;
> > +	}
> > +	region->block_size = get_block_size_pages(block_size_bytes);
> > +	if (range_len(tag_range) % region->block_size != 0) {
> > +		pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > +		       PFN_PHYS(range_len(tag_range)), region->block_size);
> > +		return -EINVAL;
> > +	}
> > +
> 
> I was confused about the variable "block_size", The block size declared in the device tree is
> in bytes, but the actual block size used is in pages. I think the term "block_size" can cause
> confusion as it might be interpreted as bytes. If possible, I suggest changing the term "block_size"
> to something more readable, such as "block_nr_pages" (This is just a example!)
> 
> Thanks,
> Regards.
> 
> > +	ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> > +	if (ret)
> > +		nid = numa_node_id();
> > +
> > +	ret = memblock_add_node(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)),
> > +				nid, MEMBLOCK_NONE);
> > +	if (ret) {
> > +		pr_err("Error adding tag memblock (%d)", ret);
> > +		return ret;
> > +	}
> > +	memblock_reserve(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > +
> > +	pr_info("Found tag storage region 0x%llx-0x%llx, block size %u pages",
> > +		PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end), region->block_size);
> > +
> > +	num_tag_regions++;
> > +
> > +	return 0;
> > +}
> > +
> > +void __init mte_tag_storage_init(void)
> > +{
> > +	struct range *tag_range;
> > +	int i, ret;
> > +
> > +	ret = of_scan_flat_dt(fdt_init_tag_storage, NULL);
> > +	if (ret) {
> > +		for (i = 0; i < num_tag_regions; i++) {
> > +			tag_range = &tag_regions[i].tag_range;
> > +			memblock_remove(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > +		}
> > +		num_tag_regions = 0;
> > +		pr_info("MTE tag storage region management disabled");
> > +	}
> > +}
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 417a8a86b2db..1b77138c1aa5 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -42,6 +42,7 @@
> >  #include <asm/cpufeature.h>
> >  #include <asm/cpu_ops.h>
> >  #include <asm/kasan.h>
> > +#include <asm/mte_tag_storage.h>
> >  #include <asm/numa.h>
> >  #include <asm/scs.h>
> >  #include <asm/sections.h>
> > @@ -342,6 +343,12 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
> >  			   FW_BUG "Booted with MMU enabled!");
> >  	}
> >  
> > +	/*
> > +	 * Must be called before memory limits are enforced by
> > +	 * arm64_memblock_init().
> > +	 */
> > +	mte_tag_storage_init();
> > +
> >  	arm64_memblock_init();
> >  
> >  	paging_init();
> > -- 
> > 2.42.1
> > 
> >
Alexandru Elisei Dec. 3, 2023, 12:14 p.m. UTC | #3
Hi,

On Wed, Nov 29, 2023 at 05:44:24PM +0900, Hyesoo Yu wrote:
> Hello.
> 
> On Sun, Nov 19, 2023 at 04:57:05PM +0000, Alexandru Elisei wrote:
> > Allow the kernel to get the size and location of the MTE tag storage
> > regions from the DTB. This memory is marked as reserved for now.
> > 
> > The DTB node for the tag storage region is defined as:
> > 
> >         tags0: tag-storage@8f8000000 {
> >                 compatible = "arm,mte-tag-storage";
> >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> >                 block-size = <0x1000>;
> >                 memory = <&memory0>;	// Associated tagged memory node
> >         };
> >
> 
> How about using compatible = "shared-dma-pool" like below ?
> 
> &reserved_memory {
> 	tags0: tag0@8f8000000 {
> 		compatible = "arm,mte-tag-storage";
>         	reg = <0x08 0xf8000000 0x00 0x4000000>;
> 	};
> }
> 
> tag-storage {
>         compatible = "arm,mte-tag-storage";
> 	memory-region = <&tag>;
>         memory = <&memory0>;
> 	block-size = <0x1000>;
> }
> 
> And then, the activation of CMA would be performed in the CMA code.
> We just can get the region information from memory-region and allocate it directly
> like alloc_contig_range, take_page_off_buddy. It seems like we can remove a lots of code.

Played with reserved_mem a bit. I don't think that's the correct path
forward.

The location of the tag storage is a hardware property, independent of how
Linux is configured.

early_init_fdt_scan_reserved_mem() is called from arm64_memblock_init(),
**after** the kernel enforces an upper address for various reasons. One of
the reasons can be that it's been compiled with 39 bits VA.

After early_init_fdt_scan_reserved_mem() returns, the kernel sets the
maximum address, stored in the variable "high_memory".

What can happen is that tag storage is present at an address above the
maximum addressable by the kernel, and the CMA code will trigger an
unrecovrable page fault.

I was able to trigger this with the dts change:

diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
index 60472d65a355..201359d014e4 100644
--- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
+++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
@@ -183,6 +183,13 @@ vram: vram@18000000 {
                        reg = <0x00000000 0x18000000 0 0x00800000>;
                        no-map;
                };
+
+
+               linux,cma {
+                       compatible = "shared-dma-pool";
+                       reg = <0x100 0x0 0x00 0x4000000>;
+                       reusable;
+               };
        };

        gic: interrupt-controller@2f000000 {

And the error I got:

[    0.000000] Reserved memory: created CMA memory pool at 0x0000010000000000, size 64 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] OF: reserved mem: 0x0000010000000000..0x0000010003ffffff (65536 KiB) map reusable linux,cma
[..]
[    0.793193] WARNING: CPU: 0 PID: 1 at mm/cma.c:111 cma_init_reserved_areas+0xa8/0x378
[..]
[    0.806945] Unable to handle kernel paging request at virtual address 00000001fe000000
[    0.807277] Mem abort info:
[    0.807277]   ESR = 0x0000000096000005
[    0.807693]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.808110]   SET = 0, FnV = 0
[    0.808443]   EA = 0, S1PTW = 0
[    0.808526]   FSC = 0x05: level 1 translation fault
[    0.808943] Data abort info:
[    0.808943]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[    0.809360]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    0.809776]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.810221] [00000001fe000000] user address but active_mm is swapper
[..]
[    0.820887] Call trace:
[    0.821027]  cma_init_reserved_areas+0xc4/0x378
[    0.821443]  do_one_initcall+0x7c/0x1c0
[    0.821860]  kernel_init_freeable+0x1bc/0x284
[    0.822277]  kernel_init+0x24/0x1dc
[    0.822693]  ret_from_fork+0x10/0x20
[    0.823554] Code: 9127a29a cb813321 d37ae421 8b030020 (f8636822)
[    0.823554] ---[ end trace 0000000000000000 ]---
[    0.824360] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.824443] SMP: stopping secondary CPUs
[    0.825193] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Should reserved mem check if the reserved memory is actually addressable by
the kernel if it's not "no-map"? Should cma fail gracefully if
!pfn_valid(base_pfn)? Shold early_init_fdt_scan_reserved_mem() be moved
because arm64_bootmem_init()? I don't have the answer to any of those. And
I got a kernel panic because the kernel cannot address that memory (39 bits
VA). I don't know what would happen if the upper limit is reduced for
another reason.

What I think should happen:

1. Add the tag storage memory before any limits are enforced by
arm64_bootmem_init().

2. Call cma_declare_contiguous_nid() after arm64_bootmem_init(), because
the function will check the memory limit.

3. Have an arch initcall that checks that the CMA regions corresponding to
the tag storage have been activated successfully (cma_init_reserved_areas()
is a core initcall). If not, then don't enable tag storage.

How does that sound to you?

Thanks,
Alex

> 
> > The tag storage region represents the largest contiguous memory region that
> > holds all the tags for the associated contiguous memory region which can be
> > tagged. For example, for a 32GB contiguous tagged memory the corresponding
> > tag storage region is 1GB of contiguous memory, not two adjacent 512M of
> > tag storage memory.
> > 
> > "block-size" represents the minimum multiple of 4K of tag storage where all
> > the tags stored in the block correspond to a contiguous memory region. This
> > is needed for platforms where the memory controller interleaves tag writes
> > to memory. For example, if the memory controller interleaves tag writes for
> > 256KB of contiguous memory across 8K of tag storage (2-way interleave),
> > then the correct value for "block-size" is 0x2000. This value is a hardware
> > property, independent of the selected kernel page size.
> >
> 
> Is it considered for kernel page size like 16K page, 64K page ? The comment says
> it should be a multiple of 4K, but it should be a multiple of the "page size" more accurately.
> Please let me know if there's anything I misunderstood. :-)
> 
> 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/Kconfig                       |  12 ++
> >  arch/arm64/include/asm/mte_tag_storage.h |  15 ++
> >  arch/arm64/kernel/Makefile               |   1 +
> >  arch/arm64/kernel/mte_tag_storage.c      | 256 +++++++++++++++++++++++
> >  arch/arm64/kernel/setup.c                |   7 +
> >  5 files changed, 291 insertions(+)
> >  create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
> >  create mode 100644 arch/arm64/kernel/mte_tag_storage.c
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 7b071a00425d..fe8276fdc7a8 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -2062,6 +2062,18 @@ config ARM64_MTE
> >  
> >  	  Documentation/arch/arm64/memory-tagging-extension.rst.
> >  
> > +if ARM64_MTE
> > +config ARM64_MTE_TAG_STORAGE
> > +	bool "Dynamic MTE tag storage management"
> > +	help
> > +	  Adds support for dynamic management of the memory used by the hardware
> > +	  for storing MTE tags. This memory, unlike normal memory, cannot be
> > +	  tagged. When it is used to store tags for another memory location it
> > +	  cannot be used for any type of allocation.
> > +
> > +	  If unsure, say N
> > +endif # ARM64_MTE
> > +
> >  endmenu # "ARMv8.5 architectural features"
> >  
> >  menu "ARMv8.7 architectural features"
> > diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h
> > new file mode 100644
> > index 000000000000..8f86c4f9a7c3
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/mte_tag_storage.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2023 ARM Ltd.
> > + */
> > +#ifndef __ASM_MTE_TAG_STORAGE_H
> > +#define __ASM_MTE_TAG_STORAGE_H
> > +
> > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> > +void mte_tag_storage_init(void);
> > +#else
> > +static inline void mte_tag_storage_init(void)
> > +{
> > +}
> > +#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
> > +#endif /* __ASM_MTE_TAG_STORAGE_H  */
> > diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> > index d95b3d6b471a..5f031bf9f8f1 100644
> > --- a/arch/arm64/kernel/Makefile
> > +++ b/arch/arm64/kernel/Makefile
> > @@ -70,6 +70,7 @@ obj-$(CONFIG_CRASH_CORE)		+= crash_core.o
> >  obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
> >  obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
> >  obj-$(CONFIG_ARM64_MTE)			+= mte.o
> > +obj-$(CONFIG_ARM64_MTE_TAG_STORAGE)	+= mte_tag_storage.o
> >  obj-y					+= vdso-wrap.o
> >  obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
> >  obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
> > diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c
> > new file mode 100644
> > index 000000000000..fa6267ef8392
> > --- /dev/null
> > +++ b/arch/arm64/kernel/mte_tag_storage.c
> > @@ -0,0 +1,256 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Support for dynamic tag storage.
> > + *
> > + * Copyright (C) 2023 ARM Ltd.
> > + */
> > +
> > +#include <linux/memblock.h>
> > +#include <linux/mm.h>
> > +#include <linux/of_device.h>
> > +#include <linux/of_fdt.h>
> > +#include <linux/range.h>
> > +#include <linux/string.h>
> > +#include <linux/xarray.h>
> > +
> > +#include <asm/mte_tag_storage.h>
> > +
> > +struct tag_region {
> > +	struct range mem_range;	/* Memory associated with the tag storage, in PFNs. */
> > +	struct range tag_range;	/* Tag storage memory, in PFNs. */
> > +	u32 block_size;		/* Tag block size, in pages. */
> > +};
> > +
> > +#define MAX_TAG_REGIONS	32
> > +
> > +static struct tag_region tag_regions[MAX_TAG_REGIONS];
> > +static int num_tag_regions;
> > +
> > +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> > +						int reg_len, struct range *range)
> > +{
> > +	int addr_cells = dt_root_addr_cells;
> > +	int size_cells = dt_root_size_cells;
> > +	u64 size;
> > +
> > +	if (reg_len / 4 > addr_cells + size_cells)
> > +		return -EINVAL;
> > +
> > +	range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> > +	size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> > +	if (size == 0) {
> > +		pr_err("Invalid node");
> > +		return -EINVAL;
> > +	}
> > +	range->end = range->start + size - 1;
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> > +						    struct range *tag_range)
> > +{
> > +	const __be32 *reg;
> > +	int reg_len;
> > +
> > +	reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > +	if (reg == NULL) {
> > +		pr_err("Invalid metadata node");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> > +}
> > +
> > +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> > +{
> > +	const __be32 *reg;
> > +	int reg_len;
> > +
> > +	reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> > +	if (reg == NULL)
> > +		reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > +
> > +	if (reg == NULL) {
> > +		pr_err("Invalid memory node");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> > +}
> > +
> > +struct find_memory_node_arg {
> > +	unsigned long node;
> > +	u32 phandle;
> > +};
> > +
> > +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> > +				       int depth, void *data)
> > +{
> > +	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> > +	struct find_memory_node_arg *arg = data;
> > +
> > +	if (depth != 1 || !type || strcmp(type, "memory") != 0)
> > +		return 0;
> > +
> > +	if (of_get_flat_dt_phandle(node) == arg->phandle) {
> > +		arg->node = node;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> > +{
> > +	struct find_memory_node_arg arg = { 0 };
> > +	const __be32 *memory_prop;
> > +	u32 mem_phandle;
> > +	int ret, reg_len;
> > +
> > +	memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> > +	if (!memory_prop) {
> > +		pr_err("Missing 'memory' property in the tag storage node");
> > +		return -EINVAL;
> > +	}
> > +
> > +	mem_phandle = be32_to_cpup(memory_prop);
> > +	arg.phandle = mem_phandle;
> > +
> > +	ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
> > +	if (ret != 1) {
> > +		pr_err("Associated memory node not found");
> > +		return -EINVAL;
> > +	}
> > +
> > +	*mem_node = arg.node;
> > +
> > +	return 0;
> > +}
> > +
> > +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> > +					       u32 *retval)
> > +{
> > +	const __be32 *reg;
> > +
> > +	reg = of_get_flat_dt_prop(node, propname, NULL);
> > +	if (!reg)
> > +		return -EINVAL;
> > +
> > +	*retval = be32_to_cpup(reg);
> > +	return 0;
> > +}
> > +
> > +static u32 __init get_block_size_pages(u32 block_size_bytes)
> > +{
> > +	u32 a = PAGE_SIZE;
> > +	u32 b = block_size_bytes;
> > +	u32 r;
> > +
> > +	/* Find greatest common divisor using the Euclidian algorithm. */
> > +	do {
> > +		r = a % b;
> > +		a = b;
> > +		b = r;
> > +	} while (b != 0);
> > +
> > +	return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> > +}
> > +
> > +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> > +				       int depth, void *data)
> > +{
> > +	struct tag_region *region;
> > +	unsigned long mem_node;
> > +	struct range *mem_range;
> > +	struct range *tag_range;
> > +	u32 block_size_bytes;
> > +	u32 nid = 0;
> > +	int ret;
> > +
> > +	if (depth != 1 || !strstr(uname, "tag-storage"))
> > +		return 0;
> > +
> > +	if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> > +		return 0;
> > +
> > +	if (num_tag_regions == MAX_TAG_REGIONS) {
> > +		pr_err("Maximum number of tag storage regions exceeded");
> > +		return -EINVAL;
> > +	}
> > +
> > +	region = &tag_regions[num_tag_regions];
> > +	mem_range = &region->mem_range;
> > +	tag_range = &region->tag_range;
> > +
> > +	ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> > +	if (ret) {
> > +		pr_err("Invalid tag storage node");
> > +		return ret;
> > +	}
> > +
> > +	ret = tag_storage_get_memory_node(node, &mem_node);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> > +	if (ret) {
> > +		pr_err("Invalid address for associated data memory node");
> > +		return ret;
> > +	}
> > +
> > +	/* The tag region must exactly match the corresponding memory. */
> > +	if (range_len(tag_range) * 32 != range_len(mem_range)) {
> > +		pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> > +		       PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> > +		       PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> > +		return -EINVAL;
> > +	}
> > +
> > +	ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > +	if (ret || block_size_bytes == 0) {
> > +		pr_err("Invalid or missing 'block-size' property");
> > +		return -EINVAL;
> > +	}
> > +	region->block_size = get_block_size_pages(block_size_bytes);
> > +	if (range_len(tag_range) % region->block_size != 0) {
> > +		pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > +		       PFN_PHYS(range_len(tag_range)), region->block_size);
> > +		return -EINVAL;
> > +	}
> > +
> 
> I was confused about the variable "block_size", The block size declared in the device tree is
> in bytes, but the actual block size used is in pages. I think the term "block_size" can cause
> confusion as it might be interpreted as bytes. If possible, I suggest changing the term "block_size"
> to something more readable, such as "block_nr_pages" (This is just a example!)
> 
> Thanks,
> Regards.
> 
> > +	ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> > +	if (ret)
> > +		nid = numa_node_id();
> > +
> > +	ret = memblock_add_node(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)),
> > +				nid, MEMBLOCK_NONE);
> > +	if (ret) {
> > +		pr_err("Error adding tag memblock (%d)", ret);
> > +		return ret;
> > +	}
> > +	memblock_reserve(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > +
> > +	pr_info("Found tag storage region 0x%llx-0x%llx, block size %u pages",
> > +		PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end), region->block_size);
> > +
> > +	num_tag_regions++;
> > +
> > +	return 0;
> > +}
> > +
> > +void __init mte_tag_storage_init(void)
> > +{
> > +	struct range *tag_range;
> > +	int i, ret;
> > +
> > +	ret = of_scan_flat_dt(fdt_init_tag_storage, NULL);
> > +	if (ret) {
> > +		for (i = 0; i < num_tag_regions; i++) {
> > +			tag_range = &tag_regions[i].tag_range;
> > +			memblock_remove(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > +		}
> > +		num_tag_regions = 0;
> > +		pr_info("MTE tag storage region management disabled");
> > +	}
> > +}
> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > index 417a8a86b2db..1b77138c1aa5 100644
> > --- a/arch/arm64/kernel/setup.c
> > +++ b/arch/arm64/kernel/setup.c
> > @@ -42,6 +42,7 @@
> >  #include <asm/cpufeature.h>
> >  #include <asm/cpu_ops.h>
> >  #include <asm/kasan.h>
> > +#include <asm/mte_tag_storage.h>
> >  #include <asm/numa.h>
> >  #include <asm/scs.h>
> >  #include <asm/sections.h>
> > @@ -342,6 +343,12 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
> >  			   FW_BUG "Booted with MMU enabled!");
> >  	}
> >  
> > +	/*
> > +	 * Must be called before memory limits are enforced by
> > +	 * arm64_memblock_init().
> > +	 */
> > +	mte_tag_storage_init();
> > +
> >  	arm64_memblock_init();
> >  
> >  	paging_init();
> > -- 
> > 2.42.1
> > 
> >
Hyesoo Yu Dec. 8, 2023, 5:03 a.m. UTC | #4
Hi, 

I'm sorry for the late response, I was on vacation.

On Sun, Dec 03, 2023 at 12:14:30PM +0000, Alexandru Elisei wrote:
> Hi,
> 
> On Wed, Nov 29, 2023 at 05:44:24PM +0900, Hyesoo Yu wrote:
> > Hello.
> > 
> > On Sun, Nov 19, 2023 at 04:57:05PM +0000, Alexandru Elisei wrote:
> > > Allow the kernel to get the size and location of the MTE tag storage
> > > regions from the DTB. This memory is marked as reserved for now.
> > > 
> > > The DTB node for the tag storage region is defined as:
> > > 
> > >         tags0: tag-storage@8f8000000 {
> > >                 compatible = "arm,mte-tag-storage";
> > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > >                 block-size = <0x1000>;
> > >                 memory = <&memory0>;	// Associated tagged memory node
> > >         };
> > >
> > 
> > How about using compatible = "shared-dma-pool" like below ?
> > 
> > &reserved_memory {
> > 	tags0: tag0@8f8000000 {
> > 		compatible = "arm,mte-tag-storage";
> >         	reg = <0x08 0xf8000000 0x00 0x4000000>;
> > 	};
> > }
> > 
> > tag-storage {
> >         compatible = "arm,mte-tag-storage";
> > 	memory-region = <&tag>;
> >         memory = <&memory0>;
> > 	block-size = <0x1000>;
> > }
> > 
> > And then, the activation of CMA would be performed in the CMA code.
> > We just can get the region information from memory-region and allocate it directly
> > like alloc_contig_range, take_page_off_buddy. It seems like we can remove a lots of code.
>

Sorry, that example was my mistake. Actually I wanted to write like this. 

&reserved_memory {
	tags0: tag0@8f8000000 {
		compatible = "shared-dma-pool";
        	reg = <0x08 0xf8000000 0x00 0x4000000>;
		reusable;
	};
}

tag-storage {
        compatible = "arm,mte-tag-storage";
 	memory-region = <&tag>;
        memory = <&memory0>;
	block-size = <0x1000>;
}


> Played with reserved_mem a bit. I don't think that's the correct path
> forward.
> 
> The location of the tag storage is a hardware property, independent of how
> Linux is configured.
> 
> early_init_fdt_scan_reserved_mem() is called from arm64_memblock_init(),
> **after** the kernel enforces an upper address for various reasons. One of
> the reasons can be that it's been compiled with 39 bits VA.
> 

I'm not sure about this part. What is the upper address enforced by the kernel ?
Where can I check the code ? Do you means that memblock_end_of_DRAM() ?  

> After early_init_fdt_scan_reserved_mem() returns, the kernel sets the
> maximum address, stored in the variable "high_memory".
>
> What can happen is that tag storage is present at an address above the
> maximum addressable by the kernel, and the CMA code will trigger an
> unrecovrable page fault.
> 
> I was able to trigger this with the dts change:
> 
> diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> index 60472d65a355..201359d014e4 100644
> --- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> @@ -183,6 +183,13 @@ vram: vram@18000000 {
>                         reg = <0x00000000 0x18000000 0 0x00800000>;
>                         no-map;
>                 };
> +
> +
> +               linux,cma {
> +                       compatible = "shared-dma-pool";
> +                       reg = <0x100 0x0 0x00 0x4000000>;
> +                       reusable;
> +               };
>         };
> 
>         gic: interrupt-controller@2f000000 {
> 
> And the error I got:
> 
> [    0.000000] Reserved memory: created CMA memory pool at 0x0000010000000000, size 64 MiB
> [    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
> [    0.000000] OF: reserved mem: 0x0000010000000000..0x0000010003ffffff (65536 KiB) map reusable linux,cma
> [..]
> [    0.793193] WARNING: CPU: 0 PID: 1 at mm/cma.c:111 cma_init_reserved_areas+0xa8/0x378
> [..]
> [    0.806945] Unable to handle kernel paging request at virtual address 00000001fe000000
> [    0.807277] Mem abort info:
> [    0.807277]   ESR = 0x0000000096000005
> [    0.807693]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    0.808110]   SET = 0, FnV = 0
> [    0.808443]   EA = 0, S1PTW = 0
> [    0.808526]   FSC = 0x05: level 1 translation fault
> [    0.808943] Data abort info:
> [    0.808943]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> [    0.809360]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [    0.809776]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [    0.810221] [00000001fe000000] user address but active_mm is swapper
> [..]
> [    0.820887] Call trace:
> [    0.821027]  cma_init_reserved_areas+0xc4/0x378
> [    0.821443]  do_one_initcall+0x7c/0x1c0
> [    0.821860]  kernel_init_freeable+0x1bc/0x284
> [    0.822277]  kernel_init+0x24/0x1dc
> [    0.822693]  ret_from_fork+0x10/0x20
> [    0.823554] Code: 9127a29a cb813321 d37ae421 8b030020 (f8636822)
> [    0.823554] ---[ end trace 0000000000000000 ]---
> [    0.824360] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    0.824443] SMP: stopping secondary CPUs
> [    0.825193] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> Should reserved mem check if the reserved memory is actually addressable by
> the kernel if it's not "no-map"? Should cma fail gracefully if
> !pfn_valid(base_pfn)? Shold early_init_fdt_scan_reserved_mem() be moved
> because arm64_bootmem_init()? I don't have the answer to any of those. And
> I got a kernel panic because the kernel cannot address that memory (39 bits
> VA). I don't know what would happen if the upper limit is reduced for
> another reason.
> 

My answer may not be accurate because I don't understand what this upper limit is.
Is this a problem caused by the tag storage area not being included in the memory node ?

The reason for not including it in the memory node is to enable static mte when dmte
initilization fails, right ? I think I missed that. I thought the tag storage is included
in the memory node and registered as cma.

> What I think should happen:
> 
> 1. Add the tag storage memory before any limits are enforced by
> arm64_bootmem_init().
>
> 2. Call cma_declare_contiguous_nid() after arm64_bootmem_init(), because
> the function will check the memory limit.
> 
> 3. Have an arch initcall that checks that the CMA regions corresponding to
> the tag storage have been activated successfully (cma_init_reserved_areas()
> is a core initcall). If not, then don't enable tag storage.
> 
> How does that sound to you?
> 
> Thanks,
> Alex
> 

I think this is a good way to utilize the cma code !

Thanks,
Regards.

> > > +	ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > > +	if (ret || block_size_bytes == 0) {
> > > +		pr_err("Invalid or missing 'block-size' property");
> > > +		return -EINVAL;
> > > +	}
> > > +	region->block_size = get_block_size_pages(block_size_bytes);
> > > +	if (range_len(tag_range) % region->block_size != 0) {
> > > +		pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > > +		       PFN_PHYS(range_len(tag_range)), region->block_size);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > 
> > I was confused about the variable "block_size", The block size declared in the device tree is
> > in bytes, but the actual block size used is in pages. I think the term "block_size" can cause
> > confusion as it might be interpreted as bytes. If possible, I suggest changing the term "block_size"
> > to something more readable, such as "block_nr_pages" (This is just a example!)
> > 
> > Thanks,
> > Regards.
>>

What do you think about this ?

Thanks,
Regards.

> > > +	ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> > > +	if (ret)
> > > +		nid = numa_node_id();
> > > +
> > > +	ret = memblock_add_node(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)),
> > > +				nid, MEMBLOCK_NONE);
> > > +	if (ret) {
> > > +		pr_err("Error adding tag memblock (%d)", ret);
> > > +		return ret;
> > > +	}
> > > +	memblock_reserve(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > > +
> > > +	pr_info("Found tag storage region 0x%llx-0x%llx, block size %u pages",
> > > +		PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end), region->block_size);
> > > +
> > > +	num_tag_regions++;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +void __init mte_tag_storage_init(void)
> > > +{
> > > +	struct range *tag_range;
> > > +	int i, ret;
> > > +
> > > +	ret = of_scan_flat_dt(fdt_init_tag_storage, NULL);
> > > +	if (ret) {
> > > +		for (i = 0; i < num_tag_regions; i++) {
> > > +			tag_range = &tag_regions[i].tag_range;
> > > +			memblock_remove(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > > +		}
> > > +		num_tag_regions = 0;
> > > +		pr_info("MTE tag storage region management disabled");
> > > +	}
> > > +}
> > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > > index 417a8a86b2db..1b77138c1aa5 100644
> > > --- a/arch/arm64/kernel/setup.c
> > > +++ b/arch/arm64/kernel/setup.c
> > > @@ -42,6 +42,7 @@
> > >  #include <asm/cpufeature.h>
> > >  #include <asm/cpu_ops.h>
> > >  #include <asm/kasan.h>
> > > +#include <asm/mte_tag_storage.h>
> > >  #include <asm/numa.h>
> > >  #include <asm/scs.h>
> > >  #include <asm/sections.h>
> > > @@ -342,6 +343,12 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
> > >  			   FW_BUG "Booted with MMU enabled!");
> > >  	}
> > >  
> > > +	/*
> > > +	 * Must be called before memory limits are enforced by
> > > +	 * arm64_memblock_init().
> > > +	 */
> > > +	mte_tag_storage_init();
> > > +
> > >  	arm64_memblock_init();
> > >  
> > >  	paging_init();
> > > -- 
> > > 2.42.1
> > > 
> > > 
> 
> 
>
Alexandru Elisei Dec. 11, 2023, 2:45 p.m. UTC | #5
Hi,

On Fri, Dec 08, 2023 at 02:03:44PM +0900, Hyesoo Yu wrote:
> Hi, 
> 
> I'm sorry for the late response, I was on vacation.
> 
> On Sun, Dec 03, 2023 at 12:14:30PM +0000, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Wed, Nov 29, 2023 at 05:44:24PM +0900, Hyesoo Yu wrote:
> > > Hello.
> > > 
> > > On Sun, Nov 19, 2023 at 04:57:05PM +0000, Alexandru Elisei wrote:
> > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > regions from the DTB. This memory is marked as reserved for now.
> > > > 
> > > > The DTB node for the tag storage region is defined as:
> > > > 
> > > >         tags0: tag-storage@8f8000000 {
> > > >                 compatible = "arm,mte-tag-storage";
> > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > >                 block-size = <0x1000>;
> > > >                 memory = <&memory0>;	// Associated tagged memory node
> > > >         };
> > > >
> > > 
> > > How about using compatible = "shared-dma-pool" like below ?
> > > 
> > > &reserved_memory {
> > > 	tags0: tag0@8f8000000 {
> > > 		compatible = "arm,mte-tag-storage";
> > >         	reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > 	};
> > > }
> > > 
> > > tag-storage {
> > >         compatible = "arm,mte-tag-storage";
> > > 	memory-region = <&tag>;
> > >         memory = <&memory0>;
> > > 	block-size = <0x1000>;
> > > }
> > > 
> > > And then, the activation of CMA would be performed in the CMA code.
> > > We just can get the region information from memory-region and allocate it directly
> > > like alloc_contig_range, take_page_off_buddy. It seems like we can remove a lots of code.
> >
> 
> Sorry, that example was my mistake. Actually I wanted to write like this. 
> 
> &reserved_memory {
> 	tags0: tag0@8f8000000 {
> 		compatible = "shared-dma-pool";
>         	reg = <0x08 0xf8000000 0x00 0x4000000>;
> 		reusable;
> 	};
> }
> 
> tag-storage {
>         compatible = "arm,mte-tag-storage";
>  	memory-region = <&tag>;
>         memory = <&memory0>;
> 	block-size = <0x1000>;
> }

I prototyped your suggestion with this change to the device tree:

            reserved-memory {
                    #address-cells = <0x02>;
                    #size-cells = <0x02>;
                    ranges;

                    tags0: tag-storage@8f8000000 {
                            compatible = "arm,mte-tag-storage";
                            reg = <0x08 0xf8000000 0x00 0x4000000>;
                            block-size = <0x1000>;
                            memory = <&memory0>;
                            reusable;
                    };
            };

Would you mind explaining what we are gaining by using reserved mem?

Struct reserved_mem only has the base and size of the tag storage region,
and initialization for reserved mem happens before the DTB is unflattened.
When I prototyped using reserved mem, I still had to write the code to
parse the memory node address and size. This code was the same as the code
needed to parse the tag storage region address and size, so having that
information in struct reserved_mem does not reduce the size of the code by
a meaningful amount.

> 
> 
> > Played with reserved_mem a bit. I don't think that's the correct path
> > forward.
> > 
> > The location of the tag storage is a hardware property, independent of how
> > Linux is configured.
> > 
> > early_init_fdt_scan_reserved_mem() is called from arm64_memblock_init(),
> > **after** the kernel enforces an upper address for various reasons. One of
> > the reasons can be that it's been compiled with 39 bits VA.
> > 
> 
> I'm not sure about this part. What is the upper address enforced by the kernel ?
> Where can I check the code ? Do you means that memblock_end_of_DRAM() ?  

I am referring to arch/arm64/mm/init.c:: arm64_memblock_init(). The
function initializes reserved mem (in early_init_fdt_scan_reserved_mem())
**after**removing memory from memblock that the kernel cannot address.

> 
> > After early_init_fdt_scan_reserved_mem() returns, the kernel sets the
> > maximum address, stored in the variable "high_memory".
> >
> > What can happen is that tag storage is present at an address above the
> > maximum addressable by the kernel, and the CMA code will trigger an
> > unrecovrable page fault.
> > 
> > I was able to trigger this with the dts change:
> > 
> > diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> > index 60472d65a355..201359d014e4 100644
> > --- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> > +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
> > @@ -183,6 +183,13 @@ vram: vram@18000000 {
> >                         reg = <0x00000000 0x18000000 0 0x00800000>;
> >                         no-map;
> >                 };
> > +
> > +
> > +               linux,cma {
> > +                       compatible = "shared-dma-pool";
> > +                       reg = <0x100 0x0 0x00 0x4000000>;
> > +                       reusable;
> > +               };
> >         };
> > 
> >         gic: interrupt-controller@2f000000 {
> > 
> > And the error I got:
> > 
> > [    0.000000] Reserved memory: created CMA memory pool at 0x0000010000000000, size 64 MiB
> > [    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
> > [    0.000000] OF: reserved mem: 0x0000010000000000..0x0000010003ffffff (65536 KiB) map reusable linux,cma
> > [..]
> > [    0.793193] WARNING: CPU: 0 PID: 1 at mm/cma.c:111 cma_init_reserved_areas+0xa8/0x378
> > [..]
> > [    0.806945] Unable to handle kernel paging request at virtual address 00000001fe000000
> > [    0.807277] Mem abort info:
> > [    0.807277]   ESR = 0x0000000096000005
> > [    0.807693]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [    0.808110]   SET = 0, FnV = 0
> > [    0.808443]   EA = 0, S1PTW = 0
> > [    0.808526]   FSC = 0x05: level 1 translation fault
> > [    0.808943] Data abort info:
> > [    0.808943]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > [    0.809360]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > [    0.809776]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [    0.810221] [00000001fe000000] user address but active_mm is swapper
> > [..]
> > [    0.820887] Call trace:
> > [    0.821027]  cma_init_reserved_areas+0xc4/0x378
> > [    0.821443]  do_one_initcall+0x7c/0x1c0
> > [    0.821860]  kernel_init_freeable+0x1bc/0x284
> > [    0.822277]  kernel_init+0x24/0x1dc
> > [    0.822693]  ret_from_fork+0x10/0x20
> > [    0.823554] Code: 9127a29a cb813321 d37ae421 8b030020 (f8636822)
> > [    0.823554] ---[ end trace 0000000000000000 ]---
> > [    0.824360] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    0.824443] SMP: stopping secondary CPUs
> > [    0.825193] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> > 
> > Should reserved mem check if the reserved memory is actually addressable by
> > the kernel if it's not "no-map"? Should cma fail gracefully if
> > !pfn_valid(base_pfn)? Shold early_init_fdt_scan_reserved_mem() be moved
> > because arm64_bootmem_init()? I don't have the answer to any of those. And
> > I got a kernel panic because the kernel cannot address that memory (39 bits
> > VA). I don't know what would happen if the upper limit is reduced for
> > another reason.
> > 
> 
> My answer may not be accurate because I don't understand what this upper limit is.
> Is this a problem caused by the tag storage area not being included in the memory node ?

This problem is caused by the kernel cannot using virtual addresses in the
linear map (where VA == PA) to access the tag storage region.

> 
> The reason for not including it in the memory node is to enable static mte when dmte
> initilization fails, right ? I think I missed that. I thought the tag storage is included
> in the memory node and registered as cma.
> 
> > What I think should happen:
> > 
> > 1. Add the tag storage memory before any limits are enforced by
> > arm64_bootmem_init().
> >
> > 2. Call cma_declare_contiguous_nid() after arm64_bootmem_init(), because
> > the function will check the memory limit.
> > 
> > 3. Have an arch initcall that checks that the CMA regions corresponding to
> > the tag storage have been activated successfully (cma_init_reserved_areas()
> > is a core initcall). If not, then don't enable tag storage.
> > 
> > How does that sound to you?
> > 
> > Thanks,
> > Alex
> > 
> 
> I think this is a good way to utilize the cma code !

Cool, thanks!

Alex

> 
> Thanks,
> Regards.
> 
> > > > +	ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > > > +	if (ret || block_size_bytes == 0) {
> > > > +		pr_err("Invalid or missing 'block-size' property");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +	region->block_size = get_block_size_pages(block_size_bytes);
> > > > +	if (range_len(tag_range) % region->block_size != 0) {
> > > > +		pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > > > +		       PFN_PHYS(range_len(tag_range)), region->block_size);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > 
> > > I was confused about the variable "block_size", The block size declared in the device tree is
> > > in bytes, but the actual block size used is in pages. I think the term "block_size" can cause
> > > confusion as it might be interpreted as bytes. If possible, I suggest changing the term "block_size"
> > > to something more readable, such as "block_nr_pages" (This is just a example!)
> > > 
> > > Thanks,
> > > Regards.
> >>
> 
> What do you think about this ?
> 
> Thanks,
> Regards.
> 
> > > > +	ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> > > > +	if (ret)
> > > > +		nid = numa_node_id();
> > > > +
> > > > +	ret = memblock_add_node(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)),
> > > > +				nid, MEMBLOCK_NONE);
> > > > +	if (ret) {
> > > > +		pr_err("Error adding tag memblock (%d)", ret);
> > > > +		return ret;
> > > > +	}
> > > > +	memblock_reserve(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > > > +
> > > > +	pr_info("Found tag storage region 0x%llx-0x%llx, block size %u pages",
> > > > +		PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end), region->block_size);
> > > > +
> > > > +	num_tag_regions++;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +void __init mte_tag_storage_init(void)
> > > > +{
> > > > +	struct range *tag_range;
> > > > +	int i, ret;
> > > > +
> > > > +	ret = of_scan_flat_dt(fdt_init_tag_storage, NULL);
> > > > +	if (ret) {
> > > > +		for (i = 0; i < num_tag_regions; i++) {
> > > > +			tag_range = &tag_regions[i].tag_range;
> > > > +			memblock_remove(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
> > > > +		}
> > > > +		num_tag_regions = 0;
> > > > +		pr_info("MTE tag storage region management disabled");
> > > > +	}
> > > > +}
> > > > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > > > index 417a8a86b2db..1b77138c1aa5 100644
> > > > --- a/arch/arm64/kernel/setup.c
> > > > +++ b/arch/arm64/kernel/setup.c
> > > > @@ -42,6 +42,7 @@
> > > >  #include <asm/cpufeature.h>
> > > >  #include <asm/cpu_ops.h>
> > > >  #include <asm/kasan.h>
> > > > +#include <asm/mte_tag_storage.h>
> > > >  #include <asm/numa.h>
> > > >  #include <asm/scs.h>
> > > >  #include <asm/sections.h>
> > > > @@ -342,6 +343,12 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p)
> > > >  			   FW_BUG "Booted with MMU enabled!");
> > > >  	}
> > > >  
> > > > +	/*
> > > > +	 * Must be called before memory limits are enforced by
> > > > +	 * arm64_memblock_init().
> > > > +	 */
> > > > +	mte_tag_storage_init();
> > > > +
> > > >  	arm64_memblock_init();
> > > >  
> > > >  	paging_init();
> > > > -- 
> > > > 2.42.1
> > > > 
> > > > 
> > 
> > 
> >
Rob Herring (Arm) Dec. 11, 2023, 5:29 p.m. UTC | #6
On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Allow the kernel to get the size and location of the MTE tag storage
> regions from the DTB. This memory is marked as reserved for now.
>
> The DTB node for the tag storage region is defined as:
>
>         tags0: tag-storage@8f8000000 {
>                 compatible = "arm,mte-tag-storage";
>                 reg = <0x08 0xf8000000 0x00 0x4000000>;
>                 block-size = <0x1000>;
>                 memory = <&memory0>;    // Associated tagged memory node
>         };

I skimmed thru the discussion some. If this memory range is within
main RAM, then it definitely belongs in /reserved-memory.

You need a binding for this too.

> The tag storage region represents the largest contiguous memory region that
> holds all the tags for the associated contiguous memory region which can be
> tagged. For example, for a 32GB contiguous tagged memory the corresponding
> tag storage region is 1GB of contiguous memory, not two adjacent 512M of
> tag storage memory.
>
> "block-size" represents the minimum multiple of 4K of tag storage where all
> the tags stored in the block correspond to a contiguous memory region. This
> is needed for platforms where the memory controller interleaves tag writes
> to memory. For example, if the memory controller interleaves tag writes for
> 256KB of contiguous memory across 8K of tag storage (2-way interleave),
> then the correct value for "block-size" is 0x2000. This value is a hardware
> property, independent of the selected kernel page size.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/Kconfig                       |  12 ++
>  arch/arm64/include/asm/mte_tag_storage.h |  15 ++
>  arch/arm64/kernel/Makefile               |   1 +
>  arch/arm64/kernel/mte_tag_storage.c      | 256 +++++++++++++++++++++++
>  arch/arm64/kernel/setup.c                |   7 +
>  5 files changed, 291 insertions(+)
>  create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
>  create mode 100644 arch/arm64/kernel/mte_tag_storage.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7b071a00425d..fe8276fdc7a8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,18 @@ config ARM64_MTE
>
>           Documentation/arch/arm64/memory-tagging-extension.rst.
>
> +if ARM64_MTE
> +config ARM64_MTE_TAG_STORAGE
> +       bool "Dynamic MTE tag storage management"
> +       help
> +         Adds support for dynamic management of the memory used by the hardware
> +         for storing MTE tags. This memory, unlike normal memory, cannot be
> +         tagged. When it is used to store tags for another memory location it
> +         cannot be used for any type of allocation.
> +
> +         If unsure, say N
> +endif # ARM64_MTE
> +
>  endmenu # "ARMv8.5 architectural features"
>
>  menu "ARMv8.7 architectural features"
> diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h
> new file mode 100644
> index 000000000000..8f86c4f9a7c3
> --- /dev/null
> +++ b/arch/arm64/include/asm/mte_tag_storage.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +#ifndef __ASM_MTE_TAG_STORAGE_H
> +#define __ASM_MTE_TAG_STORAGE_H
> +
> +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> +void mte_tag_storage_init(void);
> +#else
> +static inline void mte_tag_storage_init(void)
> +{
> +}
> +#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
> +#endif /* __ASM_MTE_TAG_STORAGE_H  */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index d95b3d6b471a..5f031bf9f8f1 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -70,6 +70,7 @@ obj-$(CONFIG_CRASH_CORE)              += crash_core.o
>  obj-$(CONFIG_ARM_SDE_INTERFACE)                += sdei.o
>  obj-$(CONFIG_ARM64_PTR_AUTH)           += pointer_auth.o
>  obj-$(CONFIG_ARM64_MTE)                        += mte.o
> +obj-$(CONFIG_ARM64_MTE_TAG_STORAGE)    += mte_tag_storage.o
>  obj-y                                  += vdso-wrap.o
>  obj-$(CONFIG_COMPAT_VDSO)              += vdso32-wrap.o
>  obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)        += patch-scs.o
> diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c
> new file mode 100644
> index 000000000000..fa6267ef8392
> --- /dev/null
> +++ b/arch/arm64/kernel/mte_tag_storage.c
> @@ -0,0 +1,256 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Support for dynamic tag storage.
> + *
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#include <linux/memblock.h>
> +#include <linux/mm.h>
> +#include <linux/of_device.h>

You probably don't need this header. If you depend on what it
implicitly includes, then that will now break in linux-next.

> +#include <linux/of_fdt.h>
> +#include <linux/range.h>
> +#include <linux/string.h>
> +#include <linux/xarray.h>
> +
> +#include <asm/mte_tag_storage.h>
> +
> +struct tag_region {
> +       struct range mem_range; /* Memory associated with the tag storage, in PFNs. */
> +       struct range tag_range; /* Tag storage memory, in PFNs. */
> +       u32 block_size;         /* Tag block size, in pages. */
> +};
> +
> +#define MAX_TAG_REGIONS        32
> +
> +static struct tag_region tag_regions[MAX_TAG_REGIONS];
> +static int num_tag_regions;
> +
> +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> +                                               int reg_len, struct range *range)
> +{
> +       int addr_cells = dt_root_addr_cells;
> +       int size_cells = dt_root_size_cells;
> +       u64 size;
> +
> +       if (reg_len / 4 > addr_cells + size_cells)
> +               return -EINVAL;
> +
> +       range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> +       size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> +       if (size == 0) {
> +               pr_err("Invalid node");
> +               return -EINVAL;
> +       }
> +       range->end = range->start + size - 1;

We have a function to read (and translate which you forgot) addresses.
Add what's missing rather than open code your own.

> +
> +       return 0;
> +}
> +
> +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> +                                                   struct range *tag_range)
> +{
> +       const __be32 *reg;
> +       int reg_len;
> +
> +       reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> +       if (reg == NULL) {
> +               pr_err("Invalid metadata node");
> +               return -EINVAL;
> +       }
> +
> +       return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> +}
> +
> +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> +{
> +       const __be32 *reg;
> +       int reg_len;
> +
> +       reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> +       if (reg == NULL)
> +               reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> +
> +       if (reg == NULL) {
> +               pr_err("Invalid memory node");
> +               return -EINVAL;
> +       }
> +
> +       return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> +}
> +
> +struct find_memory_node_arg {
> +       unsigned long node;
> +       u32 phandle;
> +};
> +
> +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> +                                      int depth, void *data)
> +{
> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +       struct find_memory_node_arg *arg = data;
> +
> +       if (depth != 1 || !type || strcmp(type, "memory") != 0)
> +               return 0;
> +
> +       if (of_get_flat_dt_phandle(node) == arg->phandle) {
> +               arg->node = node;
> +               return 1;
> +       }
> +
> +       return 0;
> +}
> +
> +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> +{
> +       struct find_memory_node_arg arg = { 0 };
> +       const __be32 *memory_prop;
> +       u32 mem_phandle;
> +       int ret, reg_len;
> +
> +       memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> +       if (!memory_prop) {
> +               pr_err("Missing 'memory' property in the tag storage node");
> +               return -EINVAL;
> +       }
> +
> +       mem_phandle = be32_to_cpup(memory_prop);
> +       arg.phandle = mem_phandle;
> +
> +       ret = of_scan_flat_dt(fdt_find_memory_node, &arg);

Do not use of_scan_flat_dt. It is a relic predating libfdt which can
get a node by phandle directly.

> +       if (ret != 1) {
> +               pr_err("Associated memory node not found");
> +               return -EINVAL;
> +       }
> +
> +       *mem_node = arg.node;
> +
> +       return 0;
> +}
> +
> +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> +                                              u32 *retval)

If you are going to make a generic function, make it for everyone.

> +{
> +       const __be32 *reg;
> +
> +       reg = of_get_flat_dt_prop(node, propname, NULL);
> +       if (!reg)
> +               return -EINVAL;
> +
> +       *retval = be32_to_cpup(reg);
> +       return 0;
> +}
> +
> +static u32 __init get_block_size_pages(u32 block_size_bytes)
> +{
> +       u32 a = PAGE_SIZE;
> +       u32 b = block_size_bytes;
> +       u32 r;
> +
> +       /* Find greatest common divisor using the Euclidian algorithm. */
> +       do {
> +               r = a % b;
> +               a = b;
> +               b = r;
> +       } while (b != 0);
> +
> +       return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> +}
> +
> +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> +                                      int depth, void *data)
> +{
> +       struct tag_region *region;
> +       unsigned long mem_node;
> +       struct range *mem_range;
> +       struct range *tag_range;
> +       u32 block_size_bytes;
> +       u32 nid = 0;
> +       int ret;
> +
> +       if (depth != 1 || !strstr(uname, "tag-storage"))
> +               return 0;
> +
> +       if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> +               return 0;
> +
> +       if (num_tag_regions == MAX_TAG_REGIONS) {
> +               pr_err("Maximum number of tag storage regions exceeded");
> +               return -EINVAL;
> +       }
> +
> +       region = &tag_regions[num_tag_regions];
> +       mem_range = &region->mem_range;
> +       tag_range = &region->tag_range;
> +
> +       ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> +       if (ret) {
> +               pr_err("Invalid tag storage node");
> +               return ret;
> +       }
> +
> +       ret = tag_storage_get_memory_node(node, &mem_node);
> +       if (ret)
> +               return ret;
> +
> +       ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> +       if (ret) {
> +               pr_err("Invalid address for associated data memory node");
> +               return ret;
> +       }
> +
> +       /* The tag region must exactly match the corresponding memory. */
> +       if (range_len(tag_range) * 32 != range_len(mem_range)) {
> +               pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> +                      PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> +                      PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> +               return -EINVAL;
> +       }
> +
> +       ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> +       if (ret || block_size_bytes == 0) {
> +               pr_err("Invalid or missing 'block-size' property");
> +               return -EINVAL;
> +       }
> +       region->block_size = get_block_size_pages(block_size_bytes);
> +       if (range_len(tag_range) % region->block_size != 0) {
> +               pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> +                      PFN_PHYS(range_len(tag_range)), region->block_size);
> +               return -EINVAL;
> +       }
> +
> +       ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);

I was going to say we already have a way to associate memory nodes
other nodes using "numa-node-id", so the "memory" phandle property is
somewhat redundant. Maybe the tag node should have a numa-node-id.
With that, it looks like you don't even need to access the /memory
node. Avoiding that would be good for 2 reasons. It avoids parsing
memory nodes twice and it's not the kernel's job to validate the DT.
Really, if you want memory info, you should use memblock to get it
because all the special cases of memory layout are handled. For
example you can have memory nodes with multiple 'reg' entries or
multiple memory nodes or both, and then some of those could be
contiguous.

Rob
Alexandru Elisei Dec. 12, 2023, 4:38 p.m. UTC | #7
Hi Rob,

Thank you so much for the feedback, I'm not very familiar with device tree,
and any comments are very useful.

On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Allow the kernel to get the size and location of the MTE tag storage
> > regions from the DTB. This memory is marked as reserved for now.
> >
> > The DTB node for the tag storage region is defined as:
> >
> >         tags0: tag-storage@8f8000000 {
> >                 compatible = "arm,mte-tag-storage";
> >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> >                 block-size = <0x1000>;
> >                 memory = <&memory0>;    // Associated tagged memory node
> >         };
> 
> I skimmed thru the discussion some. If this memory range is within
> main RAM, then it definitely belongs in /reserved-memory.

Ok, will do that.

If you don't mind, why do you say that it definitely belongs in
reserved-memory? I'm not trying to argue otherwise, I'm curious about the
motivation.

Tag storage is not DMA and can live anywhere in memory. In
arm64_memblock_init(), the kernel first removes the memory that it cannot
address from memblock. For example, because it has been compiled with
CONFIG_ARM64_VA_BITS_39=y. And then calls
early_init_fdt_scan_reserved_mem().

What happens if reserved memory is above what the kernel can address?

From my testing, when the kernel is compiled with 39 bit VA, if I use
reserved memory to discover tag storage the lives above the virtua address
limit and then I try to use CMA to manage the tag storage memory, I get a
kernel panic:

[    0.000000] Reserved memory: created CMA memory pool at 0x0000010000000000, size 64 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] OF: reserved mem: 0x0000010000000000..0x0000010003ffffff (65536 KiB) map reusable linux,cma
[..]
[    0.806945] Unable to handle kernel paging request at virtual address 00000001fe000000
[    0.807277] Mem abort info:
[    0.807277]   ESR = 0x0000000096000005
[    0.807693]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.808110]   SET = 0, FnV = 0
[    0.808443]   EA = 0, S1PTW = 0
[    0.808526]   FSC = 0x05: level 1 translation fault
[    0.808943] Data abort info:
[    0.808943]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[    0.809360]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    0.809776]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.810221] [00000001fe000000] user address but active_mm is swapper
[..]
[    0.820887] Call trace:
[    0.821027]  cma_init_reserved_areas+0xc4/0x378

> 
> You need a binding for this too.

By binding you mean having an yaml file in dt-schem [1] describing the tag
storage node, right?

[1] https://github.com/devicetree-org/dt-schema

> 
> > The tag storage region represents the largest contiguous memory region that
> > holds all the tags for the associated contiguous memory region which can be
> > tagged. For example, for a 32GB contiguous tagged memory the corresponding
> > tag storage region is 1GB of contiguous memory, not two adjacent 512M of
> > tag storage memory.
> >
> > "block-size" represents the minimum multiple of 4K of tag storage where all
> > the tags stored in the block correspond to a contiguous memory region. This
> > is needed for platforms where the memory controller interleaves tag writes
> > to memory. For example, if the memory controller interleaves tag writes for
> > 256KB of contiguous memory across 8K of tag storage (2-way interleave),
> > then the correct value for "block-size" is 0x2000. This value is a hardware
> > property, independent of the selected kernel page size.
> >
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/Kconfig                       |  12 ++
> >  arch/arm64/include/asm/mte_tag_storage.h |  15 ++
> >  arch/arm64/kernel/Makefile               |   1 +
> >  arch/arm64/kernel/mte_tag_storage.c      | 256 +++++++++++++++++++++++
> >  arch/arm64/kernel/setup.c                |   7 +
> >  5 files changed, 291 insertions(+)
> >  create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
> >  create mode 100644 arch/arm64/kernel/mte_tag_storage.c
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 7b071a00425d..fe8276fdc7a8 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -2062,6 +2062,18 @@ config ARM64_MTE
> >
> >           Documentation/arch/arm64/memory-tagging-extension.rst.
> >
> > +if ARM64_MTE
> > +config ARM64_MTE_TAG_STORAGE
> > +       bool "Dynamic MTE tag storage management"
> > +       help
> > +         Adds support for dynamic management of the memory used by the hardware
> > +         for storing MTE tags. This memory, unlike normal memory, cannot be
> > +         tagged. When it is used to store tags for another memory location it
> > +         cannot be used for any type of allocation.
> > +
> > +         If unsure, say N
> > +endif # ARM64_MTE
> > +
> >  endmenu # "ARMv8.5 architectural features"
> >
> >  menu "ARMv8.7 architectural features"
> > diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h
> > new file mode 100644
> > index 000000000000..8f86c4f9a7c3
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/mte_tag_storage.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2023 ARM Ltd.
> > + */
> > +#ifndef __ASM_MTE_TAG_STORAGE_H
> > +#define __ASM_MTE_TAG_STORAGE_H
> > +
> > +#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
> > +void mte_tag_storage_init(void);
> > +#else
> > +static inline void mte_tag_storage_init(void)
> > +{
> > +}
> > +#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
> > +#endif /* __ASM_MTE_TAG_STORAGE_H  */
> > diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> > index d95b3d6b471a..5f031bf9f8f1 100644
> > --- a/arch/arm64/kernel/Makefile
> > +++ b/arch/arm64/kernel/Makefile
> > @@ -70,6 +70,7 @@ obj-$(CONFIG_CRASH_CORE)              += crash_core.o
> >  obj-$(CONFIG_ARM_SDE_INTERFACE)                += sdei.o
> >  obj-$(CONFIG_ARM64_PTR_AUTH)           += pointer_auth.o
> >  obj-$(CONFIG_ARM64_MTE)                        += mte.o
> > +obj-$(CONFIG_ARM64_MTE_TAG_STORAGE)    += mte_tag_storage.o
> >  obj-y                                  += vdso-wrap.o
> >  obj-$(CONFIG_COMPAT_VDSO)              += vdso32-wrap.o
> >  obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)        += patch-scs.o
> > diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c
> > new file mode 100644
> > index 000000000000..fa6267ef8392
> > --- /dev/null
> > +++ b/arch/arm64/kernel/mte_tag_storage.c
> > @@ -0,0 +1,256 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Support for dynamic tag storage.
> > + *
> > + * Copyright (C) 2023 ARM Ltd.
> > + */
> > +
> > +#include <linux/memblock.h>
> > +#include <linux/mm.h>
> > +#include <linux/of_device.h>
> 
> You probably don't need this header. If you depend on what it
> implicitly includes, then that will now break in linux-next.

I'll have a look if I can remove it. Might be an artifact from an earlier
version of the patches.

> 
> > +#include <linux/of_fdt.h>
> > +#include <linux/range.h>
> > +#include <linux/string.h>
> > +#include <linux/xarray.h>
> > +
> > +#include <asm/mte_tag_storage.h>
> > +
> > +struct tag_region {
> > +       struct range mem_range; /* Memory associated with the tag storage, in PFNs. */
> > +       struct range tag_range; /* Tag storage memory, in PFNs. */
> > +       u32 block_size;         /* Tag block size, in pages. */
> > +};
> > +
> > +#define MAX_TAG_REGIONS        32
> > +
> > +static struct tag_region tag_regions[MAX_TAG_REGIONS];
> > +static int num_tag_regions;
> > +
> > +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> > +                                               int reg_len, struct range *range)
> > +{
> > +       int addr_cells = dt_root_addr_cells;
> > +       int size_cells = dt_root_size_cells;
> > +       u64 size;
> > +
> > +       if (reg_len / 4 > addr_cells + size_cells)
> > +               return -EINVAL;
> > +
> > +       range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> > +       size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> > +       if (size == 0) {
> > +               pr_err("Invalid node");
> > +               return -EINVAL;
> > +       }
> > +       range->end = range->start + size - 1;
> 
> We have a function to read (and translate which you forgot) addresses.
> Add what's missing rather than open code your own.

I must have missed that there's already a function to read addresses. Would
you mind pointing me in the right direction?

> 
> > +
> > +       return 0;
> > +}
> > +
> > +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> > +                                                   struct range *tag_range)
> > +{
> > +       const __be32 *reg;
> > +       int reg_len;
> > +
> > +       reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > +       if (reg == NULL) {
> > +               pr_err("Invalid metadata node");
> > +               return -EINVAL;
> > +       }
> > +
> > +       return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> > +}
> > +
> > +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> > +{
> > +       const __be32 *reg;
> > +       int reg_len;
> > +
> > +       reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> > +       if (reg == NULL)
> > +               reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > +
> > +       if (reg == NULL) {
> > +               pr_err("Invalid memory node");
> > +               return -EINVAL;
> > +       }
> > +
> > +       return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> > +}
> > +
> > +struct find_memory_node_arg {
> > +       unsigned long node;
> > +       u32 phandle;
> > +};
> > +
> > +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> > +                                      int depth, void *data)
> > +{
> > +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> > +       struct find_memory_node_arg *arg = data;
> > +
> > +       if (depth != 1 || !type || strcmp(type, "memory") != 0)
> > +               return 0;
> > +
> > +       if (of_get_flat_dt_phandle(node) == arg->phandle) {
> > +               arg->node = node;
> > +               return 1;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> > +{
> > +       struct find_memory_node_arg arg = { 0 };
> > +       const __be32 *memory_prop;
> > +       u32 mem_phandle;
> > +       int ret, reg_len;
> > +
> > +       memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> > +       if (!memory_prop) {
> > +               pr_err("Missing 'memory' property in the tag storage node");
> > +               return -EINVAL;
> > +       }
> > +
> > +       mem_phandle = be32_to_cpup(memory_prop);
> > +       arg.phandle = mem_phandle;
> > +
> > +       ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
> 
> Do not use of_scan_flat_dt. It is a relic predating libfdt which can
> get a node by phandle directly.

I used that because that's what drivers/of/fdt.c uses. With reserved memory
I shouldn't need it, because struct reserved_mem already includes a
phandle.

> 
> > +       if (ret != 1) {
> > +               pr_err("Associated memory node not found");
> > +               return -EINVAL;
> > +       }
> > +
> > +       *mem_node = arg.node;
> > +
> > +       return 0;
> > +}
> > +
> > +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> > +                                              u32 *retval)
> 
> If you are going to make a generic function, make it for everyone.

Sure. If I still need it, should I put the function in
include/linux/of_fdt.h?

> 
> > +{
> > +       const __be32 *reg;
> > +
> > +       reg = of_get_flat_dt_prop(node, propname, NULL);
> > +       if (!reg)
> > +               return -EINVAL;
> > +
> > +       *retval = be32_to_cpup(reg);
> > +       return 0;
> > +}
> > +
> > +static u32 __init get_block_size_pages(u32 block_size_bytes)
> > +{
> > +       u32 a = PAGE_SIZE;
> > +       u32 b = block_size_bytes;
> > +       u32 r;
> > +
> > +       /* Find greatest common divisor using the Euclidian algorithm. */
> > +       do {
> > +               r = a % b;
> > +               a = b;
> > +               b = r;
> > +       } while (b != 0);
> > +
> > +       return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> > +}
> > +
> > +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> > +                                      int depth, void *data)
> > +{
> > +       struct tag_region *region;
> > +       unsigned long mem_node;
> > +       struct range *mem_range;
> > +       struct range *tag_range;
> > +       u32 block_size_bytes;
> > +       u32 nid = 0;
> > +       int ret;
> > +
> > +       if (depth != 1 || !strstr(uname, "tag-storage"))
> > +               return 0;
> > +
> > +       if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> > +               return 0;
> > +
> > +       if (num_tag_regions == MAX_TAG_REGIONS) {
> > +               pr_err("Maximum number of tag storage regions exceeded");
> > +               return -EINVAL;
> > +       }
> > +
> > +       region = &tag_regions[num_tag_regions];
> > +       mem_range = &region->mem_range;
> > +       tag_range = &region->tag_range;
> > +
> > +       ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> > +       if (ret) {
> > +               pr_err("Invalid tag storage node");
> > +               return ret;
> > +       }
> > +
> > +       ret = tag_storage_get_memory_node(node, &mem_node);
> > +       if (ret)
> > +               return ret;
> > +
> > +       ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> > +       if (ret) {
> > +               pr_err("Invalid address for associated data memory node");
> > +               return ret;
> > +       }
> > +
> > +       /* The tag region must exactly match the corresponding memory. */
> > +       if (range_len(tag_range) * 32 != range_len(mem_range)) {
> > +               pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> > +                      PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> > +                      PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> > +               return -EINVAL;
> > +       }
> > +
> > +       ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > +       if (ret || block_size_bytes == 0) {
> > +               pr_err("Invalid or missing 'block-size' property");
> > +               return -EINVAL;
> > +       }
> > +       region->block_size = get_block_size_pages(block_size_bytes);
> > +       if (range_len(tag_range) % region->block_size != 0) {
> > +               pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > +                      PFN_PHYS(range_len(tag_range)), region->block_size);
> > +               return -EINVAL;
> > +       }
> > +
> > +       ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> 
> I was going to say we already have a way to associate memory nodes
> other nodes using "numa-node-id", so the "memory" phandle property is
> somewhat redundant. Maybe the tag node should have a numa-node-id.
> With that, it looks like you don't even need to access the /memory
> node. Avoiding that would be good for 2 reasons. It avoids parsing
> memory nodes twice and it's not the kernel's job to validate the DT.
> Really, if you want memory info, you should use memblock to get it
> because all the special cases of memory layout are handled. For
> example you can have memory nodes with multiple 'reg' entries or
> multiple memory nodes or both, and then some of those could be
> contiguous.

I need to have a memory node associated with the tag storage node because
there is a static relationship between a page from "normal" memory and its
associated tag storage. If the code doesn't know that the memory region
A..B has the corresponding tag storage in the region X..Y, then it doesn't
know which tag storage to reserve when a page is allocated as tagged.

In the example above, assuming that page P is allocated as tagged, the
corresponding tag storage page that needs to be reserved is:

tag_storage_pfn = (page_to_pfn(P) - PHYS_PFN(A)) / 32* + PHYS_PFN(X)

numa-node-id is not enough for this, because as far I know you can have
multiple memory regions withing the same numa node.

*32 tagged pages use one tag storage page to store the tags.

Thanks,
Alex
Rob Herring (Arm) Dec. 12, 2023, 6:44 p.m. UTC | #8
On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi Rob,
>
> Thank you so much for the feedback, I'm not very familiar with device tree,
> and any comments are very useful.
>
> On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Allow the kernel to get the size and location of the MTE tag storage
> > > regions from the DTB. This memory is marked as reserved for now.
> > >
> > > The DTB node for the tag storage region is defined as:
> > >
> > >         tags0: tag-storage@8f8000000 {
> > >                 compatible = "arm,mte-tag-storage";
> > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > >                 block-size = <0x1000>;
> > >                 memory = <&memory0>;    // Associated tagged memory node
> > >         };
> >
> > I skimmed thru the discussion some. If this memory range is within
> > main RAM, then it definitely belongs in /reserved-memory.
>
> Ok, will do that.
>
> If you don't mind, why do you say that it definitely belongs in
> reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> motivation.

Simply so that /memory nodes describe all possible memory and
/reserved-memory is just adding restrictions. It's also because
/reserved-memory is what gets handled early, and we don't need
multiple things to handle early.

> Tag storage is not DMA and can live anywhere in memory.

Then why put it in DT at all? The only reason CMA is there is to set
the size. It's not even clear to me we need CMA in DT either. The
reasoning long ago was the kernel didn't do a good job of moving and
reclaiming contiguous space, but that's supposed to be better now (and
most h/w figured out they need IOMMUs).

But for tag storage you know the size as it is a function of the
memory size, right? After all, you are validating the size is correct.
I guess there is still the aspect of whether you want enable MTE or
not which could be done in a variety of ways.

> In
> arm64_memblock_init(), the kernel first removes the memory that it cannot
> address from memblock. For example, because it has been compiled with
> CONFIG_ARM64_VA_BITS_39=y. And then calls
> early_init_fdt_scan_reserved_mem().
>
> What happens if reserved memory is above what the kernel can address?

I would hope the kernel handles it. That's the kernel's problem unless
there's some h/w limitation to access some region. The DT can't have
things dependent on the kernel config.

> From my testing, when the kernel is compiled with 39 bit VA, if I use
> reserved memory to discover tag storage the lives above the virtua address
> limit and then I try to use CMA to manage the tag storage memory, I get a
> kernel panic:

Looks like we should handle that better...

>> [    0.000000] Reserved memory: created CMA memory pool at 0x0000010000000000, size 64 MiB
> [    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
> [    0.000000] OF: reserved mem: 0x0000010000000000..0x0000010003ffffff (65536 KiB) map reusable linux,cma
> [..]
> [    0.806945] Unable to handle kernel paging request at virtual address 00000001fe000000
> [    0.807277] Mem abort info:
> [    0.807277]   ESR = 0x0000000096000005
> [    0.807693]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    0.808110]   SET = 0, FnV = 0
> [    0.808443]   EA = 0, S1PTW = 0
> [    0.808526]   FSC = 0x05: level 1 translation fault
> [    0.808943] Data abort info:
> [    0.808943]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> [    0.809360]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [    0.809776]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [    0.810221] [00000001fe000000] user address but active_mm is swapper
> [..]
> [    0.820887] Call trace:
> [    0.821027]  cma_init_reserved_areas+0xc4/0x378
>
> >
> > You need a binding for this too.
>
> By binding you mean having an yaml file in dt-schem [1] describing the tag
> storage node, right?

Yes, but in the kernel tree is fine.

[...]

> > > +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> > > +                                               int reg_len, struct range *range)
> > > +{
> > > +       int addr_cells = dt_root_addr_cells;
> > > +       int size_cells = dt_root_size_cells;
> > > +       u64 size;
> > > +
> > > +       if (reg_len / 4 > addr_cells + size_cells)
> > > +               return -EINVAL;
> > > +
> > > +       range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> > > +       size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> > > +       if (size == 0) {
> > > +               pr_err("Invalid node");
> > > +               return -EINVAL;
> > > +       }
> > > +       range->end = range->start + size - 1;
> >
> > We have a function to read (and translate which you forgot) addresses.
> > Add what's missing rather than open code your own.
>
> I must have missed that there's already a function to read addresses. Would
> you mind pointing me in the right direction?

drivers/of/fdt_address.c

Though it doesn't provide getting the size, so that will have to be added.


> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> > > +                                                   struct range *tag_range)
> > > +{
> > > +       const __be32 *reg;
> > > +       int reg_len;
> > > +
> > > +       reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > > +       if (reg == NULL) {
> > > +               pr_err("Invalid metadata node");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> > > +}
> > > +
> > > +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> > > +{
> > > +       const __be32 *reg;
> > > +       int reg_len;
> > > +
> > > +       reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> > > +       if (reg == NULL)
> > > +               reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > > +
> > > +       if (reg == NULL) {
> > > +               pr_err("Invalid memory node");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> > > +}
> > > +
> > > +struct find_memory_node_arg {
> > > +       unsigned long node;
> > > +       u32 phandle;
> > > +};
> > > +
> > > +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> > > +                                      int depth, void *data)
> > > +{
> > > +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> > > +       struct find_memory_node_arg *arg = data;
> > > +
> > > +       if (depth != 1 || !type || strcmp(type, "memory") != 0)
> > > +               return 0;
> > > +
> > > +       if (of_get_flat_dt_phandle(node) == arg->phandle) {
> > > +               arg->node = node;
> > > +               return 1;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> > > +{
> > > +       struct find_memory_node_arg arg = { 0 };
> > > +       const __be32 *memory_prop;
> > > +       u32 mem_phandle;
> > > +       int ret, reg_len;
> > > +
> > > +       memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> > > +       if (!memory_prop) {
> > > +               pr_err("Missing 'memory' property in the tag storage node");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       mem_phandle = be32_to_cpup(memory_prop);
> > > +       arg.phandle = mem_phandle;
> > > +
> > > +       ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
> >
> > Do not use of_scan_flat_dt. It is a relic predating libfdt which can
> > get a node by phandle directly.
>
> I used that because that's what drivers/of/fdt.c uses. With reserved memory
> I shouldn't need it, because struct reserved_mem already includes a
> phandle.

Check again. Only some arch/ code (mostly powerpc) uses it. I've
killed off most of it.


> > > +       if (ret != 1) {
> > > +               pr_err("Associated memory node not found");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       *mem_node = arg.node;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> > > +                                              u32 *retval)
> >
> > If you are going to make a generic function, make it for everyone.
>
> Sure. If I still need it, should I put the function in
> include/linux/of_fdt.h?

Yes.

> > > +{
> > > +       const __be32 *reg;
> > > +
> > > +       reg = of_get_flat_dt_prop(node, propname, NULL);
> > > +       if (!reg)
> > > +               return -EINVAL;
> > > +
> > > +       *retval = be32_to_cpup(reg);
> > > +       return 0;
> > > +}
> > > +
> > > +static u32 __init get_block_size_pages(u32 block_size_bytes)
> > > +{
> > > +       u32 a = PAGE_SIZE;
> > > +       u32 b = block_size_bytes;
> > > +       u32 r;
> > > +
> > > +       /* Find greatest common divisor using the Euclidian algorithm. */
> > > +       do {
> > > +               r = a % b;
> > > +               a = b;
> > > +               b = r;
> > > +       } while (b != 0);
> > > +
> > > +       return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> > > +}
> > > +
> > > +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> > > +                                      int depth, void *data)
> > > +{
> > > +       struct tag_region *region;
> > > +       unsigned long mem_node;
> > > +       struct range *mem_range;
> > > +       struct range *tag_range;
> > > +       u32 block_size_bytes;
> > > +       u32 nid = 0;
> > > +       int ret;
> > > +
> > > +       if (depth != 1 || !strstr(uname, "tag-storage"))
> > > +               return 0;
> > > +
> > > +       if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> > > +               return 0;
> > > +
> > > +       if (num_tag_regions == MAX_TAG_REGIONS) {
> > > +               pr_err("Maximum number of tag storage regions exceeded");
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       region = &tag_regions[num_tag_regions];
> > > +       mem_range = &region->mem_range;
> > > +       tag_range = &region->tag_range;
> > > +
> > > +       ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> > > +       if (ret) {
> > > +               pr_err("Invalid tag storage node");
> > > +               return ret;
> > > +       }
> > > +
> > > +       ret = tag_storage_get_memory_node(node, &mem_node);
> > > +       if (ret)
> > > +               return ret;
> > > +
> > > +       ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> > > +       if (ret) {
> > > +               pr_err("Invalid address for associated data memory node");
> > > +               return ret;
> > > +       }
> > > +
> > > +       /* The tag region must exactly match the corresponding memory. */
> > > +       if (range_len(tag_range) * 32 != range_len(mem_range)) {
> > > +               pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> > > +                      PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> > > +                      PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > > +       if (ret || block_size_bytes == 0) {
> > > +               pr_err("Invalid or missing 'block-size' property");
> > > +               return -EINVAL;
> > > +       }
> > > +       region->block_size = get_block_size_pages(block_size_bytes);
> > > +       if (range_len(tag_range) % region->block_size != 0) {
> > > +               pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > > +                      PFN_PHYS(range_len(tag_range)), region->block_size);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> >
> > I was going to say we already have a way to associate memory nodes
> > other nodes using "numa-node-id", so the "memory" phandle property is
> > somewhat redundant. Maybe the tag node should have a numa-node-id.
> > With that, it looks like you don't even need to access the /memory
> > node. Avoiding that would be good for 2 reasons. It avoids parsing
> > memory nodes twice and it's not the kernel's job to validate the DT.
> > Really, if you want memory info, you should use memblock to get it
> > because all the special cases of memory layout are handled. For
> > example you can have memory nodes with multiple 'reg' entries or
> > multiple memory nodes or both, and then some of those could be
> > contiguous.
>
> I need to have a memory node associated with the tag storage node because
> there is a static relationship between a page from "normal" memory and its
> associated tag storage. If the code doesn't know that the memory region
> A..B has the corresponding tag storage in the region X..Y, then it doesn't
> know which tag storage to reserve when a page is allocated as tagged.
>
> In the example above, assuming that page P is allocated as tagged, the
> corresponding tag storage page that needs to be reserved is:
>
> tag_storage_pfn = (page_to_pfn(P) - PHYS_PFN(A)) / 32* + PHYS_PFN(X)
>
> numa-node-id is not enough for this, because as far I know you can have
> multiple memory regions withing the same numa node.

Okay.

Rob
Alexandru Elisei Dec. 13, 2023, 1:04 p.m. UTC | #9
Hi Rob,

On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi Rob,
> >
> > Thank you so much for the feedback, I'm not very familiar with device tree,
> > and any comments are very useful.
> >
> > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > regions from the DTB. This memory is marked as reserved for now.
> > > >
> > > > The DTB node for the tag storage region is defined as:
> > > >
> > > >         tags0: tag-storage@8f8000000 {
> > > >                 compatible = "arm,mte-tag-storage";
> > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > >                 block-size = <0x1000>;
> > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > >         };
> > >
> > > I skimmed thru the discussion some. If this memory range is within
> > > main RAM, then it definitely belongs in /reserved-memory.
> >
> > Ok, will do that.
> >
> > If you don't mind, why do you say that it definitely belongs in
> > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > motivation.
> 
> Simply so that /memory nodes describe all possible memory and
> /reserved-memory is just adding restrictions. It's also because
> /reserved-memory is what gets handled early, and we don't need
> multiple things to handle early.
> 
> > Tag storage is not DMA and can live anywhere in memory.
> 
> Then why put it in DT at all? The only reason CMA is there is to set
> the size. It's not even clear to me we need CMA in DT either. The
> reasoning long ago was the kernel didn't do a good job of moving and
> reclaiming contiguous space, but that's supposed to be better now (and
> most h/w figured out they need IOMMUs).
> 
> But for tag storage you know the size as it is a function of the
> memory size, right? After all, you are validating the size is correct.
> I guess there is still the aspect of whether you want enable MTE or
> not which could be done in a variety of ways.

Oh, sorry, my bad, I should have been clearer about this. I don't want to
put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.

> 
> > In
> > arm64_memblock_init(), the kernel first removes the memory that it cannot
> > address from memblock. For example, because it has been compiled with
> > CONFIG_ARM64_VA_BITS_39=y. And then calls
> > early_init_fdt_scan_reserved_mem().
> >
> > What happens if reserved memory is above what the kernel can address?
> 
> I would hope the kernel handles it. That's the kernel's problem unless
> there's some h/w limitation to access some region. The DT can't have
> things dependent on the kernel config.

I would hope so too, that's why I was surprised when I put reserved memory
at 1TB in a 39 bit VA kernel and got a panic.

> 
> > From my testing, when the kernel is compiled with 39 bit VA, if I use
> > reserved memory to discover tag storage the lives above the virtua address
> > limit and then I try to use CMA to manage the tag storage memory, I get a
> > kernel panic:
> 
> Looks like we should handle that better...

I guess we don't need to tackle that problem right now. I don't know of
many systems in the wild that have memory above the 1TB address.

> 
> >> [    0.000000] Reserved memory: created CMA memory pool at 0x0000010000000000, size 64 MiB
> > [    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
> > [    0.000000] OF: reserved mem: 0x0000010000000000..0x0000010003ffffff (65536 KiB) map reusable linux,cma
> > [..]
> > [    0.806945] Unable to handle kernel paging request at virtual address 00000001fe000000
> > [    0.807277] Mem abort info:
> > [    0.807277]   ESR = 0x0000000096000005
> > [    0.807693]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [    0.808110]   SET = 0, FnV = 0
> > [    0.808443]   EA = 0, S1PTW = 0
> > [    0.808526]   FSC = 0x05: level 1 translation fault
> > [    0.808943] Data abort info:
> > [    0.808943]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > [    0.809360]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > [    0.809776]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [    0.810221] [00000001fe000000] user address but active_mm is swapper
> > [..]
> > [    0.820887] Call trace:
> > [    0.821027]  cma_init_reserved_areas+0xc4/0x378
> >
> > >
> > > You need a binding for this too.
> >
> > By binding you mean having an yaml file in dt-schem [1] describing the tag
> > storage node, right?
> 
> Yes, but in the kernel tree is fine.

Cool, thanks.

> 
> [...]
> 
> > > > +static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
> > > > +                                               int reg_len, struct range *range)
> > > > +{
> > > > +       int addr_cells = dt_root_addr_cells;
> > > > +       int size_cells = dt_root_size_cells;
> > > > +       u64 size;
> > > > +
> > > > +       if (reg_len / 4 > addr_cells + size_cells)
> > > > +               return -EINVAL;
> > > > +
> > > > +       range->start = PHYS_PFN(of_read_number(reg, addr_cells));
> > > > +       size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
> > > > +       if (size == 0) {
> > > > +               pr_err("Invalid node");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +       range->end = range->start + size - 1;
> > >
> > > We have a function to read (and translate which you forgot) addresses.
> > > Add what's missing rather than open code your own.
> >
> > I must have missed that there's already a function to read addresses. Would
> > you mind pointing me in the right direction?
> 
> drivers/of/fdt_address.c
> 
> Though it doesn't provide getting the size, so that will have to be added.

Ok, will do!

> 
> 
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
> > > > +                                                   struct range *tag_range)
> > > > +{
> > > > +       const __be32 *reg;
> > > > +       int reg_len;
> > > > +
> > > > +       reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > > > +       if (reg == NULL) {
> > > > +               pr_err("Invalid metadata node");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
> > > > +}
> > > > +
> > > > +static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
> > > > +{
> > > > +       const __be32 *reg;
> > > > +       int reg_len;
> > > > +
> > > > +       reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
> > > > +       if (reg == NULL)
> > > > +               reg = of_get_flat_dt_prop(node, "reg", &reg_len);
> > > > +
> > > > +       if (reg == NULL) {
> > > > +               pr_err("Invalid memory node");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
> > > > +}
> > > > +
> > > > +struct find_memory_node_arg {
> > > > +       unsigned long node;
> > > > +       u32 phandle;
> > > > +};
> > > > +
> > > > +static int __init fdt_find_memory_node(unsigned long node, const char *uname,
> > > > +                                      int depth, void *data)
> > > > +{
> > > > +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> > > > +       struct find_memory_node_arg *arg = data;
> > > > +
> > > > +       if (depth != 1 || !type || strcmp(type, "memory") != 0)
> > > > +               return 0;
> > > > +
> > > > +       if (of_get_flat_dt_phandle(node) == arg->phandle) {
> > > > +               arg->node = node;
> > > > +               return 1;
> > > > +       }
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
> > > > +{
> > > > +       struct find_memory_node_arg arg = { 0 };
> > > > +       const __be32 *memory_prop;
> > > > +       u32 mem_phandle;
> > > > +       int ret, reg_len;
> > > > +
> > > > +       memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
> > > > +       if (!memory_prop) {
> > > > +               pr_err("Missing 'memory' property in the tag storage node");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       mem_phandle = be32_to_cpup(memory_prop);
> > > > +       arg.phandle = mem_phandle;
> > > > +
> > > > +       ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
> > >
> > > Do not use of_scan_flat_dt. It is a relic predating libfdt which can
> > > get a node by phandle directly.
> >
> > I used that because that's what drivers/of/fdt.c uses. With reserved memory
> > I shouldn't need it, because struct reserved_mem already includes a
> > phandle.
> 
> Check again. Only some arch/ code (mostly powerpc) uses it. I've
> killed off most of it.

You're right, I think I grep'ed for a different function name in
drivers/of/fdt.c Either way, the message is clear: no of_scan_flat_dt().

> 
> 
> > > > +       if (ret != 1) {
> > > > +               pr_err("Associated memory node not found");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       *mem_node = arg.node;
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
> > > > +                                              u32 *retval)
> > >
> > > If you are going to make a generic function, make it for everyone.
> >
> > Sure. If I still need it, should I put the function in
> > include/linux/of_fdt.h?
> 
> Yes.

Noted.

> 
> > > > +{
> > > > +       const __be32 *reg;
> > > > +
> > > > +       reg = of_get_flat_dt_prop(node, propname, NULL);
> > > > +       if (!reg)
> > > > +               return -EINVAL;
> > > > +
> > > > +       *retval = be32_to_cpup(reg);
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static u32 __init get_block_size_pages(u32 block_size_bytes)
> > > > +{
> > > > +       u32 a = PAGE_SIZE;
> > > > +       u32 b = block_size_bytes;
> > > > +       u32 r;
> > > > +
> > > > +       /* Find greatest common divisor using the Euclidian algorithm. */
> > > > +       do {
> > > > +               r = a % b;
> > > > +               a = b;
> > > > +               b = r;
> > > > +       } while (b != 0);
> > > > +
> > > > +       return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
> > > > +}
> > > > +
> > > > +static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
> > > > +                                      int depth, void *data)
> > > > +{
> > > > +       struct tag_region *region;
> > > > +       unsigned long mem_node;
> > > > +       struct range *mem_range;
> > > > +       struct range *tag_range;
> > > > +       u32 block_size_bytes;
> > > > +       u32 nid = 0;
> > > > +       int ret;
> > > > +
> > > > +       if (depth != 1 || !strstr(uname, "tag-storage"))
> > > > +               return 0;
> > > > +
> > > > +       if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
> > > > +               return 0;
> > > > +
> > > > +       if (num_tag_regions == MAX_TAG_REGIONS) {
> > > > +               pr_err("Maximum number of tag storage regions exceeded");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       region = &tag_regions[num_tag_regions];
> > > > +       mem_range = &region->mem_range;
> > > > +       tag_range = &region->tag_range;
> > > > +
> > > > +       ret = tag_storage_of_flat_get_tag_range(node, tag_range);
> > > > +       if (ret) {
> > > > +               pr_err("Invalid tag storage node");
> > > > +               return ret;
> > > > +       }
> > > > +
> > > > +       ret = tag_storage_get_memory_node(node, &mem_node);
> > > > +       if (ret)
> > > > +               return ret;
> > > > +
> > > > +       ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
> > > > +       if (ret) {
> > > > +               pr_err("Invalid address for associated data memory node");
> > > > +               return ret;
> > > > +       }
> > > > +
> > > > +       /* The tag region must exactly match the corresponding memory. */
> > > > +       if (range_len(tag_range) * 32 != range_len(mem_range)) {
> > > > +               pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
> > > > +                      PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
> > > > +                      PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
> > > > +       if (ret || block_size_bytes == 0) {
> > > > +               pr_err("Invalid or missing 'block-size' property");
> > > > +               return -EINVAL;
> > > > +       }
> > > > +       region->block_size = get_block_size_pages(block_size_bytes);
> > > > +       if (range_len(tag_range) % region->block_size != 0) {
> > > > +               pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
> > > > +                      PFN_PHYS(range_len(tag_range)), region->block_size);
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
> > >
> > > I was going to say we already have a way to associate memory nodes
> > > other nodes using "numa-node-id", so the "memory" phandle property is
> > > somewhat redundant. Maybe the tag node should have a numa-node-id.
> > > With that, it looks like you don't even need to access the /memory
> > > node. Avoiding that would be good for 2 reasons. It avoids parsing
> > > memory nodes twice and it's not the kernel's job to validate the DT.
> > > Really, if you want memory info, you should use memblock to get it
> > > because all the special cases of memory layout are handled. For
> > > example you can have memory nodes with multiple 'reg' entries or
> > > multiple memory nodes or both, and then some of those could be
> > > contiguous.
> >
> > I need to have a memory node associated with the tag storage node because
> > there is a static relationship between a page from "normal" memory and its
> > associated tag storage. If the code doesn't know that the memory region
> > A..B has the corresponding tag storage in the region X..Y, then it doesn't
> > know which tag storage to reserve when a page is allocated as tagged.
> >
> > In the example above, assuming that page P is allocated as tagged, the
> > corresponding tag storage page that needs to be reserved is:
> >
> > tag_storage_pfn = (page_to_pfn(P) - PHYS_PFN(A)) / 32* + PHYS_PFN(X)
> >
> > numa-node-id is not enough for this, because as far I know you can have
> > multiple memory regions withing the same numa node.
> 
> Okay.

Great, glad we are on the same page.

Thanks,
Alex
Rob Herring (Arm) Dec. 13, 2023, 2:06 p.m. UTC | #10
On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi Rob,
>
> On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Hi Rob,
> > >
> > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > and any comments are very useful.
> > >
> > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > <alexandru.elisei@arm.com> wrote:
> > > > >
> > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > >
> > > > > The DTB node for the tag storage region is defined as:
> > > > >
> > > > >         tags0: tag-storage@8f8000000 {
> > > > >                 compatible = "arm,mte-tag-storage";
> > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > >                 block-size = <0x1000>;
> > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > >         };
> > > >
> > > > I skimmed thru the discussion some. If this memory range is within
> > > > main RAM, then it definitely belongs in /reserved-memory.
> > >
> > > Ok, will do that.
> > >
> > > If you don't mind, why do you say that it definitely belongs in
> > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > motivation.
> >
> > Simply so that /memory nodes describe all possible memory and
> > /reserved-memory is just adding restrictions. It's also because
> > /reserved-memory is what gets handled early, and we don't need
> > multiple things to handle early.
> >
> > > Tag storage is not DMA and can live anywhere in memory.
> >
> > Then why put it in DT at all? The only reason CMA is there is to set
> > the size. It's not even clear to me we need CMA in DT either. The
> > reasoning long ago was the kernel didn't do a good job of moving and
> > reclaiming contiguous space, but that's supposed to be better now (and
> > most h/w figured out they need IOMMUs).
> >
> > But for tag storage you know the size as it is a function of the
> > memory size, right? After all, you are validating the size is correct.
> > I guess there is still the aspect of whether you want enable MTE or
> > not which could be done in a variety of ways.
>
> Oh, sorry, my bad, I should have been clearer about this. I don't want to
> put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.

Yes, I understand, but my point remains. Why do you need this in DT?
If the location doesn't matter and you can calculate the size from the
memory size, what else is there to add to the DT?

Rob
Alexandru Elisei Dec. 13, 2023, 2:51 p.m. UTC | #11
Hi,

On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi Rob,
> >
> > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > Hi Rob,
> > > >
> > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > and any comments are very useful.
> > > >
> > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > <alexandru.elisei@arm.com> wrote:
> > > > > >
> > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > >
> > > > > > The DTB node for the tag storage region is defined as:
> > > > > >
> > > > > >         tags0: tag-storage@8f8000000 {
> > > > > >                 compatible = "arm,mte-tag-storage";
> > > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > >                 block-size = <0x1000>;
> > > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > > >         };
> > > > >
> > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > >
> > > > Ok, will do that.
> > > >
> > > > If you don't mind, why do you say that it definitely belongs in
> > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > motivation.
> > >
> > > Simply so that /memory nodes describe all possible memory and
> > > /reserved-memory is just adding restrictions. It's also because
> > > /reserved-memory is what gets handled early, and we don't need
> > > multiple things to handle early.
> > >
> > > > Tag storage is not DMA and can live anywhere in memory.
> > >
> > > Then why put it in DT at all? The only reason CMA is there is to set
> > > the size. It's not even clear to me we need CMA in DT either. The
> > > reasoning long ago was the kernel didn't do a good job of moving and
> > > reclaiming contiguous space, but that's supposed to be better now (and
> > > most h/w figured out they need IOMMUs).
> > >
> > > But for tag storage you know the size as it is a function of the
> > > memory size, right? After all, you are validating the size is correct.
> > > I guess there is still the aspect of whether you want enable MTE or
> > > not which could be done in a variety of ways.
> >
> > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> 
> Yes, I understand, but my point remains. Why do you need this in DT?
> If the location doesn't matter and you can calculate the size from the
> memory size, what else is there to add to the DT?

I am afraid there has been a misunderstanding. What do you mean by
"location doesn't matter"?

At the very least, Linux needs to know the address and size of a memory
region to use it. The series is about using the tag storage memory for
data. Tag storage cannot be described as a regular memory node because it
cannot be tagged (and normal memory can).

Then there's the matter of the tag storage block size (explained in this
commit message), and also knowing the memory range for which a tag storage
region stores the tags. This is explained in the cover letter.

Is there something that you feel that is not clear enough? I am more than
happy to go into details.

Thanks,
Alex
Rob Herring (Arm) Dec. 13, 2023, 5:22 p.m. UTC | #12
On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Hi Rob,
> > >
> > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > > <alexandru.elisei@arm.com> wrote:
> > > > >
> > > > > Hi Rob,
> > > > >
> > > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > > and any comments are very useful.
> > > > >
> > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > >
> > > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > > >
> > > > > > > The DTB node for the tag storage region is defined as:
> > > > > > >
> > > > > > >         tags0: tag-storage@8f8000000 {
> > > > > > >                 compatible = "arm,mte-tag-storage";
> > > > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > > >                 block-size = <0x1000>;
> > > > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > > > >         };
> > > > > >
> > > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > > >
> > > > > Ok, will do that.
> > > > >
> > > > > If you don't mind, why do you say that it definitely belongs in
> > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > > motivation.
> > > >
> > > > Simply so that /memory nodes describe all possible memory and
> > > > /reserved-memory is just adding restrictions. It's also because
> > > > /reserved-memory is what gets handled early, and we don't need
> > > > multiple things to handle early.
> > > >
> > > > > Tag storage is not DMA and can live anywhere in memory.
> > > >
> > > > Then why put it in DT at all? The only reason CMA is there is to set
> > > > the size. It's not even clear to me we need CMA in DT either. The
> > > > reasoning long ago was the kernel didn't do a good job of moving and
> > > > reclaiming contiguous space, but that's supposed to be better now (and
> > > > most h/w figured out they need IOMMUs).
> > > >
> > > > But for tag storage you know the size as it is a function of the
> > > > memory size, right? After all, you are validating the size is correct.
> > > > I guess there is still the aspect of whether you want enable MTE or
> > > > not which could be done in a variety of ways.
> > >
> > > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> >
> > Yes, I understand, but my point remains. Why do you need this in DT?
> > If the location doesn't matter and you can calculate the size from the
> > memory size, what else is there to add to the DT?
>
> I am afraid there has been a misunderstanding. What do you mean by
> "location doesn't matter"?

You said:
> Tag storage is not DMA and can live anywhere in memory.

Which I took as the kernel can figure out where to put it. But maybe
you meant the h/w platform can hard code it to be anywhere in memory?
If so, then yes, DT is needed.

> At the very least, Linux needs to know the address and size of a memory
> region to use it. The series is about using the tag storage memory for
> data. Tag storage cannot be described as a regular memory node because it
> cannot be tagged (and normal memory can).

If the tag storage lives in the middle of memory, then it would be
described in the memory node, but removed by being in reserved-memory
node.

> Then there's the matter of the tag storage block size (explained in this
> commit message), and also knowing the memory range for which a tag storage
> region stores the tags. This is explained in the cover letter.

Honestly, I just forgot about that part.

Rob
Alexandru Elisei Dec. 13, 2023, 5:44 p.m. UTC | #13
On Wed, Dec 13, 2023 at 11:22:17AM -0600, Rob Herring wrote:
> On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi,
> >
> > On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> > > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > Hi Rob,
> > > >
> > > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > > > <alexandru.elisei@arm.com> wrote:
> > > > > >
> > > > > > Hi Rob,
> > > > > >
> > > > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > > > and any comments are very useful.
> > > > > >
> > > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > >
> > > > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > > > >
> > > > > > > > The DTB node for the tag storage region is defined as:
> > > > > > > >
> > > > > > > >         tags0: tag-storage@8f8000000 {
> > > > > > > >                 compatible = "arm,mte-tag-storage";
> > > > > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > > > >                 block-size = <0x1000>;
> > > > > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > > > > >         };
> > > > > > >
> > > > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > > > >
> > > > > > Ok, will do that.
> > > > > >
> > > > > > If you don't mind, why do you say that it definitely belongs in
> > > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > > > motivation.
> > > > >
> > > > > Simply so that /memory nodes describe all possible memory and
> > > > > /reserved-memory is just adding restrictions. It's also because
> > > > > /reserved-memory is what gets handled early, and we don't need
> > > > > multiple things to handle early.
> > > > >
> > > > > > Tag storage is not DMA and can live anywhere in memory.
> > > > >
> > > > > Then why put it in DT at all? The only reason CMA is there is to set
> > > > > the size. It's not even clear to me we need CMA in DT either. The
> > > > > reasoning long ago was the kernel didn't do a good job of moving and
> > > > > reclaiming contiguous space, but that's supposed to be better now (and
> > > > > most h/w figured out they need IOMMUs).
> > > > >
> > > > > But for tag storage you know the size as it is a function of the
> > > > > memory size, right? After all, you are validating the size is correct.
> > > > > I guess there is still the aspect of whether you want enable MTE or
> > > > > not which could be done in a variety of ways.
> > > >
> > > > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> > >
> > > Yes, I understand, but my point remains. Why do you need this in DT?
> > > If the location doesn't matter and you can calculate the size from the
> > > memory size, what else is there to add to the DT?
> >
> > I am afraid there has been a misunderstanding. What do you mean by
> > "location doesn't matter"?
> 
> You said:
> > Tag storage is not DMA and can live anywhere in memory.
> 
> Which I took as the kernel can figure out where to put it. But maybe
> you meant the h/w platform can hard code it to be anywhere in memory?
> If so, then yes, DT is needed.

Ah, I see, sorry for not being clear enough, you are correct: tag storage
is a hardware property, and software needs a mechanism (in this case, the
dt) to discover its properties.

> 
> > At the very least, Linux needs to know the address and size of a memory
> > region to use it. The series is about using the tag storage memory for
> > data. Tag storage cannot be described as a regular memory node because it
> > cannot be tagged (and normal memory can).
> 
> If the tag storage lives in the middle of memory, then it would be
> described in the memory node, but removed by being in reserved-memory
> node.

I don't follow. Would you mind going into more details?

> 
> > Then there's the matter of the tag storage block size (explained in this
> > commit message), and also knowing the memory range for which a tag storage
> > region stores the tags. This is explained in the cover letter.
> 
> Honestly, I just forgot about that part.

I totally understand, there are a lot of things to consider at the same
time.

Thanks,
Alex
Rob Herring (Arm) Dec. 13, 2023, 8:30 p.m. UTC | #14
On Wed, Dec 13, 2023 at 11:44 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> On Wed, Dec 13, 2023 at 11:22:17AM -0600, Rob Herring wrote:
> > On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> > > > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> > > > <alexandru.elisei@arm.com> wrote:
> > > > >
> > > > > Hi Rob,
> > > > >
> > > > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > >
> > > > > > > Hi Rob,
> > > > > > >
> > > > > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > > > > and any comments are very useful.
> > > > > > >
> > > > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > > >
> > > > > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > > > > >
> > > > > > > > > The DTB node for the tag storage region is defined as:
> > > > > > > > >
> > > > > > > > >         tags0: tag-storage@8f8000000 {
> > > > > > > > >                 compatible = "arm,mte-tag-storage";
> > > > > > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > > > > >                 block-size = <0x1000>;
> > > > > > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > > > > > >         };
> > > > > > > >
> > > > > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > > > > >
> > > > > > > Ok, will do that.
> > > > > > >
> > > > > > > If you don't mind, why do you say that it definitely belongs in
> > > > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > > > > motivation.
> > > > > >
> > > > > > Simply so that /memory nodes describe all possible memory and
> > > > > > /reserved-memory is just adding restrictions. It's also because
> > > > > > /reserved-memory is what gets handled early, and we don't need
> > > > > > multiple things to handle early.
> > > > > >
> > > > > > > Tag storage is not DMA and can live anywhere in memory.
> > > > > >
> > > > > > Then why put it in DT at all? The only reason CMA is there is to set
> > > > > > the size. It's not even clear to me we need CMA in DT either. The
> > > > > > reasoning long ago was the kernel didn't do a good job of moving and
> > > > > > reclaiming contiguous space, but that's supposed to be better now (and
> > > > > > most h/w figured out they need IOMMUs).
> > > > > >
> > > > > > But for tag storage you know the size as it is a function of the
> > > > > > memory size, right? After all, you are validating the size is correct.
> > > > > > I guess there is still the aspect of whether you want enable MTE or
> > > > > > not which could be done in a variety of ways.
> > > > >
> > > > > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > > > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> > > >
> > > > Yes, I understand, but my point remains. Why do you need this in DT?
> > > > If the location doesn't matter and you can calculate the size from the
> > > > memory size, what else is there to add to the DT?
> > >
> > > I am afraid there has been a misunderstanding. What do you mean by
> > > "location doesn't matter"?
> >
> > You said:
> > > Tag storage is not DMA and can live anywhere in memory.
> >
> > Which I took as the kernel can figure out where to put it. But maybe
> > you meant the h/w platform can hard code it to be anywhere in memory?
> > If so, then yes, DT is needed.
>
> Ah, I see, sorry for not being clear enough, you are correct: tag storage
> is a hardware property, and software needs a mechanism (in this case, the
> dt) to discover its properties.
>
> >
> > > At the very least, Linux needs to know the address and size of a memory
> > > region to use it. The series is about using the tag storage memory for
> > > data. Tag storage cannot be described as a regular memory node because it
> > > cannot be tagged (and normal memory can).
> >
> > If the tag storage lives in the middle of memory, then it would be
> > described in the memory node, but removed by being in reserved-memory
> > node.
>
> I don't follow. Would you mind going into more details?

It goes back to what I said earlier about /memory nodes describing all
the memory. There's no reason to reserve memory if you haven't
described that range as memory to begin with. One could presumably
just have a memory node for each contiguous chunk and not need
/reserved-memory (ignoring the need to say what things are reserved
for). That would become very difficult to adjust. Note that the kernel
has a hardcoded limit of 64 reserved regions currently and that is not
enough for some people. Seems like a lot, but I have no idea how they
are (ab)using /reserved-memory.

Let me give an example. Presumably using MTE at all is configurable.
If you boot a kernel with MTE disabled (or older and not supporting
it), then I'd assume you'd want to use the tag storage for regular
memory. Well, If tag storage is already part of /memory, then all you
have to do is ignore the tag reserved-memory region. Tweaking the
memory nodes would be more work.


Also, I should point out that /memory and /reserved-memory nodes are
not used for UEFI boot.

Rob
Alexandru Elisei Dec. 14, 2023, 3:45 p.m. UTC | #15
Hi,

On Wed, Dec 13, 2023 at 02:30:42PM -0600, Rob Herring wrote:
> On Wed, Dec 13, 2023 at 11:44 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > On Wed, Dec 13, 2023 at 11:22:17AM -0600, Rob Herring wrote:
> > > On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> > > > > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> > > > > <alexandru.elisei@arm.com> wrote:
> > > > > >
> > > > > > Hi Rob,
> > > > > >
> > > > > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > > > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > >
> > > > > > > > Hi Rob,
> > > > > > > >
> > > > > > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > > > > > and any comments are very useful.
> > > > > > > >
> > > > > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > > > > > >
> > > > > > > > > > The DTB node for the tag storage region is defined as:
> > > > > > > > > >
> > > > > > > > > >         tags0: tag-storage@8f8000000 {
> > > > > > > > > >                 compatible = "arm,mte-tag-storage";
> > > > > > > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > > > > > >                 block-size = <0x1000>;
> > > > > > > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > > > > > > >         };
> > > > > > > > >
> > > > > > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > > > > > >
> > > > > > > > Ok, will do that.
> > > > > > > >
> > > > > > > > If you don't mind, why do you say that it definitely belongs in
> > > > > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > > > > > motivation.
> > > > > > >
> > > > > > > Simply so that /memory nodes describe all possible memory and
> > > > > > > /reserved-memory is just adding restrictions. It's also because
> > > > > > > /reserved-memory is what gets handled early, and we don't need
> > > > > > > multiple things to handle early.
> > > > > > >
> > > > > > > > Tag storage is not DMA and can live anywhere in memory.
> > > > > > >
> > > > > > > Then why put it in DT at all? The only reason CMA is there is to set
> > > > > > > the size. It's not even clear to me we need CMA in DT either. The
> > > > > > > reasoning long ago was the kernel didn't do a good job of moving and
> > > > > > > reclaiming contiguous space, but that's supposed to be better now (and
> > > > > > > most h/w figured out they need IOMMUs).
> > > > > > >
> > > > > > > But for tag storage you know the size as it is a function of the
> > > > > > > memory size, right? After all, you are validating the size is correct.
> > > > > > > I guess there is still the aspect of whether you want enable MTE or
> > > > > > > not which could be done in a variety of ways.
> > > > > >
> > > > > > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > > > > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> > > > >
> > > > > Yes, I understand, but my point remains. Why do you need this in DT?
> > > > > If the location doesn't matter and you can calculate the size from the
> > > > > memory size, what else is there to add to the DT?
> > > >
> > > > I am afraid there has been a misunderstanding. What do you mean by
> > > > "location doesn't matter"?
> > >
> > > You said:
> > > > Tag storage is not DMA and can live anywhere in memory.
> > >
> > > Which I took as the kernel can figure out where to put it. But maybe
> > > you meant the h/w platform can hard code it to be anywhere in memory?
> > > If so, then yes, DT is needed.
> >
> > Ah, I see, sorry for not being clear enough, you are correct: tag storage
> > is a hardware property, and software needs a mechanism (in this case, the
> > dt) to discover its properties.
> >
> > >
> > > > At the very least, Linux needs to know the address and size of a memory
> > > > region to use it. The series is about using the tag storage memory for
> > > > data. Tag storage cannot be described as a regular memory node because it
> > > > cannot be tagged (and normal memory can).
> > >
> > > If the tag storage lives in the middle of memory, then it would be
> > > described in the memory node, but removed by being in reserved-memory
> > > node.
> >
> > I don't follow. Would you mind going into more details?
> 
> It goes back to what I said earlier about /memory nodes describing all
> the memory. There's no reason to reserve memory if you haven't
> described that range as memory to begin with. One could presumably
> just have a memory node for each contiguous chunk and not need
> /reserved-memory (ignoring the need to say what things are reserved
> for). That would become very difficult to adjust. Note that the kernel
> has a hardcoded limit of 64 reserved regions currently and that is not
> enough for some people. Seems like a lot, but I have no idea how they
> are (ab)using /reserved-memory.

Ah, I see what you mean, reserved memory is about marking existing memory
(from a /memory node) as special, not about adding new memory.

After the memblock allocator is initialized, the kernel can use it for its
own allocations. Kernel allocations are not movable.

When a page is allocated as tagged, the associated tag storage cannot be
used for data, otherwise the tags would corrupt that data. To avoid this,
the requirement is that tag storage pages are only used for movable
allocations. When a page is allocated as tagged, the data in the associated
tag storage is migrated and the tag storage is taken from the page
allocator (via alloc_contig_range()).

My understanding is that the memblock allocator can use all the memory from
a /memory node. If the tags storage memory is declared in a /memory node,
there exists the possibility that Linux will use tag storage memory for its
own allocation, which would make that tags storage memory unmovable, and
thus unusable for storing tags.

Looking at early_init_dt_scan_memory(), even if a /memory node if marked at
hotpluggable, memblock will still use it, unless "movable_node" is set on
the kernel command line.

That's the reason why I'm not describing tag storage in a /memory node.  Is
there way to tell the memblock allocator not to use memory from a /memory
node?

> 
> Let me give an example. Presumably using MTE at all is configurable.
> If you boot a kernel with MTE disabled (or older and not supporting
> it), then I'd assume you'd want to use the tag storage for regular
> memory. Well, If tag storage is already part of /memory, then all you
> have to do is ignore the tag reserved-memory region. Tweaking the
> memory nodes would be more work.

Right now, memory is added via memblock_reserve(), and if MTE is disabled
(for example, via the kernel command line), the code calls
free_reserved_page() for each tag storage page. I find that straightfoward
to implement.

Thanks,
Alex

> 
> 
> Also, I should point out that /memory and /reserved-memory nodes are
> not used for UEFI boot.
> 
> Rob
>
Rob Herring (Arm) Dec. 14, 2023, 6:55 p.m. UTC | #16
On Thu, Dec 14, 2023 at 9:45 AM Alexandru Elisei
<alexandru.elisei@arm.com> wrote:
>
> Hi,
>
> On Wed, Dec 13, 2023 at 02:30:42PM -0600, Rob Herring wrote:
> > On Wed, Dec 13, 2023 at 11:44 AM Alexandru Elisei
> > <alexandru.elisei@arm.com> wrote:
> > >
> > > On Wed, Dec 13, 2023 at 11:22:17AM -0600, Rob Herring wrote:
> > > > On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei
> > > > <alexandru.elisei@arm.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> > > > > > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > >
> > > > > > > Hi Rob,
> > > > > > >
> > > > > > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > > > > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Rob,
> > > > > > > > >
> > > > > > > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > > > > > > and any comments are very useful.
> > > > > > > > >
> > > > > > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > > > > > > >
> > > > > > > > > > > The DTB node for the tag storage region is defined as:
> > > > > > > > > > >
> > > > > > > > > > >         tags0: tag-storage@8f8000000 {
> > > > > > > > > > >                 compatible = "arm,mte-tag-storage";
> > > > > > > > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > > > > > > >                 block-size = <0x1000>;
> > > > > > > > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > > > > > > > >         };
> > > > > > > > > >
> > > > > > > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > > > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > > > > > > >
> > > > > > > > > Ok, will do that.
> > > > > > > > >
> > > > > > > > > If you don't mind, why do you say that it definitely belongs in
> > > > > > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > > > > > > motivation.
> > > > > > > >
> > > > > > > > Simply so that /memory nodes describe all possible memory and
> > > > > > > > /reserved-memory is just adding restrictions. It's also because
> > > > > > > > /reserved-memory is what gets handled early, and we don't need
> > > > > > > > multiple things to handle early.
> > > > > > > >
> > > > > > > > > Tag storage is not DMA and can live anywhere in memory.
> > > > > > > >
> > > > > > > > Then why put it in DT at all? The only reason CMA is there is to set
> > > > > > > > the size. It's not even clear to me we need CMA in DT either. The
> > > > > > > > reasoning long ago was the kernel didn't do a good job of moving and
> > > > > > > > reclaiming contiguous space, but that's supposed to be better now (and
> > > > > > > > most h/w figured out they need IOMMUs).
> > > > > > > >
> > > > > > > > But for tag storage you know the size as it is a function of the
> > > > > > > > memory size, right? After all, you are validating the size is correct.
> > > > > > > > I guess there is still the aspect of whether you want enable MTE or
> > > > > > > > not which could be done in a variety of ways.
> > > > > > >
> > > > > > > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > > > > > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> > > > > >
> > > > > > Yes, I understand, but my point remains. Why do you need this in DT?
> > > > > > If the location doesn't matter and you can calculate the size from the
> > > > > > memory size, what else is there to add to the DT?
> > > > >
> > > > > I am afraid there has been a misunderstanding. What do you mean by
> > > > > "location doesn't matter"?
> > > >
> > > > You said:
> > > > > Tag storage is not DMA and can live anywhere in memory.
> > > >
> > > > Which I took as the kernel can figure out where to put it. But maybe
> > > > you meant the h/w platform can hard code it to be anywhere in memory?
> > > > If so, then yes, DT is needed.
> > >
> > > Ah, I see, sorry for not being clear enough, you are correct: tag storage
> > > is a hardware property, and software needs a mechanism (in this case, the
> > > dt) to discover its properties.
> > >
> > > >
> > > > > At the very least, Linux needs to know the address and size of a memory
> > > > > region to use it. The series is about using the tag storage memory for
> > > > > data. Tag storage cannot be described as a regular memory node because it
> > > > > cannot be tagged (and normal memory can).
> > > >
> > > > If the tag storage lives in the middle of memory, then it would be
> > > > described in the memory node, but removed by being in reserved-memory
> > > > node.
> > >
> > > I don't follow. Would you mind going into more details?
> >
> > It goes back to what I said earlier about /memory nodes describing all
> > the memory. There's no reason to reserve memory if you haven't
> > described that range as memory to begin with. One could presumably
> > just have a memory node for each contiguous chunk and not need
> > /reserved-memory (ignoring the need to say what things are reserved
> > for). That would become very difficult to adjust. Note that the kernel
> > has a hardcoded limit of 64 reserved regions currently and that is not
> > enough for some people. Seems like a lot, but I have no idea how they
> > are (ab)using /reserved-memory.
>
> Ah, I see what you mean, reserved memory is about marking existing memory
> (from a /memory node) as special, not about adding new memory.
>
> After the memblock allocator is initialized, the kernel can use it for its
> own allocations. Kernel allocations are not movable.
>
> When a page is allocated as tagged, the associated tag storage cannot be
> used for data, otherwise the tags would corrupt that data. To avoid this,
> the requirement is that tag storage pages are only used for movable
> allocations. When a page is allocated as tagged, the data in the associated
> tag storage is migrated and the tag storage is taken from the page
> allocator (via alloc_contig_range()).
>
> My understanding is that the memblock allocator can use all the memory from
> a /memory node. If the tags storage memory is declared in a /memory node,
> there exists the possibility that Linux will use tag storage memory for its
> own allocation, which would make that tags storage memory unmovable, and
> thus unusable for storing tags.

No, because the tag storage would be reserved in /reserved-memory.

Of course, the arch code could do something between scanning /memory
nodes and /reserved-memory, but that would be broken arch code.
Ideally, there wouldn't be any arch code in between those 2 points,
but it's complicated. It used to mainly be powerpc, but we keep adding
to the complexity on arm64.

> Looking at early_init_dt_scan_memory(), even if a /memory node if marked at
> hotpluggable, memblock will still use it, unless "movable_node" is set on
> the kernel command line.
>
> That's the reason why I'm not describing tag storage in a /memory node.  Is
> there way to tell the memblock allocator not to use memory from a /memory
> node?
>
> >
> > Let me give an example. Presumably using MTE at all is configurable.
> > If you boot a kernel with MTE disabled (or older and not supporting
> > it), then I'd assume you'd want to use the tag storage for regular
> > memory. Well, If tag storage is already part of /memory, then all you
> > have to do is ignore the tag reserved-memory region. Tweaking the
> > memory nodes would be more work.
>
> Right now, memory is added via memblock_reserve(), and if MTE is disabled
> (for example, via the kernel command line), the code calls
> free_reserved_page() for each tag storage page. I find that straightfoward
> to implement.

But better to just not reserve the region in the first place. Also, it
needs to be simple enough to back port.

Also, does free_reserved_page() work on ranges outside of memblock
range (e.g. beyond end_of_DRAM())? If the tag storage happened to live
at the end of DRAM and you shorten the /memory node size to remove tag
storage, is it still going to work?

Rob
Alexandru Elisei Dec. 18, 2023, 10:59 a.m. UTC | #17
Hi,

On Thu, Dec 14, 2023 at 12:55:14PM -0600, Rob Herring wrote:
> On Thu, Dec 14, 2023 at 9:45 AM Alexandru Elisei
> <alexandru.elisei@arm.com> wrote:
> >
> > Hi,
> >
> > On Wed, Dec 13, 2023 at 02:30:42PM -0600, Rob Herring wrote:
> > > On Wed, Dec 13, 2023 at 11:44 AM Alexandru Elisei
> > > <alexandru.elisei@arm.com> wrote:
> > > >
> > > > On Wed, Dec 13, 2023 at 11:22:17AM -0600, Rob Herring wrote:
> > > > > On Wed, Dec 13, 2023 at 8:51 AM Alexandru Elisei
> > > > > <alexandru.elisei@arm.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Wed, Dec 13, 2023 at 08:06:44AM -0600, Rob Herring wrote:
> > > > > > > On Wed, Dec 13, 2023 at 7:05 AM Alexandru Elisei
> > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > >
> > > > > > > > Hi Rob,
> > > > > > > >
> > > > > > > > On Tue, Dec 12, 2023 at 12:44:06PM -0600, Rob Herring wrote:
> > > > > > > > > On Tue, Dec 12, 2023 at 10:38 AM Alexandru Elisei
> > > > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Rob,
> > > > > > > > > >
> > > > > > > > > > Thank you so much for the feedback, I'm not very familiar with device tree,
> > > > > > > > > > and any comments are very useful.
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 11, 2023 at 11:29:40AM -0600, Rob Herring wrote:
> > > > > > > > > > > On Sun, Nov 19, 2023 at 10:59 AM Alexandru Elisei
> > > > > > > > > > > <alexandru.elisei@arm.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Allow the kernel to get the size and location of the MTE tag storage
> > > > > > > > > > > > regions from the DTB. This memory is marked as reserved for now.
> > > > > > > > > > > >
> > > > > > > > > > > > The DTB node for the tag storage region is defined as:
> > > > > > > > > > > >
> > > > > > > > > > > >         tags0: tag-storage@8f8000000 {
> > > > > > > > > > > >                 compatible = "arm,mte-tag-storage";
> > > > > > > > > > > >                 reg = <0x08 0xf8000000 0x00 0x4000000>;
> > > > > > > > > > > >                 block-size = <0x1000>;
> > > > > > > > > > > >                 memory = <&memory0>;    // Associated tagged memory node
> > > > > > > > > > > >         };
> > > > > > > > > > >
> > > > > > > > > > > I skimmed thru the discussion some. If this memory range is within
> > > > > > > > > > > main RAM, then it definitely belongs in /reserved-memory.
> > > > > > > > > >
> > > > > > > > > > Ok, will do that.
> > > > > > > > > >
> > > > > > > > > > If you don't mind, why do you say that it definitely belongs in
> > > > > > > > > > reserved-memory? I'm not trying to argue otherwise, I'm curious about the
> > > > > > > > > > motivation.
> > > > > > > > >
> > > > > > > > > Simply so that /memory nodes describe all possible memory and
> > > > > > > > > /reserved-memory is just adding restrictions. It's also because
> > > > > > > > > /reserved-memory is what gets handled early, and we don't need
> > > > > > > > > multiple things to handle early.
> > > > > > > > >
> > > > > > > > > > Tag storage is not DMA and can live anywhere in memory.
> > > > > > > > >
> > > > > > > > > Then why put it in DT at all? The only reason CMA is there is to set
> > > > > > > > > the size. It's not even clear to me we need CMA in DT either. The
> > > > > > > > > reasoning long ago was the kernel didn't do a good job of moving and
> > > > > > > > > reclaiming contiguous space, but that's supposed to be better now (and
> > > > > > > > > most h/w figured out they need IOMMUs).
> > > > > > > > >
> > > > > > > > > But for tag storage you know the size as it is a function of the
> > > > > > > > > memory size, right? After all, you are validating the size is correct.
> > > > > > > > > I guess there is still the aspect of whether you want enable MTE or
> > > > > > > > > not which could be done in a variety of ways.
> > > > > > > >
> > > > > > > > Oh, sorry, my bad, I should have been clearer about this. I don't want to
> > > > > > > > put it in the DT as a "linux,cma" node. But I want it to be managed by CMA.
> > > > > > >
> > > > > > > Yes, I understand, but my point remains. Why do you need this in DT?
> > > > > > > If the location doesn't matter and you can calculate the size from the
> > > > > > > memory size, what else is there to add to the DT?
> > > > > >
> > > > > > I am afraid there has been a misunderstanding. What do you mean by
> > > > > > "location doesn't matter"?
> > > > >
> > > > > You said:
> > > > > > Tag storage is not DMA and can live anywhere in memory.
> > > > >
> > > > > Which I took as the kernel can figure out where to put it. But maybe
> > > > > you meant the h/w platform can hard code it to be anywhere in memory?
> > > > > If so, then yes, DT is needed.
> > > >
> > > > Ah, I see, sorry for not being clear enough, you are correct: tag storage
> > > > is a hardware property, and software needs a mechanism (in this case, the
> > > > dt) to discover its properties.
> > > >
> > > > >
> > > > > > At the very least, Linux needs to know the address and size of a memory
> > > > > > region to use it. The series is about using the tag storage memory for
> > > > > > data. Tag storage cannot be described as a regular memory node because it
> > > > > > cannot be tagged (and normal memory can).
> > > > >
> > > > > If the tag storage lives in the middle of memory, then it would be
> > > > > described in the memory node, but removed by being in reserved-memory
> > > > > node.
> > > >
> > > > I don't follow. Would you mind going into more details?
> > >
> > > It goes back to what I said earlier about /memory nodes describing all
> > > the memory. There's no reason to reserve memory if you haven't
> > > described that range as memory to begin with. One could presumably
> > > just have a memory node for each contiguous chunk and not need
> > > /reserved-memory (ignoring the need to say what things are reserved
> > > for). That would become very difficult to adjust. Note that the kernel
> > > has a hardcoded limit of 64 reserved regions currently and that is not
> > > enough for some people. Seems like a lot, but I have no idea how they
> > > are (ab)using /reserved-memory.
> >
> > Ah, I see what you mean, reserved memory is about marking existing memory
> > (from a /memory node) as special, not about adding new memory.
> >
> > After the memblock allocator is initialized, the kernel can use it for its
> > own allocations. Kernel allocations are not movable.
> >
> > When a page is allocated as tagged, the associated tag storage cannot be
> > used for data, otherwise the tags would corrupt that data. To avoid this,
> > the requirement is that tag storage pages are only used for movable
> > allocations. When a page is allocated as tagged, the data in the associated
> > tag storage is migrated and the tag storage is taken from the page
> > allocator (via alloc_contig_range()).
> >
> > My understanding is that the memblock allocator can use all the memory from
> > a /memory node. If the tags storage memory is declared in a /memory node,
> > there exists the possibility that Linux will use tag storage memory for its
> > own allocation, which would make that tags storage memory unmovable, and
> > thus unusable for storing tags.
> 
> No, because the tag storage would be reserved in /reserved-memory.
> 
> Of course, the arch code could do something between scanning /memory
> nodes and /reserved-memory, but that would be broken arch code.
> Ideally, there wouldn't be any arch code in between those 2 points,
> but it's complicated. It used to mainly be powerpc, but we keep adding
> to the complexity on arm64.

Ah, yes, thats what I was referring to, the fact that the memory nodes are
parsed in setup_arch -> setup_machine_fdt -> early_init_dt_scan, and the
reserved memory is parsed later in setup_arch -> arm64_memblock_init.

If the rule is that no memblock allocations can take place between
setup_machine_fdt() and arm64_memblock_init(), then putting tag storage in
a /memory node will work, thank you for the clarification.

> 
> > Looking at early_init_dt_scan_memory(), even if a /memory node if marked at
> > hotpluggable, memblock will still use it, unless "movable_node" is set on
> > the kernel command line.
> >
> > That's the reason why I'm not describing tag storage in a /memory node.  Is
> > there way to tell the memblock allocator not to use memory from a /memory
> > node?
> >
> > >
> > > Let me give an example. Presumably using MTE at all is configurable.
> > > If you boot a kernel with MTE disabled (or older and not supporting
> > > it), then I'd assume you'd want to use the tag storage for regular
> > > memory. Well, If tag storage is already part of /memory, then all you
> > > have to do is ignore the tag reserved-memory region. Tweaking the
> > > memory nodes would be more work.
> >
> > Right now, memory is added via memblock_reserve(), and if MTE is disabled
> > (for example, via the kernel command line), the code calls
> > free_reserved_page() for each tag storage page. I find that straightfoward
> > to implement.
> 
> But better to just not reserve the region in the first place. Also, it
> needs to be simple enough to back port.

I don't think that works - reserved memory is parsed in setup_arch ->
arm64_memblock_init, and the cpu capabilities are initialized later, in
smp_prepare_boot_cpu.

> 
> Also, does free_reserved_page() work on ranges outside of memblock
> range (e.g. beyond end_of_DRAM())? If the tag storage happened to live
> at the end of DRAM and you shorten the /memory node size to remove tag
> storage, is it still going to work?

Tag storage memory is discovered in 2 staged: first it is added to memblock
with memblock_add(), then reserved with memblock_reserve().  This is
performed in setup_arch(), after setup_machine_fdt(), and before
arm64_memblock_init(). The tag torage code keeps an array of the discovered
tag regions. This is implemented in this patch.

The next patch [1] adds an arch_initcall that checks if
memblock_end_of_DRAM() is less than the upper address of a tag storage
region. If that is the case, then tag storage memory is kept as reserved
and remains unused by the kernel.

The next check is for mte enabled: if it is disabled, then the pages are
unreserved by doing free_reserved_page().

And finally, if all the checks pass, the tag storage pages are put on the
MIGRATE_CMA lists with init_cma_reserved_pageblock().

[1] https://lore.kernel.org/all/20231119165721.9849-12-alexandru.elisei@arm.com/

Thanks,
Alex

> 
> Rob
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b071a00425d..fe8276fdc7a8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,18 @@  config ARM64_MTE
 
 	  Documentation/arch/arm64/memory-tagging-extension.rst.
 
+if ARM64_MTE
+config ARM64_MTE_TAG_STORAGE
+	bool "Dynamic MTE tag storage management"
+	help
+	  Adds support for dynamic management of the memory used by the hardware
+	  for storing MTE tags. This memory, unlike normal memory, cannot be
+	  tagged. When it is used to store tags for another memory location it
+	  cannot be used for any type of allocation.
+
+	  If unsure, say N
+endif # ARM64_MTE
+
 endmenu # "ARMv8.5 architectural features"
 
 menu "ARMv8.7 architectural features"
diff --git a/arch/arm64/include/asm/mte_tag_storage.h b/arch/arm64/include/asm/mte_tag_storage.h
new file mode 100644
index 000000000000..8f86c4f9a7c3
--- /dev/null
+++ b/arch/arm64/include/asm/mte_tag_storage.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+#ifndef __ASM_MTE_TAG_STORAGE_H
+#define __ASM_MTE_TAG_STORAGE_H
+
+#ifdef CONFIG_ARM64_MTE_TAG_STORAGE
+void mte_tag_storage_init(void);
+#else
+static inline void mte_tag_storage_init(void)
+{
+}
+#endif /* CONFIG_ARM64_MTE_TAG_STORAGE */
+#endif /* __ASM_MTE_TAG_STORAGE_H  */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index d95b3d6b471a..5f031bf9f8f1 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -70,6 +70,7 @@  obj-$(CONFIG_CRASH_CORE)		+= crash_core.o
 obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
 obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
 obj-$(CONFIG_ARM64_MTE)			+= mte.o
+obj-$(CONFIG_ARM64_MTE_TAG_STORAGE)	+= mte_tag_storage.o
 obj-y					+= vdso-wrap.o
 obj-$(CONFIG_COMPAT_VDSO)		+= vdso32-wrap.o
 obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)	+= patch-scs.o
diff --git a/arch/arm64/kernel/mte_tag_storage.c b/arch/arm64/kernel/mte_tag_storage.c
new file mode 100644
index 000000000000..fa6267ef8392
--- /dev/null
+++ b/arch/arm64/kernel/mte_tag_storage.c
@@ -0,0 +1,256 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Support for dynamic tag storage.
+ *
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#include <linux/memblock.h>
+#include <linux/mm.h>
+#include <linux/of_device.h>
+#include <linux/of_fdt.h>
+#include <linux/range.h>
+#include <linux/string.h>
+#include <linux/xarray.h>
+
+#include <asm/mte_tag_storage.h>
+
+struct tag_region {
+	struct range mem_range;	/* Memory associated with the tag storage, in PFNs. */
+	struct range tag_range;	/* Tag storage memory, in PFNs. */
+	u32 block_size;		/* Tag block size, in pages. */
+};
+
+#define MAX_TAG_REGIONS	32
+
+static struct tag_region tag_regions[MAX_TAG_REGIONS];
+static int num_tag_regions;
+
+static int __init tag_storage_of_flat_get_range(unsigned long node, const __be32 *reg,
+						int reg_len, struct range *range)
+{
+	int addr_cells = dt_root_addr_cells;
+	int size_cells = dt_root_size_cells;
+	u64 size;
+
+	if (reg_len / 4 > addr_cells + size_cells)
+		return -EINVAL;
+
+	range->start = PHYS_PFN(of_read_number(reg, addr_cells));
+	size = PHYS_PFN(of_read_number(reg + addr_cells, size_cells));
+	if (size == 0) {
+		pr_err("Invalid node");
+		return -EINVAL;
+	}
+	range->end = range->start + size - 1;
+
+	return 0;
+}
+
+static int __init tag_storage_of_flat_get_tag_range(unsigned long node,
+						    struct range *tag_range)
+{
+	const __be32 *reg;
+	int reg_len;
+
+	reg = of_get_flat_dt_prop(node, "reg", &reg_len);
+	if (reg == NULL) {
+		pr_err("Invalid metadata node");
+		return -EINVAL;
+	}
+
+	return tag_storage_of_flat_get_range(node, reg, reg_len, tag_range);
+}
+
+static int __init tag_storage_of_flat_get_memory_range(unsigned long node, struct range *mem)
+{
+	const __be32 *reg;
+	int reg_len;
+
+	reg = of_get_flat_dt_prop(node, "linux,usable-memory", &reg_len);
+	if (reg == NULL)
+		reg = of_get_flat_dt_prop(node, "reg", &reg_len);
+
+	if (reg == NULL) {
+		pr_err("Invalid memory node");
+		return -EINVAL;
+	}
+
+	return tag_storage_of_flat_get_range(node, reg, reg_len, mem);
+}
+
+struct find_memory_node_arg {
+	unsigned long node;
+	u32 phandle;
+};
+
+static int __init fdt_find_memory_node(unsigned long node, const char *uname,
+				       int depth, void *data)
+{
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+	struct find_memory_node_arg *arg = data;
+
+	if (depth != 1 || !type || strcmp(type, "memory") != 0)
+		return 0;
+
+	if (of_get_flat_dt_phandle(node) == arg->phandle) {
+		arg->node = node;
+		return 1;
+	}
+
+	return 0;
+}
+
+static int __init tag_storage_get_memory_node(unsigned long tag_node, unsigned long *mem_node)
+{
+	struct find_memory_node_arg arg = { 0 };
+	const __be32 *memory_prop;
+	u32 mem_phandle;
+	int ret, reg_len;
+
+	memory_prop = of_get_flat_dt_prop(tag_node, "memory", &reg_len);
+	if (!memory_prop) {
+		pr_err("Missing 'memory' property in the tag storage node");
+		return -EINVAL;
+	}
+
+	mem_phandle = be32_to_cpup(memory_prop);
+	arg.phandle = mem_phandle;
+
+	ret = of_scan_flat_dt(fdt_find_memory_node, &arg);
+	if (ret != 1) {
+		pr_err("Associated memory node not found");
+		return -EINVAL;
+	}
+
+	*mem_node = arg.node;
+
+	return 0;
+}
+
+static int __init tag_storage_of_flat_read_u32(unsigned long node, const char *propname,
+					       u32 *retval)
+{
+	const __be32 *reg;
+
+	reg = of_get_flat_dt_prop(node, propname, NULL);
+	if (!reg)
+		return -EINVAL;
+
+	*retval = be32_to_cpup(reg);
+	return 0;
+}
+
+static u32 __init get_block_size_pages(u32 block_size_bytes)
+{
+	u32 a = PAGE_SIZE;
+	u32 b = block_size_bytes;
+	u32 r;
+
+	/* Find greatest common divisor using the Euclidian algorithm. */
+	do {
+		r = a % b;
+		a = b;
+		b = r;
+	} while (b != 0);
+
+	return PHYS_PFN(PAGE_SIZE * block_size_bytes / a);
+}
+
+static int __init fdt_init_tag_storage(unsigned long node, const char *uname,
+				       int depth, void *data)
+{
+	struct tag_region *region;
+	unsigned long mem_node;
+	struct range *mem_range;
+	struct range *tag_range;
+	u32 block_size_bytes;
+	u32 nid = 0;
+	int ret;
+
+	if (depth != 1 || !strstr(uname, "tag-storage"))
+		return 0;
+
+	if (!of_flat_dt_is_compatible(node, "arm,mte-tag-storage"))
+		return 0;
+
+	if (num_tag_regions == MAX_TAG_REGIONS) {
+		pr_err("Maximum number of tag storage regions exceeded");
+		return -EINVAL;
+	}
+
+	region = &tag_regions[num_tag_regions];
+	mem_range = &region->mem_range;
+	tag_range = &region->tag_range;
+
+	ret = tag_storage_of_flat_get_tag_range(node, tag_range);
+	if (ret) {
+		pr_err("Invalid tag storage node");
+		return ret;
+	}
+
+	ret = tag_storage_get_memory_node(node, &mem_node);
+	if (ret)
+		return ret;
+
+	ret = tag_storage_of_flat_get_memory_range(mem_node, mem_range);
+	if (ret) {
+		pr_err("Invalid address for associated data memory node");
+		return ret;
+	}
+
+	/* The tag region must exactly match the corresponding memory. */
+	if (range_len(tag_range) * 32 != range_len(mem_range)) {
+		pr_err("Tag storage region 0x%llx-0x%llx does not cover the memory region 0x%llx-0x%llx",
+		       PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end),
+		       PFN_PHYS(mem_range->start), PFN_PHYS(mem_range->end));
+		return -EINVAL;
+	}
+
+	ret = tag_storage_of_flat_read_u32(node, "block-size", &block_size_bytes);
+	if (ret || block_size_bytes == 0) {
+		pr_err("Invalid or missing 'block-size' property");
+		return -EINVAL;
+	}
+	region->block_size = get_block_size_pages(block_size_bytes);
+	if (range_len(tag_range) % region->block_size != 0) {
+		pr_err("Tag storage region size 0x%llx is not a multiple of block size %u",
+		       PFN_PHYS(range_len(tag_range)), region->block_size);
+		return -EINVAL;
+	}
+
+	ret = tag_storage_of_flat_read_u32(mem_node, "numa-node-id", &nid);
+	if (ret)
+		nid = numa_node_id();
+
+	ret = memblock_add_node(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)),
+				nid, MEMBLOCK_NONE);
+	if (ret) {
+		pr_err("Error adding tag memblock (%d)", ret);
+		return ret;
+	}
+	memblock_reserve(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
+
+	pr_info("Found tag storage region 0x%llx-0x%llx, block size %u pages",
+		PFN_PHYS(tag_range->start), PFN_PHYS(tag_range->end), region->block_size);
+
+	num_tag_regions++;
+
+	return 0;
+}
+
+void __init mte_tag_storage_init(void)
+{
+	struct range *tag_range;
+	int i, ret;
+
+	ret = of_scan_flat_dt(fdt_init_tag_storage, NULL);
+	if (ret) {
+		for (i = 0; i < num_tag_regions; i++) {
+			tag_range = &tag_regions[i].tag_range;
+			memblock_remove(PFN_PHYS(tag_range->start), PFN_PHYS(range_len(tag_range)));
+		}
+		num_tag_regions = 0;
+		pr_info("MTE tag storage region management disabled");
+	}
+}
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 417a8a86b2db..1b77138c1aa5 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -42,6 +42,7 @@ 
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
+#include <asm/mte_tag_storage.h>
 #include <asm/numa.h>
 #include <asm/scs.h>
 #include <asm/sections.h>
@@ -342,6 +343,12 @@  void __init __no_sanitize_address setup_arch(char **cmdline_p)
 			   FW_BUG "Booted with MMU enabled!");
 	}
 
+	/*
+	 * Must be called before memory limits are enforced by
+	 * arm64_memblock_init().
+	 */
+	mte_tag_storage_init();
+
 	arm64_memblock_init();
 
 	paging_init();