Message ID | 20220711122459.13773-5-me@linux.beauty (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | add struct page and Direct I/O support to reserved memory | expand |
On Mon, Jul 11, 2022 at 2:24 PM Li Chen <me@linux.beauty> wrote: > > From: Li Chen <lchen@ambarella.com> > > This sample driver shows how to build struct pages support to no-map rmem. > > Signed-off-by: Li Chen <lchen@ambarella.com> Not sure what a sample driver helps here if there are no actual users in-tree. It would make more sense to merge the driver that wants to actually use this first, and then add the additional feature. > Change-Id: Ie78494fa86fda40ceb73eab3b8ba505d0ad851a1 Please drop these lines, the Change-Id fields are useless in a public repository. > +/* > + * dts example > + * rmem: rmem@1 { > + * compatible = "shared-dma-pool"; > + * no-map; > + * size = <0x0 0x20000000>; > + * }; > + * perf { > + * compatible = "example,rmem"; > + * memory-region = <&rmem>; > + * }; The problem here is that the DT is meant to describe the platform in an OS independent way, so having a binding that just corresponds to a user space interface is not a good abstraction. > + vaddr = reserved_mem_memremap_pages(dev, rmem); > + if (IS_ERR_OR_NULL(vaddr)) > + return PTR_ERR(vaddr); Using IS_ERR_OR_NULL() is usually an indication of a bad interface. For the reserved_mem_memremap_pages(), you should decide whether to return NULL on error or an error pointer, but not both. Arnd
Hi Arnd, ---- On Mon, 11 Jul 2022 21:28:10 +0800 Arnd Bergmann <arnd@arndb.de> wrote --- > On Mon, Jul 11, 2022 at 2:24 PM Li Chen <me@linux.beauty> wrote: > > > > From: Li Chen <lchen@ambarella.com> > > > > This sample driver shows how to build struct pages support to no-map rmem. > > > > Signed-off-by: Li Chen <lchen@ambarella.com> > > Not sure what a sample driver helps here if there are no actual users in-tree. > > It would make more sense to merge the driver that wants to actually use this > first, and then add the additional feature. Totally agree, but we plan to start rewriting our video driver in a long time, it has many legacy codes and I need to rewrite a lot of codes to migrate to v4l2. That's why I also submit a sample driver here: to make the review progress easier and don't need reviewers to read video driver codes. > > +/* > > + * dts example > > + * rmem: rmem@1 { > > + * compatible = "shared-dma-pool"; > > + * no-map; > > + * size = <0x0 0x20000000>; > > + * }; > > + * perf { > > + * compatible = "example,rmem"; > > + * memory-region = <&rmem>; > > + * }; > > The problem here is that the DT is meant to describe the platform in an OS > independent way, so having a binding that just corresponds to a user space > interface is not a good abstraction. Gotcha, but IMO dts + rmem is the only choice for our use case. In our real case, we use reg instead of size to specify the physical address, so memremap cannot be used. > > > + vaddr = reserved_mem_memremap_pages(dev, rmem); > > + if (IS_ERR_OR_NULL(vaddr)) > > + return PTR_ERR(vaddr); > > Using IS_ERR_OR_NULL() is usually an indication of a bad interface. > > For the reserved_mem_memremap_pages(), you should decide whether to return > NULL on error or an error pointer, but not both. Thanks, will fix in v2. > > Arnd >
On Tue, Jul 12, 2022 at 2:26 AM Li Chen <me@linux.beauty> wrote: > ---- On Mon, 11 Jul 2022 21:28:10 +0800 Arnd Bergmann <arnd@arndb.de> wrote --- > > On Mon, Jul 11, 2022 at 2:24 PM Li Chen <me@linux.beauty> wrote: > > > > > > From: Li Chen <lchen@ambarella.com> > > > > > > This sample driver shows how to build struct pages support to no-map rmem. > > > > > > Signed-off-by: Li Chen <lchen@ambarella.com> > > > > Not sure what a sample driver helps here if there are no actual users in-tree. > > > > It would make more sense to merge the driver that wants to actually use this > > first, and then add the additional feature. > > Totally agree, but we plan to start rewriting our video driver in a long time, it > has many legacy codes and I need to rewrite a lot of codes to migrate to v4l2. > That's why I also submit a sample driver here: to make the review progress > easier and don't need reviewers to read video driver codes. The problem is that this patch may not be the right solution for your new driver either. As Christoph also commented, what you do here is rather unusual, and without seeing the video driver first, we have no way of knowing whether there is something the driver should be doing differently to solve the original problem. > > > +/* > > > + * dts example > > > + * rmem: rmem@1 { > > > + * compatible = "shared-dma-pool"; > > > + * no-map; > > > + * size = <0x0 0x20000000>; > > > + * }; > > > + * perf { > > > + * compatible = "example,rmem"; > > > + * memory-region = <&rmem>; > > > + * }; > > > > The problem here is that the DT is meant to describe the platform in an OS > > independent way, so having a binding that just corresponds to a user space > > interface is not a good abstraction. > > Gotcha, but IMO dts + rmem is the only choice for our use case. In our real > case, we use reg instead of size to specify the physical address, so > memremap cannot be used. Does your hardware require a fixed address for the buffer? If it can be anywhere in memory (or at least within a certain range) but just has to be physically contiguous, the normal way would be to use a CMA area to allocate from, which gives you 'struct page' backed pages. Arnd
Hi Arnd, ---- On Tue, 12 Jul 2022 15:50:46 +0800 Arnd Bergmann <arnd@arndb.de> wrote --- > On Tue, Jul 12, 2022 at 2:26 AM Li Chen <me@linux.beauty> wrote: > > ---- On Mon, 11 Jul 2022 21:28:10 +0800 Arnd Bergmann <arnd@arndb.de> wrote --- > > > On Mon, Jul 11, 2022 at 2:24 PM Li Chen <me@linux.beauty> wrote: > > > > > > > > From: Li Chen <lchen@ambarella.com> > > > > > > > > This sample driver shows how to build struct pages support to no-map rmem. > > > > > > > > Signed-off-by: Li Chen <lchen@ambarella.com> > > > > > > Not sure what a sample driver helps here if there are no actual users in-tree. > > > > > > It would make more sense to merge the driver that wants to actually use this > > > first, and then add the additional feature. > > > > Totally agree, but we plan to start rewriting our video driver in a long time, it > > has many legacy codes and I need to rewrite a lot of codes to migrate to v4l2. > > That's why I also submit a sample driver here: to make the review progress > > easier and don't need reviewers to read video driver codes. > > The problem is that this patch may not be the right solution for your new > driver either. As Christoph also commented, what you do here is rather > unusual, and without seeing the video driver first, we have no way of > knowing whether there is something the driver should be doing > differently to solve the original problem. Ok, I will update the patch series after rewriting and upstreaming our video driver. > > > > > +/* > > > > + * dts example > > > > + * rmem: rmem@1 { > > > > + * compatible = "shared-dma-pool"; > > > > + * no-map; > > > > + * size = <0x0 0x20000000>; > > > > + * }; > > > > + * perf { > > > > + * compatible = "example,rmem"; > > > > + * memory-region = <&rmem>; > > > > + * }; > > > > > > The problem here is that the DT is meant to describe the platform in an OS > > > independent way, so having a binding that just corresponds to a user space > > > interface is not a good abstraction. > > > > Gotcha, but IMO dts + rmem is the only choice for our use case. In our real > > case, we use reg instead of size to specify the physical address, so > > memremap cannot be used. > > Does your hardware require a fixed address for the buffer? If it can be > anywhere in memory (or at least within a certain range) but just has to > be physically contiguous, the normal way would be to use a CMA area > to allocate from, which gives you 'struct page' backed pages. The limitation is our DSP can only access 32bit memory, but total dram is > 4G, so I cannot use "size = <...>" in our real case (it might get memory above 4G). I'm not sure if other vendors' DSP also has this limitation, if so, how do they deal with it if throughput matters. Regards, Li
On Tue, Jul 12, 2022 at 11:58 AM Li Chen <me@linux.beauty> wrote: > > On Tue, Jul 12, 2022 at 2:26 AM Li Chen <me@linux.beauty> wrote: > > > ---- On Mon, 11 Jul 2022 21:28:10 +0800 Arnd Bergmann <arnd@arndb.de> wrote --- > > > > On Mon, Jul 11, 2022 at 2:24 PM Li Chen <me@linux.beauty> wrote: > > > > The problem here is that the DT is meant to describe the platform in an OS > > > > independent way, so having a binding that just corresponds to a user space > > > > interface is not a good abstraction. > > > > > > Gotcha, but IMO dts + rmem is the only choice for our use case. In our real > > > case, we use reg instead of size to specify the physical address, so > > > memremap cannot be used. > > > > Does your hardware require a fixed address for the buffer? If it can be > > anywhere in memory (or at least within a certain range) but just has to > > be physically contiguous, the normal way would be to use a CMA area > > to allocate from, which gives you 'struct page' backed pages. > > The limitation is our DSP can only access 32bit memory, but total dram is > 4G, so I cannot use > "size = <...>" in our real case (it might get memory above 4G). I'm not sure if other vendors' DSP also has > this limitation, if so, how do they deal with it if throughput matters. This is a common limitation that gets handled automatically by setting the dma_mask of the device through the dma-ranges property in DT. When the driver does dma_alloc_coherent() or similar to gets its buffer, it will then allocate pages below this boundary. If you need a large contiguous memory area, then using CMA allows you to specify a region of memory that is kept reserved for DMA allocations, so a call to dma_alloc_coherent() on your device will get contiguous pages from that area, and move other data in those pages elsewhere if necessary. non-movable data is allocated from pages outside of the CMA reserved area in this case. Arnd
On Tue, Jul 12, 2022 at 12:08 PM Arnd Bergmann <arnd@arndb.de> wrote: > On Tue, Jul 12, 2022 at 11:58 AM Li Chen <me@linux.beauty> wrote: > > > > The limitation is our DSP can only access 32bit memory, but total dram is > 4G, so I cannot use > > "size = <...>" in our real case (it might get memory above 4G). I'm not sure if other vendors' DSP also has > > this limitation, if so, how do they deal with it if throughput matters. > > This is a common limitation that gets handled automatically by setting > the dma_mask of the device through the dma-ranges property in DT. > When the driver does dma_alloc_coherent() or similar to gets its buffer, > it will then allocate pages below this boundary. To clarify: in the DT, you can either add a 'alloc-ranges' property to limit where the CMA area gets allocated, or use a 'reg' property instead of the 'size' property to force a specific address. I would expect that in either case, you get the type of memory area you want (a reserved set of addresses with page structures that you can use for contiguous allocations within the first 4GB), but it's possible that I'm missing some more details here. Arnd
Hi Arnd, ---- On Tue, 12 Jul 2022 18:08:10 +0800 Arnd Bergmann <arnd@arndb.de> wrote --- > On Tue, Jul 12, 2022 at 11:58 AM Li Chen <me@linux.beauty> wrote: > > > On Tue, Jul 12, 2022 at 2:26 AM Li Chen <me@linux.beauty> wrote: > > > > ---- On Mon, 11 Jul 2022 21:28:10 +0800 Arnd Bergmann <arnd@arndb.de> wrote --- > > > > > On Mon, Jul 11, 2022 at 2:24 PM Li Chen <me@linux.beauty> wrote: > > > > > The problem here is that the DT is meant to describe the platform in an OS > > > > > independent way, so having a binding that just corresponds to a user space > > > > > interface is not a good abstraction. > > > > > > > > Gotcha, but IMO dts + rmem is the only choice for our use case. In our real > > > > case, we use reg instead of size to specify the physical address, so > > > > memremap cannot be used. > > > > > > Does your hardware require a fixed address for the buffer? If it can be > > > anywhere in memory (or at least within a certain range) but just has to > > > be physically contiguous, the normal way would be to use a CMA area > > > to allocate from, which gives you 'struct page' backed pages. > > > > The limitation is our DSP can only access 32bit memory, but total dram is > 4G, so I cannot use > > "size = <...>" in our real case (it might get memory above 4G). I'm not sure if other vendors' DSP also has > > this limitation, if so, how do they deal with it if throughput matters. > > This is a common limitation that gets handled automatically by setting > the dma_mask of the device through the dma-ranges property in DT. > When the driver does dma_alloc_coherent() or similar to gets its buffer, > it will then allocate pages below this boundary. Thanks for the tip! I wasn't aware that dma-ranges can be used by devices other than PCIe controllers. > If you need a large contiguous memory area, then using CMA allows > you to specify a region of memory that is kept reserved for DMA > allocations, so a call to dma_alloc_coherent() on your device will > get contiguous pages from that area, and move other data in those > pages elsewhere if necessary. non-movable data is allocated from > pages outside of the CMA reserved area in this case. We need a large memory pool, around 2G. I will try CMA and dma-ranges later! Regards, Li
On Tue, Jul 12, 2022 at 12:55 PM Li Chen <me@linux.beauty> wrote: > > > > This is a common limitation that gets handled automatically by setting > > the dma_mask of the device through the dma-ranges property in DT. > > When the driver does dma_alloc_coherent() or similar to gets its buffer, > > it will then allocate pages below this boundary. > > Thanks for the tip! I wasn't aware that dma-ranges can be used by devices other than PCIe controllers. You should actually have dma-ranges listed in the parent of any DMA master capable device, to list the exact DMA capabilities. Without this, devices fall back to the default 32-bit address limit, which would be correct for your video device but is often wrong in other devices. Arnd
Hi Arnd, ---- On Tue, 12 Jul 2022 16:50:46 +0900 Arnd Bergmann wrote --- > Does your hardware require a fixed address for the buffer? If it can be > anywhere in memory (or at least within a certain range) but just has to > be physically contiguous, the normal way would be to use a CMA area > to allocate from, which gives you 'struct page' backed pages. CMA does support Direct I/O, but it has its own issue: It does not guarantee that the memory previously borrowed by the OS will be returned to the device. We've been plagued by examples like this in the past: Many other kernel modules/subsystems have already allocated much memory from both non-CMA and CMA memory, When our DSP driver got probed then, cma_alloc will fail in that non-CMA system memory is not enough for CMA memory to migrate. Regards, Li
On Thu, Aug 4, 2022 at 9:17 AM Li Chen <me@linux.beauty> wrote: > ---- On Tue, 12 Jul 2022 16:50:46 +0900 Arnd Bergmann wrote --- > > Does your hardware require a fixed address for the buffer? If it can be > > anywhere in memory (or at least within a certain range) but just has to > > be physically contiguous, the normal way would be to use a CMA area > > to allocate from, which gives you 'struct page' backed pages. > > CMA does support Direct I/O, but it has its own issue: > It does not guarantee that the memory previously borrowed by the OS will be returned to the device. > > We've been plagued by examples like this in the past: > Many other kernel modules/subsystems have already allocated much memory from both non-CMA and CMA memory, > When our DSP driver got probed then, cma_alloc will fail in that non-CMA system memory is not enough > for CMA memory to migrate. This part should at least be possible to solve by declaring the amount and location of CMA areas in the right way. It's not great to fine-tune the DT for a particular kernel's use, but if you know which other drivers require CMA type allocations you can find a lower bound that should suffice. Most coherent allocations tend to be long-lived and only for very small memory regions. If you have another driver that uses large or periodic dma_alloc_coherent() type allocations, you can consider either giving that device its own CMA area, or fixing it to use streaming mappings. Arnd
---- On Thu, 04 Aug 2022 17:24:20 +0900 Arnd Bergmann wrote --- > On Thu, Aug 4, 2022 at 9:17 AM Li Chen me@linux.beauty> wrote: > > ---- On Tue, 12 Jul 2022 16:50:46 +0900 Arnd Bergmann wrote --- > > > Does your hardware require a fixed address for the buffer? If it can be > > > anywhere in memory (or at least within a certain range) but just has to > > > be physically contiguous, the normal way would be to use a CMA area > > > to allocate from, which gives you 'struct page' backed pages. > > > > CMA does support Direct I/O, but it has its own issue: > > It does not guarantee that the memory previously borrowed by the OS will be returned to the device. > > > > We've been plagued by examples like this in the past: > > Many other kernel modules/subsystems have already allocated much memory from both non-CMA and CMA memory, > > When our DSP driver got probed then, cma_alloc will fail in that non-CMA system memory is not enough > > for CMA memory to migrate. > > This part should at least be possible to solve by declaring the amount > and location of > CMA areas in the right way. It's not great to fine-tune the DT for a > particular kernel's > use, but if you know which other drivers require CMA type allocations > you can find a lower > bound that should suffice. That's the problem, haha. End users(customers) may modprobe many other modules and modprobe our driver in the end. We cannot decide the probe order for end users. Apart from our cases, I heard there are some other cases where cma_alloc failed even non-cma system memory has enough memory because pages in CMA memory are pinned and cannot move out of CMA. There are some fixes like 1. move these memory out of CMA before pinned 2. only allow non-long-time pinned memory allocation from CMA. But these two solutions are not merged into the mainline yet. > > Most coherent allocations tend to be long-lived and only for very > small memory regions. > If you have another driver that uses large or periodic > dma_alloc_coherent() type allocations, > you can consider either giving that device its own CMA area, or fixing > it to use streaming > mappings. Device-wise CMA also suffers from the problems I mentioned, other modules/subsystems may have already alloc from this CMA area and refuse to return it back. Regards, Li
On Thu, Aug 4, 2022 at 12:07 PM Li Chen <me@linux.beauty> wrote: > Apart from our cases, I heard there are some other cases where cma_alloc > failed even non-cma system memory has enough memory because pages in > CMA memory are pinned and cannot move out of CMA. There are some fixes like > 1. move these memory out of CMA before pinned > 2. only allow non-long-time pinned memory allocation from CMA. > > But these two solutions are not merged into the mainline yet. Right, I think this has come up before, not sure why it wasn't implemented. My feeling is that 2. cannot work because you don't know if memory will be pinned in the future at the time of allocation, but 1. should be doable. A possible alternative here would be to avoid the pinning. In most workloads it should not be possible to pin a large number of pages, but I assume there is a good reason to do so here. Arnd
On 05.08.22 16:09, Arnd Bergmann wrote: > On Thu, Aug 4, 2022 at 12:07 PM Li Chen <me@linux.beauty> wrote: > >> Apart from our cases, I heard there are some other cases where cma_alloc >> failed even non-cma system memory has enough memory because pages in >> CMA memory are pinned and cannot move out of CMA. There are some fixes like >> 1. move these memory out of CMA before pinned >> 2. only allow non-long-time pinned memory allocation from CMA. >> >> But these two solutions are not merged into the mainline yet. > > Right, I think this has come up before, not sure why it wasn't implemented. > My feeling is that 2. cannot work because you don't know if memory will be > pinned in the future at the time of allocation, but 1. should be doable. We disallow longterm pinning of CMA memory already and migrate it out of the CMA region. If migration fails, we reject pinning. See 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") and recent 1c563432588d ("mm: fix is_pinnable_page against a cma page") It's worth nothing that is_pinnable_page() will be renamed to is_longterm_pinnable_page() soon to express what it actually means. Note that some FOLL_GET users (vmsplice, O_DIRECT) still have to be converted to FOLL_PIN, and especially also set FOLL_LONGTERM (vmsplice).
diff --git a/samples/Kconfig b/samples/Kconfig index b0503ef058d3..d83ba02ec215 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -6,6 +6,13 @@ menuconfig SAMPLES if SAMPLES +config RESERVED_MEM_DIO + tristate "Build example reserved mem dio support" + depends on OF_RESERVED_MEM_DIO_SUPPORT + help + Build kernel space sample module to show how to add struct + page and dio support to reserved memory. + config SAMPLE_AUXDISPLAY bool "auxdisplay sample" depends on CC_CAN_LINK diff --git a/samples/Makefile b/samples/Makefile index 087e0988ccc5..106a386869eb 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 # Makefile for Linux samples code +obj-$(CONFIG_RESERVED_MEM_DIO) += reserved_mem/ subdir-$(CONFIG_SAMPLE_AUXDISPLAY) += auxdisplay subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/ diff --git a/samples/reserved_mem/Makefile b/samples/reserved_mem/Makefile new file mode 100755 index 000000000000..a4f5c8bc08dc --- /dev/null +++ b/samples/reserved_mem/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_RESERVED_MEM_DIO) += rmem_dio.o diff --git a/samples/reserved_mem/rmem_dio.c b/samples/reserved_mem/rmem_dio.c new file mode 100755 index 000000000000..ffb3395de63c --- /dev/null +++ b/samples/reserved_mem/rmem_dio.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Sample driver for reserved memory with struct page and dio support. + * + * wsps is abbr for with struct page support + * + * Copyright (C) 2022 Li Chen <lchen@ambarella.com> + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/fs.h> +#include <linux/errno.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <linux/cdev.h> +#include <linux/module.h> +#include <linux/io.h> +#include <linux/miscdevice.h> +#include <linux/of_reserved_mem.h> +#include <linux/platform_device.h> + +/* + * dts example + * rmem: rmem@1 { + * compatible = "shared-dma-pool"; + * no-map; + * size = <0x0 0x20000000>; + * }; + * perf { + * compatible = "example,rmem"; + * memory-region = <&rmem>; + * }; + */ + +static struct reserved_mem *rmem; + +static int rmem_wsps_open(struct inode *inode, struct file *filp) +{ + return 0; +} + +static int rmem_wsps_release(struct inode *inode, + struct file *filp) +{ + return 0; +} + +static int rmem_wsps_mmap(struct file *file, + struct vm_area_struct *vma) +{ + return reserved_mem_dio_mmap(file, vma, rmem); +} + +static const struct file_operations rmem_wsps_remap_ops = { + .owner = THIS_MODULE, + .open = rmem_wsps_open, + .release = rmem_wsps_release, + .mmap = rmem_wsps_mmap, +}; + +static struct miscdevice rmem_wsps_miscdev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "rmem_wsps", + .fops = &rmem_wsps_remap_ops, +}; + +static int rmem_wsps_probe(struct platform_device *pdev) +{ + struct device *dev = &pdev->dev; + void *vaddr; + int ret; + + ret = misc_register(&rmem_wsps_miscdev); + if (ret) { + dev_err(dev, "%s: misc_register failed, %d\n", __func__, ret); + return -ENODEV; + } + + rmem = get_reserved_mem_from_dev(dev); + if (!rmem) { + dev_err(dev, "%s: failed to find reserved\n", __func__); + return -ENODEV; + } + + vaddr = reserved_mem_memremap_pages(dev, rmem); + if (IS_ERR_OR_NULL(vaddr)) + return PTR_ERR(vaddr); + + return 0; +} + +static const struct of_device_id rmem_dio_match[] = { + { + .compatible = "example,rmem", + }, + { /* sentinel */ }, +}; +MODULE_DEVICE_TABLE(of, rmem_dio_match); + +static struct platform_driver rmem_wsps_driver = { + .probe = rmem_wsps_probe, + .driver = { + .name = "rmem_wsps", + .of_match_table = rmem_dio_match, + }, +}; + +static int __init rmem_wsps_init(void) +{ + return platform_driver_register(&rmem_wsps_driver); +} + +module_init(rmem_wsps_init); +MODULE_LICENSE("GPL");