Message ID | 20231109074545.148149-2-contact@emersion.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,1/2] dma-buf/dma-heap: export dma_heap_add and dma_heap_get_drvdata | expand |
Hi Simon, On Thu, Nov 09, 2023 at 07:45:58AM +0000, Simon Ser wrote: > User-space sometimes needs to allocate scanout-capable memory for > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > is achieved via DRM dumb buffers: the v3d user-space driver opens > the primary vc4 node, allocates a DRM dumb buffer there, exports it > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > However, DRM dumb buffers are only meant for CPU rendering, they are > not intended to be used for GPU rendering. Primary nodes should only > be used for mode-setting purposes, other programs should not attempt > to open it. Moreover, opening the primary node is already broken on > some setups: systemd grants permission to open primary nodes to > physically logged in users, but this breaks when the user is not > physically logged in (e.g. headless setup) and when the distribution > is using a different init (e.g. Alpine Linux uses openrc). > > We need an alternate way for v3d to allocate scanout-capable memory. > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > Preliminary testing has been done with wlroots [1]. > > This is still an RFC. Open questions: > > - Does this approach make sense to y'all in general? Makes sense to me :) > - What would be a good name for the heap? "vc4" is maybe a bit naive and > not precise enough. Something with "cma"? Do we need to plan a naming > scheme to accomodate for multiple vc4 devices? That's a general issue though that happens with pretty much all devices with a separate node for modesetting and rendering, so I don't think addressing it only for vc4 make sense, we should make it generic. So maybe something like "scanout"? One thing we need to consider too is that the Pi5 will have multiple display nodes (4(?) iirc) with different capabilities, vc4 being only one of them. This will impact that solution too. > - Right now root privileges are necessary to open the heap. Should we > allow everybody to open the heap by default (after all, user-space can > already allocate arbitrary amounts of GPU memory)? Should we leave it > up to user-space to solve this issue (via logind/seatd or a Wayland > protocol or something else)? I would have expected a udev rule to handle that? > TODO: > > - Need to add !vc5 support. If by !vc5 you mean RPi0-3, then it's probably not needed there at all since it has a single node for both modesetting and rendering? Maxime
Hi Simon, thanks for looking into this! I few comments below inline... El jue, 09-11-2023 a las 07:45 +0000, Simon Ser escribió: > User-space sometimes needs to allocate scanout-capable memory for > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > is achieved via DRM dumb buffers: the v3d user-space driver opens > the primary vc4 node, allocates a DRM dumb buffer there, exports it > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > However, DRM dumb buffers are only meant for CPU rendering, they are > not intended to be used for GPU rendering. Primary nodes should only > be used for mode-setting purposes, other programs should not attempt > to open it. Moreover, opening the primary node is already broken on > some setups: systemd grants permission to open primary nodes to > physically logged in users, but this breaks when the user is not > physically logged in (e.g. headless setup) and when the distribution > is using a different init (e.g. Alpine Linux uses openrc). > > We need an alternate way for v3d to allocate scanout-capable memory. > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > Preliminary testing has been done with wlroots [1]. > > This is still an RFC. Open questions: I think Maxime is the right person to answer about vc4 but I'll answer a few of these questions from my perspective: > > - Does this approach make sense to y'all in general? Yeah, this sounds sensible to me. > - What would be a good name for the heap? "vc4" is maybe a bit naive > and > not precise enough. Something with "cma"? Do we need to plan a > naming > scheme to accomodate for multiple vc4 devices? > - Right now root privileges are necessary to open the heap. Should we > allow everybody to open the heap by default (after all, user-space > can > already allocate arbitrary amounts of GPU memory)? Should we leave > it > up to user-space to solve this issue (via logind/seatd or a Wayland > protocol or something else)? I think so, yes, after all, the main user of this will be the Mesa driver, so root only would not be great I think. > > TODO: > > - Need to add !vc5 support. I believe that would be models before Raspberry Pi 4, which don't have v3d, so maybe we don't need this there? Iago > - Need to the extend DMA heaps API to allow vc4 to unregister the > heap > on unload. > > [1]: > https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4414 > > Signed-off-by: Simon Ser <contact@emersion.fr> > Cc: Maxime Ripard <mripard@kernel.org> > Cc: Daniel Vetter <daniel@ffwll.ch> > Cc: Daniel Stone <daniel@fooishbar.org> > Cc: Erico Nunes <nunes.erico@gmail.com> > Cc: Iago Toral Quiroga <itoral@igalia.com> > Cc: Maíra Canal <mcanal@igalia.com> > Cc: Thomas Zimmermann <tzimmermann@suse.de> > --- > drivers/gpu/drm/vc4/Makefile | 2 ++ > drivers/gpu/drm/vc4/vc4_dma_heap.c | 51 > ++++++++++++++++++++++++++++++ > drivers/gpu/drm/vc4/vc4_drv.c | 6 ++++ > drivers/gpu/drm/vc4/vc4_drv.h | 5 +++ > 4 files changed, 64 insertions(+) > create mode 100644 drivers/gpu/drm/vc4/vc4_dma_heap.c > > diff --git a/drivers/gpu/drm/vc4/Makefile > b/drivers/gpu/drm/vc4/Makefile > index c41f89a15a55..e4048870cec7 100644 > --- a/drivers/gpu/drm/vc4/Makefile > +++ b/drivers/gpu/drm/vc4/Makefile > @@ -34,4 +34,6 @@ vc4-$(CONFIG_DRM_VC4_KUNIT_TEST) += \ > > vc4-$(CONFIG_DEBUG_FS) += vc4_debugfs.o > > +vc4-$(CONFIG_DMABUF_HEAPS) += vc4_dma_heap.o > + > obj-$(CONFIG_DRM_VC4) += vc4.o > diff --git a/drivers/gpu/drm/vc4/vc4_dma_heap.c > b/drivers/gpu/drm/vc4/vc4_dma_heap.c > new file mode 100644 > index 000000000000..010d0a88f3fa > --- /dev/null > +++ b/drivers/gpu/drm/vc4/vc4_dma_heap.c > @@ -0,0 +1,51 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright © 2023 Simon Ser > + */ > + > +#include <linux/dma-buf.h> > +#include <linux/dma-heap.h> > + > +#include "vc4_drv.h" > + > +static struct dma_buf *vc4_dma_heap_allocate(struct dma_heap *heap, > + unsigned long size, > + unsigned long fd_flags, > + unsigned long > heap_flags) > +{ > + struct vc4_dev *vc4 = dma_heap_get_drvdata(heap); > + //DEFINE_DMA_BUF_EXPORT_INFO(exp_info); > + struct drm_gem_dma_object *dma_obj; > + struct dma_buf *dmabuf; > + > + if (WARN_ON_ONCE(!vc4->is_vc5)) > + return ERR_PTR(-ENODEV); > + > + dma_obj = drm_gem_dma_create(&vc4->base, size); > + if (IS_ERR(dma_obj)) > + return ERR_CAST(dma_obj); > + > + dmabuf = drm_gem_prime_export(&dma_obj->base, fd_flags); > + drm_gem_object_put(&dma_obj->base); > + return dmabuf; > +} > + > +static const struct dma_heap_ops vc4_dma_heap_ops = { > + .allocate = vc4_dma_heap_allocate, > +}; > + > +int vc4_dma_heap_create(struct vc4_dev *vc4) > +{ > + struct dma_heap_export_info exp_info; > + struct dma_heap *heap; > + > + exp_info.name = "vc4"; /* TODO: allow multiple? */ > + exp_info.ops = &vc4_dma_heap_ops; > + exp_info.priv = vc4; /* TODO: unregister when unloading */ > + > + heap = dma_heap_add(&exp_info); > + if (IS_ERR(heap)) > + return PTR_ERR(heap); > + > + return 0; > +} > diff --git a/drivers/gpu/drm/vc4/vc4_drv.c > b/drivers/gpu/drm/vc4/vc4_drv.c > index c133e96b8aca..c7297dd7d9d5 100644 > --- a/drivers/gpu/drm/vc4/vc4_drv.c > +++ b/drivers/gpu/drm/vc4/vc4_drv.c > @@ -391,6 +391,12 @@ static int vc4_drm_bind(struct device *dev) > > drm_fbdev_dma_setup(drm, 16); > > +#if IS_ENABLED(CONFIG_DMABUF_HEAPS) > + ret = vc4_dma_heap_create(vc4); > + if (ret) > + goto err; > +#endif > + > return 0; > > err: > diff --git a/drivers/gpu/drm/vc4/vc4_drv.h > b/drivers/gpu/drm/vc4/vc4_drv.h > index ab61e96e7e14..d5c5dd18815c 100644 > --- a/drivers/gpu/drm/vc4/vc4_drv.h > +++ b/drivers/gpu/drm/vc4/vc4_drv.h > @@ -1078,4 +1078,9 @@ int vc4_perfmon_destroy_ioctl(struct drm_device > *dev, void *data, > int vc4_perfmon_get_values_ioctl(struct drm_device *dev, void *data, > struct drm_file *file_priv); > > +/* vc4_dma_heap.c */ > +#if IS_ENABLED(CONFIG_DMABUF_HEAPS) > +int vc4_dma_heap_create(struct vc4_dev *vc4); > +#endif > + > #endif /* _VC4_DRV_H_ */
Hi Simon, Thanks for working on this feature! On 11/9/23 04:45, Simon Ser wrote: > User-space sometimes needs to allocate scanout-capable memory for > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > is achieved via DRM dumb buffers: the v3d user-space driver opens > the primary vc4 node, allocates a DRM dumb buffer there, exports it > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > However, DRM dumb buffers are only meant for CPU rendering, they are > not intended to be used for GPU rendering. Primary nodes should only > be used for mode-setting purposes, other programs should not attempt > to open it. Moreover, opening the primary node is already broken on > some setups: systemd grants permission to open primary nodes to > physically logged in users, but this breaks when the user is not > physically logged in (e.g. headless setup) and when the distribution > is using a different init (e.g. Alpine Linux uses openrc). > > We need an alternate way for v3d to allocate scanout-capable memory. For me, it is a bit unclear how we import the vc4 DMA-BUF heap into v3d. Should we create an IOCTL for it on v3d? Also, if you need some help testing with the RPi 5, you can ping on IRC and I can try to help by testing. Best Regards, - Maíra > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > Preliminary testing has been done with wlroots [1]. > > This is still an RFC. Open questions: > > - Does this approach make sense to y'all in general? > - What would be a good name for the heap? "vc4" is maybe a bit naive and > not precise enough. Something with "cma"? Do we need to plan a naming > scheme to accomodate for multiple vc4 devices? > - Right now root privileges are necessary to open the heap. Should we > allow everybody to open the heap by default (after all, user-space can > already allocate arbitrary amounts of GPU memory)? Should we leave it > up to user-space to solve this issue (via logind/seatd or a Wayland > protocol or something else)? > > TODO: > > - Need to add !vc5 support. > - Need to the extend DMA heaps API to allow vc4 to unregister the heap > on unload. > > [1]: https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4414 > > Signed-off-by: Simon Ser <contact@emersion.fr> > Cc: Maxime Ripard <mripard@kernel.org> > Cc: Daniel Vetter <daniel@ffwll.ch> > Cc: Daniel Stone <daniel@fooishbar.org> > Cc: Erico Nunes <nunes.erico@gmail.com> > Cc: Iago Toral Quiroga <itoral@igalia.com> > Cc: Maíra Canal <mcanal@igalia.com> > Cc: Thomas Zimmermann <tzimmermann@suse.de> > --- > drivers/gpu/drm/vc4/Makefile | 2 ++ > drivers/gpu/drm/vc4/vc4_dma_heap.c | 51 ++++++++++++++++++++++++++++++ > drivers/gpu/drm/vc4/vc4_drv.c | 6 ++++ > drivers/gpu/drm/vc4/vc4_drv.h | 5 +++ > 4 files changed, 64 insertions(+) > create mode 100644 drivers/gpu/drm/vc4/vc4_dma_heap.c > > diff --git a/drivers/gpu/drm/vc4/Makefile b/drivers/gpu/drm/vc4/Makefile > index c41f89a15a55..e4048870cec7 100644 > --- a/drivers/gpu/drm/vc4/Makefile > +++ b/drivers/gpu/drm/vc4/Makefile > @@ -34,4 +34,6 @@ vc4-$(CONFIG_DRM_VC4_KUNIT_TEST) += \ > > vc4-$(CONFIG_DEBUG_FS) += vc4_debugfs.o > > +vc4-$(CONFIG_DMABUF_HEAPS) += vc4_dma_heap.o > + > obj-$(CONFIG_DRM_VC4) += vc4.o > diff --git a/drivers/gpu/drm/vc4/vc4_dma_heap.c b/drivers/gpu/drm/vc4/vc4_dma_heap.c > new file mode 100644 > index 000000000000..010d0a88f3fa > --- /dev/null > +++ b/drivers/gpu/drm/vc4/vc4_dma_heap.c > @@ -0,0 +1,51 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright © 2023 Simon Ser > + */ > + > +#include <linux/dma-buf.h> > +#include <linux/dma-heap.h> > + > +#include "vc4_drv.h" > + > +static struct dma_buf *vc4_dma_heap_allocate(struct dma_heap *heap, > + unsigned long size, > + unsigned long fd_flags, > + unsigned long heap_flags) > +{ > + struct vc4_dev *vc4 = dma_heap_get_drvdata(heap); > + //DEFINE_DMA_BUF_EXPORT_INFO(exp_info); > + struct drm_gem_dma_object *dma_obj; > + struct dma_buf *dmabuf; > + > + if (WARN_ON_ONCE(!vc4->is_vc5)) > + return ERR_PTR(-ENODEV); > + > + dma_obj = drm_gem_dma_create(&vc4->base, size); > + if (IS_ERR(dma_obj)) > + return ERR_CAST(dma_obj); > + > + dmabuf = drm_gem_prime_export(&dma_obj->base, fd_flags); > + drm_gem_object_put(&dma_obj->base); > + return dmabuf; > +} > + > +static const struct dma_heap_ops vc4_dma_heap_ops = { > + .allocate = vc4_dma_heap_allocate, > +}; > + > +int vc4_dma_heap_create(struct vc4_dev *vc4) > +{ > + struct dma_heap_export_info exp_info; > + struct dma_heap *heap; > + > + exp_info.name = "vc4"; /* TODO: allow multiple? */ > + exp_info.ops = &vc4_dma_heap_ops; > + exp_info.priv = vc4; /* TODO: unregister when unloading */ > + > + heap = dma_heap_add(&exp_info); > + if (IS_ERR(heap)) > + return PTR_ERR(heap); > + > + return 0; > +} > diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c > index c133e96b8aca..c7297dd7d9d5 100644 > --- a/drivers/gpu/drm/vc4/vc4_drv.c > +++ b/drivers/gpu/drm/vc4/vc4_drv.c > @@ -391,6 +391,12 @@ static int vc4_drm_bind(struct device *dev) > > drm_fbdev_dma_setup(drm, 16); > > +#if IS_ENABLED(CONFIG_DMABUF_HEAPS) > + ret = vc4_dma_heap_create(vc4); > + if (ret) > + goto err; > +#endif > + > return 0; > > err: > diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h > index ab61e96e7e14..d5c5dd18815c 100644 > --- a/drivers/gpu/drm/vc4/vc4_drv.h > +++ b/drivers/gpu/drm/vc4/vc4_drv.h > @@ -1078,4 +1078,9 @@ int vc4_perfmon_destroy_ioctl(struct drm_device *dev, void *data, > int vc4_perfmon_get_values_ioctl(struct drm_device *dev, void *data, > struct drm_file *file_priv); > > +/* vc4_dma_heap.c */ > +#if IS_ENABLED(CONFIG_DMABUF_HEAPS) > +int vc4_dma_heap_create(struct vc4_dev *vc4); > +#endif > + > #endif /* _VC4_DRV_H_ */
Hi Maira, On Thu, Nov 09, 2023 at 09:14:29AM -0300, Maira Canal wrote: > On 11/9/23 04:45, Simon Ser wrote: > > User-space sometimes needs to allocate scanout-capable memory for > > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > > is achieved via DRM dumb buffers: the v3d user-space driver opens > > the primary vc4 node, allocates a DRM dumb buffer there, exports it > > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > > > However, DRM dumb buffers are only meant for CPU rendering, they are > > not intended to be used for GPU rendering. Primary nodes should only > > be used for mode-setting purposes, other programs should not attempt > > to open it. Moreover, opening the primary node is already broken on > > some setups: systemd grants permission to open primary nodes to > > physically logged in users, but this breaks when the user is not > > physically logged in (e.g. headless setup) and when the distribution > > is using a different init (e.g. Alpine Linux uses openrc). > > > > We need an alternate way for v3d to allocate scanout-capable memory. > > For me, it is a bit unclear how we import the vc4 DMA-BUF heap into v3d. > Should we create an IOCTL for it on v3d? dma-heaps are separate device files (under /dev/dma_heap) that have their specific uapi to allocate buffers: https://elixir.bootlin.com/linux/latest/source/drivers/dma-buf/dma-heap.c#L126 You then get a dma-buf handle you can import into whatever device you want. Maxime
On Wed, Nov 8, 2023 at 11:46 PM Simon Ser <contact@emersion.fr> wrote: > > +int vc4_dma_heap_create(struct vc4_dev *vc4) > +{ > + struct dma_heap_export_info exp_info; > + struct dma_heap *heap; > + > + exp_info.name = "vc4"; /* TODO: allow multiple? */ > + exp_info.ops = &vc4_dma_heap_ops; > + exp_info.priv = vc4; /* TODO: unregister when unloading */ > + So unregistering a heap isn't currently possible, but we're trying to enable that here: https://lore.kernel.org/all/20231106120423.23364-7-yunfei.dong@mediatek.com/
On Thursday, November 9th, 2023 at 10:11, Maxime Ripard <mripard@kernel.org> wrote: > > - Does this approach make sense to y'all in general? > > Makes sense to me :) Great to hear! > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > not precise enough. Something with "cma"? Do we need to plan a naming > > scheme to accomodate for multiple vc4 devices? > > That's a general issue though that happens with pretty much all devices > with a separate node for modesetting and rendering, so I don't think > addressing it only for vc4 make sense, we should make it generic. > > So maybe something like "scanout"? > > One thing we need to consider too is that the Pi5 will have multiple > display nodes (4(?) iirc) with different capabilities, vc4 being only > one of them. This will impact that solution too. I'm not sure trying to find a unique generic name for all split render/display SoC is a good idea: - For the purposes of replacing DRM dumb buffers usage from v3d, we don't actually need a generic name, it's perfectly fine to hardcode a name here since vc4 is pretty much hardcoded there already. - As you said, "scanout" may be ill-defined depending on the system. What if a PCIe or USB device is plugged in? What if vkms is loaded? What if there are multiple platform display devices? What does "scanout" mean then? - A generic solution to advertise what DMA heaps a DRM display device is compatible with is probably better addressed by populating sysfs with new files. We've talked with Sima at XDC about adding a symlink pointing to the DMA heap and extra metadata files describing priorities and such. However we don't actually need that part for the purposes of v3d -- I think I'd rather defer that until more DMA heaps are plumbed across the DRM drivers. So I'd be in favor of using something descriptive, platform-specific for the heap name, something that user-space can't generically try to interpret or parse. > > - Right now root privileges are necessary to open the heap. Should we > > allow everybody to open the heap by default (after all, user-space can > > already allocate arbitrary amounts of GPU memory)? Should we leave it > > up to user-space to solve this issue (via logind/seatd or a Wayland > > protocol or something else)? > > I would have expected a udev rule to handle that? Right. That sounds fine to me. (It's not the end of the world if v3d can't allocate scanout-capable memory if the heap is not accessible -- just means we will go through an extra copy in the compositor. And we can always keep the DRM legacy primary node stuff as a fallback if there really are concerns.) > > TODO: > > > > - Need to add !vc5 support. > > If by !vc5 you mean RPi0-3, then it's probably not needed there at all > since it has a single node for both modesetting and rendering? Ah that's a good point. So we can just not expose any heap for !vc5.
Thanks for the feedback Iago! I've replied to Maxime and I believe that covers your questions as well.
On Thursday, November 9th, 2023 at 13:14, Maira Canal <mcanal@igalia.com> wrote: > On 11/9/23 04:45, Simon Ser wrote: > > User-space sometimes needs to allocate scanout-capable memory for > > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > > is achieved via DRM dumb buffers: the v3d user-space driver opens > > the primary vc4 node, allocates a DRM dumb buffer there, exports it > > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > > > However, DRM dumb buffers are only meant for CPU rendering, they are > > not intended to be used for GPU rendering. Primary nodes should only > > be used for mode-setting purposes, other programs should not attempt > > to open it. Moreover, opening the primary node is already broken on > > some setups: systemd grants permission to open primary nodes to > > physically logged in users, but this breaks when the user is not > > physically logged in (e.g. headless setup) and when the distribution > > is using a different init (e.g. Alpine Linux uses openrc). > > > > We need an alternate way for v3d to allocate scanout-capable memory. > > For me, it is a bit unclear how we import the vc4 DMA-BUF heap into v3d. > Should we create an IOCTL for it on v3d? Hm, but v3d is the render driver, so it shouldn't have any knowledge of what vc4 -- the display driver -- actually needs to be able to scanout a piece of memory. It's true that other x86 drivers like amdgpu and i915 just have their own IOCTL to allocate scanout-capable memory, however the split render/display SoC situation makes the situation a bit more hairy for vc4/v3d. You can think of the vc4 DMA-BUF heaps as a vc4 alloc IOCTL, except it's using a more standard framework instead of a custom IOCTL. Does this make sense? > Also, if you need some help testing with the RPi 5, you can ping on IRC > and I can try to help by testing. Thank you, appreciated!
On Thursday, November 9th, 2023 at 16:14, T.J. Mercier <tjmercier@google.com> wrote: > > + exp_info.priv = vc4; / TODO: unregister when unloading */ > > + > > So unregistering a heap isn't currently possible, but we're trying to > enable that here: > https://lore.kernel.org/all/20231106120423.23364-7-yunfei.dong@mediatek.com/ Great! Note, I wouldn't actually need any ref'counting, a simpler function to always unregister would be enough for my purposes I think. But if you have an actual use-case for the ref'counting, that's completely fine for me as well.
Hi Simon and Maxime On Thu, 9 Nov 2023 at 09:12, Maxime Ripard <mripard@kernel.org> wrote: > > Hi Simon, > > On Thu, Nov 09, 2023 at 07:45:58AM +0000, Simon Ser wrote: > > User-space sometimes needs to allocate scanout-capable memory for > > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > > is achieved via DRM dumb buffers: the v3d user-space driver opens > > the primary vc4 node, allocates a DRM dumb buffer there, exports it > > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > > > However, DRM dumb buffers are only meant for CPU rendering, they are > > not intended to be used for GPU rendering. Primary nodes should only > > be used for mode-setting purposes, other programs should not attempt > > to open it. Moreover, opening the primary node is already broken on > > some setups: systemd grants permission to open primary nodes to > > physically logged in users, but this breaks when the user is not > > physically logged in (e.g. headless setup) and when the distribution > > is using a different init (e.g. Alpine Linux uses openrc). > > > > We need an alternate way for v3d to allocate scanout-capable memory. > > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > > Preliminary testing has been done with wlroots [1]. > > > > This is still an RFC. Open questions: > > > > - Does this approach make sense to y'all in general? > > Makes sense to me :) > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > not precise enough. Something with "cma"? Do we need to plan a naming > > scheme to accomodate for multiple vc4 devices? > > That's a general issue though that happens with pretty much all devices > with a separate node for modesetting and rendering, so I don't think > addressing it only for vc4 make sense, we should make it generic. > > So maybe something like "scanout"? > > One thing we need to consider too is that the Pi5 will have multiple > display nodes (4(?) iirc) with different capabilities, vc4 being only > one of them. This will impact that solution too. It does need to scale. Pi5 adds 4 additional DRM devices (2xDSI, 1xDPI, and 1xComposite Video), and just this last week I've been running Wayfire with TinyDRM drivers for SPI displays and UDL (DisplayLink) outputs as well. Presumably all of those drivers need to have appropriate hooks added so they each expose a dma-heap to enable scanout buffers to be allocated. Can we add another function pointer to the struct drm_driver (or similar) to do the allocation, and move the actual dmabuf handling code into the core? > > - Right now root privileges are necessary to open the heap. Should we > > allow everybody to open the heap by default (after all, user-space can > > already allocate arbitrary amounts of GPU memory)? Should we leave it > > up to user-space to solve this issue (via logind/seatd or a Wayland > > protocol or something else)? > > I would have expected a udev rule to handle that? > > > TODO: > > > > - Need to add !vc5 support. > > If by !vc5 you mean RPi0-3, then it's probably not needed there at all > since it has a single node for both modesetting and rendering? That is true, but potentially vc4 could be rendering for scanout via UDL or SPI. Is it easier to always have the dma-heap allocator for every DRM card rather than making userspace mix and match depending on whether it is all in one vs split? Dave
On Thursday, November 9th, 2023 at 16:42, Dave Stevenson <dave.stevenson@raspberrypi.com> wrote: > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > scheme to accomodate for multiple vc4 devices? > > > > That's a general issue though that happens with pretty much all devices > > with a separate node for modesetting and rendering, so I don't think > > addressing it only for vc4 make sense, we should make it generic. > > > > So maybe something like "scanout"? > > > > One thing we need to consider too is that the Pi5 will have multiple > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > one of them. This will impact that solution too. > > It does need to scale. > > Pi5 adds 4 additional DRM devices (2xDSI, 1xDPI, and 1xComposite > Video), and just this last week I've been running Wayfire with TinyDRM > drivers for SPI displays and UDL (DisplayLink) outputs as well. > Presumably all of those drivers need to have appropriate hooks added > so they each expose a dma-heap to enable scanout buffers to be > allocated. I'm not sure this makes sense necessarily for all of these devices. For vc4 and the 4 additional RPi 5 DRM devices, probably. For the rest, e.g. UDL, I'm not sure it makes sense to expose scanout memory allocation via DMA heaps: AFAIK UDL needs CPU access to the buffers, it can't "scanout", and thus directly rendering via v3d to a "scanout-capable" buffer doesn't make sense. There will be a copy (CPU download) either way, and allocating via v3d wouldn't make a difference. > Can we add another function pointer to the struct drm_driver (or > similar) to do the allocation, and move the actual dmabuf handling > code into the core? Do you mean that the new logic introduced in this patch should be in DRM core to allow other drivers to more easily re-use it? Or do you mean something else? Indeed, there is nothing vc4-specific in this patch, the only requirement is that the driver uses drm_gem_dma_helper. So this code could be moved into (or alongside) that helper in DRM core. However, maybe it would be best to wait to have a second user for this infrastructure before we move into core. > > > - Need to add !vc5 support. > > > > If by !vc5 you mean RPi0-3, then it's probably not needed there at all > > since it has a single node for both modesetting and rendering? > > That is true, but potentially vc4 could be rendering for scanout via UDL or SPI. > Is it easier to always have the dma-heap allocator for every DRM card > rather than making userspace mix and match depending on whether it is > all in one vs split? I don't think it's realistic to try to always make DMA heaps available for each and every driver which might need it from day one. It's too big of a task. And it's an even bigger task to try to design a fully generic heap compatibility uAPI from day one. I'd much rather add the heaps one by one, if and when we figure that it makes sense, and incrementally work our way through.
Hi Simon On Thu, 9 Nov 2023 at 17:46, Simon Ser <contact@emersion.fr> wrote: > > On Thursday, November 9th, 2023 at 16:42, Dave Stevenson <dave.stevenson@raspberrypi.com> wrote: > > > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > > scheme to accomodate for multiple vc4 devices? > > > > > > That's a general issue though that happens with pretty much all devices > > > with a separate node for modesetting and rendering, so I don't think > > > addressing it only for vc4 make sense, we should make it generic. > > > > > > So maybe something like "scanout"? > > > > > > One thing we need to consider too is that the Pi5 will have multiple > > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > > one of them. This will impact that solution too. > > > > It does need to scale. > > > > Pi5 adds 4 additional DRM devices (2xDSI, 1xDPI, and 1xComposite > > Video), and just this last week I've been running Wayfire with TinyDRM > > drivers for SPI displays and UDL (DisplayLink) outputs as well. > > Presumably all of those drivers need to have appropriate hooks added > > so they each expose a dma-heap to enable scanout buffers to be > > allocated. > > I'm not sure this makes sense necessarily for all of these devices. For vc4 and > the 4 additional RPi 5 DRM devices, probably. For the rest, e.g. UDL, I'm not > sure it makes sense to expose scanout memory allocation via DMA heaps: AFAIK > UDL needs CPU access to the buffers, it can't "scanout", and thus directly > rendering via v3d to a "scanout-capable" buffer doesn't make sense. There will > be a copy (CPU download) either way, and allocating via v3d wouldn't make a > difference. You as a developer may know that UDL is going to need CPU access, but how does a generic userspace app know? Is it a case of falling back to allocating via the renderer if there is no suitably named heap? > > Can we add another function pointer to the struct drm_driver (or > > similar) to do the allocation, and move the actual dmabuf handling > > code into the core? > > Do you mean that the new logic introduced in this patch should be in DRM core > to allow other drivers to more easily re-use it? Or do you mean something else? Yes, make it easy to reuse between drivers. > Indeed, there is nothing vc4-specific in this patch, the only requirement is > that the driver uses drm_gem_dma_helper. So this code could be moved into (or > alongside) that helper in DRM core. However, maybe it would be best to wait to > have a second user for this infrastructure before we move into core. Upstreaming of the DSI / DPI / composite drivers for Pi5 should only be a few months away, and they can all directly scanout. I expect the Rockchip platforms to also fall into the same category as the Pi, with Mali as the 3D IP, and the VOP block as the scanout engine. Have I missed some detail that means that they aren't a second user for this? > > > > - Need to add !vc5 support. > > > > > > If by !vc5 you mean RPi0-3, then it's probably not needed there at all > > > since it has a single node for both modesetting and rendering? > > > > That is true, but potentially vc4 could be rendering for scanout via UDL or SPI. > > Is it easier to always have the dma-heap allocator for every DRM card > > rather than making userspace mix and match depending on whether it is > > all in one vs split? > > I don't think it's realistic to try to always make DMA heaps available for each > and every driver which might need it from day one. It's too big of a task. And > it's an even bigger task to try to design a fully generic heap compatibility > uAPI from day one. I'd much rather add the heaps one by one, if and when we > figure that it makes sense, and incrementally work our way through. Is it really that massive a task? We have the dma heap UAPI for handling the allocations, so what new UAPI is required? If it's a new function pointer in struct drm_driver, then the heap is only created by the core if that function is defined using the driver name. The function returns a struct dma_buf *. Any driver using drm_gem_dma_helper can use the new helper function that is basically your vc4_dma_heap_allocate. The "if (WARN_ON_ONCE(!vc4->is_vc5))" could be handled by not setting the function pointer on those platforms. Sorry, I feel I must be missing some critical piece of information here. Dave
On Thu, Nov 09, 2023 at 03:31:44PM +0000, Simon Ser wrote: > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > scheme to accomodate for multiple vc4 devices? > > > > That's a general issue though that happens with pretty much all devices > > with a separate node for modesetting and rendering, so I don't think > > addressing it only for vc4 make sense, we should make it generic. > > > > So maybe something like "scanout"? > > > > One thing we need to consider too is that the Pi5 will have multiple > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > one of them. This will impact that solution too. > > I'm not sure trying to find a unique generic name for all split render/display > SoC is a good idea: > > - For the purposes of replacing DRM dumb buffers usage from v3d, we don't > actually need a generic name, it's perfectly fine to hardcode a name here > since vc4 is pretty much hardcoded there already. Right. I'm wondering how that will work with drivers like panfrost or powervr that will need to interact with different display drivers. We will have the same issue for those, but we won't be able to hardcode the driver name. Similarly, if you have multiple display drivers, what "scanout-capable" will mean might differ from one device to the other. Some have constraints on the memory they can access for example, so you cannot just assume that a scanout-capable buffer allocated by vc4 might not be scanout-capable for one of the RP1 DSI device. > - As you said, "scanout" may be ill-defined depending on the system. What if > a PCIe or USB device is plugged in? What if vkms is loaded? What if there are > multiple platform display devices? What does "scanout" mean then? > - A generic solution to advertise what DMA heaps a DRM display device is > compatible with is probably better addressed by populating sysfs with new > files. That would be a good idea indeed > We've talked with Sima at XDC about adding a symlink pointing to the > DMA heap and extra metadata files describing priorities and such. > However we don't actually need that part for the purposes of v3d -- > I think I'd rather defer that until more DMA heaps are plumbed > across the DRM drivers. Honestly, I don't think we can afford to only consider vc4/v3d here. The issue you described seem to affect any SoC with a split scanout/GPU, which is pretty much all of them? And if they are all affected, we should design something that fixes it once and for all. Maxime
On Thu, Nov 09, 2023 at 03:42:38PM +0000, Dave Stevenson wrote: > Hi Simon and Maxime > > On Thu, 9 Nov 2023 at 09:12, Maxime Ripard <mripard@kernel.org> wrote: > > > > Hi Simon, > > > > On Thu, Nov 09, 2023 at 07:45:58AM +0000, Simon Ser wrote: > > > User-space sometimes needs to allocate scanout-capable memory for > > > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > > > is achieved via DRM dumb buffers: the v3d user-space driver opens > > > the primary vc4 node, allocates a DRM dumb buffer there, exports it > > > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > > > > > However, DRM dumb buffers are only meant for CPU rendering, they are > > > not intended to be used for GPU rendering. Primary nodes should only > > > be used for mode-setting purposes, other programs should not attempt > > > to open it. Moreover, opening the primary node is already broken on > > > some setups: systemd grants permission to open primary nodes to > > > physically logged in users, but this breaks when the user is not > > > physically logged in (e.g. headless setup) and when the distribution > > > is using a different init (e.g. Alpine Linux uses openrc). > > > > > > We need an alternate way for v3d to allocate scanout-capable memory. > > > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > > > Preliminary testing has been done with wlroots [1]. > > > > > > This is still an RFC. Open questions: > > > > > > - Does this approach make sense to y'all in general? > > > > Makes sense to me :) > > > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > scheme to accomodate for multiple vc4 devices? > > > > That's a general issue though that happens with pretty much all devices > > with a separate node for modesetting and rendering, so I don't think > > addressing it only for vc4 make sense, we should make it generic. > > > > So maybe something like "scanout"? > > > > One thing we need to consider too is that the Pi5 will have multiple > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > one of them. This will impact that solution too. > > It does need to scale. > > Pi5 adds 4 additional DRM devices (2xDSI, 1xDPI, and 1xComposite > Video), and just this last week I've been running Wayfire with TinyDRM > drivers for SPI displays and UDL (DisplayLink) outputs as well. > Presumably all of those drivers need to have appropriate hooks added > so they each expose a dma-heap to enable scanout buffers to be > allocated. > > Can we add another function pointer to the struct drm_driver (or > similar) to do the allocation, and move the actual dmabuf handling > code into the core? Yeah, I agree here, it just seems easier to provide a global hook and a somewhat sane default to cover all drivers in one go (potentially with additional restrictions, like only for modeset-only drivers or something). Maxime
On Thu, Nov 09, 2023 at 06:38:20PM +0000, Dave Stevenson wrote: > Hi Simon > > On Thu, 9 Nov 2023 at 17:46, Simon Ser <contact@emersion.fr> wrote: > > > > On Thursday, November 9th, 2023 at 16:42, Dave Stevenson <dave.stevenson@raspberrypi.com> wrote: > > > > > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > > > scheme to accomodate for multiple vc4 devices? > > > > > > > > That's a general issue though that happens with pretty much all devices > > > > with a separate node for modesetting and rendering, so I don't think > > > > addressing it only for vc4 make sense, we should make it generic. > > > > > > > > So maybe something like "scanout"? > > > > > > > > One thing we need to consider too is that the Pi5 will have multiple > > > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > > > one of them. This will impact that solution too. > > > > > > It does need to scale. > > > > > > Pi5 adds 4 additional DRM devices (2xDSI, 1xDPI, and 1xComposite > > > Video), and just this last week I've been running Wayfire with TinyDRM > > > drivers for SPI displays and UDL (DisplayLink) outputs as well. > > > Presumably all of those drivers need to have appropriate hooks added > > > so they each expose a dma-heap to enable scanout buffers to be > > > allocated. > > > > I'm not sure this makes sense necessarily for all of these devices. For vc4 and > > the 4 additional RPi 5 DRM devices, probably. For the rest, e.g. UDL, I'm not > > sure it makes sense to expose scanout memory allocation via DMA heaps: AFAIK > > UDL needs CPU access to the buffers, it can't "scanout", and thus directly > > rendering via v3d to a "scanout-capable" buffer doesn't make sense. There will > > be a copy (CPU download) either way, and allocating via v3d wouldn't make a > > difference. > > You as a developer may know that UDL is going to need CPU access, but > how does a generic userspace app know? Is it a case of falling back to > allocating via the renderer if there is no suitably named heap? If we're going with the idea of using sysfs to link a device to a heap (which I think we should), then those devices all should be linked to the system memory heap, no? > > > Can we add another function pointer to the struct drm_driver (or > > > similar) to do the allocation, and move the actual dmabuf handling > > > code into the core? > > > > Do you mean that the new logic introduced in this patch should be in DRM core > > to allow other drivers to more easily re-use it? Or do you mean something else? > > Yes, make it easy to reuse between drivers. > > > Indeed, there is nothing vc4-specific in this patch, the only requirement is > > that the driver uses drm_gem_dma_helper. So this code could be moved into (or > > alongside) that helper in DRM core. However, maybe it would be best to wait to > > have a second user for this infrastructure before we move into core. > > Upstreaming of the DSI / DPI / composite drivers for Pi5 should only > be a few months away, and they can all directly scanout. > > I expect the Rockchip platforms to also fall into the same category as > the Pi, with Mali as the 3D IP, and the VOP block as the scanout > engine. Have I missed some detail that means that they aren't a second > user for this? > > > > > > - Need to add !vc5 support. > > > > > > > > If by !vc5 you mean RPi0-3, then it's probably not needed there at all > > > > since it has a single node for both modesetting and rendering? > > > > > > That is true, but potentially vc4 could be rendering for scanout via UDL or SPI. > > > Is it easier to always have the dma-heap allocator for every DRM card > > > rather than making userspace mix and match depending on whether it is > > > all in one vs split? > > > > I don't think it's realistic to try to always make DMA heaps available for each > > and every driver which might need it from day one. It's too big of a task. And > > it's an even bigger task to try to design a fully generic heap compatibility > > uAPI from day one. I'd much rather add the heaps one by one, if and when we > > figure that it makes sense, and incrementally work our way through. > > Is it really that massive a task? We have the dma heap UAPI for > handling the allocations, so what new UAPI is required? > > If it's a new function pointer in struct drm_driver, then the heap is > only created by the core if that function is defined using the driver > name. The function returns a struct dma_buf *. > Any driver using drm_gem_dma_helper can use the new helper function > that is basically your vc4_dma_heap_allocate. The "if > (WARN_ON_ONCE(!vc4->is_vc5))" could be handled by not setting the > function pointer on those platforms. > > Sorry, I feel I must be missing some critical piece of information here. > > Dave >
On Thursday, November 9th, 2023 at 19:38, Dave Stevenson <dave.stevenson@raspberrypi.com> wrote: > Hi Simon > > On Thu, 9 Nov 2023 at 17:46, Simon Ser <contact@emersion.fr> wrote: > > > > On Thursday, November 9th, 2023 at 16:42, Dave Stevenson <dave.stevenson@raspberrypi.com> wrote: > > > > > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > > > scheme to accomodate for multiple vc4 devices? > > > > > > > > That's a general issue though that happens with pretty much all devices > > > > with a separate node for modesetting and rendering, so I don't think > > > > addressing it only for vc4 make sense, we should make it generic. > > > > > > > > So maybe something like "scanout"? > > > > > > > > One thing we need to consider too is that the Pi5 will have multiple > > > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > > > one of them. This will impact that solution too. > > > > > > It does need to scale. > > > > > > Pi5 adds 4 additional DRM devices (2xDSI, 1xDPI, and 1xComposite > > > Video), and just this last week I've been running Wayfire with TinyDRM > > > drivers for SPI displays and UDL (DisplayLink) outputs as well. > > > Presumably all of those drivers need to have appropriate hooks added > > > so they each expose a dma-heap to enable scanout buffers to be > > > allocated. > > > > I'm not sure this makes sense necessarily for all of these devices. For vc4 and > > the 4 additional RPi 5 DRM devices, probably. For the rest, e.g. UDL, I'm not > > sure it makes sense to expose scanout memory allocation via DMA heaps: AFAIK > > UDL needs CPU access to the buffers, it can't "scanout", and thus directly > > rendering via v3d to a "scanout-capable" buffer doesn't make sense. There will > > be a copy (CPU download) either way, and allocating via v3d wouldn't make a > > difference. > > You as a developer may know that UDL is going to need CPU access, but > how does a generic userspace app know? Is it a case of falling back to > allocating via the renderer if there is no suitably named heap? Yeah, just like how things are working today. Then try to do direct scanout with the buffer allocated on the render node, and if it doesn't work blit into a DRM dumb buffer. > > Indeed, there is nothing vc4-specific in this patch, the only requirement is > > that the driver uses drm_gem_dma_helper. So this code could be moved into (or > > alongside) that helper in DRM core. However, maybe it would be best to wait to > > have a second user for this infrastructure before we move into core. > > Upstreaming of the DSI / DPI / composite drivers for Pi5 should only > be a few months away, and they can all directly scanout. > > I expect the Rockchip platforms to also fall into the same category as > the Pi, with Mali as the 3D IP, and the VOP block as the scanout > engine. Have I missed some detail that means that they aren't a second > user for this? > > > > > > - Need to add !vc5 support. > > > > > > > > If by !vc5 you mean RPi0-3, then it's probably not needed there at all > > > > since it has a single node for both modesetting and rendering? > > > > > > That is true, but potentially vc4 could be rendering for scanout via UDL or SPI. > > > Is it easier to always have the dma-heap allocator for every DRM card > > > rather than making userspace mix and match depending on whether it is > > > all in one vs split? > > > > I don't think it's realistic to try to always make DMA heaps available for each > > and every driver which might need it from day one. It's too big of a task. And > > it's an even bigger task to try to design a fully generic heap compatibility > > uAPI from day one. I'd much rather add the heaps one by one, if and when we > > figure that it makes sense, and incrementally work our way through. > > Is it really that massive a task? We have the dma heap UAPI for > handling the allocations, so what new UAPI is required? I'm only focused on fixing v3d for now. Some split render/display drivers do similar things, some drivers not. I don't have hardware to test, I'm not willing to commit a lot of time writing a patchset I'm not even sure will make it. I don't really understand the hurry of doing all of the things at once here. If you do happen to have the time, feel free to write the patches or even take over this series. I'm worried about blindly adding heaps all over the place, because as said above, I don't believe it makes sense for all drivers. I think it's probably a safe bet to add heaps in cases where we use kmsro with dumb buffers on the Mesa side. But anything else and it really needs to be discussed on a case-by-case basis. > If it's a new function pointer in struct drm_driver, then the heap is > only created by the core if that function is defined using the driver > name. The function returns a struct dma_buf *. > Any driver using drm_gem_dma_helper can use the new helper function > that is basically your vc4_dma_heap_allocate. The "if > (WARN_ON_ONCE(!vc4->is_vc5))" could be handled by not setting the > function pointer on those platforms. I would advise against a function pointer, because that kind of API design makes DRM core more of a midlayer than a helper library, and this is something we prefer to avoid [1]. I would instead suggest introducing a DRM core function which creates the heap and can be called from the driver. 1-line vs. 3-line in driver code doesn't really make such a big difference. [1]: https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html
On Thursday, November 9th, 2023 at 20:09, Maxime Ripard <mripard@kernel.org> wrote: > On Thu, Nov 09, 2023 at 03:31:44PM +0000, Simon Ser wrote: > > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > > scheme to accomodate for multiple vc4 devices? > > > > > > That's a general issue though that happens with pretty much all devices > > > with a separate node for modesetting and rendering, so I don't think > > > addressing it only for vc4 make sense, we should make it generic. > > > > > > So maybe something like "scanout"? > > > > > > One thing we need to consider too is that the Pi5 will have multiple > > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > > one of them. This will impact that solution too. > > > > I'm not sure trying to find a unique generic name for all split render/display > > SoC is a good idea: > > > > - For the purposes of replacing DRM dumb buffers usage from v3d, we don't > > actually need a generic name, it's perfectly fine to hardcode a name here > > since vc4 is pretty much hardcoded there already. > > Right. I'm wondering how that will work with drivers like panfrost or > powervr that will need to interact with different display drivers. We > will have the same issue for those, but we won't be able to hardcode the > driver name. We will be able to hardcode the driver name. In fact, the driver name is already hardcoded in kmsro today (Mesa creates one kmsro .so per supported display driver in /usr/lib/dri). It's just a matter of passing through the actual display device to panfrost in Mesa and adding a very simple mapping of driver name -> heap name. > Similarly, if you have multiple display drivers, what "scanout-capable" > will mean might differ from one device to the other. Some have > constraints on the memory they can access for example, so you cannot > just assume that a scanout-capable buffer allocated by vc4 might not be > scanout-capable for one of the RP1 DSI device. > > > - As you said, "scanout" may be ill-defined depending on the system. What if > > a PCIe or USB device is plugged in? What if vkms is loaded? What if there are > > multiple platform display devices? What does "scanout" mean then? > > - A generic solution to advertise what DMA heaps a DRM display device is > > compatible with is probably better addressed by populating sysfs with new > > files. > > That would be a good idea indeed > > > We've talked with Sima at XDC about adding a symlink pointing to the > > DMA heap and extra metadata files describing priorities and such. > > However we don't actually need that part for the purposes of v3d -- > > I think I'd rather defer that until more DMA heaps are plumbed > > across the DRM drivers. > > Honestly, I don't think we can afford to only consider vc4/v3d here. The > issue you described seem to affect any SoC with a split scanout/GPU, > which is pretty much all of them? And if they are all affected, we > should design something that fixes it once and for all. We don't need any sysfs stuff to fix the primary node and DRM dumb buffer issues in Mesa's kmsro/renderonly. The sysfs stuff is only required for a fully generic buffer placement constraint/compatibility uAPI. Which would be super useful in compositors, but let's do one step at a time.
On Thursday, November 9th, 2023 at 20:17, Maxime Ripard <mripard@kernel.org> wrote: > > Can we add another function pointer to the struct drm_driver (or > > similar) to do the allocation, and move the actual dmabuf handling > > code into the core? > > Yeah, I agree here, it just seems easier to provide a global hook and a > somewhat sane default to cover all drivers in one go (potentially with > additional restrictions, like only for modeset-only drivers or > something). First off not all drivers are using the GEM DMA helpers (e.g. vc4 with !vc5 does not). The heap code in this patch only works with drivers leveraging GEM DMA helpers. Then maybe it's somewhat simple to cover typical display devices found on split render/display SoCs, however for the rest it's going to be much more complicated. For x86 typically there are multiple buffer placements supported by the GPU and we need to have one heap per possible placement. And then figuring out all of the rules, priority and compatibility stuff is a whole other task in and of itself. In short: I think it's unreasonable to expose a heap without good knowledge of the specific hardware. In other words, unreasonable to expose a heap for all drivers at once.
On Fri, Nov 10, 2023 at 11:21:15AM +0000, Simon Ser wrote: > On Thursday, November 9th, 2023 at 20:17, Maxime Ripard <mripard@kernel.org> wrote: > > > > Can we add another function pointer to the struct drm_driver (or > > > similar) to do the allocation, and move the actual dmabuf handling > > > code into the core? > > > > Yeah, I agree here, it just seems easier to provide a global hook and a > > somewhat sane default to cover all drivers in one go (potentially with > > additional restrictions, like only for modeset-only drivers or > > something). > > First off not all drivers are using the GEM DMA helpers (e.g. vc4 with > !vc5 does not). Right. vc4 pre-RPi4 is the exception though, so it's kind of what I meant by providing sane defaults: the vast majority of drivers are using GEM DMA helpers, and we should totally let drivers override that if relevant. Basically, we already have 2.5 citizen classes, I'd really like to avoid having 3 officially, even more so if it's not super difficult to do. > The heap code in this patch only works with drivers leveraging GEM DMA > helpers. We could add a new hook to drm_driver to allocate heaps, link to a default implementation in DRM_GEM_DMA_DRIVER_OPS_WITH_DUMB_CREATE(), and then use that new hook in your new heap. It would already cover 40 drivers at the moment, instead of just one, with all of them (but atmel-hlcdc maybe?) being in the same situation than RPi4-vc4 is. > Then maybe it's somewhat simple to cover typical display devices found > on split render/display SoCs, however for the rest it's going to be much > more complicated. For x86 typically there are multiple buffer placements > supported by the GPU and we need to have one heap per possible placement. > And then figuring out all of the rules, priority and compatibility stuff > is a whole other task in and of itself. But x86 typically doesn't have a split render/display setup, right? Maxime
On Fri, Nov 10, 2023 at 11:08:06AM +0000, Simon Ser wrote: > > On Thu, Nov 09, 2023 at 03:31:44PM +0000, Simon Ser wrote: > > > > > - What would be a good name for the heap? "vc4" is maybe a bit naive and > > > > > not precise enough. Something with "cma"? Do we need to plan a naming > > > > > scheme to accomodate for multiple vc4 devices? > > > > > > > > That's a general issue though that happens with pretty much all devices > > > > with a separate node for modesetting and rendering, so I don't think > > > > addressing it only for vc4 make sense, we should make it generic. > > > > > > > > So maybe something like "scanout"? > > > > > > > > One thing we need to consider too is that the Pi5 will have multiple > > > > display nodes (4(?) iirc) with different capabilities, vc4 being only > > > > one of them. This will impact that solution too. > > > > > > I'm not sure trying to find a unique generic name for all split render/display > > > SoC is a good idea: > > > > > > - For the purposes of replacing DRM dumb buffers usage from v3d, we don't > > > actually need a generic name, it's perfectly fine to hardcode a name here > > > since vc4 is pretty much hardcoded there already. > > > > Right. I'm wondering how that will work with drivers like panfrost or > > powervr that will need to interact with different display drivers. We > > will have the same issue for those, but we won't be able to hardcode the > > driver name. > > We will be able to hardcode the driver name. In fact, the driver name is > already hardcoded in kmsro today (Mesa creates one kmsro .so per supported > display driver in /usr/lib/dri). > > It's just a matter of passing through the actual display device to panfrost in > Mesa and adding a very simple mapping of driver name -> heap name. > > > Similarly, if you have multiple display drivers, what "scanout-capable" > > will mean might differ from one device to the other. Some have > > constraints on the memory they can access for example, so you cannot > > just assume that a scanout-capable buffer allocated by vc4 might not be > > scanout-capable for one of the RP1 DSI device. > > > > > - As you said, "scanout" may be ill-defined depending on the system. What if > > > a PCIe or USB device is plugged in? What if vkms is loaded? What if there are > > > multiple platform display devices? What does "scanout" mean then? > > > - A generic solution to advertise what DMA heaps a DRM display device is > > > compatible with is probably better addressed by populating sysfs with new > > > files. > > > > That would be a good idea indeed > > > > > We've talked with Sima at XDC about adding a symlink pointing to the > > > DMA heap and extra metadata files describing priorities and such. > > > However we don't actually need that part for the purposes of v3d -- > > > I think I'd rather defer that until more DMA heaps are plumbed > > > across the DRM drivers. > > > > Honestly, I don't think we can afford to only consider vc4/v3d here. The > > issue you described seem to affect any SoC with a split scanout/GPU, > > which is pretty much all of them? And if they are all affected, we > > should design something that fixes it once and for all. > > We don't need any sysfs stuff to fix the primary node and DRM dumb buffer > issues in Mesa's kmsro/renderonly. The sysfs stuff is only required for a fully > generic buffer placement constraint/compatibility uAPI. Which would be super > useful in compositors, but let's do one step at a time. I don't think a solution that further fragments the ecosystem is worth taking, sorry. What you're doing is valuable, we should totally fix the issue you have, but not at the expense of making vc4 special on one of the platforms it supports. Maxime
On Friday, November 10th, 2023 at 15:13, Maxime Ripard <mripard@kernel.org> wrote: > > > > We've talked with Sima at XDC about adding a symlink pointing to the > > > > DMA heap and extra metadata files describing priorities and such. > > > > However we don't actually need that part for the purposes of v3d -- > > > > I think I'd rather defer that until more DMA heaps are plumbed > > > > across the DRM drivers. > > > > > > Honestly, I don't think we can afford to only consider vc4/v3d here. The > > > issue you described seem to affect any SoC with a split scanout/GPU, > > > which is pretty much all of them? And if they are all affected, we > > > should design something that fixes it once and for all. > > > > We don't need any sysfs stuff to fix the primary node and DRM dumb buffer > > issues in Mesa's kmsro/renderonly. The sysfs stuff is only required for a fully > > generic buffer placement constraint/compatibility uAPI. Which would be super > > useful in compositors, but let's do one step at a time. > > I don't think a solution that further fragments the ecosystem is worth > taking, sorry. What you're doing is valuable, we should totally fix the > issue you have, but not at the expense of making vc4 special on one of > the platforms it supports. This does not fragment the ecosystem. It moves the ecosystem bit by bit towards the final solution.
On Friday, November 10th, 2023 at 15:01, Maxime Ripard <mripard@kernel.org> wrote: > On Fri, Nov 10, 2023 at 11:21:15AM +0000, Simon Ser wrote: > > > On Thursday, November 9th, 2023 at 20:17, Maxime Ripard mripard@kernel.org wrote: > > > > > > Can we add another function pointer to the struct drm_driver (or > > > > similar) to do the allocation, and move the actual dmabuf handling > > > > code into the core? > > > > > > Yeah, I agree here, it just seems easier to provide a global hook and a > > > somewhat sane default to cover all drivers in one go (potentially with > > > additional restrictions, like only for modeset-only drivers or > > > something). > > > > First off not all drivers are using the GEM DMA helpers (e.g. vc4 with > > !vc5 does not). > > Right. vc4 pre-RPi4 is the exception though, so it's kind of what I > meant by providing sane defaults: the vast majority of drivers are using > GEM DMA helpers, and we should totally let drivers override that if > relevant. > > Basically, we already have 2.5 citizen classes, I'd really like to avoid > having 3 officially, even more so if it's not super difficult to do. > > > The heap code in this patch only works with drivers leveraging GEM DMA > > helpers. > > We could add a new hook to drm_driver to allocate heaps, link to a > default implementation in DRM_GEM_DMA_DRIVER_OPS_WITH_DUMB_CREATE(), and > then use that new hook in your new heap. It would already cover 40 > drivers at the moment, instead of just one, with all of them (but > atmel-hlcdc maybe?) being in the same situation than RPi4-vc4 is. As said in another e-mail, I really think the consequences of DMA heaps need to be thought out on a per-driver basis. Moreover this approach makes core DRM go in the wrong direction, deeper in midlayer territory. > > Then maybe it's somewhat simple to cover typical display devices found > > on split render/display SoCs, however for the rest it's going to be much > > more complicated. For x86 typically there are multiple buffer placements > > supported by the GPU and we need to have one heap per possible placement. > > And then figuring out all of the rules, priority and compatibility stuff > > is a whole other task in and of itself. > > But x86 typically doesn't have a split render/display setup, right? So you're saying we should fix everything at once, but why is x86 not part of "everything" then? x86 also has issues regarding buffer placement. You're saying you don't want to fragment the ecosystem, but it seems like there would still be "fragmentation" between split render/display SoCs and the rest? I'm having a hard time understanding your goals here.
On Fri, Nov 10, 2023 at 02:23:16PM +0000, Simon Ser wrote: > On Friday, November 10th, 2023 at 15:01, Maxime Ripard <mripard@kernel.org> wrote: > > > On Fri, Nov 10, 2023 at 11:21:15AM +0000, Simon Ser wrote: > > > > > On Thursday, November 9th, 2023 at 20:17, Maxime Ripard mripard@kernel.org wrote: > > > > > > > > Can we add another function pointer to the struct drm_driver (or > > > > > similar) to do the allocation, and move the actual dmabuf handling > > > > > code into the core? > > > > > > > > Yeah, I agree here, it just seems easier to provide a global hook and a > > > > somewhat sane default to cover all drivers in one go (potentially with > > > > additional restrictions, like only for modeset-only drivers or > > > > something). > > > > > > First off not all drivers are using the GEM DMA helpers (e.g. vc4 with > > > !vc5 does not). > > > > Right. vc4 pre-RPi4 is the exception though, so it's kind of what I > > meant by providing sane defaults: the vast majority of drivers are using > > GEM DMA helpers, and we should totally let drivers override that if > > relevant. > > > > Basically, we already have 2.5 citizen classes, I'd really like to avoid > > having 3 officially, even more so if it's not super difficult to do. > > > > > The heap code in this patch only works with drivers leveraging GEM DMA > > > helpers. > > > > We could add a new hook to drm_driver to allocate heaps, link to a > > default implementation in DRM_GEM_DMA_DRIVER_OPS_WITH_DUMB_CREATE(), and > > then use that new hook in your new heap. It would already cover 40 > > drivers at the moment, instead of just one, with all of them (but > > atmel-hlcdc maybe?) being in the same situation than RPi4-vc4 is. > > As said in another e-mail, I really think the consequences of DMA heaps > need to be thought out on a per-driver basis. The issue you pointed out doesn't show up on a per-driver basis though. > Moreover this approach makes core DRM go in the wrong direction, > deeper in midlayer territory. I have no idea why that makes it more of a midlayer here, but even if it does, just because it does doesn't mean it's bad or something to avoid. We've been striving for more than a decade now to make drivers easy to write and easy to extend / deviate from the norm. AFAIK, what I suggested provides both. Yours create unnecessary boilerplate and will leave a lot of drivers out of a needed solution. > > > Then maybe it's somewhat simple to cover typical display devices found > > > on split render/display SoCs, however for the rest it's going to be much > > > more complicated. For x86 typically there are multiple buffer placements > > > supported by the GPU and we need to have one heap per possible placement. > > > And then figuring out all of the rules, priority and compatibility stuff > > > is a whole other task in and of itself. > > > > But x86 typically doesn't have a split render/display setup, right? > > So you're saying we should fix everything at once, but why is x86 not > part of "everything" then? "everything" is your word, not mine. I'm saying that the issue you've mentioned for this series applies to a lot of other drivers, and we should fix it for those too. x86 doesn't suffer from the issue you've mentioned, just like the Pi0-3, and thus we don't have to come up with a solution for them. > x86 also has issues regarding buffer placement. You're saying you > don't want to fragment the ecosystem, but it seems like there would > still be "fragmentation" between split render/display SoCs and the > rest? If you want to come up with a generic solution to buffer placement, then you need to consider both sides. A buffer that can be addressed by the scanout engine might not be addressable by the codec that will fill that buffer for example. The problem is far broader than what you described here, and the solution far more involved too. I don't see why you're bringing that up here, I don't think we need a solution for that at this point, and yet I think your current patch is a step in the right direction if we enroll every relevant driver. Maxime > I'm having a hard time understanding your goals here.
On Fri, Nov 10, 2023 at 02:17:45PM +0000, Simon Ser wrote: > On Friday, November 10th, 2023 at 15:13, Maxime Ripard <mripard@kernel.org> wrote: > > > > > > We've talked with Sima at XDC about adding a symlink pointing to the > > > > > DMA heap and extra metadata files describing priorities and such. > > > > > However we don't actually need that part for the purposes of v3d -- > > > > > I think I'd rather defer that until more DMA heaps are plumbed > > > > > across the DRM drivers. > > > > > > > > Honestly, I don't think we can afford to only consider vc4/v3d here. The > > > > issue you described seem to affect any SoC with a split scanout/GPU, > > > > which is pretty much all of them? And if they are all affected, we > > > > should design something that fixes it once and for all. > > > > > > We don't need any sysfs stuff to fix the primary node and DRM dumb buffer > > > issues in Mesa's kmsro/renderonly. The sysfs stuff is only required for a fully > > > generic buffer placement constraint/compatibility uAPI. Which would be super > > > useful in compositors, but let's do one step at a time. > > > > I don't think a solution that further fragments the ecosystem is worth > > taking, sorry. What you're doing is valuable, we should totally fix the > > issue you have, but not at the expense of making vc4 special on one of > > the platforms it supports. > > This does not fragment the ecosystem. It moves the ecosystem bit by bit > towards the final solution. You can rephrase that any way you want, it moves one driver towards the final solution, thus making it deviate from the norm and leaving the rest behind. Maxime
On Thursday, November 9th, 2023 at 08:45, Simon Ser <contact@emersion.fr> wrote: > User-space sometimes needs to allocate scanout-capable memory for > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > is achieved via DRM dumb buffers: the v3d user-space driver opens > the primary vc4 node, allocates a DRM dumb buffer there, exports it > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > However, DRM dumb buffers are only meant for CPU rendering, they are > not intended to be used for GPU rendering. Primary nodes should only > be used for mode-setting purposes, other programs should not attempt > to open it. Moreover, opening the primary node is already broken on > some setups: systemd grants permission to open primary nodes to > physically logged in users, but this breaks when the user is not > physically logged in (e.g. headless setup) and when the distribution > is using a different init (e.g. Alpine Linux uses openrc). > > We need an alternate way for v3d to allocate scanout-capable memory. > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. So we've discussed about this patch on IRC [1] [2]. Some random notes: - We shouldn't create per-DRM-device heaps in general. Instead, we should try using centralized heaps like the existing system and cma ones. That way other drivers (video, render, etc) can also link to these heaps without depending on the display driver. - We can't generically link to heaps in core DRM, however we probably provide a default for shmem and cma helpers. - We're missing a bunch of heaps, e.g. sometimes there are multiple cma areas but only a single cma heap is created right now. - Some hw needs the memory to be in a specific region for scanout (e.g. lower 256MB of RAM for Allwinner). We could create one heap per such region (but is it fine to have overlapping heaps?). Also I tried using the default CMA heap on a Pi 4 for scanout and it works fine. Not super sure it's strictly equivalent to allocations done via dumb buffers (e.g. WC etc). [1]: https://oftc.irclog.whitequark.org/dri-devel/2023-11-13#1699899003-1699919633; [2]: https://oftc.irclog.whitequark.org/dri-devel/2023-11-14
Hi, Thanks for writing this down On Thu, Nov 16, 2023 at 03:53:20PM +0000, Simon Ser wrote: > On Thursday, November 9th, 2023 at 08:45, Simon Ser <contact@emersion.fr> wrote: > > > User-space sometimes needs to allocate scanout-capable memory for > > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > > is achieved via DRM dumb buffers: the v3d user-space driver opens > > the primary vc4 node, allocates a DRM dumb buffer there, exports it > > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > > > However, DRM dumb buffers are only meant for CPU rendering, they are > > not intended to be used for GPU rendering. Primary nodes should only > > be used for mode-setting purposes, other programs should not attempt > > to open it. Moreover, opening the primary node is already broken on > > some setups: systemd grants permission to open primary nodes to > > physically logged in users, but this breaks when the user is not > > physically logged in (e.g. headless setup) and when the distribution > > is using a different init (e.g. Alpine Linux uses openrc). > > > > We need an alternate way for v3d to allocate scanout-capable memory. > > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > > So we've discussed about this patch on IRC [1] [2]. Some random notes: > > - We shouldn't create per-DRM-device heaps in general. Instead, we should try > using centralized heaps like the existing system and cma ones. That way other > drivers (video, render, etc) can also link to these heaps without depending > on the display driver. > - We can't generically link to heaps in core DRM, however we probably provide > a default for shmem and cma helpers. > - We're missing a bunch of heaps, e.g. sometimes there are multiple cma areas > but only a single cma heap is created right now. > - Some hw needs the memory to be in a specific region for scanout (e.g. lower > 256MB of RAM for Allwinner). We could create one heap per such region (but is > it fine to have overlapping heaps?). Just for reference, it's not the scanout itself that has that requirement on Allwinner SoCs, it's the HW codec. But if you want to display the decoded frame directly using dma-buf, you'll still need to either allocate a scanout buffer and hope it'll be in the lower 256MB, or allocate a buffer from the codec in the lower 256MB and then hope it's scanout-capable (which it is, so that's we do, but there's no guarantee about it). I think the logicvc is a much better example for this, since it requires framebuffers to be in a specific area, with each plane having a dedicated area. AFAIK that's the most extreme example we have upstream. Maxime
On Wednesday, November 29th, 2023 at 13:45, Maxime Ripard <mripard@kernel.org> wrote: > > > Hi, > > Thanks for writing this down > > On Thu, Nov 16, 2023 at 03:53:20PM +0000, Simon Ser wrote: > > > On Thursday, November 9th, 2023 at 08:45, Simon Ser contact@emersion.fr wrote: > > > > > User-space sometimes needs to allocate scanout-capable memory for > > > GPU rendering purposes. On a vc4/v3d split render/display SoC, this > > > is achieved via DRM dumb buffers: the v3d user-space driver opens > > > the primary vc4 node, allocates a DRM dumb buffer there, exports it > > > as a DMA-BUF, imports it into the v3d render node, and renders to it. > > > > > > However, DRM dumb buffers are only meant for CPU rendering, they are > > > not intended to be used for GPU rendering. Primary nodes should only > > > be used for mode-setting purposes, other programs should not attempt > > > to open it. Moreover, opening the primary node is already broken on > > > some setups: systemd grants permission to open primary nodes to > > > physically logged in users, but this breaks when the user is not > > > physically logged in (e.g. headless setup) and when the distribution > > > is using a different init (e.g. Alpine Linux uses openrc). > > > > > > We need an alternate way for v3d to allocate scanout-capable memory. > > > Leverage DMA heaps for this purpose: expose a CMA heap to user-space. > > > > So we've discussed about this patch on IRC [1] [2]. Some random notes: > > > > - We shouldn't create per-DRM-device heaps in general. Instead, we should try > > using centralized heaps like the existing system and cma ones. That way other > > drivers (video, render, etc) can also link to these heaps without depending > > on the display driver. > > - We can't generically link to heaps in core DRM, however we probably provide > > a default for shmem and cma helpers. > > - We're missing a bunch of heaps, e.g. sometimes there are multiple cma areas > > but only a single cma heap is created right now. > > - Some hw needs the memory to be in a specific region for scanout (e.g. lower > > 256MB of RAM for Allwinner). We could create one heap per such region (but is > > it fine to have overlapping heaps?). > > Just for reference, it's not the scanout itself that has that > requirement on Allwinner SoCs, it's the HW codec. But if you want to > display the decoded frame directly using dma-buf, you'll still need to > either allocate a scanout buffer and hope it'll be in the lower 256MB, > or allocate a buffer from the codec in the lower 256MB and then hope > it's scanout-capable (which it is, so that's we do, but there's no > guarantee about it). OK. Yeah, the problem remains. > I think the logicvc is a much better example for this, since it requires > framebuffers to be in a specific area, with each plane having a > dedicated area. > > AFAIK that's the most extreme example we have upstream. That kind of restriction is not supported by generic user-space. As far as user-space is concerned, scanout-capable buffers aren't tied to any plane in particular. Generic user-space allocates via GBM or dumb buffers, and at allocation time there is no hint about the plane the buffer will be attached to. I'm personally not super excited/interested about supporting this kind of weird setup which doesn't match the KMS uAPI.
On Tue, Dec 05, 2023 at 04:52:06PM +0000, Simon Ser wrote: > > I think the logicvc is a much better example for this, since it requires > > framebuffers to be in a specific area, with each plane having a > > dedicated area. > > > > AFAIK that's the most extreme example we have upstream. > > That kind of restriction is not supported by generic user-space. As far > as user-space is concerned, scanout-capable buffers aren't tied to any > plane in particular. Generic user-space allocates via GBM or dumb > buffers, and at allocation time there is no hint about the plane the > buffer will be attached to. > > I'm personally not super excited/interested about supporting this kind > of weird setup which doesn't match the KMS uAPI. Yeah... I think if we ever come to this, we can support it in two ways: - Either we handle it like simpledrm and the like by allocating a buffer and then memcpy'ing it to the right place; - Or if it's not fast enough, we create a single heap for the entire device, and then create some dynamic association between the framebuffer and the (hardware) plane depending on where the plane is in that area. That would be quite difficult to get right but not impossible imo. Maxime
diff --git a/drivers/gpu/drm/vc4/Makefile b/drivers/gpu/drm/vc4/Makefile index c41f89a15a55..e4048870cec7 100644 --- a/drivers/gpu/drm/vc4/Makefile +++ b/drivers/gpu/drm/vc4/Makefile @@ -34,4 +34,6 @@ vc4-$(CONFIG_DRM_VC4_KUNIT_TEST) += \ vc4-$(CONFIG_DEBUG_FS) += vc4_debugfs.o +vc4-$(CONFIG_DMABUF_HEAPS) += vc4_dma_heap.o + obj-$(CONFIG_DRM_VC4) += vc4.o diff --git a/drivers/gpu/drm/vc4/vc4_dma_heap.c b/drivers/gpu/drm/vc4/vc4_dma_heap.c new file mode 100644 index 000000000000..010d0a88f3fa --- /dev/null +++ b/drivers/gpu/drm/vc4/vc4_dma_heap.c @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright © 2023 Simon Ser + */ + +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> + +#include "vc4_drv.h" + +static struct dma_buf *vc4_dma_heap_allocate(struct dma_heap *heap, + unsigned long size, + unsigned long fd_flags, + unsigned long heap_flags) +{ + struct vc4_dev *vc4 = dma_heap_get_drvdata(heap); + //DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct drm_gem_dma_object *dma_obj; + struct dma_buf *dmabuf; + + if (WARN_ON_ONCE(!vc4->is_vc5)) + return ERR_PTR(-ENODEV); + + dma_obj = drm_gem_dma_create(&vc4->base, size); + if (IS_ERR(dma_obj)) + return ERR_CAST(dma_obj); + + dmabuf = drm_gem_prime_export(&dma_obj->base, fd_flags); + drm_gem_object_put(&dma_obj->base); + return dmabuf; +} + +static const struct dma_heap_ops vc4_dma_heap_ops = { + .allocate = vc4_dma_heap_allocate, +}; + +int vc4_dma_heap_create(struct vc4_dev *vc4) +{ + struct dma_heap_export_info exp_info; + struct dma_heap *heap; + + exp_info.name = "vc4"; /* TODO: allow multiple? */ + exp_info.ops = &vc4_dma_heap_ops; + exp_info.priv = vc4; /* TODO: unregister when unloading */ + + heap = dma_heap_add(&exp_info); + if (IS_ERR(heap)) + return PTR_ERR(heap); + + return 0; +} diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c index c133e96b8aca..c7297dd7d9d5 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.c +++ b/drivers/gpu/drm/vc4/vc4_drv.c @@ -391,6 +391,12 @@ static int vc4_drm_bind(struct device *dev) drm_fbdev_dma_setup(drm, 16); +#if IS_ENABLED(CONFIG_DMABUF_HEAPS) + ret = vc4_dma_heap_create(vc4); + if (ret) + goto err; +#endif + return 0; err: diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h index ab61e96e7e14..d5c5dd18815c 100644 --- a/drivers/gpu/drm/vc4/vc4_drv.h +++ b/drivers/gpu/drm/vc4/vc4_drv.h @@ -1078,4 +1078,9 @@ int vc4_perfmon_destroy_ioctl(struct drm_device *dev, void *data, int vc4_perfmon_get_values_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); +/* vc4_dma_heap.c */ +#if IS_ENABLED(CONFIG_DMABUF_HEAPS) +int vc4_dma_heap_create(struct vc4_dev *vc4); +#endif + #endif /* _VC4_DRV_H_ */
User-space sometimes needs to allocate scanout-capable memory for GPU rendering purposes. On a vc4/v3d split render/display SoC, this is achieved via DRM dumb buffers: the v3d user-space driver opens the primary vc4 node, allocates a DRM dumb buffer there, exports it as a DMA-BUF, imports it into the v3d render node, and renders to it. However, DRM dumb buffers are only meant for CPU rendering, they are not intended to be used for GPU rendering. Primary nodes should only be used for mode-setting purposes, other programs should not attempt to open it. Moreover, opening the primary node is already broken on some setups: systemd grants permission to open primary nodes to physically logged in users, but this breaks when the user is not physically logged in (e.g. headless setup) and when the distribution is using a different init (e.g. Alpine Linux uses openrc). We need an alternate way for v3d to allocate scanout-capable memory. Leverage DMA heaps for this purpose: expose a CMA heap to user-space. Preliminary testing has been done with wlroots [1]. This is still an RFC. Open questions: - Does this approach make sense to y'all in general? - What would be a good name for the heap? "vc4" is maybe a bit naive and not precise enough. Something with "cma"? Do we need to plan a naming scheme to accomodate for multiple vc4 devices? - Right now root privileges are necessary to open the heap. Should we allow everybody to open the heap by default (after all, user-space can already allocate arbitrary amounts of GPU memory)? Should we leave it up to user-space to solve this issue (via logind/seatd or a Wayland protocol or something else)? TODO: - Need to add !vc5 support. - Need to the extend DMA heaps API to allow vc4 to unregister the heap on unload. [1]: https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4414 Signed-off-by: Simon Ser <contact@emersion.fr> Cc: Maxime Ripard <mripard@kernel.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Stone <daniel@fooishbar.org> Cc: Erico Nunes <nunes.erico@gmail.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> --- drivers/gpu/drm/vc4/Makefile | 2 ++ drivers/gpu/drm/vc4/vc4_dma_heap.c | 51 ++++++++++++++++++++++++++++++ drivers/gpu/drm/vc4/vc4_drv.c | 6 ++++ drivers/gpu/drm/vc4/vc4_drv.h | 5 +++ 4 files changed, 64 insertions(+) create mode 100644 drivers/gpu/drm/vc4/vc4_dma_heap.c