Message ID | 20240424063753.3740664-1-tomeu@tomeuvizoso.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/etnaviv: Create an accel device node if compute-only | expand |
Hi Tomeu, > > If we expose a render node for NPUs without rendering capabilities, the > userspace stack will offer it to compositors and applications for > rendering, which of course won't work. > > Userspace is probably right in not questioning whether a render node > might not be capable of supporting rendering, so change it in the kernel > instead by exposing a /dev/accel node. > > Before we bring the device up we don't know whether it is capable of > rendering or not (depends on the features of its blocks), so first try > to probe a rendering node, and if we find out that there is no rendering > hardware, abort and retry with an accel node. > I really love this idea of moving away from a render node. What needs to be done on the userspace side? > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > Cc: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> > --- > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 46 ++++++++++++++++++++++----- > 1 file changed, 38 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > index 6500f3999c5f..8e7dd23115f4 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > @@ -11,6 +11,7 @@ > #include <linux/platform_device.h> > #include <linux/uaccess.h> > > +#include <drm/drm_accel.h> > #include <drm/drm_debugfs.h> > #include <drm/drm_drv.h> > #include <drm/drm_file.h> > @@ -488,10 +489,10 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = { > ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW), > }; > > -DEFINE_DRM_GEM_FOPS(fops); > +DEFINE_DRM_GEM_FOPS(render_fops); > +DEFINE_DRM_ACCEL_FOPS(accel_fops); > > -static const struct drm_driver etnaviv_drm_driver = { > - .driver_features = DRIVER_GEM | DRIVER_RENDER, > +static struct drm_driver etnaviv_drm_driver = { > .open = etnaviv_open, > .postclose = etnaviv_postclose, > .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table, > @@ -500,7 +501,6 @@ static const struct drm_driver etnaviv_drm_driver = { > #endif > .ioctls = etnaviv_ioctls, > .num_ioctls = DRM_ETNAVIV_NUM_IOCTLS, > - .fops = &fops, > .name = "etnaviv", > .desc = "etnaviv DRM", > .date = "20151214", > @@ -508,15 +508,20 @@ static const struct drm_driver etnaviv_drm_driver = { > .minor = 4, > }; > > -/* > - * Platform driver: > - */ > -static int etnaviv_bind(struct device *dev) > +static int etnaviv_bind_with_type(struct device *dev, u32 type) > { > struct etnaviv_drm_private *priv; > struct drm_device *drm; > + bool is_compute_only = true; > int ret; > > + etnaviv_drm_driver.driver_features = DRIVER_GEM | type; > + > + if (type == DRIVER_RENDER) > + etnaviv_drm_driver.fops = &render_fops; > + else > + etnaviv_drm_driver.fops = &accel_fops; > + > drm = drm_dev_alloc(&etnaviv_drm_driver, dev); > if (IS_ERR(drm)) > return PTR_ERR(drm); > @@ -553,6 +558,18 @@ static int etnaviv_bind(struct device *dev) > > load_gpu(drm); > > + for (unsigned int i = 0; i < ETNA_MAX_PIPES; i++) { > + struct etnaviv_gpu *g = priv->gpu[i]; > + > + if (g && (g->identity.minor_features8 & chipMinorFeatures8_COMPUTE_ONLY) == 0) > + is_compute_only = false; > + } > + > + if (type == DRIVER_RENDER && is_compute_only) { > + ret = -EINVAL; > + goto out_unbind; > + } > + > ret = drm_dev_register(drm, 0); > if (ret) > goto out_unbind; > @@ -571,6 +588,19 @@ static int etnaviv_bind(struct device *dev) > return ret; > } > > +/* > + * Platform driver: > + */ > +static int etnaviv_bind(struct device *dev) > +{ > + int ret = etnaviv_bind_with_type(dev, DRIVER_RENDER); > + > + if (ret == -EINVAL) > + return etnaviv_bind_with_type(dev, DRIVER_COMPUTE_ACCEL); > + > + return ret; > +} > + > static void etnaviv_unbind(struct device *dev) > { > struct drm_device *drm = dev_get_drvdata(dev); > -- > 2.44.0 >
On 4/24/2024 12:37 AM, Tomeu Vizoso wrote: > If we expose a render node for NPUs without rendering capabilities, the > userspace stack will offer it to compositors and applications for > rendering, which of course won't work. > > Userspace is probably right in not questioning whether a render node > might not be capable of supporting rendering, so change it in the kernel > instead by exposing a /dev/accel node. > > Before we bring the device up we don't know whether it is capable of > rendering or not (depends on the features of its blocks), so first try > to probe a rendering node, and if we find out that there is no rendering > hardware, abort and retry with an accel node. > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > Cc: Oded Gabbay <ogabbay@kernel.org> I hope Oded chimes in as Accel maintainer. I think Airlie/Vetter had also previously mentioned they'd have opinions on what is Accel vs DRM. This gets a nack from me in its current state. This is not a strong nack, and I don't want to discourage you. I think there is a path forward. The Accel subsystem documentation says that accel drivers will reside in drivers/accel/ but this does not. Also, the commit text for "accel: add dedicated minor for accelerator devices" mentions - "for drivers that declare they handle compute accelerator, using a new driver feature flag called DRIVER_COMPUTE_ACCEL. It is important to note that this driver feature is mutually exclusive with DRIVER_RENDER. Devices that want to expose both graphics and compute device char files should be handled by two drivers that are connected using the auxiliary bus framework." I don't see any of that happening here (two drivers connected by aux bus, one in drivers/accel). I think this is the first case we've had of a combo DRM/Accel usecase, and so there isn't an existing example to refer you to on how to structure things. I think you are going to be the first example where we figure all of this out. On a more implementation note, ioctls for Accel devices should not be marked DRM_RENDER_ALLOW. Seems like your attempt to reuse as much of the code as possible trips over this. -Jeff
On Thu, Apr 25, 2024 at 1:32 PM Christian Gmeiner <christian.gmeiner@gmail.com> wrote: > > Hi Tomeu, > > > > > If we expose a render node for NPUs without rendering capabilities, the > > userspace stack will offer it to compositors and applications for > > rendering, which of course won't work. > > > > Userspace is probably right in not questioning whether a render node > > might not be capable of supporting rendering, so change it in the kernel > > instead by exposing a /dev/accel node. > > > > Before we bring the device up we don't know whether it is capable of > > rendering or not (depends on the features of its blocks), so first try > > to probe a rendering node, and if we find out that there is no rendering > > hardware, abort and retry with an accel node. > > > > I really love this idea of moving away from a render node. What needs to be done > on the userspace side? Doesn't seem that bad, here is a proof of concept: https://gitlab.freedesktop.org/tomeu/mesa/-/tree/teflon-accel Thanks for taking a look. Tomeu > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > > Cc: Oded Gabbay <ogabbay@kernel.org> > > Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> > > > --- > > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 46 ++++++++++++++++++++++----- > > 1 file changed, 38 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > index 6500f3999c5f..8e7dd23115f4 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > @@ -11,6 +11,7 @@ > > #include <linux/platform_device.h> > > #include <linux/uaccess.h> > > > > +#include <drm/drm_accel.h> > > #include <drm/drm_debugfs.h> > > #include <drm/drm_drv.h> > > #include <drm/drm_file.h> > > @@ -488,10 +489,10 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = { > > ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW), > > }; > > > > -DEFINE_DRM_GEM_FOPS(fops); > > +DEFINE_DRM_GEM_FOPS(render_fops); > > +DEFINE_DRM_ACCEL_FOPS(accel_fops); > > > > -static const struct drm_driver etnaviv_drm_driver = { > > - .driver_features = DRIVER_GEM | DRIVER_RENDER, > > +static struct drm_driver etnaviv_drm_driver = { > > .open = etnaviv_open, > > .postclose = etnaviv_postclose, > > .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table, > > @@ -500,7 +501,6 @@ static const struct drm_driver etnaviv_drm_driver = { > > #endif > > .ioctls = etnaviv_ioctls, > > .num_ioctls = DRM_ETNAVIV_NUM_IOCTLS, > > - .fops = &fops, > > .name = "etnaviv", > > .desc = "etnaviv DRM", > > .date = "20151214", > > @@ -508,15 +508,20 @@ static const struct drm_driver etnaviv_drm_driver = { > > .minor = 4, > > }; > > > > -/* > > - * Platform driver: > > - */ > > -static int etnaviv_bind(struct device *dev) > > +static int etnaviv_bind_with_type(struct device *dev, u32 type) > > { > > struct etnaviv_drm_private *priv; > > struct drm_device *drm; > > + bool is_compute_only = true; > > int ret; > > > > + etnaviv_drm_driver.driver_features = DRIVER_GEM | type; > > + > > + if (type == DRIVER_RENDER) > > + etnaviv_drm_driver.fops = &render_fops; > > + else > > + etnaviv_drm_driver.fops = &accel_fops; > > + > > drm = drm_dev_alloc(&etnaviv_drm_driver, dev); > > if (IS_ERR(drm)) > > return PTR_ERR(drm); > > @@ -553,6 +558,18 @@ static int etnaviv_bind(struct device *dev) > > > > load_gpu(drm); > > > > + for (unsigned int i = 0; i < ETNA_MAX_PIPES; i++) { > > + struct etnaviv_gpu *g = priv->gpu[i]; > > + > > + if (g && (g->identity.minor_features8 & chipMinorFeatures8_COMPUTE_ONLY) == 0) > > + is_compute_only = false; > > + } > > + > > + if (type == DRIVER_RENDER && is_compute_only) { > > + ret = -EINVAL; > > + goto out_unbind; > > + } > > + > > ret = drm_dev_register(drm, 0); > > if (ret) > > goto out_unbind; > > @@ -571,6 +588,19 @@ static int etnaviv_bind(struct device *dev) > > return ret; > > } > > > > +/* > > + * Platform driver: > > + */ > > +static int etnaviv_bind(struct device *dev) > > +{ > > + int ret = etnaviv_bind_with_type(dev, DRIVER_RENDER); > > + > > + if (ret == -EINVAL) > > + return etnaviv_bind_with_type(dev, DRIVER_COMPUTE_ACCEL); > > + > > + return ret; > > +} > > + > > static void etnaviv_unbind(struct device *dev) > > { > > struct drm_device *drm = dev_get_drvdata(dev); > > -- > > 2.44.0 > > > > > -- > greets > -- > Christian Gmeiner, MSc > > https://christian-gmeiner.info/privacypolicy
On Thu, Apr 25, 2024 at 8:59 PM Jeffrey Hugo <quic_jhugo@quicinc.com> wrote: > > On 4/24/2024 12:37 AM, Tomeu Vizoso wrote: > > If we expose a render node for NPUs without rendering capabilities, the > > userspace stack will offer it to compositors and applications for > > rendering, which of course won't work. > > > > Userspace is probably right in not questioning whether a render node > > might not be capable of supporting rendering, so change it in the kernel > > instead by exposing a /dev/accel node. > > > > Before we bring the device up we don't know whether it is capable of > > rendering or not (depends on the features of its blocks), so first try > > to probe a rendering node, and if we find out that there is no rendering > > hardware, abort and retry with an accel node. > > > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > > Cc: Oded Gabbay <ogabbay@kernel.org> > > I hope Oded chimes in as Accel maintainer. I think Airlie/Vetter had > also previously mentioned they'd have opinions on what is Accel vs DRM. > > This gets a nack from me in its current state. This is not a strong > nack, and I don't want to discourage you. I think there is a path forward. > > The Accel subsystem documentation says that accel drivers will reside in > drivers/accel/ but this does not. Indeed, there is that code organization aspect. > Also, the commit text for "accel: add dedicated minor for accelerator > devices" mentions - > > "for drivers that > declare they handle compute accelerator, using a new driver feature > flag called DRIVER_COMPUTE_ACCEL. It is important to note that this > driver feature is mutually exclusive with DRIVER_RENDER. Devices that > want to expose both graphics and compute device char files should be > handled by two drivers that are connected using the auxiliary bus > framework." > > I don't see any of that happening here (two drivers connected by aux > bus, one in drivers/accel). Well, the text refers to devices, not drivers. The case we are talking about is a driver that wants to sometimes expose an accel node, and sometimes a render node, depending on the hardware it is dealing with. So there would either be a device exposing a single render node, or a device exposing a single accel node. Though by using the auxiliary bus we could in theory solve the code organization problem mentioned above, I'm not quite seeing how to do this in a clean way. The driver in /drivers/gpu/drm would have to be a DRM driver that doesn't register a DRM device, but registers a device in the auxiliary bus for the driver in /drivers/accel to bind to? Or are you seeing some possibility that would fit better in the current DRM framework? > I think this is the first case we've had of a combo DRM/Accel usecase, > and so there isn't an existing example to refer you to on how to > structure things. I think you are going to be the first example where > we figure all of this out. Yep, I will be grateful for any ideas on how to structure this. > On a more implementation note, ioctls for Accel devices should not be > marked DRM_RENDER_ALLOW. Seems like your attempt to reuse as much of > the code as possible trips over this. Indeed, thanks. Cheers, Tomeu > -Jeff
Oded, Dave, Do you have an opinion on this? Thanks, Tomeu On Fri, Apr 26, 2024 at 8:10 AM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > On Thu, Apr 25, 2024 at 8:59 PM Jeffrey Hugo <quic_jhugo@quicinc.com> wrote: > > > > On 4/24/2024 12:37 AM, Tomeu Vizoso wrote: > > > If we expose a render node for NPUs without rendering capabilities, the > > > userspace stack will offer it to compositors and applications for > > > rendering, which of course won't work. > > > > > > Userspace is probably right in not questioning whether a render node > > > might not be capable of supporting rendering, so change it in the kernel > > > instead by exposing a /dev/accel node. > > > > > > Before we bring the device up we don't know whether it is capable of > > > rendering or not (depends on the features of its blocks), so first try > > > to probe a rendering node, and if we find out that there is no rendering > > > hardware, abort and retry with an accel node. > > > > > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > > > Cc: Oded Gabbay <ogabbay@kernel.org> > > > > I hope Oded chimes in as Accel maintainer. I think Airlie/Vetter had > > also previously mentioned they'd have opinions on what is Accel vs DRM. > > > > This gets a nack from me in its current state. This is not a strong > > nack, and I don't want to discourage you. I think there is a path forward. > > > > The Accel subsystem documentation says that accel drivers will reside in > > drivers/accel/ but this does not. > > Indeed, there is that code organization aspect. > > > Also, the commit text for "accel: add dedicated minor for accelerator > > devices" mentions - > > > > "for drivers that > > declare they handle compute accelerator, using a new driver feature > > flag called DRIVER_COMPUTE_ACCEL. It is important to note that this > > driver feature is mutually exclusive with DRIVER_RENDER. Devices that > > want to expose both graphics and compute device char files should be > > handled by two drivers that are connected using the auxiliary bus > > framework." > > > > I don't see any of that happening here (two drivers connected by aux > > bus, one in drivers/accel). > > Well, the text refers to devices, not drivers. The case we are talking > about is a driver that wants to sometimes expose an accel node, and > sometimes a render node, depending on the hardware it is dealing with. > So there would either be a device exposing a single render node, or a > device exposing a single accel node. > > Though by using the auxiliary bus we could in theory solve the code > organization problem mentioned above, I'm not quite seeing how to do > this in a clean way. The driver in /drivers/gpu/drm would have to be a > DRM driver that doesn't register a DRM device, but registers a device > in the auxiliary bus for the driver in /drivers/accel to bind to? Or > are you seeing some possibility that would fit better in the current > DRM framework? > > > I think this is the first case we've had of a combo DRM/Accel usecase, > > and so there isn't an existing example to refer you to on how to > > structure things. I think you are going to be the first example where > > we figure all of this out. > > Yep, I will be grateful for any ideas on how to structure this. > > > On a more implementation note, ioctls for Accel devices should not be > > marked DRM_RENDER_ALLOW. Seems like your attempt to reuse as much of > > the code as possible trips over this. > > Indeed, thanks. > > Cheers, > > Tomeu > > > -Jeff
On Thu, May 09, 2024 at 03:53:01PM +0200, Tomeu Vizoso wrote: > Oded, Dave, > > Do you have an opinion on this? > > Thanks, > > Tomeu Hi Tomeu, Sorry for not replying earlier, I was down with Covid (again...). To your question, I don't have an objection to what you are suggesting. My personal view of accel is that it is an integral part of DRM and therefore, if there is an *existing* drm driver that wants to create an accel node, I'm not against it. There is the question of why you want to expose an accel node, and here I would like to hear Dave's and Sima's opinion on your suggested solution as it may affect the direction of other drm drivers. Thanks, Oded. p.s. Please only use bottom-posting when replying, thanks :) > > On Fri, Apr 26, 2024 at 8:10 AM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > > > On Thu, Apr 25, 2024 at 8:59 PM Jeffrey Hugo <quic_jhugo@quicinc.com> wrote: > > > > > > On 4/24/2024 12:37 AM, Tomeu Vizoso wrote: > > > > If we expose a render node for NPUs without rendering capabilities, the > > > > userspace stack will offer it to compositors and applications for > > > > rendering, which of course won't work. > > > > > > > > Userspace is probably right in not questioning whether a render node > > > > might not be capable of supporting rendering, so change it in the kernel > > > > instead by exposing a /dev/accel node. > > > > > > > > Before we bring the device up we don't know whether it is capable of > > > > rendering or not (depends on the features of its blocks), so first try > > > > to probe a rendering node, and if we find out that there is no rendering > > > > hardware, abort and retry with an accel node. > > > > > > > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > > > > Cc: Oded Gabbay <ogabbay@kernel.org> > > > > > > I hope Oded chimes in as Accel maintainer. I think Airlie/Vetter had > > > also previously mentioned they'd have opinions on what is Accel vs DRM. > > > > > > This gets a nack from me in its current state. This is not a strong > > > nack, and I don't want to discourage you. I think there is a path forward. > > > > > > The Accel subsystem documentation says that accel drivers will reside in > > > drivers/accel/ but this does not. > > > > Indeed, there is that code organization aspect. > > > > > Also, the commit text for "accel: add dedicated minor for accelerator > > > devices" mentions - > > > > > > "for drivers that > > > declare they handle compute accelerator, using a new driver feature > > > flag called DRIVER_COMPUTE_ACCEL. It is important to note that this > > > driver feature is mutually exclusive with DRIVER_RENDER. Devices that > > > want to expose both graphics and compute device char files should be > > > handled by two drivers that are connected using the auxiliary bus > > > framework." > > > > > > I don't see any of that happening here (two drivers connected by aux > > > bus, one in drivers/accel). > > > > Well, the text refers to devices, not drivers. The case we are talking > > about is a driver that wants to sometimes expose an accel node, and > > sometimes a render node, depending on the hardware it is dealing with. > > So there would either be a device exposing a single render node, or a > > device exposing a single accel node. > > > > Though by using the auxiliary bus we could in theory solve the code > > organization problem mentioned above, I'm not quite seeing how to do > > this in a clean way. The driver in /drivers/gpu/drm would have to be a > > DRM driver that doesn't register a DRM device, but registers a device > > in the auxiliary bus for the driver in /drivers/accel to bind to? Or > > are you seeing some possibility that would fit better in the current > > DRM framework? > > > > > I think this is the first case we've had of a combo DRM/Accel usecase, > > > and so there isn't an existing example to refer you to on how to > > > structure things. I think you are going to be the first example where > > > we figure all of this out. > > > > Yep, I will be grateful for any ideas on how to structure this. > > > > > On a more implementation note, ioctls for Accel devices should not be > > > marked DRM_RENDER_ALLOW. Seems like your attempt to reuse as much of > > > the code as possible trips over this. > > > > Indeed, thanks. > > > > Cheers, > > > > Tomeu > > > > > -Jeff
Hi Tomeu, Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > If we expose a render node for NPUs without rendering capabilities, the > userspace stack will offer it to compositors and applications for > rendering, which of course won't work. > > Userspace is probably right in not questioning whether a render node > might not be capable of supporting rendering, so change it in the kernel > instead by exposing a /dev/accel node. > > Before we bring the device up we don't know whether it is capable of > rendering or not (depends on the features of its blocks), so first try > to probe a rendering node, and if we find out that there is no rendering > hardware, abort and retry with an accel node. > I thought about this for a while. My opinion is that this is the wrong approach. We are adding another path to the kernel driver, potentially complicating the userspace side, as now the NPU backend needs to look for both render and accel nodes. While currently accel and drm are pretty closely related and we can share most of the driver, it might still be a maintenance hassle in the long run. On the other hand we already have precedence of compute only DRM devices exposing a render node: there are AMD GPUs that don't expose a graphics queue and are thus not able to actually render graphics. Mesa already handles this in part via the PIPE_CAP_GRAPHICS and I think we should simply extend this to not offer a EGL display on screens without that capability. Regards, Lucas > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > Cc: Oded Gabbay <ogabbay@kernel.org> > --- > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 46 ++++++++++++++++++++++----- > 1 file changed, 38 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > index 6500f3999c5f..8e7dd23115f4 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > @@ -11,6 +11,7 @@ > #include <linux/platform_device.h> > #include <linux/uaccess.h> > > +#include <drm/drm_accel.h> > #include <drm/drm_debugfs.h> > #include <drm/drm_drv.h> > #include <drm/drm_file.h> > @@ -488,10 +489,10 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = { > ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW), > }; > > -DEFINE_DRM_GEM_FOPS(fops); > +DEFINE_DRM_GEM_FOPS(render_fops); > +DEFINE_DRM_ACCEL_FOPS(accel_fops); > > -static const struct drm_driver etnaviv_drm_driver = { > - .driver_features = DRIVER_GEM | DRIVER_RENDER, > +static struct drm_driver etnaviv_drm_driver = { > .open = etnaviv_open, > .postclose = etnaviv_postclose, > .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table, > @@ -500,7 +501,6 @@ static const struct drm_driver etnaviv_drm_driver = { > #endif > .ioctls = etnaviv_ioctls, > .num_ioctls = DRM_ETNAVIV_NUM_IOCTLS, > - .fops = &fops, > .name = "etnaviv", > .desc = "etnaviv DRM", > .date = "20151214", > @@ -508,15 +508,20 @@ static const struct drm_driver etnaviv_drm_driver = { > .minor = 4, > }; > > -/* > - * Platform driver: > - */ > -static int etnaviv_bind(struct device *dev) > +static int etnaviv_bind_with_type(struct device *dev, u32 type) > { > struct etnaviv_drm_private *priv; > struct drm_device *drm; > + bool is_compute_only = true; > int ret; > > + etnaviv_drm_driver.driver_features = DRIVER_GEM | type; > + > + if (type == DRIVER_RENDER) > + etnaviv_drm_driver.fops = &render_fops; > + else > + etnaviv_drm_driver.fops = &accel_fops; > + > drm = drm_dev_alloc(&etnaviv_drm_driver, dev); > if (IS_ERR(drm)) > return PTR_ERR(drm); > @@ -553,6 +558,18 @@ static int etnaviv_bind(struct device *dev) > > load_gpu(drm); > > + for (unsigned int i = 0; i < ETNA_MAX_PIPES; i++) { > + struct etnaviv_gpu *g = priv->gpu[i]; > + > + if (g && (g->identity.minor_features8 & chipMinorFeatures8_COMPUTE_ONLY) == 0) > + is_compute_only = false; > + } > + > + if (type == DRIVER_RENDER && is_compute_only) { > + ret = -EINVAL; > + goto out_unbind; > + } > + > ret = drm_dev_register(drm, 0); > if (ret) > goto out_unbind; > @@ -571,6 +588,19 @@ static int etnaviv_bind(struct device *dev) > return ret; > } > > +/* > + * Platform driver: > + */ > +static int etnaviv_bind(struct device *dev) > +{ > + int ret = etnaviv_bind_with_type(dev, DRIVER_RENDER); > + > + if (ret == -EINVAL) > + return etnaviv_bind_with_type(dev, DRIVER_COMPUTE_ACCEL); > + > + return ret; > +} > + > static void etnaviv_unbind(struct device *dev) > { > struct drm_device *drm = dev_get_drvdata(dev);
Hi Lucas, On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > Hi Tomeu, > > Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > > If we expose a render node for NPUs without rendering capabilities, the > > userspace stack will offer it to compositors and applications for > > rendering, which of course won't work. > > > > Userspace is probably right in not questioning whether a render node > > might not be capable of supporting rendering, so change it in the kernel > > instead by exposing a /dev/accel node. > > > > Before we bring the device up we don't know whether it is capable of > > rendering or not (depends on the features of its blocks), so first try > > to probe a rendering node, and if we find out that there is no rendering > > hardware, abort and retry with an accel node. > > > I thought about this for a while. My opinion is that this is the wrong > approach. We are adding another path to the kernel driver, potentially > complicating the userspace side, as now the NPU backend needs to look > for both render and accel nodes. While currently accel and drm are > pretty closely related and we can share most of the driver, it might > still be a maintenance hassle in the long run. > > On the other hand we already have precedence of compute only DRM > devices exposing a render node: there are AMD GPUs that don't expose a > graphics queue and are thus not able to actually render graphics. Mesa > already handles this in part via the PIPE_CAP_GRAPHICS and I think we > should simply extend this to not offer a EGL display on screens without > that capability. The problem with this is that the compositors I know don't loop over /dev/dri files, trying to create EGL screens and moving to the next one until they find one that works. They take the first render node (unless a specific one has been configured), and assumes it will be able to render with it. To me it seems as if userspace expects that /dev/dri/renderD* devices can be used for rendering and by breaking this assumption we would be breaking existing software. Which is what I understood to be the whole point behind the decision to create a new device file hierarchy for accelerators. Or am I missing something? Adding Daniel Stone to CC in case he wants to give his opinion from the compositor point of view. Cheers, Tomeu > Regards, > Lucas > > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > > Cc: Oded Gabbay <ogabbay@kernel.org> > > --- > > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 46 ++++++++++++++++++++++----- > > 1 file changed, 38 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > index 6500f3999c5f..8e7dd23115f4 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > @@ -11,6 +11,7 @@ > > #include <linux/platform_device.h> > > #include <linux/uaccess.h> > > > > +#include <drm/drm_accel.h> > > #include <drm/drm_debugfs.h> > > #include <drm/drm_drv.h> > > #include <drm/drm_file.h> > > @@ -488,10 +489,10 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = { > > ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW), > > }; > > > > -DEFINE_DRM_GEM_FOPS(fops); > > +DEFINE_DRM_GEM_FOPS(render_fops); > > +DEFINE_DRM_ACCEL_FOPS(accel_fops); > > > > -static const struct drm_driver etnaviv_drm_driver = { > > - .driver_features = DRIVER_GEM | DRIVER_RENDER, > > +static struct drm_driver etnaviv_drm_driver = { > > .open = etnaviv_open, > > .postclose = etnaviv_postclose, > > .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table, > > @@ -500,7 +501,6 @@ static const struct drm_driver etnaviv_drm_driver = { > > #endif > > .ioctls = etnaviv_ioctls, > > .num_ioctls = DRM_ETNAVIV_NUM_IOCTLS, > > - .fops = &fops, > > .name = "etnaviv", > > .desc = "etnaviv DRM", > > .date = "20151214", > > @@ -508,15 +508,20 @@ static const struct drm_driver etnaviv_drm_driver = { > > .minor = 4, > > }; > > > > -/* > > - * Platform driver: > > - */ > > -static int etnaviv_bind(struct device *dev) > > +static int etnaviv_bind_with_type(struct device *dev, u32 type) > > { > > struct etnaviv_drm_private *priv; > > struct drm_device *drm; > > + bool is_compute_only = true; > > int ret; > > > > + etnaviv_drm_driver.driver_features = DRIVER_GEM | type; > > + > > + if (type == DRIVER_RENDER) > > + etnaviv_drm_driver.fops = &render_fops; > > + else > > + etnaviv_drm_driver.fops = &accel_fops; > > + > > drm = drm_dev_alloc(&etnaviv_drm_driver, dev); > > if (IS_ERR(drm)) > > return PTR_ERR(drm); > > @@ -553,6 +558,18 @@ static int etnaviv_bind(struct device *dev) > > > > load_gpu(drm); > > > > + for (unsigned int i = 0; i < ETNA_MAX_PIPES; i++) { > > + struct etnaviv_gpu *g = priv->gpu[i]; > > + > > + if (g && (g->identity.minor_features8 & chipMinorFeatures8_COMPUTE_ONLY) == 0) > > + is_compute_only = false; > > + } > > + > > + if (type == DRIVER_RENDER && is_compute_only) { > > + ret = -EINVAL; > > + goto out_unbind; > > + } > > + > > ret = drm_dev_register(drm, 0); > > if (ret) > > goto out_unbind; > > @@ -571,6 +588,19 @@ static int etnaviv_bind(struct device *dev) > > return ret; > > } > > > > +/* > > + * Platform driver: > > + */ > > +static int etnaviv_bind(struct device *dev) > > +{ > > + int ret = etnaviv_bind_with_type(dev, DRIVER_RENDER); > > + > > + if (ret == -EINVAL) > > + return etnaviv_bind_with_type(dev, DRIVER_COMPUTE_ACCEL); > > + > > + return ret; > > +} > > + > > static void etnaviv_unbind(struct device *dev) > > { > > struct drm_device *drm = dev_get_drvdata(dev); >
On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > Hi Tomeu, > > Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > > If we expose a render node for NPUs without rendering capabilities, the > > userspace stack will offer it to compositors and applications for > > rendering, which of course won't work. > > > > Userspace is probably right in not questioning whether a render node > > might not be capable of supporting rendering, so change it in the kernel > > instead by exposing a /dev/accel node. > > > > Before we bring the device up we don't know whether it is capable of > > rendering or not (depends on the features of its blocks), so first try > > to probe a rendering node, and if we find out that there is no rendering > > hardware, abort and retry with an accel node. > > > I thought about this for a while. My opinion is that this is the wrong > approach. We are adding another path to the kernel driver, potentially > complicating the userspace side, as now the NPU backend needs to look > for both render and accel nodes. Forgot to mention in my earlier reply today that with the proposed solution no changes are needed in the Gallium drivers, only in the pipeloader component in Mesa, and in the Gallium frontends. But those changes are needed anyway to support the upcoming compute-only NPUs, such as Rockchip's. These are the changes I needed to make to the userspace to go with this kernel patch: https://gitlab.freedesktop.org/tomeu/mesa/-/commit/6b0db4cce406c574d2b7710208df9c8bd1ab6345 Cheers, Tomeu > While currently accel and drm are > pretty closely related and we can share most of the driver, it might > still be a maintenance hassle in the long run. > > On the other hand we already have precedence of compute only DRM > devices exposing a render node: there are AMD GPUs that don't expose a > graphics queue and are thus not able to actually render graphics. Mesa > already handles this in part via the PIPE_CAP_GRAPHICS and I think we > should simply extend this to not offer a EGL display on screens without > that capability. > > Regards, > Lucas > > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > > Cc: Oded Gabbay <ogabbay@kernel.org> > > --- > > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 46 ++++++++++++++++++++++----- > > 1 file changed, 38 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > index 6500f3999c5f..8e7dd23115f4 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c > > @@ -11,6 +11,7 @@ > > #include <linux/platform_device.h> > > #include <linux/uaccess.h> > > > > +#include <drm/drm_accel.h> > > #include <drm/drm_debugfs.h> > > #include <drm/drm_drv.h> > > #include <drm/drm_file.h> > > @@ -488,10 +489,10 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = { > > ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW), > > }; > > > > -DEFINE_DRM_GEM_FOPS(fops); > > +DEFINE_DRM_GEM_FOPS(render_fops); > > +DEFINE_DRM_ACCEL_FOPS(accel_fops); > > > > -static const struct drm_driver etnaviv_drm_driver = { > > - .driver_features = DRIVER_GEM | DRIVER_RENDER, > > +static struct drm_driver etnaviv_drm_driver = { > > .open = etnaviv_open, > > .postclose = etnaviv_postclose, > > .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table, > > @@ -500,7 +501,6 @@ static const struct drm_driver etnaviv_drm_driver = { > > #endif > > .ioctls = etnaviv_ioctls, > > .num_ioctls = DRM_ETNAVIV_NUM_IOCTLS, > > - .fops = &fops, > > .name = "etnaviv", > > .desc = "etnaviv DRM", > > .date = "20151214", > > @@ -508,15 +508,20 @@ static const struct drm_driver etnaviv_drm_driver = { > > .minor = 4, > > }; > > > > -/* > > - * Platform driver: > > - */ > > -static int etnaviv_bind(struct device *dev) > > +static int etnaviv_bind_with_type(struct device *dev, u32 type) > > { > > struct etnaviv_drm_private *priv; > > struct drm_device *drm; > > + bool is_compute_only = true; > > int ret; > > > > + etnaviv_drm_driver.driver_features = DRIVER_GEM | type; > > + > > + if (type == DRIVER_RENDER) > > + etnaviv_drm_driver.fops = &render_fops; > > + else > > + etnaviv_drm_driver.fops = &accel_fops; > > + > > drm = drm_dev_alloc(&etnaviv_drm_driver, dev); > > if (IS_ERR(drm)) > > return PTR_ERR(drm); > > @@ -553,6 +558,18 @@ static int etnaviv_bind(struct device *dev) > > > > load_gpu(drm); > > > > + for (unsigned int i = 0; i < ETNA_MAX_PIPES; i++) { > > + struct etnaviv_gpu *g = priv->gpu[i]; > > + > > + if (g && (g->identity.minor_features8 & chipMinorFeatures8_COMPUTE_ONLY) == 0) > > + is_compute_only = false; > > + } > > + > > + if (type == DRIVER_RENDER && is_compute_only) { > > + ret = -EINVAL; > > + goto out_unbind; > > + } > > + > > ret = drm_dev_register(drm, 0); > > if (ret) > > goto out_unbind; > > @@ -571,6 +588,19 @@ static int etnaviv_bind(struct device *dev) > > return ret; > > } > > > > +/* > > + * Platform driver: > > + */ > > +static int etnaviv_bind(struct device *dev) > > +{ > > + int ret = etnaviv_bind_with_type(dev, DRIVER_RENDER); > > + > > + if (ret == -EINVAL) > > + return etnaviv_bind_with_type(dev, DRIVER_COMPUTE_ACCEL); > > + > > + return ret; > > +} > > + > > static void etnaviv_unbind(struct device *dev) > > { > > struct drm_device *drm = dev_get_drvdata(dev); >
Hi, On Mon, 20 May 2024 at 08:39, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > > > If we expose a render node for NPUs without rendering capabilities, the > > > userspace stack will offer it to compositors and applications for > > > rendering, which of course won't work. > > > > > > Userspace is probably right in not questioning whether a render node > > > might not be capable of supporting rendering, so change it in the kernel > > > instead by exposing a /dev/accel node. > > > > > > Before we bring the device up we don't know whether it is capable of > > > rendering or not (depends on the features of its blocks), so first try > > > to probe a rendering node, and if we find out that there is no rendering > > > hardware, abort and retry with an accel node. > > > > On the other hand we already have precedence of compute only DRM > > devices exposing a render node: there are AMD GPUs that don't expose a > > graphics queue and are thus not able to actually render graphics. Mesa > > already handles this in part via the PIPE_CAP_GRAPHICS and I think we > > should simply extend this to not offer a EGL display on screens without > > that capability. > > The problem with this is that the compositors I know don't loop over > /dev/dri files, trying to create EGL screens and moving to the next > one until they find one that works. > > They take the first render node (unless a specific one has been > configured), and assumes it will be able to render with it. > > To me it seems as if userspace expects that /dev/dri/renderD* devices > can be used for rendering and by breaking this assumption we would be > breaking existing software. Mm, it's sort of backwards from that. Compositors just take a non-render DRM node for KMS, then ask GBM+EGL to instantiate a GPU which can work with that. When run in headless mode, we don't take render nodes directly, but instead just create an EGLDisplay or VkPhysicalDevice and work backwards to a render node, rather than selecting a render node and going from there. So from that PoV I don't think it's really that harmful. The only complication is in Mesa, where it would see an etnaviv/amdgpu/... render node and potentially try to use it as a device. As long as Mesa can correctly skip, there should be no userspace API implications. That being said, I'm not entirely sure what the _benefit_ would be of exposing a render node for a device which can't be used by any 'traditional' DRM consumers, i.e. GL/Vulkan/winsys. Cheers, Daniel
On Mon, May 20, 2024 at 1:19 PM Daniel Stone <daniel@fooishbar.org> wrote: > > Hi, > > On Mon, 20 May 2024 at 08:39, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > > Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > > > > If we expose a render node for NPUs without rendering capabilities, the > > > > userspace stack will offer it to compositors and applications for > > > > rendering, which of course won't work. > > > > > > > > Userspace is probably right in not questioning whether a render node > > > > might not be capable of supporting rendering, so change it in the kernel > > > > instead by exposing a /dev/accel node. > > > > > > > > Before we bring the device up we don't know whether it is capable of > > > > rendering or not (depends on the features of its blocks), so first try > > > > to probe a rendering node, and if we find out that there is no rendering > > > > hardware, abort and retry with an accel node. > > > > > > On the other hand we already have precedence of compute only DRM > > > devices exposing a render node: there are AMD GPUs that don't expose a > > > graphics queue and are thus not able to actually render graphics. Mesa > > > already handles this in part via the PIPE_CAP_GRAPHICS and I think we > > > should simply extend this to not offer a EGL display on screens without > > > that capability. > > > > The problem with this is that the compositors I know don't loop over > > /dev/dri files, trying to create EGL screens and moving to the next > > one until they find one that works. > > > > They take the first render node (unless a specific one has been > > configured), and assumes it will be able to render with it. > > > > To me it seems as if userspace expects that /dev/dri/renderD* devices > > can be used for rendering and by breaking this assumption we would be > > breaking existing software. > > Mm, it's sort of backwards from that. Compositors just take a > non-render DRM node for KMS, then ask GBM+EGL to instantiate a GPU > which can work with that. When run in headless mode, we don't take > render nodes directly, but instead just create an EGLDisplay or > VkPhysicalDevice and work backwards to a render node, rather than > selecting a render node and going from there. > > So from that PoV I don't think it's really that harmful. The only > complication is in Mesa, where it would see an etnaviv/amdgpu/... > render node and potentially try to use it as a device. As long as Mesa > can correctly skip, there should be no userspace API implications. > > That being said, I'm not entirely sure what the _benefit_ would be of > exposing a render node for a device which can't be used by any > 'traditional' DRM consumers, i.e. GL/Vulkan/winsys. What I don't understand yet from Lucas proposal is how this isn't going to break existing userspace. I mean, even if we find a good way of having userspace skip non-rendering render nodes, what about existing userspace that isn't able to do that? Any updates to newer kernels are going to break them. Regards, Tomeu
Hi Lucas, Do you have any idea on how not to break userspace if we expose a render node? Cheers, Tomeu On Wed, Jun 12, 2024 at 4:26 PM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > On Mon, May 20, 2024 at 1:19 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > Hi, > > > > On Mon, 20 May 2024 at 08:39, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > > On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > > > Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > > > > > If we expose a render node for NPUs without rendering capabilities, the > > > > > userspace stack will offer it to compositors and applications for > > > > > rendering, which of course won't work. > > > > > > > > > > Userspace is probably right in not questioning whether a render node > > > > > might not be capable of supporting rendering, so change it in the kernel > > > > > instead by exposing a /dev/accel node. > > > > > > > > > > Before we bring the device up we don't know whether it is capable of > > > > > rendering or not (depends on the features of its blocks), so first try > > > > > to probe a rendering node, and if we find out that there is no rendering > > > > > hardware, abort and retry with an accel node. > > > > > > > > On the other hand we already have precedence of compute only DRM > > > > devices exposing a render node: there are AMD GPUs that don't expose a > > > > graphics queue and are thus not able to actually render graphics. Mesa > > > > already handles this in part via the PIPE_CAP_GRAPHICS and I think we > > > > should simply extend this to not offer a EGL display on screens without > > > > that capability. > > > > > > The problem with this is that the compositors I know don't loop over > > > /dev/dri files, trying to create EGL screens and moving to the next > > > one until they find one that works. > > > > > > They take the first render node (unless a specific one has been > > > configured), and assumes it will be able to render with it. > > > > > > To me it seems as if userspace expects that /dev/dri/renderD* devices > > > can be used for rendering and by breaking this assumption we would be > > > breaking existing software. > > > > Mm, it's sort of backwards from that. Compositors just take a > > non-render DRM node for KMS, then ask GBM+EGL to instantiate a GPU > > which can work with that. When run in headless mode, we don't take > > render nodes directly, but instead just create an EGLDisplay or > > VkPhysicalDevice and work backwards to a render node, rather than > > selecting a render node and going from there. > > > > So from that PoV I don't think it's really that harmful. The only > > complication is in Mesa, where it would see an etnaviv/amdgpu/... > > render node and potentially try to use it as a device. As long as Mesa > > can correctly skip, there should be no userspace API implications. > > > > That being said, I'm not entirely sure what the _benefit_ would be of > > exposing a render node for a device which can't be used by any > > 'traditional' DRM consumers, i.e. GL/Vulkan/winsys. > > What I don't understand yet from Lucas proposal is how this isn't > going to break existing userspace. > > I mean, even if we find a good way of having userspace skip > non-rendering render nodes, what about existing userspace that isn't > able to do that? Any updates to newer kernels are going to break them. > > Regards, > > Tomeu
On Thu, May 09, 2024 at 05:41:18PM +0300, Oded Gabbay wrote: > On Thu, May 09, 2024 at 03:53:01PM +0200, Tomeu Vizoso wrote: > > Oded, Dave, > > > > Do you have an opinion on this? > > > > Thanks, > > > > Tomeu > Hi Tomeu, > > Sorry for not replying earlier, I was down with Covid (again...). > > To your question, I don't have an objection to what you are > suggesting. My personal view of accel is that it is an integral part of > DRM and therefore, if there is an *existing* drm driver that wants to > create an accel node, I'm not against it. Yeah, there's a continum from "clearly 3d gpu" to "compute AI accelerator", with everything possible in-between shipping somewhere. Collaboration is the important part, hair-splitting on where exactly the driver should be is kinda secondary. I mean beyond "don't put a pure 3d driver into accel or vice versa" of course :-) > There is the question of why you want to expose an accel node, and > here I would like to hear Dave's and Sima's opinion on your suggested > solution as it may affect the direction of other drm drivers. So existing userspace that blindly assumes that any render node will give it useful 3d acceleration, then that's broken already. - kernel with new driver support but old mesa without that driver already gives you that, even for a pure 3d chip. - intel (and I think also amd) have pure compute chips without 3d, so this issue already exists Same for the other directions, 3d gpus have variable amounts of compute chips nowadays. That leaves imo just the pragmatic choice, and if we need to complicate the init flow of the kernel driver just for a different charnode major, then I don't really see the point. And if we do see the point in this, I think the right approach would be if we split the init flow further into allocating the drm_device, and then in a 2nd step either allocate the accel or render uapi stuff as needed. The DRIVER_FOO flags just aren't super flexible for this kinda of stuff and have a bit a midlayer taste to them. Cheers, Sima > > Thanks, > Oded. > > p.s. > Please only use bottom-posting when replying, thanks :) > > > > > On Fri, Apr 26, 2024 at 8:10 AM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > > > > > On Thu, Apr 25, 2024 at 8:59 PM Jeffrey Hugo <quic_jhugo@quicinc.com> wrote: > > > > > > > > On 4/24/2024 12:37 AM, Tomeu Vizoso wrote: > > > > > If we expose a render node for NPUs without rendering capabilities, the > > > > > userspace stack will offer it to compositors and applications for > > > > > rendering, which of course won't work. > > > > > > > > > > Userspace is probably right in not questioning whether a render node > > > > > might not be capable of supporting rendering, so change it in the kernel > > > > > instead by exposing a /dev/accel node. > > > > > > > > > > Before we bring the device up we don't know whether it is capable of > > > > > rendering or not (depends on the features of its blocks), so first try > > > > > to probe a rendering node, and if we find out that there is no rendering > > > > > hardware, abort and retry with an accel node. > > > > > > > > > > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> > > > > > Cc: Oded Gabbay <ogabbay@kernel.org> > > > > > > > > I hope Oded chimes in as Accel maintainer. I think Airlie/Vetter had > > > > also previously mentioned they'd have opinions on what is Accel vs DRM. > > > > > > > > This gets a nack from me in its current state. This is not a strong > > > > nack, and I don't want to discourage you. I think there is a path forward. > > > > > > > > The Accel subsystem documentation says that accel drivers will reside in > > > > drivers/accel/ but this does not. > > > > > > Indeed, there is that code organization aspect. > > > > > > > Also, the commit text for "accel: add dedicated minor for accelerator > > > > devices" mentions - > > > > > > > > "for drivers that > > > > declare they handle compute accelerator, using a new driver feature > > > > flag called DRIVER_COMPUTE_ACCEL. It is important to note that this > > > > driver feature is mutually exclusive with DRIVER_RENDER. Devices that > > > > want to expose both graphics and compute device char files should be > > > > handled by two drivers that are connected using the auxiliary bus > > > > framework." > > > > > > > > I don't see any of that happening here (two drivers connected by aux > > > > bus, one in drivers/accel). > > > > > > Well, the text refers to devices, not drivers. The case we are talking > > > about is a driver that wants to sometimes expose an accel node, and > > > sometimes a render node, depending on the hardware it is dealing with. > > > So there would either be a device exposing a single render node, or a > > > device exposing a single accel node. > > > > > > Though by using the auxiliary bus we could in theory solve the code > > > organization problem mentioned above, I'm not quite seeing how to do > > > this in a clean way. The driver in /drivers/gpu/drm would have to be a > > > DRM driver that doesn't register a DRM device, but registers a device > > > in the auxiliary bus for the driver in /drivers/accel to bind to? Or > > > are you seeing some possibility that would fit better in the current > > > DRM framework? > > > > > > > I think this is the first case we've had of a combo DRM/Accel usecase, > > > > and so there isn't an existing example to refer you to on how to > > > > structure things. I think you are going to be the first example where > > > > we figure all of this out. > > > > > > Yep, I will be grateful for any ideas on how to structure this. > > > > > > > On a more implementation note, ioctls for Accel devices should not be > > > > marked DRM_RENDER_ALLOW. Seems like your attempt to reuse as much of > > > > the code as possible trips over this. > > > > > > Indeed, thanks. > > > > > > Cheers, > > > > > > Tomeu > > > > > > > -Jeff
On Mon, Jun 17, 2024 at 07:01:05PM +0200, Tomeu Vizoso wrote: > Hi Lucas, > > Do you have any idea on how not to break userspace if we expose a render node? So if you get a new chip with an incompatible 3d block, you already have that issue. And I hope etnaviv userspace can cope. Worst case you need to publish a fake extremely_fancy_3d_block to make sure old mesa never binds against an NPU-only instance. Or mesa just doesn't cope, in which case we need a etnaviv-v2-we_are_sorry drm driver name, or something like that. -Sima > > Cheers, > > Tomeu > > On Wed, Jun 12, 2024 at 4:26 PM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > > > On Mon, May 20, 2024 at 1:19 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > Hi, > > > > > > On Mon, 20 May 2024 at 08:39, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > > > On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > > > > Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > > > > > > If we expose a render node for NPUs without rendering capabilities, the > > > > > > userspace stack will offer it to compositors and applications for > > > > > > rendering, which of course won't work. > > > > > > > > > > > > Userspace is probably right in not questioning whether a render node > > > > > > might not be capable of supporting rendering, so change it in the kernel > > > > > > instead by exposing a /dev/accel node. > > > > > > > > > > > > Before we bring the device up we don't know whether it is capable of > > > > > > rendering or not (depends on the features of its blocks), so first try > > > > > > to probe a rendering node, and if we find out that there is no rendering > > > > > > hardware, abort and retry with an accel node. > > > > > > > > > > On the other hand we already have precedence of compute only DRM > > > > > devices exposing a render node: there are AMD GPUs that don't expose a > > > > > graphics queue and are thus not able to actually render graphics. Mesa > > > > > already handles this in part via the PIPE_CAP_GRAPHICS and I think we > > > > > should simply extend this to not offer a EGL display on screens without > > > > > that capability. > > > > > > > > The problem with this is that the compositors I know don't loop over > > > > /dev/dri files, trying to create EGL screens and moving to the next > > > > one until they find one that works. > > > > > > > > They take the first render node (unless a specific one has been > > > > configured), and assumes it will be able to render with it. > > > > > > > > To me it seems as if userspace expects that /dev/dri/renderD* devices > > > > can be used for rendering and by breaking this assumption we would be > > > > breaking existing software. > > > > > > Mm, it's sort of backwards from that. Compositors just take a > > > non-render DRM node for KMS, then ask GBM+EGL to instantiate a GPU > > > which can work with that. When run in headless mode, we don't take > > > render nodes directly, but instead just create an EGLDisplay or > > > VkPhysicalDevice and work backwards to a render node, rather than > > > selecting a render node and going from there. > > > > > > So from that PoV I don't think it's really that harmful. The only > > > complication is in Mesa, where it would see an etnaviv/amdgpu/... > > > render node and potentially try to use it as a device. As long as Mesa > > > can correctly skip, there should be no userspace API implications. > > > > > > That being said, I'm not entirely sure what the _benefit_ would be of > > > exposing a render node for a device which can't be used by any > > > 'traditional' DRM consumers, i.e. GL/Vulkan/winsys. > > > > What I don't understand yet from Lucas proposal is how this isn't > > going to break existing userspace. > > > > I mean, even if we find a good way of having userspace skip > > non-rendering render nodes, what about existing userspace that isn't > > able to do that? Any updates to newer kernels are going to break them. > > > > Regards, > > > > Tomeu
Am Mittwoch, dem 26.06.2024 um 09:28 +0200 schrieb Daniel Vetter: > On Mon, Jun 17, 2024 at 07:01:05PM +0200, Tomeu Vizoso wrote: > > Hi Lucas, > > > > Do you have any idea on how not to break userspace if we expose a render node? > > So if you get a new chip with an incompatible 3d block, you already have > that issue. And I hope etnaviv userspace can cope. > > Worst case you need to publish a fake extremely_fancy_3d_block to make > sure old mesa never binds against an NPU-only instance. > > Or mesa just doesn't cope, in which case we need a etnaviv-v2-we_are_sorry > drm driver name, or something like that. Mesa doesn't cope right now. Mostly because of the renderonly thing where we magically need to match render devices to otherwise render incapable KMS devices. The way this matching works is that the renderonly code tries to open a screen on a rendernode and if that succeeds we treat it as the matching render device. The core of the issue is that we have no way of specifying which kind of screen we need at that point, i.e. if the screen should have 3D render capabilities or if compute-only or even NN-accel-only would be okay. So we can't fail screen creation if there is no 3D engine, as this would break the teflon case, which needs a screen for the NN accel, but once we successfully create a screen reanderonly might treat the thing as a rendering device. So we are kind of stuck here between breaking one or the other use- case. I'm leaning heavily into the direction of just fixing Mesa, so we can specify the type of screen we need at creation time to avoid the renderonly issue, porting this change as far back as reasonably possible and file old userspace into shit-happens. Regards, Lucas > > > > > Cheers, > > > > Tomeu > > > > On Wed, Jun 12, 2024 at 4:26 PM Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > > > > > On Mon, May 20, 2024 at 1:19 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > > > > > Hi, > > > > > > > > On Mon, 20 May 2024 at 08:39, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > > > > > On Fri, May 10, 2024 at 10:34 AM Lucas Stach <l.stach@pengutronix.de> wrote: > > > > > > Am Mittwoch, dem 24.04.2024 um 08:37 +0200 schrieb Tomeu Vizoso: > > > > > > > If we expose a render node for NPUs without rendering capabilities, the > > > > > > > userspace stack will offer it to compositors and applications for > > > > > > > rendering, which of course won't work. > > > > > > > > > > > > > > Userspace is probably right in not questioning whether a render node > > > > > > > might not be capable of supporting rendering, so change it in the kernel > > > > > > > instead by exposing a /dev/accel node. > > > > > > > > > > > > > > Before we bring the device up we don't know whether it is capable of > > > > > > > rendering or not (depends on the features of its blocks), so first try > > > > > > > to probe a rendering node, and if we find out that there is no rendering > > > > > > > hardware, abort and retry with an accel node. > > > > > > > > > > > > On the other hand we already have precedence of compute only DRM > > > > > > devices exposing a render node: there are AMD GPUs that don't expose a > > > > > > graphics queue and are thus not able to actually render graphics. Mesa > > > > > > already handles this in part via the PIPE_CAP_GRAPHICS and I think we > > > > > > should simply extend this to not offer a EGL display on screens without > > > > > > that capability. > > > > > > > > > > The problem with this is that the compositors I know don't loop over > > > > > /dev/dri files, trying to create EGL screens and moving to the next > > > > > one until they find one that works. > > > > > > > > > > They take the first render node (unless a specific one has been > > > > > configured), and assumes it will be able to render with it. > > > > > > > > > > To me it seems as if userspace expects that /dev/dri/renderD* devices > > > > > can be used for rendering and by breaking this assumption we would be > > > > > breaking existing software. > > > > > > > > Mm, it's sort of backwards from that. Compositors just take a > > > > non-render DRM node for KMS, then ask GBM+EGL to instantiate a GPU > > > > which can work with that. When run in headless mode, we don't take > > > > render nodes directly, but instead just create an EGLDisplay or > > > > VkPhysicalDevice and work backwards to a render node, rather than > > > > selecting a render node and going from there. > > > > > > > > So from that PoV I don't think it's really that harmful. The only > > > > complication is in Mesa, where it would see an etnaviv/amdgpu/... > > > > render node and potentially try to use it as a device. As long as Mesa > > > > can correctly skip, there should be no userspace API implications. > > > > > > > > That being said, I'm not entirely sure what the _benefit_ would be of > > > > exposing a render node for a device which can't be used by any > > > > 'traditional' DRM consumers, i.e. GL/Vulkan/winsys. > > > > > > What I don't understand yet from Lucas proposal is how this isn't > > > going to break existing userspace. > > > > > > I mean, even if we find a good way of having userspace skip > > > non-rendering render nodes, what about existing userspace that isn't > > > able to do that? Any updates to newer kernels are going to break them. > > > > > > Regards, > > > > > > Tomeu >
On Wed, Jun 26, 2024 at 09:26:40AM GMT, Daniel Vetter wrote: > On Thu, May 09, 2024 at 05:41:18PM +0300, Oded Gabbay wrote: > > On Thu, May 09, 2024 at 03:53:01PM +0200, Tomeu Vizoso wrote: > > > Oded, Dave, > > > > > > Do you have an opinion on this? > > > > > > Thanks, > > > > > > Tomeu > > Hi Tomeu, > > > > Sorry for not replying earlier, I was down with Covid (again...). > > > > To your question, I don't have an objection to what you are > > suggesting. My personal view of accel is that it is an integral part of > > DRM and therefore, if there is an *existing* drm driver that wants to > > create an accel node, I'm not against it. > > Yeah, there's a continum from "clearly 3d gpu" to "compute AI > accelerator", with everything possible in-between shipping somewhere. > Collaboration is the important part, hair-splitting on where exactly the > driver should be is kinda secondary. I mean beyond "don't put a pure 3d > driver into accel or vice versa" of course :-) > > > There is the question of why you want to expose an accel node, and > > here I would like to hear Dave's and Sima's opinion on your suggested > > solution as it may affect the direction of other drm drivers. > > So existing userspace that blindly assumes that any render node will give > it useful 3d acceleration, then that's broken already. > > - kernel with new driver support but old mesa without that driver already > gives you that, even for a pure 3d chip. > > - intel (and I think also amd) have pure compute chips without 3d, so this > issue already exists > > Same for the other directions, 3d gpus have variable amounts of compute > chips nowadays. > > That leaves imo just the pragmatic choice, and if we need to complicate > the init flow of the kernel driver just for a different charnode major, > then I don't really see the point. > > And if we do see the point in this, I think the right approach would be if > we split the init flow further into allocating the drm_device, and then in > a 2nd step either allocate the accel or render uapi stuff as needed. The > DRIVER_FOO flags just aren't super flexible for this kinda of stuff and > have a bit a midlayer taste to them. Being able to defer render allocation would be extremely useful for MSM too as it's not currently possible to mask the driver_features during drm_dev_init()
Hi, On Wed, 26 Jun 2024 at 09:28, Lucas Stach <l.stach@pengutronix.de> wrote: > Mesa doesn't cope right now. Mostly because of the renderonly thing > where we magically need to match render devices to otherwise render > incapable KMS devices. The way this matching works is that the > renderonly code tries to open a screen on a rendernode and if that > succeeds we treat it as the matching render device. > > The core of the issue is that we have no way of specifying which kind > of screen we need at that point, i.e. if the screen should have 3D > render capabilities or if compute-only or even NN-accel-only would be > okay. So we can't fail screen creation if there is no 3D engine, as > this would break the teflon case, which needs a screen for the NN > accel, but once we successfully create a screen reanderonly might treat > the thing as a rendering device. > So we are kind of stuck here between breaking one or the other use- > case. I'm leaning heavily into the direction of just fixing Mesa, so we > can specify the type of screen we need at creation time to avoid the > renderonly issue, porting this change as far back as reasonably > possible and file old userspace into shit-happens. Yeah, honestly this sounds like the best solution to me too. Cheers, Daniel
On Wed, Jun 26, 2024 at 11:42:24AM +0300, Dmitry Baryshkov wrote: > On Wed, Jun 26, 2024 at 09:26:40AM GMT, Daniel Vetter wrote: > > On Thu, May 09, 2024 at 05:41:18PM +0300, Oded Gabbay wrote: > > > On Thu, May 09, 2024 at 03:53:01PM +0200, Tomeu Vizoso wrote: > > > > Oded, Dave, > > > > > > > > Do you have an opinion on this? > > > > > > > > Thanks, > > > > > > > > Tomeu > > > Hi Tomeu, > > > > > > Sorry for not replying earlier, I was down with Covid (again...). > > > > > > To your question, I don't have an objection to what you are > > > suggesting. My personal view of accel is that it is an integral part of > > > DRM and therefore, if there is an *existing* drm driver that wants to > > > create an accel node, I'm not against it. > > > > Yeah, there's a continum from "clearly 3d gpu" to "compute AI > > accelerator", with everything possible in-between shipping somewhere. > > Collaboration is the important part, hair-splitting on where exactly the > > driver should be is kinda secondary. I mean beyond "don't put a pure 3d > > driver into accel or vice versa" of course :-) > > > > > There is the question of why you want to expose an accel node, and > > > here I would like to hear Dave's and Sima's opinion on your suggested > > > solution as it may affect the direction of other drm drivers. > > > > So existing userspace that blindly assumes that any render node will give > > it useful 3d acceleration, then that's broken already. > > > > - kernel with new driver support but old mesa without that driver already > > gives you that, even for a pure 3d chip. > > > > - intel (and I think also amd) have pure compute chips without 3d, so this > > issue already exists > > > > Same for the other directions, 3d gpus have variable amounts of compute > > chips nowadays. > > > > That leaves imo just the pragmatic choice, and if we need to complicate > > the init flow of the kernel driver just for a different charnode major, > > then I don't really see the point. > > > > And if we do see the point in this, I think the right approach would be if > > we split the init flow further into allocating the drm_device, and then in > > a 2nd step either allocate the accel or render uapi stuff as needed. The > > DRIVER_FOO flags just aren't super flexible for this kinda of stuff and > > have a bit a midlayer taste to them. > > Being able to defer render allocation would be extremely useful for MSM > too as it's not currently possible to mask the driver_features during > drm_dev_init() Eh I think less driver_features and more explicit (like drm_mode_config_init() instead of also having to set DRIVER_MODESET) stuff would be better in general. But they keep popping up, because it's an easy hack to get things going. Over the years I've managed to remove a lot of them tough. -Sima
On Wed, Jun 26, 2024 at 11:39:01AM +0100, Daniel Stone wrote: > Hi, > > On Wed, 26 Jun 2024 at 09:28, Lucas Stach <l.stach@pengutronix.de> wrote: > > Mesa doesn't cope right now. Mostly because of the renderonly thing > > where we magically need to match render devices to otherwise render > > incapable KMS devices. The way this matching works is that the > > renderonly code tries to open a screen on a rendernode and if that > > succeeds we treat it as the matching render device. > > > > The core of the issue is that we have no way of specifying which kind > > of screen we need at that point, i.e. if the screen should have 3D > > render capabilities or if compute-only or even NN-accel-only would be > > okay. So we can't fail screen creation if there is no 3D engine, as > > this would break the teflon case, which needs a screen for the NN > > accel, but once we successfully create a screen reanderonly might treat > > the thing as a rendering device. > > So we are kind of stuck here between breaking one or the other use- > > case. I'm leaning heavily into the direction of just fixing Mesa, so we > > can specify the type of screen we need at creation time to avoid the > > renderonly issue, porting this change as far back as reasonably > > possible and file old userspace into shit-happens. > > Yeah, honestly this sounds like the best solution to me too. Yeah mesa sounds kinda broken here ... What might work in the kernel is if you publish a fake 3d engine that's too new for broken mesa, if that's enough to make it fail to bind? And if mesa still happily binds against that, then yeah it's probably too broken and we need etnaviv-v2 (as a drm driver uapi name, I think that's what mesa filters?) for anything new (including the NN-only ones). I would still try to avoid that, but just in case someone screams about regressions. -Sima
On Wed, 26 Jun 2024 at 18:52, Daniel Vetter <daniel@ffwll.ch> wrote: > On Wed, Jun 26, 2024 at 11:39:01AM +0100, Daniel Stone wrote: > > On Wed, 26 Jun 2024 at 09:28, Lucas Stach <l.stach@pengutronix.de> wrote: > > > So we are kind of stuck here between breaking one or the other use- > > > case. I'm leaning heavily into the direction of just fixing Mesa, so we > > > can specify the type of screen we need at creation time to avoid the > > > renderonly issue, porting this change as far back as reasonably > > > possible and file old userspace into shit-happens. > > > > Yeah, honestly this sounds like the best solution to me too. > > Yeah mesa sounds kinda broken here ... > > What might work in the kernel is if you publish a fake 3d engine that's > too new for broken mesa, if that's enough to make it fail to bind? And if > mesa still happily binds against that, then yeah it's probably too broken > and we need etnaviv-v2 (as a drm driver uapi name, I think that's what > mesa filters?) for anything new (including the NN-only ones). > > I would still try to avoid that, but just in case someone screams about > regressions. It's not just etnaviv, it's literally every Mesa driver which works with decoupled render/display. So that would be etnaviv-v2, panfrost-v2, panthor-v2, v3d-v2, powervr-v2, ... albeit those don't tend to have multiple instances. Anyway, I'm still leaning towards the answer being: this is not an etnaviv regression caused by NPU, it's a longstanding generic Mesa issue for which the answer is to fix the known fragility. Cheers, Daniel
On Wed, Jun 26, 2024 at 9:26 PM Daniel Stone <daniel@fooishbar.org> wrote: > > On Wed, 26 Jun 2024 at 18:52, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Wed, Jun 26, 2024 at 11:39:01AM +0100, Daniel Stone wrote: > > > On Wed, 26 Jun 2024 at 09:28, Lucas Stach <l.stach@pengutronix.de> wrote: > > > > So we are kind of stuck here between breaking one or the other use- > > > > case. I'm leaning heavily into the direction of just fixing Mesa, so we > > > > can specify the type of screen we need at creation time to avoid the > > > > renderonly issue, porting this change as far back as reasonably > > > > possible and file old userspace into shit-happens. > > > > > > Yeah, honestly this sounds like the best solution to me too. > > > > Yeah mesa sounds kinda broken here ... > > > > What might work in the kernel is if you publish a fake 3d engine that's > > too new for broken mesa, if that's enough to make it fail to bind? And if > > mesa still happily binds against that, then yeah it's probably too broken > > and we need etnaviv-v2 (as a drm driver uapi name, I think that's what > > mesa filters?) for anything new (including the NN-only ones). > > > > I would still try to avoid that, but just in case someone screams about > > regressions. Thanks everybody for chiming in. > It's not just etnaviv, it's literally every Mesa driver which works > with decoupled render/display. So that would be etnaviv-v2, > panfrost-v2, panthor-v2, v3d-v2, powervr-v2, ... albeit those don't > tend to have multiple instances. TBH, I think VeriSilicon is the only IP vendor that has recycled a render-only IP into a compute-only IP. That is why I liked the approach of conditionally creating an accel node, as it neatly reflects that reality. > Anyway, I'm still leaning towards the answer being: this is not an > etnaviv regression caused by NPU, it's a longstanding generic Mesa > issue for which the answer is to fix the known fragility. My understanding of the consensus so far is that Mesa should be fixed so that Gallium drivers can fail at screen init if the device doesn't support some new usage flags that we would be adding. If for some reason that doesn't work, we would be looking at having etnaviv use a different kind of driver name, such as etnaviv-npu or etnaviv-compute. Did I get it right? Thanks, Tomeu
Hi, On Fri, 28 Jun 2024 at 10:43, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote: > On Wed, Jun 26, 2024 at 9:26 PM Daniel Stone <daniel@fooishbar.org> wrote: > > It's not just etnaviv, it's literally every Mesa driver which works > > with decoupled render/display. So that would be etnaviv-v2, > > panfrost-v2, panthor-v2, v3d-v2, powervr-v2, ... albeit those don't > > tend to have multiple instances. > > TBH, I think VeriSilicon is the only IP vendor that has recycled a > render-only IP into a compute-only IP. > > That is why I liked the approach of conditionally creating an accel > node, as it neatly reflects that reality. > > > Anyway, I'm still leaning towards the answer being: this is not an > > etnaviv regression caused by NPU, it's a longstanding generic Mesa > > issue for which the answer is to fix the known fragility. > > My understanding of the consensus so far is that Mesa should be fixed > so that Gallium drivers can fail at screen init if the device doesn't > support some new usage flags that we would be adding. > > If for some reason that doesn't work, we would be looking at having > etnaviv use a different kind of driver name, such as etnaviv-npu or > etnaviv-compute. > > Did I get it right? Yep, wfm. :) Cheers, Daniel
On Wed, Jun 26, 2024 at 08:26:04PM +0100, Daniel Stone wrote: > On Wed, 26 Jun 2024 at 18:52, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Wed, Jun 26, 2024 at 11:39:01AM +0100, Daniel Stone wrote: > > > On Wed, 26 Jun 2024 at 09:28, Lucas Stach <l.stach@pengutronix.de> wrote: > > > > So we are kind of stuck here between breaking one or the other use- > > > > case. I'm leaning heavily into the direction of just fixing Mesa, so we > > > > can specify the type of screen we need at creation time to avoid the > > > > renderonly issue, porting this change as far back as reasonably > > > > possible and file old userspace into shit-happens. > > > > > > Yeah, honestly this sounds like the best solution to me too. > > > > Yeah mesa sounds kinda broken here ... > > > > What might work in the kernel is if you publish a fake 3d engine that's > > too new for broken mesa, if that's enough to make it fail to bind? And if > > mesa still happily binds against that, then yeah it's probably too broken > > and we need etnaviv-v2 (as a drm driver uapi name, I think that's what > > mesa filters?) for anything new (including the NN-only ones). > > > > I would still try to avoid that, but just in case someone screams about > > regressions. > > It's not just etnaviv, it's literally every Mesa driver which works > with decoupled render/display. So that would be etnaviv-v2, > panfrost-v2, panthor-v2, v3d-v2, powervr-v2, ... albeit those don't > tend to have multiple instances. So essentially mesa just burns&crashes when old mesa runs on a newer kernel with support for a chip that mesa doesn't know about? > Anyway, I'm still leaning towards the answer being: this is not an > etnaviv regression caused by NPU, it's a longstanding generic Mesa > issue for which the answer is to fix the known fragility. If the above is correct, then yes I think we should just fix mesa. Feels like the breakage is too obviously there, and that's all we'll do unless the screaming gets too loud. -Sima
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c index 6500f3999c5f..8e7dd23115f4 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c @@ -11,6 +11,7 @@ #include <linux/platform_device.h> #include <linux/uaccess.h> +#include <drm/drm_accel.h> #include <drm/drm_debugfs.h> #include <drm/drm_drv.h> #include <drm/drm_file.h> @@ -488,10 +489,10 @@ static const struct drm_ioctl_desc etnaviv_ioctls[] = { ETNA_IOCTL(PM_QUERY_SIG, pm_query_sig, DRM_RENDER_ALLOW), }; -DEFINE_DRM_GEM_FOPS(fops); +DEFINE_DRM_GEM_FOPS(render_fops); +DEFINE_DRM_ACCEL_FOPS(accel_fops); -static const struct drm_driver etnaviv_drm_driver = { - .driver_features = DRIVER_GEM | DRIVER_RENDER, +static struct drm_driver etnaviv_drm_driver = { .open = etnaviv_open, .postclose = etnaviv_postclose, .gem_prime_import_sg_table = etnaviv_gem_prime_import_sg_table, @@ -500,7 +501,6 @@ static const struct drm_driver etnaviv_drm_driver = { #endif .ioctls = etnaviv_ioctls, .num_ioctls = DRM_ETNAVIV_NUM_IOCTLS, - .fops = &fops, .name = "etnaviv", .desc = "etnaviv DRM", .date = "20151214", @@ -508,15 +508,20 @@ static const struct drm_driver etnaviv_drm_driver = { .minor = 4, }; -/* - * Platform driver: - */ -static int etnaviv_bind(struct device *dev) +static int etnaviv_bind_with_type(struct device *dev, u32 type) { struct etnaviv_drm_private *priv; struct drm_device *drm; + bool is_compute_only = true; int ret; + etnaviv_drm_driver.driver_features = DRIVER_GEM | type; + + if (type == DRIVER_RENDER) + etnaviv_drm_driver.fops = &render_fops; + else + etnaviv_drm_driver.fops = &accel_fops; + drm = drm_dev_alloc(&etnaviv_drm_driver, dev); if (IS_ERR(drm)) return PTR_ERR(drm); @@ -553,6 +558,18 @@ static int etnaviv_bind(struct device *dev) load_gpu(drm); + for (unsigned int i = 0; i < ETNA_MAX_PIPES; i++) { + struct etnaviv_gpu *g = priv->gpu[i]; + + if (g && (g->identity.minor_features8 & chipMinorFeatures8_COMPUTE_ONLY) == 0) + is_compute_only = false; + } + + if (type == DRIVER_RENDER && is_compute_only) { + ret = -EINVAL; + goto out_unbind; + } + ret = drm_dev_register(drm, 0); if (ret) goto out_unbind; @@ -571,6 +588,19 @@ static int etnaviv_bind(struct device *dev) return ret; } +/* + * Platform driver: + */ +static int etnaviv_bind(struct device *dev) +{ + int ret = etnaviv_bind_with_type(dev, DRIVER_RENDER); + + if (ret == -EINVAL) + return etnaviv_bind_with_type(dev, DRIVER_COMPUTE_ACCEL); + + return ret; +} + static void etnaviv_unbind(struct device *dev) { struct drm_device *drm = dev_get_drvdata(dev);
If we expose a render node for NPUs without rendering capabilities, the userspace stack will offer it to compositors and applications for rendering, which of course won't work. Userspace is probably right in not questioning whether a render node might not be capable of supporting rendering, so change it in the kernel instead by exposing a /dev/accel node. Before we bring the device up we don't know whether it is capable of rendering or not (depends on the features of its blocks), so first try to probe a rendering node, and if we find out that there is no rendering hardware, abort and retry with an accel node. Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Cc: Oded Gabbay <ogabbay@kernel.org> --- drivers/gpu/drm/etnaviv/etnaviv_drv.c | 46 ++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 8 deletions(-)