Message ID | 1642587791-13222-1-git-send-email-ivo.g.dimitrov.75@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | drm: omapdrm: Fix excessive GEM buffers DMM/CMA usage | expand |
gentle ping On 19.01.22 г. 12:23 ч., Ivaylo Dimitrov wrote: > This patch series fixes excessive DMM or CMA usage of GEM buffers leading to > various runtime allocation failures. The series enables daily usage of devices > without exausting limited resources like CMA or DMM space if GPU rendering is > needed. > > The first patch doesn't bring any functional changes, it just moves some > TILER/DMM related code to a separate function, to simplify the review of the > next two patches. > > The second patch allows off-CPU rendering to non-scanout buffers. Without that > patch, it is basically impossible to use the driver allocated GEM buffers on > OMAP3 for anything else but a basic CPU rendered examples as if we want GPU > rendering, we must allocate buffers as scanout buffers, which are CMA allocated. > CMA soon gets fragmented and we start seeing allocation failures. Such failres > in Xorg cannot be handeled gracefully, so the system is basically unusable. > > Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get > fragmented with time, leading to allocation failures. > > Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and > PVR EXA from https://github.com/maemo-leste/xf86-video-omap > > Ivaylo Dimitrov (3): > drm: omapdrm: simplify omap_gem_pin > drm: omapdrm: Support exporting of non-contiguous GEM BOs > drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER > > drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- > drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- > drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- > 3 files changed, 116 insertions(+), 90 deletions(-) >
Hi Am 19.01.22 um 11:23 schrieb Ivaylo Dimitrov: > This patch series fixes excessive DMM or CMA usage of GEM buffers leading to > various runtime allocation failures. The series enables daily usage of devices > without exausting limited resources like CMA or DMM space if GPU rendering is > needed. > > The first patch doesn't bring any functional changes, it just moves some > TILER/DMM related code to a separate function, to simplify the review of the > next two patches. > > The second patch allows off-CPU rendering to non-scanout buffers. Without that > patch, it is basically impossible to use the driver allocated GEM buffers on > OMAP3 for anything else but a basic CPU rendered examples as if we want GPU > rendering, we must allocate buffers as scanout buffers, which are CMA allocated. > CMA soon gets fragmented and we start seeing allocation failures. Such failres > in Xorg cannot be handeled gracefully, so the system is basically unusable. > > Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get > fragmented with time, leading to allocation failures. > > Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and > PVR EXA from https://github.com/maemo-leste/xf86-video-omap > > Ivaylo Dimitrov (3): > drm: omapdrm: simplify omap_gem_pin > drm: omapdrm: Support exporting of non-contiguous GEM BOs > drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER With the little expertise I have with omapdrm: Acked-by: Thomas Zimmermann <tzimmermann@suse.de> > > drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- > drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- > drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- > 3 files changed, 116 insertions(+), 90 deletions(-) >
Hi Ivaylo, On 19/01/2022 12:23, Ivaylo Dimitrov wrote: > This patch series fixes excessive DMM or CMA usage of GEM buffers leading to > various runtime allocation failures. The series enables daily usage of devices > without exausting limited resources like CMA or DMM space if GPU rendering is > needed. > > The first patch doesn't bring any functional changes, it just moves some > TILER/DMM related code to a separate function, to simplify the review of the > next two patches. > > The second patch allows off-CPU rendering to non-scanout buffers. Without that > patch, it is basically impossible to use the driver allocated GEM buffers on > OMAP3 for anything else but a basic CPU rendered examples as if we want GPU > rendering, we must allocate buffers as scanout buffers, which are CMA allocated. > CMA soon gets fragmented and we start seeing allocation failures. Such failres > in Xorg cannot be handeled gracefully, so the system is basically unusable. > > Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get > fragmented with time, leading to allocation failures. I think this is just hacking around the problem. The problem is that omapdrm is being used by some as a generic buffer allocator. Those users should be changed to use a their own allocator or a generic allocator. And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would be scanout buffers. Or do we have a regression in the driver? My understanding is that this has never really worked. Tomi
Hi Tomi, On 17.02.22 г. 15:03 ч., Tomi Valkeinen wrote: > Hi Ivaylo, > > On 19/01/2022 12:23, Ivaylo Dimitrov wrote: >> This patch series fixes excessive DMM or CMA usage of GEM buffers >> leading to >> various runtime allocation failures. The series enables daily usage of >> devices >> without exausting limited resources like CMA or DMM space if GPU >> rendering is >> needed. >> >> The first patch doesn't bring any functional changes, it just moves some >> TILER/DMM related code to a separate function, to simplify the review >> of the >> next two patches. >> >> The second patch allows off-CPU rendering to non-scanout buffers. >> Without that >> patch, it is basically impossible to use the driver allocated GEM >> buffers on >> OMAP3 for anything else but a basic CPU rendered examples as if we >> want GPU >> rendering, we must allocate buffers as scanout buffers, which are CMA >> allocated. >> CMA soon gets fragmented and we start seeing allocation failures. Such >> failres >> in Xorg cannot be handeled gracefully, so the system is basically >> unusable. >> >> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get >> fragmented with time, leading to allocation failures. > > I think this is just hacking around the problem. The problem is that > omapdrm is being used by some as a generic buffer allocator. Those users Well, the user of omap_bo interface I know is xf86-video-omap. Unless if by users you mean 'kernel users' which I know none. I think that if 'we' are to teach xorg omap DDX (or any other user in that regard) to use GPU driver allocator for non-scanout buffers and omapdrm for scanout, it will become a mess. Not impossible though, just way more complicated than the $series. Also, why do omapdrm allow allocation of non-linear buffers and CPU (userspace) access to them, but refuses to export them to kernel drivers? Isn't that the whole point of DMABUF stuff? This is not consistent to me. The series fixes that inconsistency, nothing more. > should be changed to use a their own allocator or a generic allocator. SGX driver/userspace has and uses its own allocator, however, I think there is more than that - what about TILER/VRFB? Do you say that SGX userspace shall be smart enough to requests TILER buffers from omapdrm when scanout buffer is requested and use its own allocator when not? Actually I was thinking about something like that, and it is achievable now we have: https://github.com/maemo-leste/sgx-ddk-um/blob/master/dbm/dbm.c (REed SGX 1.17 ddk gbm backend) > And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would be > scanout buffers. > And what about OMAP_BO_TILED_XX stuff? To me this is even more of a hack, but it is what it is. Do I get it correctly that you want to get rid of omap_bo_new/_tiled and have only dumb buffers available in omapdrm? TBH this would be great, however I still don't see how a TILER/VRFB buffer would be allocated, given that flags in drm_mode_create_dumb is not used anywhere in the kernel(AFAIK). Unless all scanout buffers are allocated through TILER/VRFB (which is a good idea IMO). > Or do we have a regression in the driver? My understanding is that this > has never really worked. > There are couple of patches in omapdrm that change around BO flags and their meaning so I think there is a regression, as the same userspace/DDX on linux 5.9 results in only 2 linear buffers being allocated, but as SGX driver has different version as well, I can't be 100% sure without going through a lengthy assessment of SGX driver/omapdrm code and patches since 5.9. Which I am not going to do as I don't see what the benefit will be. Please consider this patch series as a fix to an inconsistency, as it is merely that, it does not really bring any new functionality in terms of what is allocated. Thanks and regards, Ivo
On 17/02/2022 18:21, Ivaylo Dimitrov wrote: > Hi Tomi, > > On 17.02.22 г. 15:03 ч., Tomi Valkeinen wrote: >> Hi Ivaylo, >> >> On 19/01/2022 12:23, Ivaylo Dimitrov wrote: >>> This patch series fixes excessive DMM or CMA usage of GEM buffers >>> leading to >>> various runtime allocation failures. The series enables daily usage >>> of devices >>> without exausting limited resources like CMA or DMM space if GPU >>> rendering is >>> needed. >>> >>> The first patch doesn't bring any functional changes, it just moves some >>> TILER/DMM related code to a separate function, to simplify the review >>> of the >>> next two patches. >>> >>> The second patch allows off-CPU rendering to non-scanout buffers. >>> Without that >>> patch, it is basically impossible to use the driver allocated GEM >>> buffers on >>> OMAP3 for anything else but a basic CPU rendered examples as if we >>> want GPU >>> rendering, we must allocate buffers as scanout buffers, which are CMA >>> allocated. >>> CMA soon gets fragmented and we start seeing allocation failures. >>> Such failres >>> in Xorg cannot be handeled gracefully, so the system is basically >>> unusable. >>> >>> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get >>> fragmented with time, leading to allocation failures. >> >> I think this is just hacking around the problem. The problem is that >> omapdrm is being used by some as a generic buffer allocator. Those users > > Well, the user of omap_bo interface I know is xf86-video-omap. Unless if > by users you mean 'kernel users' which I know none. > > I think that if 'we' are to teach xorg omap DDX (or any other user in > that regard) to use GPU driver allocator for non-scanout buffers and > omapdrm for scanout, it will become a mess. Not impossible though, just > way more complicated than the $series. Also, why do omapdrm allow > allocation of non-linear buffers and CPU (userspace) access to them, but > refuses to export them to kernel drivers? Isn't that the whole point of > DMABUF stuff? This is not consistent to me. The series fixes that > inconsistency, nothing more. > >> should be changed to use a their own allocator or a generic allocator. > > SGX driver/userspace has and uses its own allocator, however, I think > there is more than that - what about TILER/VRFB? Do you say that SGX > userspace shall be smart enough to requests TILER buffers from omapdrm > when scanout buffer is requested and use its own allocator when not? All I'm saying is that omapdrm should not support allocating buffers that are not usable by the omapdrm hardware. It doesn't make any sense. > Actually I was thinking about something like that, and it is achievable > now we have: > > https://github.com/maemo-leste/sgx-ddk-um/blob/master/dbm/dbm.c (REed > SGX 1.17 ddk gbm backend) > >> And we could then drop the OMAP_BO_SCANOUT flag, as all buffers would >> be scanout buffers. >> > > And what about OMAP_BO_TILED_XX stuff? To me this is even more of a > hack, but it is what it is. Yes, I agree, I don't think those OMAP_BO_TILED_* values should be exposed to userspace. But I also agree to the "it is what it is" =). > Do I get it correctly that you want to get rid of omap_bo_new/_tiled and > have only dumb buffers available in omapdrm? TBH this would be great, > however I still don't see how a TILER/VRFB buffer would be allocated, > given that flags in drm_mode_create_dumb is not used anywhere in the > kernel(AFAIK). Unless all scanout buffers are allocated through > TILER/VRFB (which is a good idea IMO). We can't get rid of those as they're userspace API. >> Or do we have a regression in the driver? My understanding is that >> this has never really worked. >> > > There are couple of patches in omapdrm that change around BO flags and > their meaning so I think there is a regression, as the same > userspace/DDX on linux 5.9 results in only 2 linear buffers being > allocated, but as SGX driver has different version as well, I can't be > 100% sure without going through a lengthy assessment of SGX > driver/omapdrm code and patches since 5.9. Which I am not going to do as > I don't see what the benefit will be. > > Please consider this patch series as a fix to an inconsistency, as it is > merely that, it does not really bring any new functionality in terms of > what is allocated. I've considered, and I think I agree. The design of omapdrm + tiler is broken in my opinion, but it's there, it has userspace APIs, and it's all old code. It's probably not worth the effort to try to clean it up, while still somehow keeping the old userspace working. I've had these patchesin my work branch for a while and I haven't seen any issues. I'll keep them there for a bit longer and I'll look at these patches a bit more, but I think I'll merge them at some point. Tomi
Hi, On 19/01/2022 12:23, Ivaylo Dimitrov wrote: > This patch series fixes excessive DMM or CMA usage of GEM buffers leading to > various runtime allocation failures. The series enables daily usage of devices > without exausting limited resources like CMA or DMM space if GPU rendering is > needed. > > The first patch doesn't bring any functional changes, it just moves some > TILER/DMM related code to a separate function, to simplify the review of the > next two patches. > > The second patch allows off-CPU rendering to non-scanout buffers. Without that > patch, it is basically impossible to use the driver allocated GEM buffers on > OMAP3 for anything else but a basic CPU rendered examples as if we want GPU > rendering, we must allocate buffers as scanout buffers, which are CMA allocated. > CMA soon gets fragmented and we start seeing allocation failures. Such failres > in Xorg cannot be handeled gracefully, so the system is basically unusable. > > Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get > fragmented with time, leading to allocation failures. > > Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and > PVR EXA from https://github.com/maemo-leste/xf86-video-omap > > Ivaylo Dimitrov (3): > drm: omapdrm: simplify omap_gem_pin > drm: omapdrm: Support exporting of non-contiguous GEM BOs > drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER > > drivers/gpu/drm/omapdrm/omap_gem.c | 198 +++++++++++++++++------------- > drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- > drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- > 3 files changed, 116 insertions(+), 90 deletions(-) > I have pushed this to drm-misc-next. Tomi
Hi, On 28.03.22 г. 12:46 ч., Tomi Valkeinen wrote: > Hi, > > On 19/01/2022 12:23, Ivaylo Dimitrov wrote: >> This patch series fixes excessive DMM or CMA usage of GEM buffers >> leading to >> various runtime allocation failures. The series enables daily usage of >> devices >> without exausting limited resources like CMA or DMM space if GPU >> rendering is >> needed. >> >> The first patch doesn't bring any functional changes, it just moves some >> TILER/DMM related code to a separate function, to simplify the review >> of the >> next two patches. >> >> The second patch allows off-CPU rendering to non-scanout buffers. >> Without that >> patch, it is basically impossible to use the driver allocated GEM >> buffers on >> OMAP3 for anything else but a basic CPU rendered examples as if we >> want GPU >> rendering, we must allocate buffers as scanout buffers, which are CMA >> allocated. >> CMA soon gets fragmented and we start seeing allocation failures. Such >> failres >> in Xorg cannot be handeled gracefully, so the system is basically >> unusable. >> >> Third patch fixes similar issue on OMAP4/5, where DMM/TILER spaces get >> fragmented with time, leading to allocation failures. >> >> Series were tested on Motolola Droid4 and Nokia N900, with OMAP DDX and >> PVR EXA from https://github.com/maemo-leste/xf86-video-omap >> >> Ivaylo Dimitrov (3): >> drm: omapdrm: simplify omap_gem_pin >> drm: omapdrm: Support exporting of non-contiguous GEM BOs >> drm: omapdrm: Do no allocate non-scanout GEMs through DMM/TILER >> >> drivers/gpu/drm/omapdrm/omap_gem.c | 198 >> +++++++++++++++++------------- >> drivers/gpu/drm/omapdrm/omap_gem.h | 3 +- >> drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 5 +- >> 3 files changed, 116 insertions(+), 90 deletions(-) >> > > I have pushed this to drm-misc-next. > Great, next is VRFB and TE support for Droid4 panel, as soon as I find some spare time :) Ivo