Message ID | 20201007171238.1795964-20-mperttunen@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Host1x/TegraDRM UAPI | expand |
Hi Mikko, Thank you for the patch! Yet something to improve: [auto build test ERROR on tegra-drm/drm/tegra/for-next] [also build test ERROR on tegra/for-next linus/master v5.9-rc8 next-20201007] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403 base: git://anongit.freedesktop.org/tegra/linux.git drm/tegra/for-next config: arm64-randconfig-r004-20201008 (attached as .config) compiler: aarch64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/6a3b3d79ce4488695cc0745edd19015fc2220d97 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20201008-034403 git checkout 6a3b3d79ce4488695cc0745edd19015fc2220d97 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All error/warnings (new ones prefixed by >>): In file included from drivers/gpu/drm/tegra/uapi/uapi.c:12: >> drivers/gpu/drm/tegra/uapi/../drm.h:84:1: error: attempted to randomize userland API struct tegra_drm_client_ops 84 | }; | ^ >> drivers/gpu/drm/tegra/uapi/uapi.c:62:5: warning: no previous prototype for 'close_channel_ctx' [-Wmissing-prototypes] 62 | int close_channel_ctx(int id, void *p, void *data) | ^~~~~~~~~~~~~~~~~ -- In file included from drivers/gpu/drm/tegra/uapi/submit.c:18: >> drivers/gpu/drm/tegra/uapi/../drm.h:84:1: error: attempted to randomize userland API struct tegra_drm_client_ops 84 | }; | ^ vim +84 drivers/gpu/drm/tegra/uapi/../drm.h d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22 74 53fa7f7204c97d drivers/gpu/host1x/drm/drm.h Thierry Reding 2013-09-24 75 struct tegra_drm_client_ops { 53fa7f7204c97d drivers/gpu/host1x/drm/drm.h Thierry Reding 2013-09-24 76 int (*open_channel)(struct tegra_drm_client *client, c88c363072c6dc drivers/gpu/host1x/drm/drm.h Thierry Reding 2013-09-26 77 struct tegra_drm_context *context); c88c363072c6dc drivers/gpu/host1x/drm/drm.h Thierry Reding 2013-09-26 78 void (*close_channel)(struct tegra_drm_context *context); c40f0f1afcb1dc drivers/gpu/drm/tegra/drm.h Thierry Reding 2013-10-10 79 int (*is_addr_reg)(struct device *dev, u32 class, u32 offset); 0f563a4bf66e51 drivers/gpu/drm/tegra/drm.h Dmitry Osipenko 2017-06-15 80 int (*is_valid_class)(u32 class); c88c363072c6dc drivers/gpu/host1x/drm/drm.h Thierry Reding 2013-09-26 81 int (*submit)(struct tegra_drm_context *context, d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22 82 struct drm_tegra_submit *args, struct drm_device *drm, d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22 83 struct drm_file *file); d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22 @84 }; d43f81cbaf4353 drivers/gpu/host1x/drm/drm.h Terje Bergstrom 2013-03-22 85 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
07.10.2020 20:12, Mikko Perttunen пишет: > +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, > + struct drm_file *file) > +{ Hello, Mikko! Could you please tell what are the host1x clients that are going to be upstreamed and will need this IOCTL?
On 10/19/20 5:21 AM, Dmitry Osipenko wrote: > 07.10.2020 20:12, Mikko Perttunen пишет: >> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, >> + struct drm_file *file) >> +{ > > Hello, Mikko! > > Could you please tell what are the host1x clients that are going to be > upstreamed and will need this IOCTL? > Hi Dmitry! It is needed for any engine/job that wants to access memory, as this IOCTL must be used to map memory for the engine. So all of them. Downstream doesn't have an equivalent IOCTL because it (currently) does mapping at submit time, but that is suboptimal because - it requires doing relocations in the kernel which isn't required for T186+ - it's a big performance penalty, due to which the downstream kernel has the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may not immediately unmap it in case it's used later, so that the "mapping" later is faster. A feature which we'd preferably get rid of. - because of the above feature not being controlled by the user, it can cause variance in submit times. On the other hand, we cannot (at least always) do the mapping at allocation/import time, because - A single FD may have multiple channel_ctx's, and an allocation/import may need to be used in any subset of them - The import IOCTL is fixed and doesn't have the parameters we'd need to do this at import time - Overall it's more orthogonal to have GEM object acquirement in one step and mapping in another. Maybe that's not quite what you asked, but it's some background anyway :) Cheers, Mikko
19.10.2020 11:13, Mikko Perttunen пишет: > On 10/19/20 5:21 AM, Dmitry Osipenko wrote: >> 07.10.2020 20:12, Mikko Perttunen пишет: >>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, >>> + struct drm_file *file) >>> +{ >> >> Hello, Mikko! >> >> Could you please tell what are the host1x clients that are going to be >> upstreamed and will need this IOCTL? >> > > Hi Dmitry! > > It is needed for any engine/job that wants to access memory, as this > IOCTL must be used to map memory for the engine. So all of them. > > Downstream doesn't have an equivalent IOCTL because it (currently) does > mapping at submit time, but that is suboptimal because > > - it requires doing relocations in the kernel which isn't required for > T186+ > - it's a big performance penalty, due to which the downstream kernel has > the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may > not immediately unmap it in case it's used later, so that the "mapping" > later is faster. A feature which we'd preferably get rid of. > - because of the above feature not being controlled by the user, it can > cause variance in submit times. > > On the other hand, we cannot (at least always) do the mapping at > allocation/import time, because > > - A single FD may have multiple channel_ctx's, and an allocation/import > may need to be used in any subset of them > - The import IOCTL is fixed and doesn't have the parameters we'd need to > do this at import time > - Overall it's more orthogonal to have GEM object acquirement in one > step and mapping in another. > > Maybe that's not quite what you asked, but it's some background anyway :) I'm asking this question because right now there is only one potential client for this IOCTL, the VIC. If other clients aren't supposed to be a part of the DRM driver, like for example NVDEC which probably should be a V4L driver, then DRM driver will have only a single VIC and in this case we shouldn't need this IOCTL because DRM and V4L should use generic dmabuf API for importing and exporting buffers. I'm also not quite sure about whether the current model of the unified Tegra DRM driver is suitable for having the separated engines. Perhaps each separated engine should just have its own rendering node?
On 10/19/20 8:27 PM, Dmitry Osipenko wrote: > 19.10.2020 11:13, Mikko Perttunen пишет: >> On 10/19/20 5:21 AM, Dmitry Osipenko wrote: >>> 07.10.2020 20:12, Mikko Perttunen пишет: >>>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, >>>> + struct drm_file *file) >>>> +{ >>> >>> Hello, Mikko! >>> >>> Could you please tell what are the host1x clients that are going to be >>> upstreamed and will need this IOCTL? >>> >> >> Hi Dmitry! >> >> It is needed for any engine/job that wants to access memory, as this >> IOCTL must be used to map memory for the engine. So all of them. >> >> Downstream doesn't have an equivalent IOCTL because it (currently) does >> mapping at submit time, but that is suboptimal because >> >> - it requires doing relocations in the kernel which isn't required for >> T186+ >> - it's a big performance penalty, due to which the downstream kernel has >> the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may >> not immediately unmap it in case it's used later, so that the "mapping" >> later is faster. A feature which we'd preferably get rid of. >> - because of the above feature not being controlled by the user, it can >> cause variance in submit times. >> >> On the other hand, we cannot (at least always) do the mapping at >> allocation/import time, because >> >> - A single FD may have multiple channel_ctx's, and an allocation/import >> may need to be used in any subset of them >> - The import IOCTL is fixed and doesn't have the parameters we'd need to >> do this at import time >> - Overall it's more orthogonal to have GEM object acquirement in one >> step and mapping in another. >> >> Maybe that's not quite what you asked, but it's some background anyway :) > > I'm asking this question because right now there is only one potential > client for this IOCTL, the VIC. If other clients aren't supposed to be a > part of the DRM driver, like for example NVDEC which probably should be > a V4L driver, then DRM driver will have only a single VIC and in this > case we shouldn't need this IOCTL because DRM and V4L should use generic > dmabuf API for importing and exporting buffers. This IOCTL is required for GR2D/GR3D too, as they need to access memory as well. This is a different step from import/export -- first you import or allocate your memory so you have a GEM handle, then you map it to the channel, which does the IOMMU mapping (if there is an IOMMU). > > I'm also not quite sure about whether the current model of the unified > Tegra DRM driver is suitable for having the separated engines. Perhaps > each separated engine should just have its own rendering node? > Yeah, having separate nodes per engine might be better, e.g. so it's easier to do access control. I don't have a strong opinion on that though. Mikko
On Mon, Oct 19, 2020 at 7:27 PM Dmitry Osipenko <digetx@gmail.com> wrote: > > 19.10.2020 11:13, Mikko Perttunen пишет: > > On 10/19/20 5:21 AM, Dmitry Osipenko wrote: > >> 07.10.2020 20:12, Mikko Perttunen пишет: > >>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, > >>> + struct drm_file *file) > >>> +{ > >> > >> Hello, Mikko! > >> > >> Could you please tell what are the host1x clients that are going to be > >> upstreamed and will need this IOCTL? > >> > > > > Hi Dmitry! > > > > It is needed for any engine/job that wants to access memory, as this > > IOCTL must be used to map memory for the engine. So all of them. > > > > Downstream doesn't have an equivalent IOCTL because it (currently) does > > mapping at submit time, but that is suboptimal because > > > > - it requires doing relocations in the kernel which isn't required for > > T186+ > > - it's a big performance penalty, due to which the downstream kernel has > > the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may > > not immediately unmap it in case it's used later, so that the "mapping" > > later is faster. A feature which we'd preferably get rid of. > > - because of the above feature not being controlled by the user, it can > > cause variance in submit times. > > > > On the other hand, we cannot (at least always) do the mapping at > > allocation/import time, because > > > > - A single FD may have multiple channel_ctx's, and an allocation/import > > may need to be used in any subset of them > > - The import IOCTL is fixed and doesn't have the parameters we'd need to > > do this at import time > > - Overall it's more orthogonal to have GEM object acquirement in one > > step and mapping in another. > > > > Maybe that's not quite what you asked, but it's some background anyway :) > > I'm asking this question because right now there is only one potential > client for this IOCTL, the VIC. If other clients aren't supposed to be a > part of the DRM driver, like for example NVDEC which probably should be > a V4L driver, then DRM driver will have only a single VIC and in this > case we shouldn't need this IOCTL because DRM and V4L should use generic > dmabuf API for importing and exporting buffers. Randomly jumping in here ... So if you have a drm driver with userspace in mesa3d already, the usual approach is to have a libva implementation (ideally in mesa3d too, using the gallium framework so that a lot of the boring integration glue is taken care of already) directly on top of drm. No v4l driver needed at all here. And it sounds like this nvdec thing would fit that bill pretty neatly. > I'm also not quite sure about whether the current model of the unified > Tegra DRM driver is suitable for having the separated engines. Perhaps > each separated engine should just have its own rendering node? Above model using libva driver in userspace for nvdec would avoid this issue too. -Daniel
On 10/20/20 2:40 PM, Daniel Vetter wrote: > On Mon, Oct 19, 2020 at 7:27 PM Dmitry Osipenko <digetx@gmail.com> wrote: >> >> 19.10.2020 11:13, Mikko Perttunen пишет: >>> On 10/19/20 5:21 AM, Dmitry Osipenko wrote: >>>> 07.10.2020 20:12, Mikko Perttunen пишет: >>>>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, >>>>> + struct drm_file *file) >>>>> +{ >>>> >>>> Hello, Mikko! >>>> >>>> Could you please tell what are the host1x clients that are going to be >>>> upstreamed and will need this IOCTL? >>>> >>> >>> Hi Dmitry! >>> >>> It is needed for any engine/job that wants to access memory, as this >>> IOCTL must be used to map memory for the engine. So all of them. >>> >>> Downstream doesn't have an equivalent IOCTL because it (currently) does >>> mapping at submit time, but that is suboptimal because >>> >>> - it requires doing relocations in the kernel which isn't required for >>> T186+ >>> - it's a big performance penalty, due to which the downstream kernel has >>> the "deferred dma-buf unmapping" feature, where unmapping a dma_buf may >>> not immediately unmap it in case it's used later, so that the "mapping" >>> later is faster. A feature which we'd preferably get rid of. >>> - because of the above feature not being controlled by the user, it can >>> cause variance in submit times. >>> >>> On the other hand, we cannot (at least always) do the mapping at >>> allocation/import time, because >>> >>> - A single FD may have multiple channel_ctx's, and an allocation/import >>> may need to be used in any subset of them >>> - The import IOCTL is fixed and doesn't have the parameters we'd need to >>> do this at import time >>> - Overall it's more orthogonal to have GEM object acquirement in one >>> step and mapping in another. >>> >>> Maybe that's not quite what you asked, but it's some background anyway :) >> >> I'm asking this question because right now there is only one potential >> client for this IOCTL, the VIC. If other clients aren't supposed to be a >> part of the DRM driver, like for example NVDEC which probably should be >> a V4L driver, then DRM driver will have only a single VIC and in this >> case we shouldn't need this IOCTL because DRM and V4L should use generic >> dmabuf API for importing and exporting buffers. > > Randomly jumping in here ... > > So if you have a drm driver with userspace in mesa3d already, the > usual approach is to have a libva implementation (ideally in mesa3d > too, using the gallium framework so that a lot of the boring > integration glue is taken care of already) directly on top of drm. No > v4l driver needed at all here. > > And it sounds like this nvdec thing would fit that bill pretty neatly. Something like this would be my preference as well. Mikko > >> I'm also not quite sure about whether the current model of the unified >> Tegra DRM driver is suitable for having the separated engines. Perhaps >> each separated engine should just have its own rendering node? > > Above model using libva driver in userspace for nvdec would avoid this > issue too. > -Daniel >
20.10.2020 12:18, Mikko Perttunen пишет: >> I'm asking this question because right now there is only one potential >> client for this IOCTL, the VIC. If other clients aren't supposed to be a >> part of the DRM driver, like for example NVDEC which probably should be >> a V4L driver, then DRM driver will have only a single VIC and in this >> case we shouldn't need this IOCTL because DRM and V4L should use generic >> dmabuf API for importing and exporting buffers. > > This IOCTL is required for GR2D/GR3D too, as they need to access memory > as well. This is a different step from import/export -- first you import > or allocate your memory so you have a GEM handle, then you map it to the > channel, which does the IOMMU mapping (if there is an IOMMU). > This doesn't answer my question. I don't have a full picture and for now will remain dubious about this IOCTL, but it should be fine to have it in a form of WIP patches (maybe staging feature) until userspace code and hardware specs will become available. Some more comments: 1. Older Tegra SoCs do not have restrictions which do not allow to group IOMMU as wished by kernel driver. It's fine to have one static mapping per-GEM that can be accessed by all DRM devices, that's why CHANNEL_MAP is questionable. 2. IIUC, the mappings need to be done per-device group/stream and not per-channel_ctx. It looks like nothing stops channel contexts to guess mapping addresses of the other context, isn't it? I'm suggesting that each GEM should have a per-device mapping and the new IOCTL should create these GEM-mappings, instead of the channel_ctx mappings. 3. We shouldn't need channel contexts at all, a single "DRM file" context should be enough to have. 4. The new UAPI need to be separated into several parts in the next revision, one patch for each new feature. The first patches should be the ones that are relevant to the existing userspace code, like support for the waits. Partial mappings should be a separate feature because it's a questionable feature that needs to be proved by a real userspace first. Maybe it would be even better to drop it for the starter, to ease reviewing. Waiting for fences on host1x should be a separate feature. The syncfile support needs to be a separate feature as well because I don't see a use-case for it right now. I'd like to see the DRM_SCHED and syncobj support. I can help you with it if it's out of yours scope for now.
On 10/22/20 7:20 AM, Dmitry Osipenko wrote: > 20.10.2020 12:18, Mikko Perttunen пишет: >>> I'm asking this question because right now there is only one potential >>> client for this IOCTL, the VIC. If other clients aren't supposed to be a >>> part of the DRM driver, like for example NVDEC which probably should be >>> a V4L driver, then DRM driver will have only a single VIC and in this >>> case we shouldn't need this IOCTL because DRM and V4L should use generic >>> dmabuf API for importing and exporting buffers. >> >> This IOCTL is required for GR2D/GR3D too, as they need to access memory >> as well. This is a different step from import/export -- first you import >> or allocate your memory so you have a GEM handle, then you map it to the >> channel, which does the IOMMU mapping (if there is an IOMMU). >> > > This doesn't answer my question. I don't have a full picture and for now > will remain dubious about this IOCTL, but it should be fine to have it > in a form of WIP patches (maybe staging feature) until userspace code > and hardware specs will become available. > > Some more comments: > > 1. Older Tegra SoCs do not have restrictions which do not allow to group > IOMMU as wished by kernel driver. It's fine to have one static mapping > per-GEM that can be accessed by all DRM devices, that's why CHANNEL_MAP > is questionable. Sure, on older Tegras this is not strictly necessary because the firewall can check pointers. It's not related to IOMMU grouping. > > 2. IIUC, the mappings need to be done per-device group/stream and not > per-channel_ctx. It looks like nothing stops channel contexts to guess > mapping addresses of the other context, isn't it? > > I'm suggesting that each GEM should have a per-device mapping and the > new IOCTL should create these GEM-mappings, instead of the channel_ctx > mappings. In the absence of context isolation, this is correct. But with context isolation (which is next on my upstream todo-list), on supported chips (T186+), we can map to individual contexts, which are associated with channel_ctx's. Without context isolation, this IOCTL just maps to the underlying engine. The list of mappings can also be used by the firewall later - as mentioned, just the relocs would be enough for that, but I think there's a good orthogonality in CHANNEL_MAP describing memory regions accessible by the engine, and relocations just doing relocations. > > 3. We shouldn't need channel contexts at all, a single "DRM file" > context should be enough to have. Yeah, maybe we could just have one "inline" channel context in the DRM file, that's just initialized by the CHANNEL_OPEN IOCTL. > > 4. The new UAPI need to be separated into several parts in the next > revision, one patch for each new feature. I'll try to split up where possible. > > The first patches should be the ones that are relevant to the existing > userspace code, like support for the waits. Can you elaborate what you mean by this? > > Partial mappings should be a separate feature because it's a > questionable feature that needs to be proved by a real userspace first. > Maybe it would be even better to drop it for the starter, to ease reviewing. Considering that the "no-op" support for it (map the whole buffer but just keep track of the starting offset) is only a couple of lines, I'd like to keep it in. > > Waiting for fences on host1x should be a separate feature. OK. > > The syncfile support needs to be a separate feature as well because I > don't see a use-case for it right now. I'll separate it - the reason it's there is to avoid the overhead of the extra ID/threshold -> sync_file conversion IOCTL if you need it. > > I'd like to see the DRM_SCHED and syncobj support. I can help you with > it if it's out of yours scope for now. > I already wrote some code for syncobj I can probably pull in. Regarding DRM_SCHED, help is accepted. However, we should keep using the hardware scheduler, and considering it's a bigger piece of work, let's not block this series on it. Mikko
26.10.2020 12:11, Mikko Perttunen пишет: >> >> The first patches should be the ones that are relevant to the existing >> userspace code, like support for the waits. > > Can you elaborate what you mean by this? All features that don't have an immediate real use-case should be placed later in the series because we may defer merging of those patches until we will see userspace that uses those features since we can't really decide whether these are decisions that we won't regret later on, only practical application can confirm the correctness. >> Partial mappings should be a separate feature because it's a >> questionable feature that needs to be proved by a real userspace first. >> Maybe it would be even better to drop it for the starter, to ease >> reviewing. > > Considering that the "no-op" support for it (map the whole buffer but > just keep track of the starting offset) is only a couple of lines, I'd > like to keep it in. There is no tracking in the current code which prevents the duplicated mappings, will we need to care about it? This a bit too questionable feature for now, IMO. I'd like to see it as a separate patch. ... >> I'd like to see the DRM_SCHED and syncobj support. I can help you with >> it if it's out of yours scope for now. >> > > I already wrote some code for syncobj I can probably pull in. Regarding > DRM_SCHED, help is accepted. However, we should keep using the hardware > scheduler, and considering it's a bigger piece of work, let's not block > this series on it. I'd like to see all the custom IOCTLs to be deprecated and replaced with the generic DRM API wherever possible. Hence, I think it should be a mandatory features that we need to focus on. The current WIP userspace already uses and relies on DRM_SCHED.
On 10/27/20 9:06 PM, Dmitry Osipenko wrote: > 26.10.2020 12:11, Mikko Perttunen пишет: >>> >>> The first patches should be the ones that are relevant to the existing >>> userspace code, like support for the waits. >> >> Can you elaborate what you mean by this? > > All features that don't have an immediate real use-case should be placed > later in the series because we may defer merging of those patches until > we will see userspace that uses those features since we can't really > decide whether these are decisions that we won't regret later on, only > practical application can confirm the correctness. I was more referring to the "support for waits" part, should have clarified that. > >>> Partial mappings should be a separate feature because it's a >>> questionable feature that needs to be proved by a real userspace first. >>> Maybe it would be even better to drop it for the starter, to ease >>> reviewing. >> >> Considering that the "no-op" support for it (map the whole buffer but >> just keep track of the starting offset) is only a couple of lines, I'd >> like to keep it in. > > There is no tracking in the current code which prevents the duplicated > mappings, will we need to care about it? This a bit too questionable > feature for now, IMO. I'd like to see it as a separate patch. I don't think there is any need to special case duplicated mappings. I think this is a pretty obvious feature to have, but I can rename them to reserved0 and reserved1 and require that reserved0 is zero and reserved1 is the size of the passed GEM object. > > ... >>> I'd like to see the DRM_SCHED and syncobj support. I can help you with >>> it if it's out of yours scope for now. >>> >> >> I already wrote some code for syncobj I can probably pull in. Regarding >> DRM_SCHED, help is accepted. However, we should keep using the hardware >> scheduler, and considering it's a bigger piece of work, let's not block >> this series on it. > > I'd like to see all the custom IOCTLs to be deprecated and replaced with > the generic DRM API wherever possible. Hence, I think it should be a > mandatory features that we need to focus on. The current WIP userspace > already uses and relies on DRM_SCHED. > From my point of view, the ABI needs to be designed such that it can replace what we have downstream, i.e. it needs to support the usecases the downstream nvhost ABI supports currently. Otherwise there is no migration path to upstream and it's not worth it for me to work on this. Although, I don't see what this ABI is missing that your userspace would rely on. Does it submit jobs in reverse order that would deadlock if drm_sched didn't reorder them based on prefences, or something? Software-based scheduling might make sense in situations where the channel is shared by all processes, but at least for modern chips that are designed to use hardware scheduling, I want to use that capability. Mikko
28.10.2020 12:54, Mikko Perttunen пишет: > On 10/27/20 9:06 PM, Dmitry Osipenko wrote: >> 26.10.2020 12:11, Mikko Perttunen пишет: >>>> >>>> The first patches should be the ones that are relevant to the existing >>>> userspace code, like support for the waits. >>> >>> Can you elaborate what you mean by this? >> >> All features that don't have an immediate real use-case should be placed >> later in the series because we may defer merging of those patches until >> we will see userspace that uses those features since we can't really >> decide whether these are decisions that we won't regret later on, only >> practical application can confirm the correctness. > > I was more referring to the "support for waits" part, should have > clarified that. The "support for waits" is support for the WAIT_SYNCPT command exposed to userspace, which we could utilize right now. >>>> Partial mappings should be a separate feature because it's a >>>> questionable feature that needs to be proved by a real userspace first. >>>> Maybe it would be even better to drop it for the starter, to ease >>>> reviewing. >>> >>> Considering that the "no-op" support for it (map the whole buffer but >>> just keep track of the starting offset) is only a couple of lines, I'd >>> like to keep it in. >> >> There is no tracking in the current code which prevents the duplicated >> mappings, will we need to care about it? This a bit too questionable >> feature for now, IMO. I'd like to see it as a separate patch. > > I don't think there is any need to special case duplicated mappings. I > think this is a pretty obvious feature to have, but I can rename them to > reserved0 and reserved1 and require that reserved0 is zero and reserved1 > is the size of the passed GEM object. I'm now concerned about the reserved fields after seeing this reply from Daniel Vetter: https://www.mail-archive.com/nouveau@lists.freedesktop.org/msg36324.html If DRM IOCTL structs are zero-extended, then perhaps we won't need to reserve anything? >> ... >>>> I'd like to see the DRM_SCHED and syncobj support. I can help you with >>>> it if it's out of yours scope for now. >>>> >>> >>> I already wrote some code for syncobj I can probably pull in. Regarding >>> DRM_SCHED, help is accepted. However, we should keep using the hardware >>> scheduler, and considering it's a bigger piece of work, let's not block >>> this series on it. >> >> I'd like to see all the custom IOCTLs to be deprecated and replaced with >> the generic DRM API wherever possible. Hence, I think it should be a >> mandatory features that we need to focus on. The current WIP userspace >> already uses and relies on DRM_SCHED. >> > > From my point of view, the ABI needs to be designed such that it can > replace what we have downstream, i.e. it needs to support the usecases > the downstream nvhost ABI supports currently. Otherwise there is no > migration path to upstream and it's not worth it for me to work on this. The downstream needs should be irrelevant for the upstream, please read this: https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements It may happen that some of the downstream features could become useful for upstream, but we don't know until we will see the full userspace code. We don't have a comprehensive userspace which could utilize all the new features and that's why upstream driver has been stagnated for many years now. The grate-drivers would greatly benefit from the updated ABI, but I think that we need at least a usable mesa driver first, that's why I haven't bothered to upstream anything from the WIP UAPI v2. In order to upstream new UAPI features we will need: 1. Hardware specs (from vendor or reverse-engineered). 2. Regression tests. 3. A non-toy opensource userspace. > Although, I don't see what this ABI is missing that your userspace would > rely on. Does it submit jobs in reverse order that would deadlock if > drm_sched didn't reorder them based on prefences, or something? It's the opposite, we don't have userspace which needs majority of the proposed ABI. This needs to be fixed before we could seriously consider merging the new features. I'm pretty sure that you was already aware about all the upstreaming requirements and we will see the usable opensource userspace at some point, correct? For now it will be good to have this series in a form of a work-in-progress patches, continuing to discuss and update it. Meanwhile we will need to work on the userspace part as well, which is a much bigger blocker. > Software-based scheduling might make sense in situations where the > channel is shared by all processes, but at least for modern chips that > are designed to use hardware scheduling, I want to use that capability. The software-based scheduling is indeed needed for the older SoCs in order not to block hardware channels and job-submission code paths. Maybe it could become useful for a newer SoCs as well once we will get closer to a usable userspace :) It will be great to have the hardware-based scheduling supported, but I assume that it needs to be done on top of DRM_SCHED. This should allow us to remove all the buggy host1x's pushbuffer locking code (which is known to deadlock) and support a non-host1x fences for free.
On 10/31/20 1:13 AM, Dmitry Osipenko wrote: > 28.10.2020 12:54, Mikko Perttunen пишет: >> On 10/27/20 9:06 PM, Dmitry Osipenko wrote: >>> 26.10.2020 12:11, Mikko Perttunen пишет: >>>>> >>>>> The first patches should be the ones that are relevant to the existing >>>>> userspace code, like support for the waits. >>>> >>>> Can you elaborate what you mean by this? >>> >>> All features that don't have an immediate real use-case should be placed >>> later in the series because we may defer merging of those patches until >>> we will see userspace that uses those features since we can't really >>> decide whether these are decisions that we won't regret later on, only >>> practical application can confirm the correctness. >> >> I was more referring to the "support for waits" part, should have >> clarified that. > > The "support for waits" is support for the WAIT_SYNCPT command exposed > to userspace, which we could utilize right now. > >>>>> Partial mappings should be a separate feature because it's a >>>>> questionable feature that needs to be proved by a real userspace first. >>>>> Maybe it would be even better to drop it for the starter, to ease >>>>> reviewing. >>>> >>>> Considering that the "no-op" support for it (map the whole buffer but >>>> just keep track of the starting offset) is only a couple of lines, I'd >>>> like to keep it in. >>> >>> There is no tracking in the current code which prevents the duplicated >>> mappings, will we need to care about it? This a bit too questionable >>> feature for now, IMO. I'd like to see it as a separate patch. >> >> I don't think there is any need to special case duplicated mappings. I >> think this is a pretty obvious feature to have, but I can rename them to >> reserved0 and reserved1 and require that reserved0 is zero and reserved1 >> is the size of the passed GEM object. > > I'm now concerned about the reserved fields after seeing this reply from > Daniel Vetter: > > https://www.mail-archive.com/nouveau@lists.freedesktop.org/msg36324.html > > If DRM IOCTL structs are zero-extended, then perhaps we won't need to > reserve anything? I guess for the channel_map we can drop the offset/length, I just think it's fairly obvious that an IOMMU mapping API lets you specify from where and how much you want to map. Sure, it's not a functionality blocker as it can simply be implemented in userspace by shifting the reloc offset / IOVA equivalently, but it will reduce IO address space usage and prevent access to memory that was not intended to be mapped to the engine. The latter becomes a major PITA if you need to create safety documentation at this level -- don't know if this is relevant on Linux or not.. > >>> ... >>>>> I'd like to see the DRM_SCHED and syncobj support. I can help you with >>>>> it if it's out of yours scope for now. >>>>> >>>> >>>> I already wrote some code for syncobj I can probably pull in. Regarding >>>> DRM_SCHED, help is accepted. However, we should keep using the hardware >>>> scheduler, and considering it's a bigger piece of work, let's not block >>>> this series on it. >>> >>> I'd like to see all the custom IOCTLs to be deprecated and replaced with >>> the generic DRM API wherever possible. Hence, I think it should be a >>> mandatory features that we need to focus on. The current WIP userspace >>> already uses and relies on DRM_SCHED. >>> >> >> From my point of view, the ABI needs to be designed such that it can >> replace what we have downstream, i.e. it needs to support the usecases >> the downstream nvhost ABI supports currently. Otherwise there is no >> migration path to upstream and it's not worth it for me to work on this. > > The downstream needs should be irrelevant for the upstream, please read > this: > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements > > It may happen that some of the downstream features could become useful > for upstream, but we don't know until we will see the full userspace code. > > We don't have a comprehensive userspace which could utilize all the new > features and that's why upstream driver has been stagnated for many > years now. The grate-drivers would greatly benefit from the updated ABI, > but I think that we need at least a usable mesa driver first, that's why > I haven't bothered to upstream anything from the WIP UAPI v2. > > In order to upstream new UAPI features we will need: > > 1. Hardware specs (from vendor or reverse-engineered). > 2. Regression tests. > 3. A non-toy opensource userspace. > >> Although, I don't see what this ABI is missing that your userspace would >> rely on. Does it submit jobs in reverse order that would deadlock if >> drm_sched didn't reorder them based on prefences, or something? > > It's the opposite, we don't have userspace which needs majority of the > proposed ABI. This needs to be fixed before we could seriously consider > merging the new features. > > I'm pretty sure that you was already aware about all the upstreaming > requirements and we will see the usable opensource userspace at some > point, correct? I am well aware of that. I'm not saying that we should copy the downstream stack. I am saying that when designing an ABI, we should consider all information available on what kind of features would be required from it. Going through the proposed TegraDRM UAPI, there are some features that would probably not be immediately used by userspace, or supported in a non-NOOP fashion by the kernel: * Map offset/length * IOVA of mapping * Creation of sync_file postfence * Waiting for sync_file prefences * SUBMIT_CONTEXT_SETUP flag * Having two syncpt_incrs * Reservations? I suppose we can remove all of that for now, as long as we ensure that there is a path to add them back. I'm just a bit concerned that we'll end up with 10 different ABI revisions and userspace will have to do a version detection dance and enable things depending on driver version. Anything else to remove? Regarding things like explicit channel_map, sure, we could map things implicitly at submit time, but it is a huge performance improvement on newer chips, at least. So technically userspace doesn't need it, but going by that argument, we can get rid of a lot of kernel functionality -- after all, it's only needed if you want your hardware to perform well. > > For now it will be good to have this series in a form of a > work-in-progress patches, continuing to discuss and update it. Meanwhile > we will need to work on the userspace part as well, which is a much > bigger blocker. I'm hoping that porting the userspace won't take that long. It shouldn't be that big of a hurdle. > >> Software-based scheduling might make sense in situations where the >> channel is shared by all processes, but at least for modern chips that >> are designed to use hardware scheduling, I want to use that capability. > > The software-based scheduling is indeed needed for the older SoCs in > order not to block hardware channels and job-submission code paths. > Maybe it could become useful for a newer SoCs as well once we will get > closer to a usable userspace :) Considering that many products were successfully shipped without software-based scheduling, I wouldn't consider it "needed". > > It will be great to have the hardware-based scheduling supported, but I > assume that it needs to be done on top of DRM_SCHED. This should allow > us to remove all the buggy host1x's pushbuffer locking code (which is > known to deadlock) and support a non-host1x fences for free. > If it is known to deadlock, we should fix it. In any case, which kind of scheduler is used shouldn't affect the ABI, and we already have a functional implemention in the Host1x driver, so we should merge the ABI first rather than wait for another year while the rest of the driver is redesigned and rewritten. Mikko
09.11.2020 17:53, Mikko Perttunen пишет: ... > I guess for the channel_map we can drop the offset/length, I just think > it's fairly obvious that an IOMMU mapping API lets you specify from > where and how much you want to map. Sure, it's not a functionality > blocker as it can simply be implemented in userspace by shifting the > reloc offset / IOVA equivalently, but it will reduce IO address space > usage and prevent access to memory that was not intended to be mapped to > the engine. It could be a good feature, but I'd want to see how userspace will benefit from using it in practice before putting effort into it. Could you please give examples of how this feature will be useful for userspace? What is the use-case for allocating a single buffer and then mapping it partially? Is this needed for a userspace memory pool or something else? ... > I am well aware of that. I'm not saying that we should copy the > downstream stack. I am saying that when designing an ABI, we should > consider all information available on what kind of features would be > required from it. > > Going through the proposed TegraDRM UAPI, there are some features that > would probably not be immediately used by userspace, or supported in a > non-NOOP fashion by the kernel: > * Map offset/length > * IOVA of mapping > * Creation of sync_file postfence > * Waiting for sync_file prefences > * SUBMIT_CONTEXT_SETUP flag > * Having two syncpt_incrs > * Reservations? > > I suppose we can remove all of that for now, as long as we ensure that > there is a path to add them back. I'm just a bit concerned that we'll > end up with 10 different ABI revisions and userspace will have to do a > version detection dance and enable things depending on driver version. > > Anything else to remove? I guess it should be enough for now. Reservations are needed for the grate drivers, but they need to be done in conjunction with the DRM scheduler. I guess it should be fine if you'll remove reservations, I'll then take a look at how to integrate them and drm-sched on top of yours changes. > Regarding things like explicit channel_map, sure, we could map things > implicitly at submit time, but it is a huge performance improvement on > newer chips, at least. So technically userspace doesn't need it, but > going by that argument, we can get rid of a lot of kernel functionality > -- after all, it's only needed if you want your hardware to perform well. I have no doubt that it's better to have static mappings, but it's not obvious how it should be implemented without seeing a full picture. I mean that maybe it could be possible to avoid having these special IOCTLs by changing the way of how hardware is exposed to userspace such that generic UAPI could be used in order to achieve the same goals. ... > If it is known to deadlock, we should fix it. In any case, which kind of > scheduler is used shouldn't affect the ABI, and we already have a > functional implemention in the Host1x driver, so we should merge the ABI > first rather than wait for another year while the rest of the driver is > redesigned and rewritten. Perhaps should be fine to start extending the ABI, but then it should stay under DRM_TEGRA_STAGING until we will be confident that it's all good.
diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile index d6cf202414f0..059322e88943 100644 --- a/drivers/gpu/drm/tegra/Makefile +++ b/drivers/gpu/drm/tegra/Makefile @@ -3,6 +3,9 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG tegra-drm-y := \ drm.o \ + uapi/uapi.o \ + uapi/submit.o \ + uapi/gather_bo.o \ gem.o \ fb.o \ dp.o \ diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 7124b0b0154b..88226dd0fd88 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -20,24 +20,20 @@ #include <drm/drm_prime.h> #include <drm/drm_vblank.h> +#include "uapi.h" #include "drm.h" #include "gem.h" #define DRIVER_NAME "tegra" #define DRIVER_DESC "NVIDIA Tegra graphics" #define DRIVER_DATE "20120330" -#define DRIVER_MAJOR 0 +#define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 #define CARVEOUT_SZ SZ_64M #define CDMA_GATHER_FETCHES_MAX_NB 16383 -struct tegra_drm_file { - struct idr contexts; - struct mutex lock; -}; - static int tegra_atomic_check(struct drm_device *drm, struct drm_atomic_state *state) { @@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp) if (!fpriv) return -ENOMEM; - idr_init(&fpriv->contexts); + idr_init(&fpriv->legacy_contexts); + xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1); mutex_init(&fpriv->lock); filp->driver_priv = fpriv; @@ -432,7 +429,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv, if (err < 0) return err; - err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL); + err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL); if (err < 0) { client->ops->close_channel(context); return err; @@ -487,13 +484,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data, mutex_lock(&fpriv->lock); - context = idr_find(&fpriv->contexts, args->context); + context = idr_find(&fpriv->legacy_contexts, args->context); if (!context) { err = -EINVAL; goto unlock; } - idr_remove(&fpriv->contexts, context->id); + idr_remove(&fpriv->legacy_contexts, context->id); tegra_drm_context_free(context); unlock: @@ -512,7 +509,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data, mutex_lock(&fpriv->lock); - context = idr_find(&fpriv->contexts, args->context); + context = idr_find(&fpriv->legacy_contexts, args->context); if (!context) { err = -ENODEV; goto unlock; @@ -541,7 +538,7 @@ static int tegra_submit(struct drm_device *drm, void *data, mutex_lock(&fpriv->lock); - context = idr_find(&fpriv->contexts, args->context); + context = idr_find(&fpriv->legacy_contexts, args->context); if (!context) { err = -ENODEV; goto unlock; @@ -566,7 +563,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data, mutex_lock(&fpriv->lock); - context = idr_find(&fpriv->contexts, args->context); + context = idr_find(&fpriv->legacy_contexts, args->context); if (!context) { err = -ENODEV; goto unlock; @@ -734,11 +731,23 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data, #endif static const struct drm_ioctl_desc tegra_drm_ioctls[] = { -#ifdef CONFIG_DRM_TEGRA_STAGING - DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create, + DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map, DRM_RENDER_ALLOW), - DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap, + DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap, + DRM_RENDER_ALLOW), +#ifdef CONFIG_DRM_TEGRA_STAGING + DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr, @@ -792,10 +801,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file) struct tegra_drm_file *fpriv = file->driver_priv; mutex_lock(&fpriv->lock); - idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL); + idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL); + tegra_drm_uapi_close_file(fpriv); mutex_unlock(&fpriv->lock); - idr_destroy(&fpriv->contexts); + idr_destroy(&fpriv->legacy_contexts); mutex_destroy(&fpriv->lock); kfree(fpriv); } diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index 3fc42fd97911..42f8a301555c 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -58,6 +58,11 @@ struct tegra_drm { struct tegra_display_hub *hub; }; +static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra) +{ + return dev_get_drvdata(tegra->drm->dev->parent); +} + struct tegra_drm_client; struct tegra_drm_context { diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h new file mode 100644 index 000000000000..5c422607e8fa --- /dev/null +++ b/drivers/gpu/drm/tegra/uapi.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2020 NVIDIA Corporation */ + +#ifndef _TEGRA_DRM_UAPI_H +#define _TEGRA_DRM_UAPI_H + +#include <linux/dma-mapping.h> +#include <linux/idr.h> +#include <linux/kref.h> +#include <linux/xarray.h> + +#include <drm/drm.h> + +struct drm_file; +struct drm_device; + +struct tegra_drm_file { + /* Legacy UAPI state */ + struct idr legacy_contexts; + struct mutex lock; + + /* New UAPI state */ + struct xarray contexts; +}; + +struct tegra_drm_channel_ctx { + struct tegra_drm_client *client; + struct host1x_channel *channel; + struct xarray mappings; +}; + +struct tegra_drm_mapping { + struct kref ref; + + struct device *dev; + struct host1x_bo *bo; + struct sg_table *sgt; + enum dma_data_direction direction; + dma_addr_t iova; + dma_addr_t iova_end; +}; + +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data, + struct drm_file *file); +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data, + struct drm_file *file); +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, + struct drm_file *file); +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data, + struct drm_file *file); +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data, + struct drm_file *file); +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data, + struct drm_file *file); +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data, + struct drm_file *file); + +void tegra_drm_uapi_close_file(struct tegra_drm_file *file); +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping); +struct tegra_drm_channel_ctx * +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id); + +#endif diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c new file mode 100644 index 000000000000..b487a0d44648 --- /dev/null +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c @@ -0,0 +1,86 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2020 NVIDIA Corporation */ + +#include <linux/scatterlist.h> +#include <linux/slab.h> + +#include "gather_bo.h" + +static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo) +{ + struct gather_bo *bo = container_of(host_bo, struct gather_bo, base); + + kref_get(&bo->ref); + + return host_bo; +} + +static void gather_bo_release(struct kref *ref) +{ + struct gather_bo *bo = container_of(ref, struct gather_bo, ref); + + kfree(bo->gather_data); + kfree(bo); +} + +void gather_bo_put(struct host1x_bo *host_bo) +{ + struct gather_bo *bo = container_of(host_bo, struct gather_bo, base); + + kref_put(&bo->ref, gather_bo_release); +} + +static struct sg_table * +gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys) +{ + struct gather_bo *bo = container_of(host_bo, struct gather_bo, base); + struct sg_table *sgt; + int err; + + if (phys) { + *phys = virt_to_phys(bo->gather_data); + return NULL; + } + + sgt = kzalloc(sizeof(*sgt), GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + err = sg_alloc_table(sgt, 1, GFP_KERNEL); + if (err) { + kfree(sgt); + return ERR_PTR(err); + } + + sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4); + + return sgt; +} + +static void gather_bo_unpin(struct device *dev, struct sg_table *sgt) +{ + if (sgt) { + sg_free_table(sgt); + kfree(sgt); + } +} + +static void *gather_bo_mmap(struct host1x_bo *host_bo) +{ + struct gather_bo *bo = container_of(host_bo, struct gather_bo, base); + + return bo->gather_data; +} + +static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr) +{ +} + +const struct host1x_bo_ops gather_bo_ops = { + .get = gather_bo_get, + .put = gather_bo_put, + .pin = gather_bo_pin, + .unpin = gather_bo_unpin, + .mmap = gather_bo_mmap, + .munmap = gather_bo_munmap, +}; diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h new file mode 100644 index 000000000000..6b4c9d83ac91 --- /dev/null +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2020 NVIDIA Corporation */ + +#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H +#define _TEGRA_DRM_SUBMIT_GATHER_BO_H + +#include <linux/host1x.h> +#include <linux/kref.h> + +struct gather_bo { + struct host1x_bo base; + + struct kref ref; + + u32 *gather_data; + size_t gather_data_words; +}; + +extern const struct host1x_bo_ops gather_bo_ops; +void gather_bo_put(struct host1x_bo *host_bo); + +#endif diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c new file mode 100644 index 000000000000..95141f1516e5 --- /dev/null +++ b/drivers/gpu/drm/tegra/uapi/submit.c @@ -0,0 +1,675 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2020 NVIDIA Corporation */ + +#include <linux/dma-fence-array.h> +#include <linux/file.h> +#include <linux/host1x.h> +#include <linux/iommu.h> +#include <linux/kref.h> +#include <linux/list.h> +#include <linux/nospec.h> +#include <linux/pm_runtime.h> +#include <linux/sync_file.h> + +#include <drm/drm_drv.h> +#include <drm/drm_file.h> + +#include "../uapi.h" +#include "../drm.h" +#include "../gem.h" + +#include "gather_bo.h" +#include "submit.h" + +static struct tegra_drm_mapping * +tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id) +{ + struct tegra_drm_mapping *mapping; + + xa_lock(&ctx->mappings); + mapping = xa_load(&ctx->mappings, id); + if (mapping) + kref_get(&mapping->ref); + xa_unlock(&ctx->mappings); + + return mapping; +} + +static void *alloc_copy_user_array(void __user *from, size_t count, size_t size) +{ + unsigned long copy_err; + size_t copy_len; + void *data; + + if (check_mul_overflow(count, size, ©_len)) + return ERR_PTR(-EINVAL); + + data = kvmalloc(copy_len, GFP_KERNEL); + if (!data) + return ERR_PTR(-ENOMEM); + + copy_err = copy_from_user(data, from, copy_len); + if (copy_err) { + kvfree(data); + return ERR_PTR(-EFAULT); + } + + return data; +} + +static int submit_copy_gather_data(struct drm_device *drm, + struct gather_bo **pbo, + struct drm_tegra_channel_submit *args) +{ + unsigned long copy_err; + struct gather_bo *bo; + size_t copy_len; + + if (args->gather_data_words == 0) { + drm_info(drm, "gather_data_words can't be 0"); + return -EINVAL; + } + + if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, ©_len)) + return -EINVAL; + + bo = kzalloc(sizeof(*bo), GFP_KERNEL); + if (!bo) + return -ENOMEM; + + kref_init(&bo->ref); + host1x_bo_init(&bo->base, &gather_bo_ops); + + bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN); + if (!bo->gather_data) { + kfree(bo); + return -ENOMEM; + } + + copy_err = copy_from_user(bo->gather_data, + u64_to_user_ptr(args->gather_data_ptr), + copy_len); + if (copy_err) { + kfree(bo->gather_data); + kfree(bo); + return -EFAULT; + } + + bo->gather_data_words = args->gather_data_words; + + *pbo = bo; + + return 0; +} + +static int submit_write_reloc(struct gather_bo *bo, + struct drm_tegra_submit_buf *buf, + struct tegra_drm_mapping *mapping) +{ + /* TODO check that target_offset is within bounds */ + dma_addr_t iova = mapping->iova + buf->reloc.target_offset; + u32 written_ptr = (u32)(iova >> buf->reloc.shift); + + if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR) + written_ptr |= BIT(39); + + if (buf->reloc.gather_offset_words >= bo->gather_data_words) + return -EINVAL; + + buf->reloc.gather_offset_words = array_index_nospec( + buf->reloc.gather_offset_words, bo->gather_data_words); + + bo->gather_data[buf->reloc.gather_offset_words] = written_ptr; + + return 0; +} + +static void submit_unlock_resv(struct tegra_drm_submit_data *job_data, + struct ww_acquire_ctx *acquire_ctx) +{ + u32 i; + + for (i = 0; i < job_data->num_used_mappings; i++) { + struct tegra_bo *bo = host1x_to_tegra_bo( + job_data->used_mappings[i].mapping->bo); + + dma_resv_unlock(bo->gem.resv); + } + + ww_acquire_fini(acquire_ctx); +} + +static int submit_handle_resv(struct tegra_drm_submit_data *job_data, + struct ww_acquire_ctx *acquire_ctx, + struct xarray *implicit_fences) +{ + struct tegra_drm_used_mapping *mappings = job_data->used_mappings; + struct tegra_bo *bo; + int contended = -1; + int err; + u32 i; + + /* Based on drm_gem_lock_reservations */ + + ww_acquire_init(acquire_ctx, &reservation_ww_class); + +retry: + if (contended != -1) { + bo = host1x_to_tegra_bo(mappings[contended].mapping->bo); + + err = dma_resv_lock_slow_interruptible(bo->gem.resv, acquire_ctx); + if (err) { + ww_acquire_done(acquire_ctx); + return err; + } + } + + for (i = 0; i < job_data->num_used_mappings; i++) { + bo = host1x_to_tegra_bo(mappings[i].mapping->bo); + + if (i == contended) + continue; + + err = dma_resv_lock_interruptible(bo->gem.resv, acquire_ctx); + if (err) { + int j; + + for (j = 0; j < i; j++) { + bo = host1x_to_tegra_bo(mappings[j].mapping->bo); + dma_resv_unlock(bo->gem.resv); + } + + if (contended != -1 && contended >= i) { + bo = host1x_to_tegra_bo(mappings[contended].mapping->bo); + dma_resv_unlock(bo->gem.resv); + } + + if (err == -EDEADLK) { + contended = i; + goto retry; + } + + ww_acquire_done(acquire_ctx); + return err; + } + } + + ww_acquire_done(acquire_ctx); + + for (i = 0; i < job_data->num_used_mappings; i++) { + bo = host1x_to_tegra_bo(mappings[i].mapping->bo); + + if (mappings[i].flags & DRM_TEGRA_SUBMIT_BUF_RESV_WRITE) { + err = drm_gem_fence_array_add_implicit(implicit_fences, + &bo->gem, true); + if (err < 0) + goto unlock_resv; + } else if (mappings[i].flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ) { + err = dma_resv_reserve_shared(bo->gem.resv, 1); + if (err < 0) + goto unlock_resv; + + err = drm_gem_fence_array_add_implicit(implicit_fences, + &bo->gem, false); + if (err < 0) + goto unlock_resv; + } + } + + return 0; + +unlock_resv: + submit_unlock_resv(job_data, acquire_ctx); + + return err; +} + +static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo, + struct tegra_drm_submit_data *job_data, + struct tegra_drm_channel_ctx *ctx, + struct drm_tegra_channel_submit *args, + struct ww_acquire_ctx *acquire_ctx, + bool *need_implicit_fences) +{ + struct tegra_drm_used_mapping *mappings; + struct drm_tegra_submit_buf *bufs; + int err; + u32 i; + + bufs = alloc_copy_user_array(u64_to_user_ptr(args->bufs_ptr), + args->num_bufs, sizeof(*bufs)); + if (IS_ERR(bufs)) + return PTR_ERR(bufs); + + mappings = kcalloc(args->num_bufs, sizeof(*mappings), GFP_KERNEL); + if (!mappings) { + err = -ENOMEM; + goto done; + } + + *need_implicit_fences = false; + + for (i = 0; i < args->num_bufs; i++) { + struct drm_tegra_submit_buf *buf = &bufs[i]; + struct tegra_drm_mapping *mapping; + + if (buf->flags & ~(DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR | + DRM_TEGRA_SUBMIT_BUF_RESV_READ | + DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)) { + err = -EINVAL; + goto drop_refs; + } + + if (buf->reserved[0] || buf->reserved[1]) { + err = -EINVAL; + goto drop_refs; + } + + mapping = tegra_drm_mapping_get(ctx, buf->mapping_id); + if (!mapping) { + drm_info(drm, "invalid mapping_id for buf: %u", + buf->mapping_id); + err = -EINVAL; + goto drop_refs; + } + + err = submit_write_reloc(bo, buf, mapping); + if (err) { + tegra_drm_mapping_put(mapping); + goto drop_refs; + } + + mappings[i].mapping = mapping; + mappings[i].flags = buf->flags; + + if (buf->flags & (DRM_TEGRA_SUBMIT_BUF_RESV_READ | + DRM_TEGRA_SUBMIT_BUF_RESV_WRITE)) + *need_implicit_fences = true; + } + + job_data->used_mappings = mappings; + job_data->num_used_mappings = i; + + err = 0; + + goto done; + +drop_refs: + for (;;) { + if (i-- == 0) + break; + + tegra_drm_mapping_put(mappings[i].mapping); + } + + kfree(mappings); + job_data->used_mappings = NULL; + +done: + kvfree(bufs); + + return err; +} + +static int submit_get_syncpt(struct drm_device *drm, struct host1x_job *job, + struct drm_tegra_channel_submit *args) +{ + struct drm_tegra_submit_syncpt_incr *incr; + struct host1x_syncpt *sp; + + if (args->syncpt_incrs[1].num_incrs != 0) { + drm_info(drm, "Only 1 syncpoint supported for now"); + return -EINVAL; + } + + incr = &args->syncpt_incrs[0]; + + if ((incr->flags & ~DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) || + incr->reserved[0] || incr->reserved[1] || incr->reserved[2]) + return -EINVAL; + + /* Syncpt ref will be dropped on job release */ + sp = host1x_syncpt_fd_get(incr->syncpt_fd); + if (IS_ERR(sp)) + return PTR_ERR(sp); + + job->syncpt = sp; + job->syncpt_incrs = incr->num_incrs; + + return 0; +} + +static int submit_job_add_gather(struct host1x_job *job, + struct tegra_drm_channel_ctx *ctx, + struct drm_tegra_submit_cmd_gather_uptr *cmd, + struct gather_bo *bo, u32 *offset, + struct tegra_drm_submit_data *job_data) +{ + u32 next_offset; + + if (cmd->reserved[0] || cmd->reserved[1] || cmd->reserved[2]) + return -EINVAL; + + /* Check for maximum gather size */ + if (cmd->words > 16383) + return -EINVAL; + + if (check_add_overflow(*offset, cmd->words, &next_offset)) + return -EINVAL; + + if (next_offset > bo->gather_data_words) + return -EINVAL; + + host1x_job_add_gather(job, &bo->base, cmd->words, *offset * 4); + + *offset = next_offset; + + return 0; +} + +static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob, + struct gather_bo *bo, + struct tegra_drm_channel_ctx *ctx, + struct drm_tegra_channel_submit *args, + struct tegra_drm_submit_data *job_data, + bool need_implicit_fences) +{ + struct drm_tegra_submit_cmd *cmds; + u32 i, gather_offset = 0; + struct host1x_job *job; + u32 num_cmds; + int err; + + cmds = alloc_copy_user_array(u64_to_user_ptr(args->cmds_ptr), + args->num_cmds, sizeof(*cmds)); + if (IS_ERR(cmds)) + return PTR_ERR(cmds); + + num_cmds = args->num_cmds; + if (need_implicit_fences) + num_cmds++; + + job = host1x_job_alloc(ctx->channel, num_cmds, 0); + if (!job) { + err = -ENOMEM; + goto done; + } + + err = submit_get_syncpt(drm, job, args); + if (err < 0) + goto free_job; + + job->client = &ctx->client->base; + job->class = ctx->client->base.class; + job->serialize = true; + + if (need_implicit_fences) { + u32 threshold = host1x_syncpt_incr_max(job->syncpt, 1) - 1; + host1x_job_add_wait(job, host1x_syncpt_id(job->syncpt), threshold); + } + + for (i = 0; i < args->num_cmds; i++) { + struct drm_tegra_submit_cmd *cmd = &cmds[i]; + + if (cmd->type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) { + err = submit_job_add_gather(job, ctx, &cmd->gather_uptr, + bo, &gather_offset, + job_data); + if (err) + goto free_job; + } else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) { + if (cmd->wait_syncpt.reserved[0] || + cmd->wait_syncpt.reserved[1]) { + err = -EINVAL; + goto free_job; + } + + host1x_job_add_wait(job, cmd->wait_syncpt.id, + cmd->wait_syncpt.threshold); + } else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNC_FILE) { + struct dma_fence *f; + + if (cmd->wait_sync_file.reserved[0] || + cmd->wait_sync_file.reserved[1] || + cmd->wait_sync_file.reserved[2]) { + err = -EINVAL; + goto free_job; + } + + f = sync_file_get_fence(cmd->wait_sync_file.fd); + if (!f) { + err = -EINVAL; + goto free_job; + } + + err = dma_fence_wait(f, true); + dma_fence_put(f); + + if (err) + goto free_job; + } else { + err = -EINVAL; + goto free_job; + } + } + + if (gather_offset == 0) { + drm_info(drm, "Job must have at least one gather"); + err = -EINVAL; + goto free_job; + } + + *pjob = job; + + err = 0; + goto done; + +free_job: + host1x_job_put(job); + +done: + kvfree(cmds); + + return err; +} + +static int submit_create_postfences(struct host1x_job *job, + struct drm_tegra_channel_submit *args) +{ + struct drm_tegra_submit_syncpt_incr *incr = &args->syncpt_incrs[0]; + struct tegra_drm_submit_data *job_data = job->user_data; + struct dma_fence *fence; + int err = 0; + u32 i; + + fence = host1x_fence_create(job->syncpt, job->syncpt_end); + if (IS_ERR(fence)) + return PTR_ERR(fence); + + incr->fence_value = job->syncpt_end; + + for (i = 0; i < job_data->num_used_mappings; i++) { + struct tegra_drm_used_mapping *um = &job_data->used_mappings[i]; + struct tegra_bo *bo = host1x_to_tegra_bo(um->mapping->bo); + + if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_READ) + dma_resv_add_shared_fence(bo->gem.resv, fence); + + if (um->flags & DRM_TEGRA_SUBMIT_BUF_RESV_WRITE) + dma_resv_add_excl_fence(bo->gem.resv, fence); + } + + if (incr->flags & DRM_TEGRA_SUBMIT_SYNCPT_INCR_CREATE_SYNC_FILE) { + struct sync_file *sf; + + err = get_unused_fd_flags(O_CLOEXEC); + if (err < 0) + goto put_fence; + + sf = sync_file_create(fence); + if (!sf) { + err = -ENOMEM; + goto put_fence; + } + + fd_install(err, sf->file); + incr->sync_file_fd = err; + err = 0; + } + +put_fence: + dma_fence_put(fence); + + return err; +} + +static void release_job(struct host1x_job *job) +{ + struct tegra_drm_client *client = + container_of(job->client, struct tegra_drm_client, base); + struct tegra_drm_submit_data *job_data = job->user_data; + u32 i; + + for (i = 0; i < job_data->num_used_mappings; i++) + tegra_drm_mapping_put(job_data->used_mappings[i].mapping); + + kfree(job_data->used_mappings); + kfree(job_data); + + pm_runtime_put_autosuspend(client->base.dev); +} + +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct tegra_drm_file *fpriv = file->driver_priv; + struct drm_tegra_channel_submit *args = data; + struct tegra_drm_submit_data *job_data; + struct ww_acquire_ctx acquire_ctx; + struct tegra_drm_channel_ctx *ctx; + DEFINE_XARRAY(implicit_fences); + bool need_implicit_fences; + struct host1x_job *job; + struct gather_bo *bo; + u32 i; + int err; + + if (args->reserved0 || args->reserved1) + return -EINVAL; + + ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx); + if (!ctx) + return -EINVAL; + + /* Allocate gather BO and copy gather words in. */ + err = submit_copy_gather_data(drm, &bo, args); + if (err) + goto unlock; + + job_data = kzalloc(sizeof(*job_data), GFP_KERNEL); + if (!job_data) { + err = -ENOMEM; + goto put_bo; + } + + /* Get data buffer mappings and do relocation patching. */ + err = submit_process_bufs(drm, bo, job_data, ctx, args, &acquire_ctx, + &need_implicit_fences); + if (err) + goto free_job_data; + + /* Allocate host1x_job and add gathers and waits to it. */ + err = submit_create_job(drm, &job, bo, ctx, args, + job_data, need_implicit_fences); + if (err) + goto free_job_data; + + /* Map gather data for Host1x. */ + err = host1x_job_pin(job, ctx->client->base.dev); + if (err) + goto put_job; + + /* Boot engine. */ + err = pm_runtime_get_sync(ctx->client->base.dev); + if (err < 0) + goto put_pm_runtime; + + job->user_data = job_data; + job->release = release_job; + job->timeout = 10000; + + /* + * job_data is now part of job reference counting, so don't release + * it from here. + */ + job_data = NULL; + + if (need_implicit_fences) { + xa_init_flags(&implicit_fences, XA_FLAGS_ALLOC); + + /* + * Lock DMA reservations, reserve fence slots and + * retrieve prefences. + */ + err = submit_handle_resv(job->user_data, &acquire_ctx, + &implicit_fences); + if (err) + goto put_pm_runtime; + } + + /* Submit job to hardware. */ + err = host1x_job_submit(job); + if (err) + goto unlock_resv; + + /* Return postfences to userspace and add fences to DMA reservations. */ + err = submit_create_postfences(job, args); + + if (need_implicit_fences) { + struct dma_fence *fence; + unsigned long index; + int err = 0; + + /* Unlock DMA reservations. */ + submit_unlock_resv(job->user_data, &acquire_ctx); + + /* Wait for fences and unblock job. */ + xa_for_each(&implicit_fences, index, fence) { + err = dma_fence_wait(fence, false); + if (err < 0) + break; + } + + xa_destroy(&implicit_fences); + + if (err == 0) + host1x_syncpt_incr(job->syncpt); + } + + goto put_job; + +unlock_resv: + if (need_implicit_fences) { + submit_unlock_resv(job->user_data, &acquire_ctx); + xa_destroy(&implicit_fences); + } +put_pm_runtime: + if (!job->release) + pm_runtime_put(ctx->client->base.dev); + host1x_job_unpin(job); +put_job: + host1x_job_put(job); +free_job_data: + if (job_data && job_data->used_mappings) { + for (i = 0; i < job_data->num_used_mappings; i++) + tegra_drm_mapping_put(job_data->used_mappings[i].mapping); + kfree(job_data->used_mappings); + } + if (job_data) + kfree(job_data); +put_bo: + gather_bo_put(&bo->base); +unlock: + mutex_unlock(&fpriv->lock); + return err; +} diff --git a/drivers/gpu/drm/tegra/uapi/submit.h b/drivers/gpu/drm/tegra/uapi/submit.h new file mode 100644 index 000000000000..0a165e9e4bda --- /dev/null +++ b/drivers/gpu/drm/tegra/uapi/submit.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2020 NVIDIA Corporation */ + +#ifndef _TEGRA_DRM_UAPI_SUBMIT_H +#define _TEGRA_DRM_UAPI_SUBMIT_H + +struct tegra_drm_used_mapping { + struct tegra_drm_mapping *mapping; + u32 flags; +}; + +struct tegra_drm_submit_data { + struct tegra_drm_used_mapping *used_mappings; + u32 num_used_mappings; +}; + +#endif diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c new file mode 100644 index 000000000000..9bd7766e759a --- /dev/null +++ b/drivers/gpu/drm/tegra/uapi/uapi.c @@ -0,0 +1,326 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2020 NVIDIA Corporation */ + +#include <linux/host1x.h> +#include <linux/iommu.h> +#include <linux/list.h> + +#include <drm/drm_drv.h> +#include <drm/drm_file.h> + +#include "../uapi.h" +#include "../drm.h" + +struct tegra_drm_channel_ctx * +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id) +{ + struct tegra_drm_channel_ctx *ctx; + + mutex_lock(&file->lock); + ctx = xa_load(&file->contexts, id); + if (!ctx) + mutex_unlock(&file->lock); + + return ctx; +} + +static void tegra_drm_mapping_release(struct kref *ref) +{ + struct tegra_drm_mapping *mapping = + container_of(ref, struct tegra_drm_mapping, ref); + + if (mapping->sgt) + dma_unmap_sgtable(mapping->dev, mapping->sgt, + mapping->direction, DMA_ATTR_SKIP_CPU_SYNC); + + host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt); + host1x_bo_put(mapping->bo); + + kfree(mapping); +} + +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping) +{ + kref_put(&mapping->ref, tegra_drm_mapping_release); +} + +static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx) +{ + unsigned long mapping_id; + struct tegra_drm_mapping *mapping; + + xa_for_each(&ctx->mappings, mapping_id, mapping) + tegra_drm_mapping_put(mapping); + + xa_destroy(&ctx->mappings); + + host1x_channel_put(ctx->channel); + + kfree(ctx); +} + +int close_channel_ctx(int id, void *p, void *data) +{ + struct tegra_drm_channel_ctx *ctx = p; + + tegra_drm_channel_ctx_close(ctx); + + return 0; +} + +void tegra_drm_uapi_close_file(struct tegra_drm_file *file) +{ + unsigned long ctx_id; + struct tegra_drm_channel_ctx *ctx; + + xa_for_each(&file->contexts, ctx_id, ctx) + tegra_drm_channel_ctx_close(ctx); + + xa_destroy(&file->contexts); +} + +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct tegra_drm_file *fpriv = file->driver_priv; + struct tegra_drm *tegra = drm->dev_private; + struct drm_tegra_channel_open *args = data; + struct tegra_drm_client *client = NULL; + struct tegra_drm_channel_ctx *ctx; + int err; + + if (args->flags || args->reserved[0] || args->reserved[1]) + return -EINVAL; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + + err = -ENODEV; + list_for_each_entry(client, &tegra->clients, list) { + if (client->base.class == args->host1x_class) { + err = 0; + break; + } + } + if (err) + goto free_ctx; + + if (client->shared_channel) { + ctx->channel = host1x_channel_get(client->shared_channel); + } else { + ctx->channel = host1x_channel_request(&client->base); + if (!ctx->channel) { + err = -EBUSY; + goto free_ctx; + } + } + + err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx, + XA_LIMIT(1, U32_MAX), GFP_KERNEL); + if (err < 0) + goto put_channel; + + ctx->client = client; + xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1); + + args->hardware_version = client->version; + + return 0; + +put_channel: + host1x_channel_put(ctx->channel); +free_ctx: + kfree(ctx); + + return err; +} + +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct tegra_drm_file *fpriv = file->driver_priv; + struct drm_tegra_channel_close *args = data; + struct tegra_drm_channel_ctx *ctx; + + if (args->reserved[0]) + return -EINVAL; + + ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx); + if (!ctx) + return -EINVAL; + + xa_erase(&fpriv->contexts, args->channel_ctx); + + mutex_unlock(&fpriv->lock); + + tegra_drm_channel_ctx_close(ctx); + + return 0; +} + +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct tegra_drm_file *fpriv = file->driver_priv; + struct drm_tegra_channel_map *args = data; + struct tegra_drm_channel_ctx *ctx; + struct tegra_drm_mapping *mapping; + struct drm_gem_object *gem; + u32 mapping_id; + int err = 0; + + if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE) + return -EINVAL; + if (args->reserved[0] || args->reserved[1]) + return -EINVAL; + + if (!IS_ALIGNED(args->offset, 0x1000) || + !IS_ALIGNED(args->length, 0x1000)) + return -EINVAL; + + ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx); + if (!ctx) + return -EINVAL; + + mapping = kzalloc(sizeof(*mapping), GFP_KERNEL); + if (!mapping) { + err = -ENOMEM; + goto unlock; + } + + kref_init(&mapping->ref); + + gem = drm_gem_object_lookup(file, args->handle); + if (!gem) { + err = -EINVAL; + goto unlock; + } + + if (args->offset >= gem->size || args->length > gem->size || + args->offset > gem->size - args->length) { + err = -EINVAL; + goto put_gem; + } + + mapping->dev = ctx->client->base.dev; + mapping->bo = &container_of(gem, struct tegra_bo, gem)->base; + + if (!iommu_get_domain_for_dev(mapping->dev) || + ctx->client->base.group) { + host1x_bo_pin(mapping->dev, mapping->bo, + &mapping->iova); + } else { + mapping->direction = DMA_TO_DEVICE; + if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE) + mapping->direction = DMA_BIDIRECTIONAL; + + mapping->sgt = + host1x_bo_pin(mapping->dev, mapping->bo, NULL); + if (IS_ERR(mapping->sgt)) { + err = PTR_ERR(mapping->sgt); + goto put_gem; + } + + err = dma_map_sgtable(mapping->dev, mapping->sgt, + mapping->direction, + DMA_ATTR_SKIP_CPU_SYNC); + if (err) + goto unpin; + + /* TODO only map the requested part */ + mapping->iova = sg_dma_address(mapping->sgt->sgl) + args->offset; + mapping->iova_end = mapping->iova + gem->size; + } + + mutex_unlock(&fpriv->lock); + + err = xa_alloc(&ctx->mappings, &mapping_id, mapping, + XA_LIMIT(1, U32_MAX), GFP_KERNEL); + if (err < 0) + goto unmap; + + /* TODO: if appropriate, return actual IOVA */ + args->iova = U64_MAX; + args->mapping_id = mapping_id; + + return 0; + +unmap: + if (mapping->sgt) { + dma_unmap_sgtable(mapping->dev, mapping->sgt, + mapping->direction, DMA_ATTR_SKIP_CPU_SYNC); + } +unpin: + host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt); +put_gem: + drm_gem_object_put(gem); + kfree(mapping); +unlock: + mutex_unlock(&fpriv->lock); + return err; +} + +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct tegra_drm_file *fpriv = file->driver_priv; + struct drm_tegra_channel_unmap *args = data; + struct tegra_drm_channel_ctx *ctx; + struct tegra_drm_mapping *mapping; + + if (args->reserved[0] || args->reserved[1]) + return -EINVAL; + + ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx); + if (!ctx) + return -EINVAL; + + mapping = xa_erase(&ctx->mappings, args->mapping_id); + + mutex_unlock(&fpriv->lock); + + if (mapping) { + tegra_drm_mapping_put(mapping); + return 0; + } else { + return -EINVAL; + } +} + +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct drm_tegra_gem_create *args = data; + struct tegra_bo *bo; + + if (args->flags) + return -EINVAL; + + bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags, + &args->handle); + if (IS_ERR(bo)) + return PTR_ERR(bo); + + return 0; +} + +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data, + struct drm_file *file) +{ + struct drm_tegra_gem_mmap *args = data; + struct drm_gem_object *gem; + struct tegra_bo *bo; + + gem = drm_gem_object_lookup(file, args->handle); + if (!gem) + return -EINVAL; + + bo = to_tegra_bo(gem); + + args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node); + + drm_gem_object_put(gem); + + return 0; +}
Implement the new UAPI, and bump the TegraDRM major version. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> --- v3: * Remove WRITE_RELOC. Relocations are now patched implicitly when patching is needed. * Directly call PM runtime APIs on devices instead of using power_on/power_off callbacks. * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC * Accommodate for removal of timeout field and inlining of syncpt_incrs array. * Copy entire user arrays at a time instead of going through elements one-by-one. * Implement waiting of DMA reservations. * Split out gather_bo implementation into a separate file. * Fix length parameter passed to sg_init_one in gather_bo * Cosmetic cleanup. --- drivers/gpu/drm/tegra/Makefile | 3 + drivers/gpu/drm/tegra/drm.c | 46 +- drivers/gpu/drm/tegra/drm.h | 5 + drivers/gpu/drm/tegra/uapi.h | 63 +++ drivers/gpu/drm/tegra/uapi/gather_bo.c | 86 ++++ drivers/gpu/drm/tegra/uapi/gather_bo.h | 22 + drivers/gpu/drm/tegra/uapi/submit.c | 675 +++++++++++++++++++++++++ drivers/gpu/drm/tegra/uapi/submit.h | 17 + drivers/gpu/drm/tegra/uapi/uapi.c | 326 ++++++++++++ 9 files changed, 1225 insertions(+), 18 deletions(-) create mode 100644 drivers/gpu/drm/tegra/uapi.h create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c