Message ID | 20230323230755.1094832-3-dmitry.osipenko@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add sync object UAPI support to VirtIO-GPU driver | expand |
Hi Dmitry, Have you considered creating a few DRM helpers for this functionality? AFAICT this is the third driver which supports syncobj timelines and looking at one of the implementations ... it is not great. Note that this suggestion is _not_ a blocker. On 2023/03/24, Dmitry Osipenko wrote: > Add sync object DRM UAPI support to VirtIO-GPU driver. It's required > for enabling a full-featured Vulkan fencing by Venus and native context > VirtIO-GPU Mesa drivers. > > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> > --- > +static int > +virtio_gpu_parse_deps(struct virtio_gpu_submit *submit) > +{ > + struct drm_virtgpu_execbuffer *exbuf = submit->exbuf; > + struct drm_virtgpu_execbuffer_syncobj syncobj_desc; > + size_t syncobj_stride = exbuf->syncobj_stride; > + struct drm_syncobj **syncobjs; > + int ret = 0, i; > + > + if (!submit->num_in_syncobjs) > + return 0; > + > + syncobjs = kvcalloc(submit->num_in_syncobjs, sizeof(*syncobjs), > + GFP_KERNEL); Please add an inline note about the decision behind the allocators used, both here and in the parse_post_deps below. IIRC there was some nice discussion between you and Rob in earlier revisions. > + if (!syncobjs) > + return -ENOMEM; > + > + for (i = 0; i < submit->num_in_syncobjs; i++) { > + uint64_t address = exbuf->in_syncobjs + i * syncobj_stride; > + struct dma_fence *fence; > + > + if (copy_from_user(&syncobj_desc, > + u64_to_user_ptr(address), > + min(syncobj_stride, sizeof(syncobj_desc)))) { > + ret = -EFAULT; > + break; > + } > + > + if (syncobj_desc.flags & ~VIRTGPU_EXECBUF_SYNCOBJ_FLAGS) { > + ret = -EINVAL; > + break; > + } > + > + ret = drm_syncobj_find_fence(submit->file, syncobj_desc.handle, > + syncobj_desc.point, 0, &fence); > + if (ret) > + break; > + > + ret = virtio_gpu_dma_fence_wait(submit, fence); > + > + dma_fence_put(fence); > + if (ret) > + break; If going the DRM helpers route: The above two are effectively the only variance across vendors - a simple function point as arg should suffice. Might want to use internal flags, but that's also trivial. > + submit->in_syncobjs = syncobjs; > + > + return ret; > +} > + > +static void virtio_gpu_reset_syncobjs(struct drm_syncobj **syncobjs, > + uint32_t nr_syncobjs) > +{ > + uint32_t i; > + > + for (i = 0; i < nr_syncobjs; i++) { > + if (syncobjs[i]) > + drm_syncobj_replace_fence(syncobjs[i], NULL); Side note: the drm_syncobj_put() called immediately after also calls replace/reset fence internally. Although reading from the docs, I'm not sure if relying on that is a wise move. Just thought I'd point it out. > > + ret = virtio_gpu_parse_deps(&submit); > + if (ret) > + goto cleanup; > + > + ret = virtio_gpu_parse_post_deps(&submit); > + if (ret) > + goto cleanup; > + I think we should zero num_(in|out)_syncobjs when the respective parse fails. Otherwise we get one "cleanup" within the parse function itself and a second during the cleanup_submit. Haven't looked at it too closely but I suspect that will trigger an UAF or two. > ret = virtio_gpu_install_out_fence_fd(&submit); > if (ret) > goto cleanup; > @@ -294,6 +512,7 @@ int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data, > goto cleanup; > > virtio_gpu_submit(&submit); > + virtio_gpu_process_post_deps(&submit); Any particular reason why the virtio_gpu_reset_syncobjs is deferred to virtio_gpu_cleanup_submit(). Having it just above the process_post_deps (similar to msm) allows the reader to get closure about the in syncobjs. This is just personal preference, so don't read too much into it. HTH Emil
On 3/30/23 20:24, Emil Velikov wrote: > Hi Dmitry, > > Have you considered creating a few DRM helpers for this functionality? > > AFAICT this is the third driver which supports syncobj timelines and > looking at one of the implementations ... it is not great. Note that > this suggestion is _not_ a blocker. Would like to see a third driver starting to use the exactly same drm_execbuffer_syncobj struct because UABI part isn't generic, though it's a replica of the MSM driver for now. The virtio-gpu is only at the beginning of starting to use sync objects, compared to MSM driver. Will be better to defer the generalization until virtio-gpu will become more mature, like maybe after a year since the time virtio userspace will start using sync objects, IMO. ... >> +static void virtio_gpu_reset_syncobjs(struct drm_syncobj **syncobjs, >> + uint32_t nr_syncobjs) >> +{ >> + uint32_t i; >> + >> + for (i = 0; i < nr_syncobjs; i++) { >> + if (syncobjs[i]) >> + drm_syncobj_replace_fence(syncobjs[i], NULL); > > Side note: the drm_syncobj_put() called immediately after also calls > replace/reset fence internally. Although reading from the docs, I'm not > sure if relying on that is a wise move. > > Just thought I'd point it out. The drm_syncobj_put() doesn't call replace/reset fence until syncobj is freed. We drop the old fence for active/alive in-syncobj here after handling the fence-wait, this makes syncobj reusable, otherwise userpsace would have to re-create syncobjs after each submission. >> >> + ret = virtio_gpu_parse_deps(&submit); >> + if (ret) >> + goto cleanup; >> + >> + ret = virtio_gpu_parse_post_deps(&submit); >> + if (ret) >> + goto cleanup; >> + > > I think we should zero num_(in|out)_syncobjs when the respective parse > fails. Otherwise we get one "cleanup" within the parse function itself > and a second during the cleanup_submit. Haven't looked at it too closely > but I suspect that will trigger an UAF or two. There are checks for NULL pointers in the code that will prevent the UAF. I'll add zeroing of the nums for more consistency. >> ret = virtio_gpu_install_out_fence_fd(&submit); >> if (ret) >> goto cleanup; >> @@ -294,6 +512,7 @@ int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data, >> goto cleanup; >> >> virtio_gpu_submit(&submit); >> + virtio_gpu_process_post_deps(&submit); > > Any particular reason why the virtio_gpu_reset_syncobjs is deferred to > virtio_gpu_cleanup_submit(). Having it just above the process_post_deps > (similar to msm) allows the reader to get closure about the in syncobjs. > > This is just personal preference, so don't read too much into it. The job submission path should be short as possible in general. Technically, virtio_gpu_process_post_deps() should be fast, but since I'm not 100% sure about all the corner cases, it's better to hold until job is sent out. Thank you very much for the review! I'll address the rest of comments in v5.
On Sun, 2 Apr 2023 at 18:45, Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote: > > On 3/30/23 20:24, Emil Velikov wrote: > > Hi Dmitry, > > > > Have you considered creating a few DRM helpers for this functionality? > > > > AFAICT this is the third driver which supports syncobj timelines and > > looking at one of the implementations ... it is not great. Note that > > this suggestion is _not_ a blocker. > > Would like to see a third driver starting to use the exactly same > drm_execbuffer_syncobj struct because UABI part isn't generic, though > it's a replica of the MSM driver for now. > > The virtio-gpu is only at the beginning of starting to use sync objects, > compared to MSM driver. Will be better to defer the generalization until > virtio-gpu will become more mature, like maybe after a year since the > time virtio userspace will start using sync objects, IMO. > I wasn't talking about generic UAPI, but having drm helpers instead. The former (as you pointed out) would need time to crystallize. While the latter can be done even today. > ... > >> +static void virtio_gpu_reset_syncobjs(struct drm_syncobj **syncobjs, > >> + uint32_t nr_syncobjs) > >> +{ > >> + uint32_t i; > >> + > >> + for (i = 0; i < nr_syncobjs; i++) { > >> + if (syncobjs[i]) > >> + drm_syncobj_replace_fence(syncobjs[i], NULL); > > > > Side note: the drm_syncobj_put() called immediately after also calls > > replace/reset fence internally. Although reading from the docs, I'm not > > sure if relying on that is a wise move. > > > > Just thought I'd point it out. > > The drm_syncobj_put() doesn't call replace/reset fence until syncobj is > freed. We drop the old fence for active/alive in-syncobj here after > handling the fence-wait, this makes syncobj reusable, otherwise > userpsace would have to re-create syncobjs after each submission. > I see, thanks. > >> > >> + ret = virtio_gpu_parse_deps(&submit); > >> + if (ret) > >> + goto cleanup; > >> + > >> + ret = virtio_gpu_parse_post_deps(&submit); > >> + if (ret) > >> + goto cleanup; > >> + > > > > I think we should zero num_(in|out)_syncobjs when the respective parse > > fails. Otherwise we get one "cleanup" within the parse function itself > > and a second during the cleanup_submit. Haven't looked at it too closely > > but I suspect that will trigger an UAF or two. > > There are checks for NULL pointers in the code that will prevent the > UAF. I'll add zeroing of the nums for more consistency. > Riiiight the drm_syncobj is attached to the encapsulating struct virtio_gpu_submit _only_ on success. By clearing the num variables, the NULL checks will no longer be needed ... in case you'd want to drop that. Either way - even as-is the code is safe. > >> ret = virtio_gpu_install_out_fence_fd(&submit); > >> if (ret) > >> goto cleanup; > >> @@ -294,6 +512,7 @@ int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data, > >> goto cleanup; > >> > >> virtio_gpu_submit(&submit); > >> + virtio_gpu_process_post_deps(&submit); > > > > Any particular reason why the virtio_gpu_reset_syncobjs is deferred to > > virtio_gpu_cleanup_submit(). Having it just above the process_post_deps > > (similar to msm) allows the reader to get closure about the in syncobjs. > > > > This is just personal preference, so don't read too much into it. > > The job submission path should be short as possible in general. > Technically, virtio_gpu_process_post_deps() should be fast, but since > I'm not 100% sure about all the corner cases, it's better to hold until > job is sent out. > Ack, thanks again -Emil
On Mon, 3 Apr 2023 at 14:00, Emil Velikov <emil.l.velikov@gmail.com> wrote: > > > I think we should zero num_(in|out)_syncobjs when the respective parse > > > fails. Otherwise we get one "cleanup" within the parse function itself > > > and a second during the cleanup_submit. Haven't looked at it too closely > > > but I suspect that will trigger an UAF or two. > > > > There are checks for NULL pointers in the code that will prevent the > > UAF. I'll add zeroing of the nums for more consistency. > > > > Riiiight the drm_syncobj is attached to the encapsulating struct > virtio_gpu_submit _only_ on success. > By clearing the num variables, the NULL checks will no longer be > needed ... in case you'd want to drop that. > > Either way - even as-is the code is safe. > Err or not. The NULL check itself will cause NULL pointer deref. In more detail: in/out syncobjs are memset() to NULL in virtio_gpu_init_submit(). The virtio_gpu_parse_(|post_)deps() will fail and leave them unchanged. Then virtio_gpu_free_syncobjs() and virtio_gpu_reset_syncobjs() will trigger a NULL ptr deref, because they are accessing the elements of a the (NULL) array. Apart from the num_(in|out)_syncobjcs=0, I would drop the NULL checks - they give a false sense of security IMHO. -Emil
On 4/3/23 16:22, Emil Velikov wrote: > On Mon, 3 Apr 2023 at 14:00, Emil Velikov <emil.l.velikov@gmail.com> wrote: > >>>> I think we should zero num_(in|out)_syncobjs when the respective parse >>>> fails. Otherwise we get one "cleanup" within the parse function itself >>>> and a second during the cleanup_submit. Haven't looked at it too closely >>>> but I suspect that will trigger an UAF or two. >>> >>> There are checks for NULL pointers in the code that will prevent the >>> UAF. I'll add zeroing of the nums for more consistency. >>> >> >> Riiiight the drm_syncobj is attached to the encapsulating struct >> virtio_gpu_submit _only_ on success. >> By clearing the num variables, the NULL checks will no longer be >> needed ... in case you'd want to drop that. >> >> Either way - even as-is the code is safe. >> > > Err or not. The NULL check itself will cause NULL pointer deref. > > In more detail: in/out syncobjs are memset() to NULL in > virtio_gpu_init_submit(). The virtio_gpu_parse_(|post_)deps() will > fail and leave them unchanged. Then virtio_gpu_free_syncobjs() and > virtio_gpu_reset_syncobjs() will trigger a NULL ptr deref, because > they are accessing the elements of a the (NULL) array. > > Apart from the num_(in|out)_syncobjcs=0, I would drop the NULL checks > - they give a false sense of security IMHO. The reset/free are both under the NULL check on cleanup. I think it should work okay on error. Will improve it anyways to make more intuitive. Thanks! static void virtio_gpu_cleanup_submit(struct virtio_gpu_submit *submit) { if (submit->in_syncobjs) { virtio_gpu_reset_syncobjs(submit->in_syncobjs, submit->num_in_syncobjs); virtio_gpu_free_syncobjs(submit->in_syncobjs, submit->num_in_syncobjs); }
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c index add075681e18..a22155577152 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.c +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c @@ -176,7 +176,8 @@ static const struct drm_driver driver = { * If KMS is disabled DRIVER_MODESET and DRIVER_ATOMIC are masked * out via drm_device::driver_features: */ - .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | DRIVER_ATOMIC, + .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | DRIVER_ATOMIC | + DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE, .open = virtio_gpu_driver_open, .postclose = virtio_gpu_driver_postclose, diff --git a/drivers/gpu/drm/virtio/virtgpu_submit.c b/drivers/gpu/drm/virtio/virtgpu_submit.c index 42c79869f192..a18b21f9d07a 100644 --- a/drivers/gpu/drm/virtio/virtgpu_submit.c +++ b/drivers/gpu/drm/virtio/virtgpu_submit.c @@ -14,11 +14,24 @@ #include <linux/uaccess.h> #include <drm/drm_file.h> +#include <drm/drm_syncobj.h> #include <drm/virtgpu_drm.h> #include "virtgpu_drv.h" +struct virtio_gpu_submit_post_dep { + struct drm_syncobj *syncobj; + struct dma_fence_chain *chain; + uint64_t point; +}; + struct virtio_gpu_submit { + struct virtio_gpu_submit_post_dep *post_deps; + unsigned int num_out_syncobjs; + + struct drm_syncobj **in_syncobjs; + unsigned int num_in_syncobjs; + struct virtio_gpu_object_array *buflist; struct drm_virtgpu_execbuffer *exbuf; struct virtio_gpu_fence *out_fence; @@ -58,6 +71,189 @@ static int virtio_gpu_dma_fence_wait(struct virtio_gpu_submit *submit, return 0; } +static void virtio_gpu_free_syncobjs(struct drm_syncobj **syncobjs, + uint32_t nr_syncobjs) +{ + uint32_t i = nr_syncobjs; + + while (i--) { + if (syncobjs[i]) + drm_syncobj_put(syncobjs[i]); + } + + kvfree(syncobjs); +} + +static int +virtio_gpu_parse_deps(struct virtio_gpu_submit *submit) +{ + struct drm_virtgpu_execbuffer *exbuf = submit->exbuf; + struct drm_virtgpu_execbuffer_syncobj syncobj_desc; + size_t syncobj_stride = exbuf->syncobj_stride; + struct drm_syncobj **syncobjs; + int ret = 0, i; + + if (!submit->num_in_syncobjs) + return 0; + + syncobjs = kvcalloc(submit->num_in_syncobjs, sizeof(*syncobjs), + GFP_KERNEL); + if (!syncobjs) + return -ENOMEM; + + for (i = 0; i < submit->num_in_syncobjs; i++) { + uint64_t address = exbuf->in_syncobjs + i * syncobj_stride; + struct dma_fence *fence; + + if (copy_from_user(&syncobj_desc, + u64_to_user_ptr(address), + min(syncobj_stride, sizeof(syncobj_desc)))) { + ret = -EFAULT; + break; + } + + if (syncobj_desc.flags & ~VIRTGPU_EXECBUF_SYNCOBJ_FLAGS) { + ret = -EINVAL; + break; + } + + ret = drm_syncobj_find_fence(submit->file, syncobj_desc.handle, + syncobj_desc.point, 0, &fence); + if (ret) + break; + + ret = virtio_gpu_dma_fence_wait(submit, fence); + + dma_fence_put(fence); + if (ret) + break; + + if (syncobj_desc.flags & VIRTGPU_EXECBUF_SYNCOBJ_RESET) { + syncobjs[i] = + drm_syncobj_find(submit->file, syncobj_desc.handle); + if (!syncobjs[i]) { + ret = -EINVAL; + break; + } + } + } + + if (ret) { + virtio_gpu_free_syncobjs(syncobjs, i); + return ret; + } + + submit->in_syncobjs = syncobjs; + + return ret; +} + +static void virtio_gpu_reset_syncobjs(struct drm_syncobj **syncobjs, + uint32_t nr_syncobjs) +{ + uint32_t i; + + for (i = 0; i < nr_syncobjs; i++) { + if (syncobjs[i]) + drm_syncobj_replace_fence(syncobjs[i], NULL); + } +} + +static void +virtio_gpu_free_post_deps(struct virtio_gpu_submit_post_dep *post_deps, + uint32_t nr_syncobjs) +{ + uint32_t i = nr_syncobjs; + + while (i--) { + kfree(post_deps[i].chain); + drm_syncobj_put(post_deps[i].syncobj); + } + + kvfree(post_deps); +} + +static int virtio_gpu_parse_post_deps(struct virtio_gpu_submit *submit) +{ + struct drm_virtgpu_execbuffer *exbuf = submit->exbuf; + struct drm_virtgpu_execbuffer_syncobj syncobj_desc; + struct virtio_gpu_submit_post_dep *post_deps; + size_t syncobj_stride = exbuf->syncobj_stride; + int ret = 0, i; + + if (!submit->num_out_syncobjs) + return 0; + + post_deps = kvcalloc(submit->num_out_syncobjs, sizeof(*post_deps), + GFP_KERNEL); + if (!post_deps) + return -ENOMEM; + + for (i = 0; i < submit->num_out_syncobjs; i++) { + uint64_t address = exbuf->out_syncobjs + i * syncobj_stride; + + if (copy_from_user(&syncobj_desc, + u64_to_user_ptr(address), + min(syncobj_stride, sizeof(syncobj_desc)))) { + ret = -EFAULT; + break; + } + + post_deps[i].point = syncobj_desc.point; + + if (syncobj_desc.flags) { + ret = -EINVAL; + break; + } + + if (syncobj_desc.point) { + post_deps[i].chain = dma_fence_chain_alloc(); + if (!post_deps[i].chain) { + ret = -ENOMEM; + break; + } + } + + post_deps[i].syncobj = + drm_syncobj_find(submit->file, syncobj_desc.handle); + if (!post_deps[i].syncobj) { + ret = -EINVAL; + break; + } + } + + if (ret) { + virtio_gpu_free_post_deps(post_deps, i); + return ret; + } + + submit->post_deps = post_deps; + + return 0; +} + +static void +virtio_gpu_process_post_deps(struct virtio_gpu_submit *submit) +{ + struct virtio_gpu_submit_post_dep *post_deps = submit->post_deps; + struct dma_fence *fence = &submit->out_fence->f; + uint32_t i; + + if (!post_deps) + return; + + for (i = 0; i < submit->num_out_syncobjs; i++) { + if (post_deps[i].chain) { + drm_syncobj_add_point(post_deps[i].syncobj, + post_deps[i].chain, + fence, post_deps[i].point); + post_deps[i].chain = NULL; + } else { + drm_syncobj_replace_fence(post_deps[i].syncobj, fence); + } + } +} + static int virtio_gpu_fence_event_create(struct drm_device *dev, struct drm_file *file, struct virtio_gpu_fence *fence, @@ -121,6 +317,18 @@ static int virtio_gpu_init_submit_buflist(struct virtio_gpu_submit *submit) static void virtio_gpu_cleanup_submit(struct virtio_gpu_submit *submit) { + if (submit->in_syncobjs) { + virtio_gpu_reset_syncobjs(submit->in_syncobjs, + submit->num_in_syncobjs); + + virtio_gpu_free_syncobjs(submit->in_syncobjs, + submit->num_in_syncobjs); + } + + if (submit->post_deps) + virtio_gpu_free_post_deps(submit->post_deps, + submit->num_out_syncobjs); + if (!IS_ERR(submit->buf)) kvfree(submit->buf); @@ -173,6 +381,8 @@ static int virtio_gpu_init_submit(struct virtio_gpu_submit *submit, return err; } + submit->num_out_syncobjs = exbuf->num_out_syncobjs; + submit->num_in_syncobjs = exbuf->num_in_syncobjs; submit->out_fence = out_fence; submit->fence_ctx = fence_ctx; submit->ring_idx = ring_idx; @@ -285,6 +495,14 @@ int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data, if (ret) goto cleanup; + ret = virtio_gpu_parse_deps(&submit); + if (ret) + goto cleanup; + + ret = virtio_gpu_parse_post_deps(&submit); + if (ret) + goto cleanup; + ret = virtio_gpu_install_out_fence_fd(&submit); if (ret) goto cleanup; @@ -294,6 +512,7 @@ int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data, goto cleanup; virtio_gpu_submit(&submit); + virtio_gpu_process_post_deps(&submit); virtio_gpu_complete_submit(&submit); cleanup: virtio_gpu_cleanup_submit(&submit); diff --git a/include/uapi/drm/virtgpu_drm.h b/include/uapi/drm/virtgpu_drm.h index 7b158fcb02b4..ce4948aacafd 100644 --- a/include/uapi/drm/virtgpu_drm.h +++ b/include/uapi/drm/virtgpu_drm.h @@ -64,6 +64,16 @@ struct drm_virtgpu_map { __u32 pad; }; +#define VIRTGPU_EXECBUF_SYNCOBJ_RESET 0x01 +#define VIRTGPU_EXECBUF_SYNCOBJ_FLAGS ( \ + VIRTGPU_EXECBUF_SYNCOBJ_RESET | \ + 0) +struct drm_virtgpu_execbuffer_syncobj { + __u32 handle; + __u32 flags; + __u64 point; +}; + /* fence_fd is modified on success if VIRTGPU_EXECBUF_FENCE_FD_OUT flag is set. */ struct drm_virtgpu_execbuffer { __u32 flags; @@ -73,7 +83,11 @@ struct drm_virtgpu_execbuffer { __u32 num_bo_handles; __s32 fence_fd; /* in/out fence fd (see VIRTGPU_EXECBUF_FENCE_FD_IN/OUT) */ __u32 ring_idx; /* command ring index (see VIRTGPU_EXECBUF_RING_IDX) */ - __u32 pad; + __u32 syncobj_stride; + __u32 num_in_syncobjs; + __u32 num_out_syncobjs; + __u64 in_syncobjs; + __u64 out_syncobjs; }; #define VIRTGPU_PARAM_3D_FEATURES 1 /* do we have 3D features in the hw */
Add sync object DRM UAPI support to VirtIO-GPU driver. It's required for enabling a full-featured Vulkan fencing by Venus and native context VirtIO-GPU Mesa drivers. Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> --- drivers/gpu/drm/virtio/virtgpu_drv.c | 3 +- drivers/gpu/drm/virtio/virtgpu_submit.c | 219 ++++++++++++++++++++++++ include/uapi/drm/virtgpu_drm.h | 16 +- 3 files changed, 236 insertions(+), 2 deletions(-)